summaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers
Commit message (Collapse)AuthorAge
* Docx reader: Make use of track-changes option.Jesse Rosenthal2014-06-25
|
* Docx reader: Remove unnecessary filter in Parse.Jesse Rosenthal2014-06-25
| | | | mapMaybe does the filtering for us.
* Docx reader: Add rudimentary track changes support.Jesse Rosenthal2014-06-25
| | | | This will only read the insertions, and ignore the deletions.
* Docx reader: Parse Insertions and Deletions.Jesse Rosenthal2014-06-25
| | | | | This is just for the Parse module, reading it into the Docx format. It still has to be translated into pandoc.
* Docx Reader: Add change typesJesse Rosenthal2014-06-25
| | | | Insertion and deletion. Dates are just strings for now.
* Docx reader: Ignore zero (or negative) indentJesse Rosenthal2014-06-24
| | | | | If a block has an indentation less than or equal to zero, it should not be treated as a block quote.
* Docx reader: remove T.P.Generic import.Jesse Rosenthal2014-06-24
| | | | | This marks the removal of the final tree-walk in the code. (Though there is still one in the Lists module.)
* Docx reader: pass definition test.Jesse Rosenthal2014-06-24
| | | | | This commit also fixes a problem with the previous code pushes, which wouldn't allow code blocks to share a div.
* Docx reader: pass code tests.Jesse Rosenthal2014-06-24
|
* Add copyright block to T.P.R.Docx.Reducible.Jesse Rosenthal2014-06-23
|
* Merge pull request #1366 from jkr/reducible3John MacFarlane2014-06-23
|\ | | | | Docx rewrite and cleanup (in terms of Reducible typeclass)
| * Use Reducible in docx reader.Jesse Rosenthal2014-06-23
| | | | | | | | | | This cleans up them implementation, and cuts down on tree-walking. Anecdotally, I've seen about a 3-fold speedup.
| * Move some of the clean-up logic into List module.Jesse Rosenthal2014-06-23
| | | | | | | | | | This will allow us to get rid of more general functions we no longer need in the main reader.
| * Add new typeclass, ReducibleJesse Rosenthal2014-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This defines a typeclass `Reducible` which allows us to "reduce" pandoc Inlines and Blocks, like so Emph [Strong [Str "foo", Space]] <++> Strong [Emph [Str "bar"]], Str "baz"] = [Strong [Emph [Str "foo", Space, Str "bar"], Space, Str "baz"]] So adjacent formattings and strings are appropriately grouped. Another set of operators for `(Reducible a) => (Many a)` are also included.
* | Markdown reader: Combine consecutive latex environments.John MacFarlane2014-06-23
|/ | | | | | | This helps when you have two minipages which can't have blank lines between them. See #690, #1196.
* Docx reader: Fix spacing in formatting.Jesse Rosenthal2014-06-22
| | | | | | The normalizing tests revealed a problem with unformatted spaces, brought about by `spanTrim`. This fixes by not trimming the spaces out of spans until they are in their final form.
* Implement new normalization.Jesse Rosenthal2014-06-22
| | | | | | There were some problems with the old str normalization. This fixes those problems. Also, since it drills down on its own, it only needs to be mapped over the blocks, not walked over the tree.
* Markdown reader: Support smallcaps through span.John MacFarlane2014-06-20
| | | | | | | | `<span style="font-variant:small-caps;">foo</span>` will be parsed as a `SmallCaps` inline, and will work in all output formats that support small caps. Closes #1360.
* MediaWiki reader: Tightened up template parsing.John MacFarlane2014-06-20
| | | | | | The opening "{{" must be followed by an alphanumeric or ':'. This prevents the exponential slowdown in #1033. Closes #1033.
* MediaWiki reader: Support --trace.John MacFarlane2014-06-20
|
* Markdown reader: Prevent spurious line breaks after list items.John MacFarlane2014-06-20
| | | | | | When the `hard_line_breaks` option was specified, pandoc would produce a spurious line break after a tight list item. This patch solves the problem. Closes #1137.
* HTML reader: Fix performance issue with malformed HTML tables.John MacFarlane2014-06-20
| | | | | We let a `</table>` tag close an open `<tr>` or `<td>`. Closes #1167.
* Support --trace in HTML reader.John MacFarlane2014-06-20
|
* Make strNormalize go bottomUp.Jesse Rosenthal2014-06-20
| | | | This was how it used to be before it was folded into blockNormalize.
* Docx reader: Add a comment explaining strNormalizeJesse Rosenthal2014-06-20
| | | | | | `normalize` from Text.Pandoc.Shared is more general. In tests, though, it more than doubles the run time. `strNormalize` does less, but it does what we need. This comment is added for future maintainability.
* Docx Reader: Normalize DefinitionListsJesse Rosenthal2014-06-20
| | | | | Previously DefinitionList had been left out of `blockNormalize`. Now it is included.
* Docx reader: simplify blockNormalizeJesse Rosenthal2014-06-20
| | | | | | Use a function `stripSpaces`, instead of recursion. Makes it a bit easier to read and mantain, and simplify normalizing DefinitionList, which was left out the first time.
* Docx reader: Fix hdr handling in block normJesse Rosenthal2014-06-20
| | | | | `blockNormalize` previously forgot to account for the case in which a Header's inlines did not start with a space.
* HTML reader: Allow space between `<col>` and `</col>`.John MacFarlane2014-06-19
| | | | | | | | | | | | | | | | | | | | | | | | Test case: ``` <table border="1"> <colgroup> <col> </col> <col></col> </colgroup> <tbody> <tr> <td>X</td> <td>Y</td> </tr> <tr> <td>1</td> <td>2</td> </tr> </tbody> </table> ```
* Introduce blockNormalizeJesse Rosenthal2014-06-19
| | | | This will help take care of spaces introduced at the beginning of strings.
* Have Docx reader properly interpret tabs.Jesse Rosenthal2014-06-19
|
* Add literal tabs to parser.Jesse Rosenthal2014-06-19
|
* More polish on Haddock reader/writer.John MacFarlane2014-06-18
|
* Finished first draft of Haddock writer.John MacFarlane2014-06-18
|
* Rewrote haddock reader to use haddock-library.John MacFarlane2014-06-18
| | | | | | | | | | This brings pandoc's rendering of haddock markup in line with the new haddock. Note that we preserve line breaks in `@` code blocks, unlike the earlier version. Modified tests pass. More tests would be good.
* Removed old haddock reader code. Add dependency on haddock-library.John MacFarlane2014-06-18
| | | | This also removes the dependency on alex and happy.
* DocBook reader: Support <?asciidoc-br?>.John MacFarlane2014-06-17
| | | | | | | | | Closes #1236. Note, this is a bit of a kludge, to work around the fact that xml-light doesn't parse `<?asciidoc-br?>` correctly. We preprocess the input, replacing that instruction with `<br/>`, and then parse that as a line break. Other XML instructions are simply removed from the input stream.
* LaTeX reader: Correctly handle table rows with too few cells.John MacFarlane2014-06-17
| | | | | LaTeX seems to treat them as if they have empty cells at the end. Closes #241.
* Fixed compiler warning.John MacFarlane2014-06-16
|
* Naming: Use Docx instead of DocX.John MacFarlane2014-06-16
| | | | For consistency with the existing writer.
* Merge branch 'docx' of https://github.com/jkr/pandoc into jkr-docxJohn MacFarlane2014-06-16
|\
| * Add DocX files to tree.Jesse Rosenthal2014-06-16
| | | | | | | | This introduces Text.Pandoc.DocX, and its exported `readDocX` function.
* | Org reader: make tildes create inline code.John MacFarlane2014-06-16
| | | | | | | | | | | | | | | | | | | | Closes #1345. Also relabeled 'code' and 'verbatim' parsers to accord with the org-mode manual. I'm not sure what the distinction between code and verbatim is supposed to be, but I'm pretty sure both should be represented as Code inlines in pandoc. The previous behavior resulted in the text not appearing in any output format.
* | Small improvement to fix to #1333.John MacFarlane2014-06-16
| | | | | | | | This allows blank lines at end of multiline headers.
* | Markdown reader: fixed #1333 (table parsing bug).John MacFarlane2014-06-16
| |
* | LaTeX reader: handle leading/trailing spaces in emph better.John MacFarlane2014-06-16
| | | | | | | | | | | | | | | | | | | | | | `\emph{ hi }` gets parsed as `[Space, Emph [Str "hi"], Space]` so that we don't get things like `* hi *` in markdown output. Also applies to textbf and some other constructions. Closes #1146. (`--normalize` isn't touched by this, but normalization should not generally be necessary with the changes to the readers.)
* | LaTeX reader: don't assume preamble doesn't contain environments.John MacFarlane2014-06-16
| | | | | | | | Closes #1338.
* | HTML reader: Fixed major parsing problem with HTML tables.John MacFarlane2014-06-16
| | | | | | | | Table cells were being combined into one cell. Closes #1341.
* | Merge pull request #1344 from mpickering/masterJohn MacFarlane2014-06-16
|\ \ | | | | | | Moved extractSpaces to Shared.hs
| * | Moved extractSpaces to Shared.hsmpickering2014-06-16
| |/ | | | | | | | | Generalised and move the extractSpaces function from `HTML.hs` to `Shared.hs` so that the docx reader can also use it.