diff options
author | Jonas Smedegaard <dr@jones.dk> | 2016-06-06 22:32:09 +0200 |
---|---|---|
committer | Jonas Smedegaard <dr@jones.dk> | 2016-06-06 22:32:09 +0200 |
commit | f8389d48da9921672da3896b85f7ed444ede714d (patch) | |
tree | d01ffee40dfdfa03f9b6a18980e5beb0c5dc262e | |
parent | 29583a109043c7b2128b37f13135a326add16996 (diff) |
Imported Upstream version 1.17.1~dfsg
47 files changed, 5125 insertions, 1817 deletions
@@ -1,6 +1,6 @@ % Pandoc User's Guide % John MacFarlane -% January 12, 2016 +% June 4, 2016 Synopsis ======== @@ -273,26 +273,26 @@ General options (LaTeX), `beamer` (LaTeX beamer slide show), `context` (ConTeXt), `man` (groff man), `mediawiki` (MediaWiki markup), `dokuwiki` (DokuWiki markup), `textile` (Textile), `org` (Emacs Org mode), - `texinfo` (GNU Texinfo), `opml` (OPML), `docbook` (DocBook), - `opendocument` (OpenDocument), `odt` (OpenOffice text document), - `docx` (Word docx), `haddock` (Haddock markup), `rtf` (rich text - format), `epub` (EPUB v2 book), `epub3` (EPUB v3), `fb2` - (FictionBook2 e-book), `asciidoc` (AsciiDoc), `icml` (InDesign - ICML), `tei` (TEI Simple), `slidy` (Slidy HTML and javascript slide - show), `slideous` (Slideous HTML and javascript slide show), - `dzslides` (DZSlides HTML5 + javascript slide show), `revealjs` - (reveal.js HTML5 + javascript slide show), `s5` (S5 HTML and javascript - slide show), or the path of a custom lua writer (see [Custom - writers], below). Note that `odt`, `epub`, and - `epub3` output will not be directed to *stdout*; an output - filename must be specified using the `-o/--output` option. If - `+lhs` is appended to `markdown`, `rst`, `latex`, `beamer`, - `html`, or `html5`, the output will be rendered as literate - Haskell source: see [Literate Haskell - support], below. Markdown syntax - extensions can be individually enabled or disabled by appending - `+EXTENSION` or `-EXTENSION` to the format name, as described - above under `-f`. + `texinfo` (GNU Texinfo), `opml` (OPML), `docbook` (DocBook 4), + `docbook5` (DocBook 5), `opendocument` (OpenDocument), `odt` + (OpenOffice text document), `docx` (Word docx), `haddock` + (Haddock markup), `rtf` (rich text format), `epub` (EPUB v2 + book), `epub3` (EPUB v3), `fb2` (FictionBook2 e-book), + `asciidoc` (AsciiDoc), `icml` (InDesign ICML), `tei` (TEI + Simple), `slidy` (Slidy HTML and javascript slide show), + `slideous` (Slideous HTML and javascript slide show), + `dzslides` (DZSlides HTML5 + javascript slide show), + `revealjs` (reveal.js HTML5 + javascript slide show), `s5` + (S5 HTML and javascript slide show), or the path of a custom + lua writer (see [Custom writers], below). Note that `odt`, + `epub`, and `epub3` output will not be directed to *stdout*; + an output filename must be specified using the `-o/--output` + option. If `+lhs` is appended to `markdown`, `rst`, `latex`, + `beamer`, `html`, or `html5`, the output will be rendered as + literate Haskell source: see [Literate Haskell support], + below. Markdown syntax extensions can be individually + enabled or disabled by appending `+EXTENSION` or + `-EXTENSION` to the format name, as described above under `-f`. `-o` *FILE*, `--output=`*FILE* @@ -538,16 +538,16 @@ General writer options `--columns=`*NUMBER* -: Specify length of lines in characters (for text wrapping). - This affects only the generated source code, not the layout on - the rendered page. +: Specify length of lines in characters. This affects text wrapping + in the generated source code (see `--wrap`). It also affects + calculation of column widths for plain text tables (see [Tables] below). `--toc`, `--table-of-contents` : Include an automatically generated table of contents (or, in the case of `latex`, `context`, `docx`, and `rst`, an instruction to create one) in the output document. This option has no effect on `man`, - `docbook`, `slidy`, `slideous`, `s5`, or `odt` output. + `docbook`, `docbook5`, `slidy`, `slideous`, `s5`, or `odt` output. `--toc-depth=`*NUMBER* @@ -909,7 +909,7 @@ Math rendering in HTML `--mathml`[`=`*URL*] -: Convert TeX math to [MathML] (in `docbook` as well as `html` and `html5`). +: Convert TeX math to [MathML] (in `docbook`, `docbook5`, `html` and `html5`). In standalone `html` output, a small javascript (or a link to such a script if a *URL* is supplied) will be inserted that allows the MathML to be viewed on some browsers. @@ -1591,21 +1591,26 @@ CSS. #### Extension: `implicit_header_references` #### Pandoc behaves as if reference links have been defined for each header. -So, instead of +So, to link to a header - [header identifiers](#header-identifiers-in-html) + # Header identifiers in HTML you can simply write - [header identifiers] + [Header identifiers in HTML] or - [header identifiers][] + [Header identifiers in HTML][] or - [the section on header identifiers][header identifiers] + [the section on header identifiers][header identifiers in + HTML] + +instead of giving the identifier explicitly: + + [Header identifiers in HTML](#header-identifiers-in-html) If there are multiple headers with identical text, the corresponding reference will link to the first one only, and you will need to use explicit @@ -3835,6 +3840,7 @@ any kind. (See COPYRIGHT for full copyright and warranty notices.) Contributors include Aaron Wolen, Albert Krewinkel, +Alex Vong, Alexander Kondratskiy, Alexander Sulfrian, Alexander V Vershilov, @@ -3847,6 +3853,7 @@ Arlo O'Keeffe, Artyom Kazak, Ben Gamari, Beni Cherniavsky-Paskin, +Benoit Schweblin, Bjorn Buckwalter, Bradley Kuhn, Brent Yorgey, @@ -3854,12 +3861,16 @@ Bryan O'Sullivan, B. Scott Michel, Caleb McDaniel, Calvin Beck, +Carlos Sosa, +Chris Black, +Christian Conkle, Christoffer Ackelman, Christoffer Sawicki, Clare Macrae, Clint Adams, Conal Elliott, Craig S. Bosma, +csforste, Daniel Bergey, Daniel T. Staal, David Lazar, @@ -3867,27 +3878,35 @@ David Röthlisberger, Denis Laxalde, Douglas Calvert, Douglas F. Calvert, +Emanuel Evans, +Emily Eisenberg, Eric Kow, Eric Seidel, Florian Eitel, François Gannaz, Freiric Barral, +Freirich Raabe, Fyodor Sheremetyev, Gabor Pali, Gavin Beatty, +Gottfried Haider, Greg Maslov, Grégory Bataille, Greg Rundlett, gwern, Gwern Branwen, Hans-Peter Deifel, +Henrik Tramberend, Henry de Valence, +ickc, Ilya V. Portnov, infinity0x, +Ivo Clarysse, Jaime Marquínez Ferrándiz, James Aspnes, Jamie F. Olson, Jan Larres, +Jan Schulz, Jason Ronallo, Jeff Arnold, Jeff Runningen, @@ -3902,22 +3921,30 @@ Jonathan Daugherty, Josef Svenningsson, Jose Luis Duran, Julien Cretel, +Juliusz Gonera, Justin Bogner, Kelsey Hightower, +Kolen Cheung, Konstantin Zudov, +Kristof Bastiaensen, Lars-Dominik Braun, Luke Plant, Mark Szepieniec, Mark Wright, +Martin Linn, Masayoshi Takahashi, Matej Kollar, Mathias Schenner, +Mathieu Duponchelle, +Matthew Eddey, Matthew Pickering, Matthias C. M. Troffaes, Mauro Bieg, Max Bolingbroke, Max Rydahl Andersen, Merijn Verstraaten, +Michael Beaumont, +Michael Chladek, Michael Snoyman, Michael Thompson, MinRK, @@ -3927,22 +3954,29 @@ Nick Bart, Nicolas Kaiser, Nikolay Yakimov, nkalvi, +Ophir Lifshitz, +Pablo Rodríguez, Paulo Tanimoto, Paul Rivier, Peter Wang, Philippe Ombredanne, Phillip Alday, +Prayag Verma, Puneeth Chaganti, qerub, Ralf Stephan, +Raniere Silva, Recai Oktaş, +robabla, rodja.trappe, +rski, RyanGlScott, Scott Morrison, Sergei Trofimovich, Sergey Astanin, Shahbaz Youssefi, Shaun Attfield, +Sidarth Kapur, shreevatsa.public, Simon Hengel, Sumit Sahrawat, @@ -3950,6 +3984,8 @@ takahashim, thsutton, Tim Lin, Timothy Humphries, +Tiziano Müller, +Thomas Hodgson, Todd Sifleet, Tom Leese, Uli Köhler, @@ -1,3 +1,227 @@ +pandoc (1.17.1) + + * New output format: `docbook5` (Ivo Clarysse). + + * `Text.Pandoc.Options`: Add `writerDocBook5` to `WriterOptions` + (API change). + + * Org writer: + + + Add :PROPERTIES: drawer support (Albert Krewinkel, #1962). + This allows header attributes to be added to org documents in the form + of `:PROPERTIES:` drawers. All available attributes are stored as + key/value pairs. This reflects the way the org reader handles + `:PROPERTIES:` blocks. + + Add drawer capability (Carlos Sosa). For the implementation of the + Drawer element in the Org Writer, we make use of a generic Block + container with attributes. The presence of a `drawer` class defines + that the `Div` constructor is a drawer. The first class defines the + drawer name to use. The key-value list in the attributes defines + the keys to add inside the Drawer. Lastly, the list of Block elements + contains miscellaneous blocks elements to add inside of the Drawer. + + Use `CUSTOM_ID` in properties (Albert Krewinkel). The `ID` property is + reserved for internal use by Org-mode and should not be used. + The `CUSTOM_ID` property is to be used instead, it is converted to the + `ID` property for certain export format. + + * LaTeX writer: + + + Ignore `--incremental` unless output format is beamer (#2843). + + Fix polyglossia to babel env mapping (Mauro Bieg, #2728). + Allow for optional argument in square brackets. + + Recognize `la-x-classic` as Classical Latin (Andrew Dunning). + This allows one to access the hyphenation patterns in CTAN's + hyph-utf8. + + Add missing languages from hyph-utf8 (Andrew Dunning). + + Improve use of `\strut` with `\minipage` inside tables + (Jose Luis Duran). This improves spacing in multiline + tables. + + Use `{}` around options containing special chars (#2892). + + Avoid lazy `foldl`. + + Don't escape underscore in labels (#2921). Previously they were + escaped as `ux5f`. + + brazilian -> brazil for polyglossia (#2953). + + * HTML writer: Ensure mathjax link is added when math appears in footnote + (#2881). Previously if a document only had math in a footnote, the + MathJax link would not be added. + + * EPUB writer: set `navpage` variable on nav page. + This allows templates to treat it differently. + + * DocBook writer: + + + Use docbook5 if `writerDocbook5` is set (Ivo Clarysse). + + Properly handle `ulink`/`link` (Ivo Clarysse). + + * EPUB reader: + + + Unescape URIs in spine (#2924). + + EPUB reader: normalise link id (Mauro Bieg). + + * Docx Reader: + + + Parse `moveTo` and `moveFrom` (Jesse Rosenthal). + `moveTo` and `moveFrom` are track-changes tags that are used when a + block of text is moved in the document. We now recognize these tags and + treat them the same as `insert` and `delete`, respectively. So, + `--track-changes=accept` will show the moved version, while + `--track-changes=reject` will show the original version. + + Tests for track-changes moving (Jesse Rosenthal). + + * ODT, EPUB, Docx readers: throw `PandocError` on unzip failure + (Jesse Rosenthal) Previously, `readDocx`, `readEPUB`, and `readOdt` + would error out if zip-archive failed. We change the archive extraction + step from `toArchive` to `toArchiveOrFail`, which returns an Either value. + + * Markdown, HTML readers: be more forgiving about unescaped `&` in + HTML (#2410). We are now more forgiving about parsing invalid HTML with + unescaped `&` as raw HTML. (Previously any unescaped `&` + would cause pandoc not to recognize the string as raw HTML.) + + * Markdown reader: + + + Fix pandoc title blocks with lines ending in 2 spaces (#2799). + + Added `-s` to markdown-reader-more test. + + * HTML reader: fixed bug in `pClose`. This caused exponential parsing + behavior in documnets with unclosed tags in `dl`, `dd`, `dt`. + + * MediaWiki reader: Allow spaces before `!` in MediaWiki table header + (roblabla). + + * RST reader: Support `:class:` option for code block in RST reader + (Sidharth Kapur). + + * Org reader (all Albert Krewinkel, except where noted otherwise): + + + Stop padding short table rows. + Emacs Org-mode doesn't add any padding to table rows. The first + row (header or first body row) is used to determine the column count, + no other magic is performed. + + Refactor rows-to-table conversion. This refactors + the codes conversing a list table lines to an org table ADT. + The old code was simplified and is now slightly less ugly. + + Fix handling of empty table cells, rows (Albert Krewinkel, #2616). + This fixes Org mode parsing of some corner cases regarding empty cells + and rows. Empty cells weren't parsed correctly, e.g. `|||` should be + two empty cells, but would be parsed as a single cell containing a pipe + character. Empty rows where parsed as alignment rows and dropped from + the output. + + Fix spacing after LaTeX-style symbols. + The org-reader was droping space after unescaped LaTeX-style symbol + commands: `\ForAll \Auml` resulted in `∀Ä` but should give `∀ Ä` + instead. This seems to be because the LaTeX-reader treats the + command-terminating space as part of the command. Dropping the trailing + space from the symbol-command fixes this issue. + + Print empty table rows. Empty table rows should not + be dropped from the output, so row-height is always set to be at least 1. + + Move parser state into separate module. + The org reader code has become large and confusing. Extracting smaller + parts into submodules should help to clean things up. + + Add support for sub/superscript export options. + Org-mode allows to specify export settings via `#+OPTIONS` lines. + Disabling simple sub- and superscripts is one of these export options, + this options is now supported. + + Support special strings export option Parsing of special strings + (like `...` as ellipsis or `--` as en dash) can be toggled using the `-` + option. + + Support emphasized text export option. Parsing of emphasized text can + be toggled using the `*` option. This influences parsing of text marked + as emphasized, strong, strikeout, and underline. Parsing of inline math, + code, and verbatim text is not affected by this option. + + Support smart quotes export option. Reading of smart quotes can be + toggled using the `'` option. + + Parse but ignore export options. All known export options are parsed + but ignored. + + Refactor block attribute handling. A parser state attribute was used + to keep track of block attributes defined in meta-lines. Global state + is undesirable, so block attributes are no longer saved as part of the + parser state. Old functions and the respective part of the parser state + are removed. + + Use custom `anyLine`. Additional state changes need to be made after + a newline is parsed, otherwise markup may not be recognized correctly. + This fixes a bug where markup after certain block-types would not be + recognized. + + Add support for `ATTR_HTML` attributes (#1906). + Arbitrary key-value pairs can be added to some block types using a + `#+ATTR_HTML` line before the block. Emacs Org-mode only includes these + when exporting to HTML, but since we cannot make this distinction here, + the attributes are always added. The functionality is now supported + for figures. + + Add `:PROPERTIES:` drawer support (#1877). + Headers can have optional `:PROPERTIES:` drawers associated with them. + These drawers contain key/value pairs like the header's `id`. The + reader adds all listed pairs to the header's attributes; `id` and + `class` attributes are handled specially to match the way `Attr` are + defined. This also changes behavior of how drawers of unknown type + are handled. Instead of including all unknown drawers, those are not + read/exported, thereby matching current Emacs behavior. + + Use `CUSTOM_ID` in properties. See above on Org writer changes. + + Respect drawer export setting. The `d` export option can be used + to control which drawers are exported and which are discarded. + Basic support for this option is added here. + + Ignore leading space in org code blocks (Emanuel Evans, #2862). + Also fix up tab handling for leading whitespace in code blocks. + + Support new syntax for export blocks. Org-mode version 9 + uses a new syntax for export blocks. Instead of `#+BEGIN_<FORMAT>`, + where `<FORMAT>` is the format of the block's content, the new + format uses `#+BEGIN_export <FORMAT>` instead. Both types are + supported. + + Refactor `BEGIN...END` block parsing. + + Fix handling of whitespace in blocks, allowing content to be indented + less then the block header. + + Support org-ref style citations. The *org-ref* package is an + org-mode extension commonly used to manage citations in org + documents. Basic support for the `cite:citeKey` and + `[[cite:citeKey][prefix text::suffix text]]` syntax is added. + + Split code into separate modules, making for cleaner code and + better decoupling. + + * Added `docbook5` template. + + * `--mathjax` improvements: + + + Use new CommonHTML output for MathJax (updated default MathJax URL, + #2858). + + Change default mathjax setup to use `TeX-AMS_CHTML` configuration. + This is designed for cases where the input is always TeX and maximal + conformity with TeX is desired. It seems to be smaller and load faster + than what we used before. See #2858. + + Load the full MathJax config to maximize loading speed (KolenCheung). + + * Bumped upper version bounds to allow use of latest packages + and compilation with ghc 8. + + * Require texmath 0.8.6.2. Closes several texmath-related bugs (#2775, + #2310, #2310, #2824). This fixes behavior of roots, e.g. + `\sqrt[3]{x}`, and issues with sub/superscript positioning + and matrix column alignment in docx. + + * README: + + + Clarified documentation of `implicit_header_references` (#2904). + + Improved documentation of `--columns` option. + + * Added appveyor setup, with artefacts (Jan Schulz). + + * stack.yaml versions: Use proper flags used for texmath, pandoc-citeproc. + + * LaTeX template: support for custom font families (vladipus). + Needed for correct polyglossia operation with Cyrillic fonts and perhaps + can find some other usages. Example usage in YAML metadata: + + fontfamilies: + - name: \cyrillicfont + font: Liberation Serif + - name: \cyrillicfonttt + options: Scale=MatchLowercase + font: Liberation + + * Create unsigned msi as build artifact in appveyor build. + + * On travis, test with ghc 8.0.1; drop testing for ghc 7.4.1. + pandoc (1.17.0.3) * LaTeX writer: Fixed position of label in figures (#2813). diff --git a/data/templates/default.docbook5 b/data/templates/default.docbook5 new file mode 100644 index 000000000..b3a0b6def --- /dev/null +++ b/data/templates/default.docbook5 @@ -0,0 +1,30 @@ +<?xml version="1.0" encoding="utf-8" ?> +<!DOCTYPE article> +$if(mathml)$ +<article xmlns="http://docbook.org/ns/docbook" version="5.0"> +$else$ +<article xmlns="http://docbook.org/ns/docbook" version="5.0"> +$endif$ + <info> + <title>$title$</title> +$if(author)$ + <authorgroup> +$for(author)$ + <author> + $author$ + </author> +$endfor$ + </authorgroup> +$endif$ +$if(date)$ + <date>$date$</date> +$endif$ + </info> +$for(include-before)$ +$include-before$ +$endfor$ +$body$ +$for(include-after)$ +$include-after$ +$endfor$ +</article> diff --git a/data/templates/default.latex b/data/templates/default.latex index 0a1c47391..bc84520a3 100644 --- a/data/templates/default.latex +++ b/data/templates/default.latex @@ -24,6 +24,9 @@ $endif$ \usepackage{fontspec} \fi \defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase} +$for(fontfamilies)$ + \newfontfamily{$fontfamilies.name$}[$fontfamilies.options$]{$fontfamilies.font$} +$endfor$ $if(euro)$ \newcommand{\euro}{€} $endif$ diff --git a/man/pandoc.1 b/man/pandoc.1 index 372673d84..db4a7793c 100644 --- a/man/pandoc.1 +++ b/man/pandoc.1 @@ -1,5 +1,5 @@ .\"t -.TH PANDOC 1 "January 12, 2016" "pandoc 1.17.0.3" +.TH PANDOC 1 "June 4, 2016" "pandoc 1.17.1" .SH NAME pandoc - general markup converter .SH SYNOPSIS @@ -248,17 +248,17 @@ Specify output format. (MediaWiki markup), \f[C]dokuwiki\f[] (DokuWiki markup), \f[C]textile\f[] (Textile), \f[C]org\f[] (Emacs Org mode), \f[C]texinfo\f[] (GNU Texinfo), \f[C]opml\f[] (OPML), \f[C]docbook\f[] -(DocBook), \f[C]opendocument\f[] (OpenDocument), \f[C]odt\f[] -(OpenOffice text document), \f[C]docx\f[] (Word docx), \f[C]haddock\f[] -(Haddock markup), \f[C]rtf\f[] (rich text format), \f[C]epub\f[] (EPUB -v2 book), \f[C]epub3\f[] (EPUB v3), \f[C]fb2\f[] (FictionBook2 e\-book), -\f[C]asciidoc\f[] (AsciiDoc), \f[C]icml\f[] (InDesign ICML), -\f[C]tei\f[] (TEI Simple), \f[C]slidy\f[] (Slidy HTML and javascript -slide show), \f[C]slideous\f[] (Slideous HTML and javascript slide -show), \f[C]dzslides\f[] (DZSlides HTML5 + javascript slide show), -\f[C]revealjs\f[] (reveal.js HTML5 + javascript slide show), \f[C]s5\f[] -(S5 HTML and javascript slide show), or the path of a custom lua writer -(see Custom writers, below). +(DocBook 4), \f[C]docbook5\f[] (DocBook 5), \f[C]opendocument\f[] +(OpenDocument), \f[C]odt\f[] (OpenOffice text document), \f[C]docx\f[] +(Word docx), \f[C]haddock\f[] (Haddock markup), \f[C]rtf\f[] (rich text +format), \f[C]epub\f[] (EPUB v2 book), \f[C]epub3\f[] (EPUB v3), +\f[C]fb2\f[] (FictionBook2 e\-book), \f[C]asciidoc\f[] (AsciiDoc), +\f[C]icml\f[] (InDesign ICML), \f[C]tei\f[] (TEI Simple), \f[C]slidy\f[] +(Slidy HTML and javascript slide show), \f[C]slideous\f[] (Slideous HTML +and javascript slide show), \f[C]dzslides\f[] (DZSlides HTML5 + +javascript slide show), \f[C]revealjs\f[] (reveal.js HTML5 + javascript +slide show), \f[C]s5\f[] (S5 HTML and javascript slide show), or the +path of a custom lua writer (see Custom writers, below). Note that \f[C]odt\f[], \f[C]epub\f[], and \f[C]epub3\f[] output will not be directed to \f[I]stdout\f[]; an output filename must be specified using the \f[C]\-o/\-\-output\f[] option. @@ -584,9 +584,11 @@ Deprecated synonym for \f[C]\-\-wrap=none\f[]. .RE .TP .B \f[C]\-\-columns=\f[]\f[I]NUMBER\f[] -Specify length of lines in characters (for text wrapping). -This affects only the generated source code, not the layout on the -rendered page. +Specify length of lines in characters. +This affects text wrapping in the generated source code (see +\f[C]\-\-wrap\f[]). +It also affects calculation of column widths for plain text tables (see +Tables below). .RS .RE .TP @@ -595,7 +597,8 @@ Include an automatically generated table of contents (or, in the case of \f[C]latex\f[], \f[C]context\f[], \f[C]docx\f[], and \f[C]rst\f[], an instruction to create one) in the output document. This option has no effect on \f[C]man\f[], \f[C]docbook\f[], -\f[C]slidy\f[], \f[C]slideous\f[], \f[C]s5\f[], or \f[C]odt\f[] output. +\f[C]docbook5\f[], \f[C]slidy\f[], \f[C]slideous\f[], \f[C]s5\f[], or +\f[C]odt\f[] output. .RS .RE .TP @@ -1038,8 +1041,8 @@ copy of the script, so it can be cached. .RE .TP .B \f[C]\-\-mathml\f[][\f[C]=\f[]\f[I]URL\f[]] -Convert TeX math to MathML (in \f[C]docbook\f[] as well as \f[C]html\f[] -and \f[C]html5\f[]). +Convert TeX math to MathML (in \f[C]docbook\f[], \f[C]docbook5\f[], +\f[C]html\f[] and \f[C]html5\f[]). In standalone \f[C]html\f[] output, a small javascript (or a link to such a script if a \f[I]URL\f[] is supplied) will be inserted that allows the MathML to be viewed on some browsers. @@ -1950,11 +1953,11 @@ treated differently in CSS. .SS Extension: \f[C]implicit_header_references\f[] .PP Pandoc behaves as if reference links have been defined for each header. -So, instead of +So, to link to a header .IP .nf \f[C] -[header\ identifiers](#header\-identifiers\-in\-html) +#\ Header\ identifiers\ in\ HTML \f[] .fi .PP @@ -1962,7 +1965,7 @@ you can simply write .IP .nf \f[C] -[header\ identifiers] +[Header\ identifiers\ in\ HTML] \f[] .fi .PP @@ -1970,7 +1973,7 @@ or .IP .nf \f[C] -[header\ identifiers][] +[Header\ identifiers\ in\ HTML][] \f[] .fi .PP @@ -1978,7 +1981,16 @@ or .IP .nf \f[C] -[the\ section\ on\ header\ identifiers][header\ identifiers] +[the\ section\ on\ header\ identifiers][header\ identifiers\ in +HTML] +\f[] +.fi +.PP +instead of giving the identifier explicitly: +.IP +.nf +\f[C] +[Header\ identifiers\ in\ HTML](#header\-identifiers\-in\-html) \f[] .fi .PP @@ -4780,40 +4792,47 @@ Released under the GPL, version 2 or greater. This software carries no warranty of any kind. (See COPYRIGHT for full copyright and warranty notices.) .PP -Contributors include Aaron Wolen, Albert Krewinkel, Alexander +Contributors include Aaron Wolen, Albert Krewinkel, Alex Vong, Alexander Kondratskiy, Alexander Sulfrian, Alexander V Vershilov, Alfred Wechselberger, Andreas Lööw, Andrew Dunning, Antoine Latter, Arata Mizuki, Arlo O\[aq]Keeffe, Artyom Kazak, Ben Gamari, Beni -Cherniavsky\-Paskin, Bjorn Buckwalter, Bradley Kuhn, Brent Yorgey, Bryan -O\[aq]Sullivan, B. -Scott Michel, Caleb McDaniel, Calvin Beck, Christoffer Ackelman, -Christoffer Sawicki, Clare Macrae, Clint Adams, Conal Elliott, Craig S. -Bosma, Daniel Bergey, Daniel T. +Cherniavsky\-Paskin, Benoit Schweblin, Bjorn Buckwalter, Bradley Kuhn, +Brent Yorgey, Bryan O\[aq]Sullivan, B. +Scott Michel, Caleb McDaniel, Calvin Beck, Carlos Sosa, Chris Black, +Christian Conkle, Christoffer Ackelman, Christoffer Sawicki, Clare +Macrae, Clint Adams, Conal Elliott, Craig S. +Bosma, csforste, Daniel Bergey, Daniel T. Staal, David Lazar, David Röthlisberger, Denis Laxalde, Douglas Calvert, Douglas F. -Calvert, Eric Kow, Eric Seidel, Florian Eitel, François Gannaz, Freiric -Barral, Fyodor Sheremetyev, Gabor Pali, Gavin Beatty, Greg Maslov, +Calvert, Emanuel Evans, Emily Eisenberg, Eric Kow, Eric Seidel, Florian +Eitel, François Gannaz, Freiric Barral, Freirich Raabe, Fyodor +Sheremetyev, Gabor Pali, Gavin Beatty, Gottfried Haider, Greg Maslov, Grégory Bataille, Greg Rundlett, gwern, Gwern Branwen, Hans\-Peter -Deifel, Henry de Valence, Ilya V. -Portnov, infinity0x, Jaime Marquínez Ferrándiz, James Aspnes, Jamie F. -Olson, Jan Larres, Jason Ronallo, Jeff Arnold, Jeff Runningen, Jens -Petersen, Jérémy Bobbio, Jesse Rosenthal, J. +Deifel, Henrik Tramberend, Henry de Valence, ickc, Ilya V. +Portnov, infinity0x, Ivo Clarysse, Jaime Marquínez Ferrándiz, James +Aspnes, Jamie F. +Olson, Jan Larres, Jan Schulz, Jason Ronallo, Jeff Arnold, Jeff +Runningen, Jens Petersen, Jérémy Bobbio, Jesse Rosenthal, J. Lewis Muir, Joe Hillenbrand, John MacFarlane, Jonas Smedegaard, Jonathan -Daugherty, Josef Svenningsson, Jose Luis Duran, Julien Cretel, Justin -Bogner, Kelsey Hightower, Konstantin Zudov, Lars\-Dominik Braun, Luke -Plant, Mark Szepieniec, Mark Wright, Masayoshi Takahashi, Matej Kollar, -Mathias Schenner, Matthew Pickering, Matthias C. +Daugherty, Josef Svenningsson, Jose Luis Duran, Julien Cretel, Juliusz +Gonera, Justin Bogner, Kelsey Hightower, Kolen Cheung, Konstantin Zudov, +Kristof Bastiaensen, Lars\-Dominik Braun, Luke Plant, Mark Szepieniec, +Mark Wright, Martin Linn, Masayoshi Takahashi, Matej Kollar, Mathias +Schenner, Mathieu Duponchelle, Matthew Eddey, Matthew Pickering, +Matthias C. M. Troffaes, Mauro Bieg, Max Bolingbroke, Max Rydahl Andersen, Merijn -Verstraaten, Michael Snoyman, Michael Thompson, MinRK, Nathan Gass, Neil -Mayhew, Nick Bart, Nicolas Kaiser, Nikolay Yakimov, nkalvi, Paulo +Verstraaten, Michael Beaumont, Michael Chladek, Michael Snoyman, Michael +Thompson, MinRK, Nathan Gass, Neil Mayhew, Nick Bart, Nicolas Kaiser, +Nikolay Yakimov, nkalvi, Ophir Lifshitz, Pablo Rodríguez, Paulo Tanimoto, Paul Rivier, Peter Wang, Philippe Ombredanne, Phillip Alday, -Puneeth Chaganti, qerub, Ralf Stephan, Recai Oktaş, rodja.trappe, -RyanGlScott, Scott Morrison, Sergei Trofimovich, Sergey Astanin, Shahbaz -Youssefi, Shaun Attfield, shreevatsa.public, Simon Hengel, Sumit -Sahrawat, takahashim, thsutton, Tim Lin, Timothy Humphries, Todd -Sifleet, Tom Leese, Uli Köhler, Václav Zeman, Viktor Kronvall, Vincent, -Wikiwide, and Xavier Olive. +Prayag Verma, Puneeth Chaganti, qerub, Ralf Stephan, Raniere Silva, +Recai Oktaş, robabla, rodja.trappe, rski, RyanGlScott, Scott Morrison, +Sergei Trofimovich, Sergey Astanin, Shahbaz Youssefi, Shaun Attfield, +Sidarth Kapur, shreevatsa.public, Simon Hengel, Sumit Sahrawat, +takahashim, thsutton, Tim Lin, Timothy Humphries, Tiziano Müller, Thomas +Hodgson, Todd Sifleet, Tom Leese, Uli Köhler, Václav Zeman, Viktor +Kronvall, Vincent, Wikiwide, and Xavier Olive. .PP The Pandoc source code and all documentation may be downloaded from <http://pandoc.org>. diff --git a/pandoc.cabal b/pandoc.cabal index 578cdf2ee..820e417a5 100644 --- a/pandoc.cabal +++ b/pandoc.cabal @@ -1,5 +1,5 @@ Name: pandoc -Version: 1.17.0.3 +Version: 1.17.1 Cabal-Version: >= 1.10 Build-Type: Custom License: GPL @@ -11,7 +11,7 @@ Bug-Reports: https://github.com/jgm/pandoc/issues Stability: alpha Homepage: http://pandoc.org Category: Text -Tested-With: GHC == 7.4.2, GHC == 7.6.3, GHC == 7.8.4, GHC == 7.10.2 +Tested-With: GHC == 7.6.3, GHC == 7.8.4, GHC == 7.10.2, GHC == 8.0.1 Synopsis: Conversion between markup formats Description: Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses @@ -39,6 +39,7 @@ Data-Files: data/templates/default.html data/templates/default.html5 data/templates/default.docbook + data/templates/default.docbook5 data/templates/default.tei data/templates/default.beamer data/templates/default.opendocument @@ -145,6 +146,7 @@ Extra-Source-Files: tests/s5-inserts.html tests/tables.context tests/tables.docbook + tests/tables.docbook5 tests/tables.dokuwiki tests/tables.icml tests/tables.html @@ -168,6 +170,7 @@ Extra-Source-Files: tests/writer.latex tests/writer.context tests/writer.docbook + tests/writer.docbook5 tests/writer.html tests/writer.man tests/writer.markdown @@ -257,7 +260,7 @@ Library text >= 0.11 && < 1.3, zip-archive >= 0.2.3.4 && < 0.4, HTTP >= 4000.0.5 && < 4000.4, - texmath >= 0.8.4.1 && < 0.9, + texmath >= 0.8.6.2 && < 0.9, xml >= 1.3.12 && < 1.4, random >= 1 && < 1.2, extensible-exceptions >= 0.1 && < 0.2, @@ -266,8 +269,8 @@ Library tagsoup >= 0.13.7 && < 0.14, base64-bytestring >= 0.1 && < 1.1, zlib >= 0.5 && < 0.7, - highlighting-kate >= 0.6.1 && < 0.7, - data-default >= 0.4 && < 0.6, + highlighting-kate >= 0.6.2 && < 0.7, + data-default >= 0.4 && < 0.8, temporary >= 1.1 && < 1.3, blaze-html >= 0.5 && < 0.9, blaze-markup >= 0.5.1 && < 0.8, @@ -277,7 +280,7 @@ Library hslua >= 0.3 && < 0.5, binary >= 0.5 && < 0.9, SHA >= 1.6 && < 1.7, - haddock-library >= 1.1 && < 1.3, + haddock-library >= 1.1 && < 1.5, old-time, deepseq >= 1.3 && < 1.5, JuicyPixels >= 3.1.6.1 && < 3.3, @@ -288,7 +291,7 @@ Library Build-Depends: old-locale >= 1 && < 1.1, time >= 1.2 && < 1.5 else - Build-Depends: time >= 1.5 && < 1.6 + Build-Depends: time >= 1.5 && < 1.7 if flag(network-uri) Build-Depends: network-uri >= 2.6 && < 2.7, network >= 2.6 else @@ -304,8 +307,8 @@ Library other-modules: Text.Pandoc.Data if os(windows) Cpp-options: -D_WINDOWS - Ghc-Options: -rtsopts -Wall -fno-warn-unused-do-bind - Ghc-Prof-Options: -fprof-auto-exported -rtsopts + Ghc-Options: -Wall -fno-warn-unused-do-bind + Ghc-Prof-Options: -fprof-auto-exported Default-Language: Haskell98 Other-Extensions: PatternGuards, OverloadedStrings, ScopedTypeVariables, GeneralizedNewtypeDeriving, @@ -390,6 +393,12 @@ Library Text.Pandoc.Readers.Odt.Generic.XMLConverter, Text.Pandoc.Readers.Odt.Arrows.State, Text.Pandoc.Readers.Odt.Arrows.Utils, + Text.Pandoc.Readers.Org.BlockStarts, + Text.Pandoc.Readers.Org.Blocks, + Text.Pandoc.Readers.Org.Inlines, + Text.Pandoc.Readers.Org.ParserState, + Text.Pandoc.Readers.Org.Parsing, + Text.Pandoc.Readers.Org.Shared, Text.Pandoc.Writers.Shared, Text.Pandoc.Asciify, Text.Pandoc.MIME, @@ -417,7 +426,7 @@ Executable pandoc text >= 0.11 && < 1.3, bytestring >= 0.9 && < 0.11, extensible-exceptions >= 0.1 && < 0.2, - highlighting-kate >= 0.6.1 && < 0.7, + highlighting-kate >= 0.6.2 && < 0.7, aeson >= 0.7.0.5 && < 0.12, yaml >= 0.8.8.2 && < 0.9, containers >= 0.1 && < 0.6, @@ -473,7 +482,7 @@ Test-Suite test-pandoc directory >= 1 && < 1.3, filepath >= 1.1 && < 1.5, process >= 1 && < 1.5, - highlighting-kate >= 0.6.1 && < 0.7, + highlighting-kate >= 0.6.2 && < 0.7, Diff >= 0.2 && < 0.4, test-framework >= 0.3 && < 0.9, test-framework-hunit >= 0.2 && < 0.4, @@ -1,4 +1,4 @@ -{-# LANGUAGE CPP, TupleSections #-} +{-# LANGUAGE CPP, TupleSections, ScopedTypeVariables #-} {- Copyright (C) 2006-2016 John MacFarlane <jgm@berkeley.edu> @@ -836,7 +836,7 @@ options = , Option "" ["mathjax"] (OptArg (\arg opt -> do - let url' = fromMaybe "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" arg + let url' = fromMaybe "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" arg return opt { optHTMLMathMethod = MathJax url'}) "URL") "" -- "Use MathJax for HTML math" diff --git a/src/Text/Pandoc.hs b/src/Text/Pandoc.hs index b67a53f5b..0330c46e2 100644 --- a/src/Text/Pandoc.hs +++ b/src/Text/Pandoc.hs @@ -291,6 +291,8 @@ writers = [ writeHtmlString o{ writerSlideVariant = RevealJsSlides , writerHtml5 = True }) ,("docbook" , PureStringWriter writeDocbook) + ,("docbook5" , PureStringWriter $ \o -> + writeDocbook o{ writerDocbook5 = True }) ,("opml" , PureStringWriter writeOPML) ,("opendocument" , PureStringWriter writeOpenDocument) ,("latex" , PureStringWriter writeLaTeX) diff --git a/src/Text/Pandoc/Options.hs b/src/Text/Pandoc/Options.hs index 171210962..701cd8bd1 100644 --- a/src/Text/Pandoc/Options.hs +++ b/src/Text/Pandoc/Options.hs @@ -357,6 +357,7 @@ data WriterOptions = WriterOptions , writerSourceURL :: Maybe String -- ^ Absolute URL + directory of 1st source file , writerUserDataDir :: Maybe FilePath -- ^ Path of user data directory , writerCiteMethod :: CiteMethod -- ^ How to print cites + , writerDocbook5 :: Bool -- ^ Produce DocBook5 , writerHtml5 :: Bool -- ^ Produce HTML5 , writerHtmlQTags :: Bool -- ^ Use @<q>@ tags for quotes in HTML , writerBeamer :: Bool -- ^ Produce beamer LaTeX slide show @@ -403,6 +404,7 @@ instance Default WriterOptions where , writerSourceURL = Nothing , writerUserDataDir = Nothing , writerCiteMethod = Citeproc + , writerDocbook5 = False , writerHtml5 = False , writerHtmlQTags = False , writerBeamer = False diff --git a/src/Text/Pandoc/Readers/Docx.hs b/src/Text/Pandoc/Readers/Docx.hs index 604bc20de..9c7c3b264 100644 --- a/src/Text/Pandoc/Readers/Docx.hs +++ b/src/Text/Pandoc/Readers/Docx.hs @@ -100,12 +100,13 @@ import Text.Pandoc.Compat.Except readDocxWithWarnings :: ReaderOptions -> B.ByteString -> Either PandocError (Pandoc, MediaBag, [String]) -readDocxWithWarnings opts bytes = - case archiveToDocxWithWarnings (toArchive bytes) of - Right (docx, warnings) -> do +readDocxWithWarnings opts bytes + | Right archive <- toArchiveOrFail bytes + , Right (docx, warnings) <- archiveToDocxWithWarnings archive = do (meta, blks, mediaBag) <- docxToOutput opts docx return (Pandoc meta blks, mediaBag, warnings) - Left _ -> Left (ParseFailure "couldn't parse docx file") +readDocxWithWarnings _ _ = + Left (ParseFailure "couldn't parse docx file") readDocx :: ReaderOptions -> B.ByteString diff --git a/src/Text/Pandoc/Readers/Docx/Parse.hs b/src/Text/Pandoc/Readers/Docx/Parse.hs index 364483929..7265ef8dd 100644 --- a/src/Text/Pandoc/Readers/Docx/Parse.hs +++ b/src/Text/Pandoc/Readers/Docx/Parse.hs @@ -661,14 +661,14 @@ elemToParPart ns element | isElem ns "w" "r" element = elemToRun ns element >>= (\r -> return $ PlainRun r) elemToParPart ns element - | isElem ns "w" "ins" element + | isElem ns "w" "ins" element || isElem ns "w" "moveTo" element , Just cId <- findAttr (elemName ns "w" "id") element , Just cAuthor <- findAttr (elemName ns "w" "author") element , Just cDate <- findAttr (elemName ns "w" "date") element = do runs <- mapD (elemToRun ns) (elChildren element) return $ Insertion cId cAuthor cDate runs elemToParPart ns element - | isElem ns "w" "del" element + | isElem ns "w" "del" element || isElem ns "w" "moveFrom" element , Just cId <- findAttr (elemName ns "w" "id") element , Just cAuthor <- findAttr (elemName ns "w" "author") element , Just cDate <- findAttr (elemName ns "w" "date") element = do diff --git a/src/Text/Pandoc/Readers/EPUB.hs b/src/Text/Pandoc/Readers/EPUB.hs index 07d282708..b8a0b47e7 100644 --- a/src/Text/Pandoc/Readers/EPUB.hs +++ b/src/Text/Pandoc/Readers/EPUB.hs @@ -14,12 +14,13 @@ import Text.Pandoc.Walk (walk, query) import Text.Pandoc.Readers.HTML (readHtml) import Text.Pandoc.Options ( ReaderOptions(..), readerTrace) import Text.Pandoc.Shared (escapeURI, collapseFilePath, addMetaField) +import Network.URI (unEscapeString) import Text.Pandoc.MediaBag (MediaBag, insertMedia) import Text.Pandoc.Compat.Except (MonadError, throwError, runExcept, Except) import Text.Pandoc.Compat.Monoid ((<>)) import Text.Pandoc.MIME (MimeType) import qualified Text.Pandoc.Builder as B -import Codec.Archive.Zip ( Archive (..), toArchive, fromEntry +import Codec.Archive.Zip ( Archive (..), toArchiveOrFail, fromEntry , findEntryByPath, Entry) import qualified Data.ByteString.Lazy as BL (ByteString) import System.FilePath ( takeFileName, (</>), dropFileName, normalise @@ -39,7 +40,9 @@ import Text.Pandoc.Error type Items = M.Map String (FilePath, MimeType) readEPUB :: ReaderOptions -> BL.ByteString -> Either PandocError (Pandoc, MediaBag) -readEPUB opts bytes = runEPUB (archiveToEPUB opts $ toArchive bytes) +readEPUB opts bytes = case toArchiveOrFail bytes of + Right archive -> runEPUB $ archiveToEPUB opts $ archive + Left _ -> Left $ ParseFailure "Couldn't extract ePub file" runEPUB :: Except PandocError a -> Either PandocError a runEPUB = runExcept @@ -72,14 +75,15 @@ archiveToEPUB os archive = do let docSpan = B.doc $ B.para $ B.spanWith (takeFileName path, [], []) mempty return $ docSpan <> doc mimeToReader :: MonadError PandocError m => MimeType -> FilePath -> FilePath -> m Pandoc - mimeToReader "application/xhtml+xml" (normalise -> root) (normalise -> path) = do + mimeToReader "application/xhtml+xml" (unEscapeString -> root) + (unEscapeString -> path) = do fname <- findEntryByPathE (root </> path) archive html <- either throwError return . readHtml os' . UTF8.toStringLazy $ fromEntry fname return $ fixInternalReferences path html - mimeToReader s _ path + mimeToReader s _ (unEscapeString -> path) | s `elem` imageMimes = return $ imageToPandoc path | otherwise = return $ mempty @@ -190,8 +194,10 @@ fixInlineIRs s (Span as v) = Span (fixAttrs s as) v fixInlineIRs s (Code as code) = Code (fixAttrs s as) code -fixInlineIRs s (Link attr t ('#':url, tit)) = - Link attr t (addHash s url, tit) +fixInlineIRs s (Link as is ('#':url, tit)) = + Link (fixAttrs s as) is (addHash s url, tit) +fixInlineIRs s (Link as is t) = + Link (fixAttrs s as) is t fixInlineIRs _ v = v prependHash :: [String] -> Inline -> Inline diff --git a/src/Text/Pandoc/Readers/HTML.hs b/src/Text/Pandoc/Readers/HTML.hs index fb936cff7..164e3a98f 100644 --- a/src/Text/Pandoc/Readers/HTML.hs +++ b/src/Text/Pandoc/Readers/HTML.hs @@ -707,7 +707,7 @@ pCloses tagtype = try $ do (TagOpen t' _) | t' `closes` tagtype -> return () (TagClose "ul") | tagtype == "li" -> return () (TagClose "ol") | tagtype == "li" -> return () - (TagClose "dl") | tagtype == "li" -> return () + (TagClose "dl") | tagtype == "dd" -> return () (TagClose "table") | tagtype == "td" -> return () (TagClose "table") | tagtype == "tr" -> return () _ -> mzero @@ -971,11 +971,20 @@ htmlTag :: Monad m htmlTag f = try $ do lookAhead (char '<') inp <- getInput - let (next : rest) = canonicalizeTags $ parseTagsOptions - parseOptions{ optTagWarning = True } inp + let (next : _) = canonicalizeTags $ parseTagsOptions + parseOptions{ optTagWarning = False } inp guard $ f next + let handleTag tagname = do + -- <www.boe.es/buscar/act.php?id=BOE-A-1996-8930#a66> + -- should NOT be parsed as an HTML tag, see #2277 + guard $ not ('.' `elem` tagname) + -- <https://example.org> should NOT be a tag either. + -- tagsoup will parse it as TagOpen "https:" [("example.org","")] + guard $ not (null tagname) + guard $ last tagname /= ':' + rendered <- manyTill anyChar (char '>') + return (next, rendered ++ ">") case next of - TagWarning _ -> fail "encountered TagWarning" TagComment s | "<!--" `isPrefixOf` inp -> do count (length s + 4) anyChar @@ -983,13 +992,9 @@ htmlTag f = try $ do char '>' return (next, "<!--" ++ s ++ "-->") | otherwise -> fail "bogus comment mode, HTML5 parse error" - _ -> do - -- we get a TagWarning on things like - -- <www.boe.es/buscar/act.php?id=BOE-A-1996-8930#a66> - -- which should NOT be parsed as an HTML tag, see #2277 - guard $ not $ hasTagWarning rest - rendered <- manyTill anyChar (char '>') - return (next, rendered ++ ">") + TagOpen tagname _attr -> handleTag tagname + TagClose tagname -> handleTag tagname + _ -> mzero mkAttr :: [(String, String)] -> Attr mkAttr attr = (attribsId, attribsClasses, attribsKV) diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs index b5d175453..e43714526 100644 --- a/src/Text/Pandoc/Readers/Markdown.hs +++ b/src/Text/Pandoc/Readers/Markdown.hs @@ -122,9 +122,6 @@ inList = do ctx <- stateParserContext <$> getState guard (ctx == ListItemState) -isNull :: F Inlines -> Bool -isNull ils = B.isNull $ runF ils def - spnl :: Parser [Char] st () spnl = try $ do skipSpaces @@ -188,31 +185,38 @@ charsInBalancedBrackets openBrackets = -- document structure -- -titleLine :: MarkdownParser (F Inlines) -titleLine = try $ do +rawTitleBlockLine :: MarkdownParser String +rawTitleBlockLine = do char '%' skipSpaces - res <- many $ (notFollowedBy newline >> inline) - <|> try (endline >> whitespace) - newline + first <- anyLine + rest <- many $ try $ do spaceChar + notFollowedBy blankline + skipSpaces + anyLine + return $ trim $ unlines (first:rest) + +titleLine :: MarkdownParser (F Inlines) +titleLine = try $ do + raw <- rawTitleBlockLine + res <- parseFromString (many inline) raw return $ trimInlinesF $ mconcat res authorsLine :: MarkdownParser (F [Inlines]) authorsLine = try $ do - char '%' - skipSpaces - authors <- sepEndBy (many (notFollowedBy (satisfy $ \c -> - c == ';' || c == '\n') >> inline)) - (char ';' <|> - try (newline >> notFollowedBy blankline >> spaceChar)) - newline - return $ sequence $ filter (not . isNull) $ map (trimInlinesF . mconcat) authors + raw <- rawTitleBlockLine + let sep = (char ';' <* spaces) <|> newline + let pAuthors = sepEndBy + (trimInlinesF . mconcat <$> many + (try $ notFollowedBy sep >> inline)) + sep + sequence <$> parseFromString pAuthors raw dateLine :: MarkdownParser (F Inlines) dateLine = try $ do - char '%' - skipSpaces - trimInlinesF . mconcat <$> manyTill inline newline + raw <- rawTitleBlockLine + res <- parseFromString (many inline) raw + return $ trimInlinesF $ mconcat res titleBlock :: MarkdownParser () titleBlock = pandocTitleBlock <|> mmdTitleBlock diff --git a/src/Text/Pandoc/Readers/MediaWiki.hs b/src/Text/Pandoc/Readers/MediaWiki.hs index 950497992..d3cee08e2 100644 --- a/src/Text/Pandoc/Readers/MediaWiki.hs +++ b/src/Text/Pandoc/Readers/MediaWiki.hs @@ -225,7 +225,7 @@ table = do Nothing -> 1.0 caption <- option mempty tableCaption optional rowsep - hasheader <- option False $ True <$ (lookAhead (char '!')) + hasheader <- option False $ True <$ (lookAhead (skipSpaces *> char '!')) (cellspecs',hdr) <- unzip <$> tableRow let widths = map ((tableWidth *) . snd) cellspecs' let restwidth = tableWidth - sum widths diff --git a/src/Text/Pandoc/Readers/Odt.hs b/src/Text/Pandoc/Readers/Odt.hs index a925c1d84..68e89263c 100644 --- a/src/Text/Pandoc/Readers/Odt.hs +++ b/src/Text/Pandoc/Readers/Odt.hs @@ -59,7 +59,9 @@ readOdt _ bytes = case bytesToOdt bytes of -- bytesToOdt :: B.ByteString -> Either PandocError Pandoc -bytesToOdt bytes = archiveToOdt $ toArchive bytes +bytesToOdt bytes = case toArchiveOrFail bytes of + Right archive -> archiveToOdt archive + Left _ -> Left $ ParseFailure "Couldn't parse odt file." -- archiveToOdt :: Archive -> Either PandocError Pandoc diff --git a/src/Text/Pandoc/Readers/Org.hs b/src/Text/Pandoc/Readers/Org.hs index 7dd611be3..d593f856d 100644 --- a/src/Text/Pandoc/Readers/Org.hs +++ b/src/Text/Pandoc/Readers/Org.hs @@ -1,6 +1,3 @@ -{-# LANGUAGE OverloadedStrings #-} -{-# LANGUAGE GeneralizedNewtypeDeriving #-} -{-# LANGUAGE MultiParamTypeClasses, FlexibleContexts, FlexibleInstances #-} {- Copyright (C) 2014-2016 Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> @@ -30,60 +27,35 @@ Conversion of org-mode formatted plain text to 'Pandoc' document. -} module Text.Pandoc.Readers.Org ( readOrg ) where -import qualified Text.Pandoc.Builder as B -import Text.Pandoc.Builder ( Inlines, Blocks, HasMeta(..), - trimInlines ) +import Text.Pandoc.Readers.Org.Blocks ( blockList, meta ) +import Text.Pandoc.Readers.Org.Parsing ( OrgParser, readWithM ) +import Text.Pandoc.Readers.Org.ParserState ( optionsToParserState ) + import Text.Pandoc.Definition -import Text.Pandoc.Compat.Monoid ((<>)) +import Text.Pandoc.Error import Text.Pandoc.Options -import qualified Text.Pandoc.Parsing as P -import Text.Pandoc.Parsing hiding ( F, unF, askF, asksF, runF - , newline, orderedListMarker - , parseFromString, blanklines - ) -import Text.Pandoc.Readers.LaTeX (inlineCommand, rawLaTeXInline) -import Text.Pandoc.Shared (compactify', compactify'DL) -import Text.TeXMath (readTeX, writePandoc, DisplayType(..)) -import qualified Text.TeXMath.Readers.MathML.EntityMap as MathMLEntityMap -import Control.Arrow (first) -import Control.Monad (foldM, guard, liftM, liftM2, mplus, mzero, when) -import Control.Monad.Reader (Reader, runReader, ask, asks, local) -import Data.Char (isAlphaNum, toLower) -import Data.Default -import Data.List (intersperse, isPrefixOf, isSuffixOf) -import qualified Data.Map as M -import qualified Data.Set as Set -import Data.Maybe (fromMaybe, isJust) -import Network.HTTP (urlEncode) +import Control.Monad.Reader ( runReader ) -import Text.Pandoc.Error -- | Parse org-mode string and return a Pandoc document. readOrg :: ReaderOptions -- ^ Reader options -> String -- ^ String to parse (assuming @'\n'@ line endings) -> Either PandocError Pandoc -readOrg opts s = flip runReader def $ readWithM parseOrg def{ orgStateOptions = opts } (s ++ "\n\n") - -data OrgParserLocal = OrgParserLocal { orgLocalQuoteContext :: QuoteContext } - -type OrgParser = ParserT [Char] OrgParserState (Reader OrgParserLocal) - -instance HasIdentifierList OrgParserState where - extractIdentifierList = orgStateIdentifiers - updateIdentifierList f s = s{ orgStateIdentifiers = f (orgStateIdentifiers s) } - -instance HasHeaderMap OrgParserState where - extractHeaderMap = orgStateHeaderMap - updateHeaderMap f s = s{ orgStateHeaderMap = f (orgStateHeaderMap s) } +readOrg opts s = flip runReader def $ + readWithM parseOrg (optionsToParserState opts) (s ++ "\n\n") +-- +-- Parser +-- parseOrg :: OrgParser Pandoc parseOrg = do - blocks' <- parseBlocks - st <- getState - let meta = runF (orgStateMeta' st) st - let removeUnwantedBlocks = dropCommentTrees . filter (/= Null) - return $ Pandoc meta $ removeUnwantedBlocks (B.toList $ runF blocks' st) + blocks' <- blockList + meta' <- meta + return . Pandoc meta' $ removeUnwantedBlocks blocks' + where + removeUnwantedBlocks :: [Block] -> [Block] + removeUnwantedBlocks = dropCommentTrees . filter (/= Null) -- | Drop COMMENT headers and the document tree below those headers. dropCommentTrees :: [Block] -> [Block] @@ -118,1504 +90,3 @@ isHeaderLevelLowerEq n blk = case blk of (Header level _ _) -> n >= level _ -> False - --- --- Parser State for Org --- - -type OrgNoteRecord = (String, F Blocks) -type OrgNoteTable = [OrgNoteRecord] - -type OrgBlockAttributes = M.Map String String - -type OrgLinkFormatters = M.Map String (String -> String) - --- | Org-mode parser state -data OrgParserState = OrgParserState - { orgStateOptions :: ReaderOptions - , orgStateAnchorIds :: [String] - , orgStateBlockAttributes :: OrgBlockAttributes - , orgStateEmphasisCharStack :: [Char] - , orgStateEmphasisNewlines :: Maybe Int - , orgStateLastForbiddenCharPos :: Maybe SourcePos - , orgStateLastPreCharPos :: Maybe SourcePos - , orgStateLastStrPos :: Maybe SourcePos - , orgStateLinkFormatters :: OrgLinkFormatters - , orgStateMeta :: Meta - , orgStateMeta' :: F Meta - , orgStateNotes' :: OrgNoteTable - , orgStateParserContext :: ParserContext - , orgStateIdentifiers :: Set.Set String - , orgStateHeaderMap :: M.Map Inlines String - } - -instance Default OrgParserLocal where - def = OrgParserLocal NoQuote - -instance HasReaderOptions OrgParserState where - extractReaderOptions = orgStateOptions - -instance HasMeta OrgParserState where - setMeta field val st = - st{ orgStateMeta = setMeta field val $ orgStateMeta st } - deleteMeta field st = - st{ orgStateMeta = deleteMeta field $ orgStateMeta st } - -instance HasLastStrPosition OrgParserState where - getLastStrPos = orgStateLastStrPos - setLastStrPos pos st = st{ orgStateLastStrPos = Just pos } - -instance HasQuoteContext st (Reader OrgParserLocal) where - getQuoteContext = asks orgLocalQuoteContext - withQuoteContext q = local (\s -> s{orgLocalQuoteContext = q}) - -instance Default OrgParserState where - def = defaultOrgParserState - -defaultOrgParserState :: OrgParserState -defaultOrgParserState = OrgParserState - { orgStateOptions = def - , orgStateAnchorIds = [] - , orgStateBlockAttributes = M.empty - , orgStateEmphasisCharStack = [] - , orgStateEmphasisNewlines = Nothing - , orgStateLastForbiddenCharPos = Nothing - , orgStateLastPreCharPos = Nothing - , orgStateLastStrPos = Nothing - , orgStateLinkFormatters = M.empty - , orgStateMeta = nullMeta - , orgStateMeta' = return nullMeta - , orgStateNotes' = [] - , orgStateParserContext = NullState - , orgStateIdentifiers = Set.empty - , orgStateHeaderMap = M.empty - } - -recordAnchorId :: String -> OrgParser () -recordAnchorId i = updateState $ \s -> - s{ orgStateAnchorIds = i : (orgStateAnchorIds s) } - -updateLastForbiddenCharPos :: OrgParser () -updateLastForbiddenCharPos = getPosition >>= \p -> - updateState $ \s -> s{ orgStateLastForbiddenCharPos = Just p} - -updateLastPreCharPos :: OrgParser () -updateLastPreCharPos = getPosition >>= \p -> - updateState $ \s -> s{ orgStateLastPreCharPos = Just p} - -pushToInlineCharStack :: Char -> OrgParser () -pushToInlineCharStack c = updateState $ \s -> - s{ orgStateEmphasisCharStack = c:orgStateEmphasisCharStack s } - -popInlineCharStack :: OrgParser () -popInlineCharStack = updateState $ \s -> - s{ orgStateEmphasisCharStack = drop 1 . orgStateEmphasisCharStack $ s } - -surroundingEmphasisChar :: OrgParser [Char] -surroundingEmphasisChar = - take 1 . drop 1 . orgStateEmphasisCharStack <$> getState - -startEmphasisNewlinesCounting :: Int -> OrgParser () -startEmphasisNewlinesCounting maxNewlines = updateState $ \s -> - s{ orgStateEmphasisNewlines = Just maxNewlines } - -decEmphasisNewlinesCount :: OrgParser () -decEmphasisNewlinesCount = updateState $ \s -> - s{ orgStateEmphasisNewlines = (\n -> n - 1) <$> orgStateEmphasisNewlines s } - -newlinesCountWithinLimits :: OrgParser Bool -newlinesCountWithinLimits = do - st <- getState - return $ ((< 0) <$> orgStateEmphasisNewlines st) /= Just True - -resetEmphasisNewlines :: OrgParser () -resetEmphasisNewlines = updateState $ \s -> - s{ orgStateEmphasisNewlines = Nothing } - -addLinkFormat :: String - -> (String -> String) - -> OrgParser () -addLinkFormat key formatter = updateState $ \s -> - let fs = orgStateLinkFormatters s - in s{ orgStateLinkFormatters = M.insert key formatter fs } - -addToNotesTable :: OrgNoteRecord -> OrgParser () -addToNotesTable note = do - oldnotes <- orgStateNotes' <$> getState - updateState $ \s -> s{ orgStateNotes' = note:oldnotes } - --- The version Text.Pandoc.Parsing cannot be used, as we need additional parts --- of the state saved and restored. -parseFromString :: OrgParser a -> String -> OrgParser a -parseFromString parser str' = do - oldLastPreCharPos <- orgStateLastPreCharPos <$> getState - updateState $ \s -> s{ orgStateLastPreCharPos = Nothing } - result <- P.parseFromString parser str' - updateState $ \s -> s{ orgStateLastPreCharPos = oldLastPreCharPos } - return result - - --- --- Adaptions and specializations of parsing utilities --- - -newtype F a = F { unF :: Reader OrgParserState a - } deriving (Monad, Applicative, Functor) - -runF :: F a -> OrgParserState -> a -runF = runReader . unF - -askF :: F OrgParserState -askF = F ask - -asksF :: (OrgParserState -> a) -> F a -asksF f = F $ asks f - -instance Monoid a => Monoid (F a) where - mempty = return mempty - mappend = liftM2 mappend - mconcat = fmap mconcat . sequence - -trimInlinesF :: F Inlines -> F Inlines -trimInlinesF = liftM trimInlines - -returnF :: a -> OrgParser (F a) -returnF = return . return - - --- | Like @Text.Parsec.Char.newline@, but causes additional state changes. -newline :: OrgParser Char -newline = - P.newline - <* updateLastPreCharPos - <* updateLastForbiddenCharPos - --- | Like @Text.Parsec.Char.blanklines@, but causes additional state changes. -blanklines :: OrgParser [Char] -blanklines = - P.blanklines - <* updateLastPreCharPos - <* updateLastForbiddenCharPos - --- | Succeeds when we're in list context. -inList :: OrgParser () -inList = do - ctx <- orgStateParserContext <$> getState - guard (ctx == ListItemState) - --- | Parse in different context -withContext :: ParserContext -- ^ New parser context - -> OrgParser a -- ^ Parser to run in that context - -> OrgParser a -withContext context parser = do - oldContext <- orgStateParserContext <$> getState - updateState $ \s -> s{ orgStateParserContext = context } - result <- parser - updateState $ \s -> s{ orgStateParserContext = oldContext } - return result - --- --- parsing blocks --- - -parseBlocks :: OrgParser (F Blocks) -parseBlocks = mconcat <$> manyTill block eof - -block :: OrgParser (F Blocks) -block = choice [ mempty <$ blanklines - , optionalAttributes $ choice - [ orgBlock - , figure - , table - ] - , example - , drawer - , specialLine - , header - , return <$> hline - , list - , latexFragment - , noteBlock - , paraOrPlain - ] <?> "block" - --- --- Block Attributes --- - --- | Parse optional block attributes (like #+TITLE or #+NAME) -optionalAttributes :: OrgParser (F Blocks) -> OrgParser (F Blocks) -optionalAttributes parser = try $ - resetBlockAttributes *> parseBlockAttributes *> parser - where - resetBlockAttributes :: OrgParser () - resetBlockAttributes = updateState $ \s -> - s{ orgStateBlockAttributes = orgStateBlockAttributes def } - -parseBlockAttributes :: OrgParser () -parseBlockAttributes = do - attrs <- many attribute - mapM_ (uncurry parseAndAddAttribute) attrs - where - attribute :: OrgParser (String, String) - attribute = try $ do - key <- metaLineStart *> many1Till nonspaceChar (char ':') - val <- skipSpaces *> anyLine - return (map toLower key, val) - -parseAndAddAttribute :: String -> String -> OrgParser () -parseAndAddAttribute key value = do - let key' = map toLower key - () <$ addBlockAttribute key' value - -lookupInlinesAttr :: String -> OrgParser (Maybe (F Inlines)) -lookupInlinesAttr attr = try $ do - val <- lookupBlockAttribute attr - maybe (return Nothing) - (fmap Just . parseFromString parseInlines) - val - -addBlockAttribute :: String -> String -> OrgParser () -addBlockAttribute key val = updateState $ \s -> - let attrs = orgStateBlockAttributes s - in s{ orgStateBlockAttributes = M.insert key val attrs } - -lookupBlockAttribute :: String -> OrgParser (Maybe String) -lookupBlockAttribute key = - M.lookup key . orgStateBlockAttributes <$> getState - - --- --- Org Blocks (#+BEGIN_... / #+END_...) --- - -type BlockProperties = (Int, String) -- (Indentation, Block-Type) - -orgBlock :: OrgParser (F Blocks) -orgBlock = try $ do - blockProp@(_, blkType) <- blockHeaderStart - ($ blockProp) $ - case blkType of - "comment" -> withRaw' (const mempty) - "html" -> withRaw' (return . (B.rawBlock blkType)) - "latex" -> withRaw' (return . (B.rawBlock blkType)) - "ascii" -> withRaw' (return . (B.rawBlock blkType)) - "example" -> withRaw' (return . exampleCode) - "quote" -> withParsed (fmap B.blockQuote) - "verse" -> verseBlock - "src" -> codeBlock - _ -> withParsed (fmap $ divWithClass blkType) - -blockHeaderStart :: OrgParser (Int, String) -blockHeaderStart = try $ (,) <$> indent <*> blockType - where - indent = length <$> many spaceChar - blockType = map toLower <$> (stringAnyCase "#+begin_" *> orgArgWord) - -withRaw' :: (String -> F Blocks) -> BlockProperties -> OrgParser (F Blocks) -withRaw' f blockProp = (ignHeaders *> (f <$> rawBlockContent blockProp)) - -withParsed :: (F Blocks -> F Blocks) -> BlockProperties -> OrgParser (F Blocks) -withParsed f blockProp = (ignHeaders *> (f <$> parsedBlockContent blockProp)) - -ignHeaders :: OrgParser () -ignHeaders = (() <$ newline) <|> (() <$ anyLine) - -divWithClass :: String -> Blocks -> Blocks -divWithClass cls = B.divWith ("", [cls], []) - -verseBlock :: BlockProperties -> OrgParser (F Blocks) -verseBlock blkProp = try $ do - ignHeaders - content <- rawBlockContent blkProp - fmap B.para . mconcat . intersperse (pure B.linebreak) - <$> mapM (parseFromString parseInlines) (map (++ "\n") . lines $ content) - -exportsCode :: [(String, String)] -> Bool -exportsCode attrs = not (("rundoc-exports", "none") `elem` attrs - || ("rundoc-exports", "results") `elem` attrs) - -exportsResults :: [(String, String)] -> Bool -exportsResults attrs = ("rundoc-exports", "results") `elem` attrs - || ("rundoc-exports", "both") `elem` attrs - -followingResultsBlock :: OrgParser (Maybe (F Blocks)) -followingResultsBlock = - optionMaybe (try $ blanklines *> stringAnyCase "#+RESULTS:" - *> blankline - *> block) - -codeBlock :: BlockProperties -> OrgParser (F Blocks) -codeBlock blkProp = do - skipSpaces - (classes, kv) <- codeHeaderArgs <|> (mempty <$ ignHeaders) - id' <- fromMaybe "" <$> lookupBlockAttribute "name" - content <- rawBlockContent blkProp - resultsContent <- followingResultsBlock - let includeCode = exportsCode kv - let includeResults = exportsResults kv - let codeBlck = B.codeBlockWith ( id', classes, kv ) content - labelledBlck <- maybe (pure codeBlck) - (labelDiv codeBlck) - <$> lookupInlinesAttr "caption" - let resultBlck = fromMaybe mempty resultsContent - return $ (if includeCode then labelledBlck else mempty) - <> (if includeResults then resultBlck else mempty) - where - labelDiv blk value = - B.divWith nullAttr <$> (mappend <$> labelledBlock value - <*> pure blk) - labelledBlock = fmap (B.plain . B.spanWith ("", ["label"], [])) - -rawBlockContent :: BlockProperties -> OrgParser String -rawBlockContent (indent, blockType) = try $ - unlines . map commaEscaped <$> manyTill indentedLine blockEnder - where - indentedLine = try $ ("" <$ blankline) <|> (indentWith indent *> anyLine) - blockEnder = try $ indentWith indent *> stringAnyCase ("#+end_" <> blockType) - -parsedBlockContent :: BlockProperties -> OrgParser (F Blocks) -parsedBlockContent blkProps = try $ do - raw <- rawBlockContent blkProps - parseFromString parseBlocks (raw ++ "\n") - --- indent by specified number of spaces (or equiv. tabs) -indentWith :: Int -> OrgParser String -indentWith num = do - tabStop <- getOption readerTabStop - if num < tabStop - then count num (char ' ') - else choice [ try (count num (char ' ')) - , try (char '\t' >> count (num - tabStop) (char ' ')) ] - -type SwitchOption = (Char, Maybe String) - -orgArgWord :: OrgParser String -orgArgWord = many1 orgArgWordChar - --- | Parse code block arguments --- TODO: We currently don't handle switches. -codeHeaderArgs :: OrgParser ([String], [(String, String)]) -codeHeaderArgs = try $ do - language <- skipSpaces *> orgArgWord - _ <- skipSpaces *> (try $ switch `sepBy` (many1 spaceChar)) - parameters <- manyTill blockOption newline - let pandocLang = translateLang language - return $ - if hasRundocParameters parameters - then ( [ pandocLang, rundocBlockClass ] - , map toRundocAttrib (("language", language) : parameters) - ) - else ([ pandocLang ], parameters) - where hasRundocParameters = not . null - -switch :: OrgParser SwitchOption -switch = try $ simpleSwitch <|> lineNumbersSwitch - where - simpleSwitch = (\c -> (c, Nothing)) <$> (oneOf "-+" *> letter) - lineNumbersSwitch = (\ls -> ('l', Just ls)) <$> - (string "-l \"" *> many1Till nonspaceChar (char '"')) - -translateLang :: String -> String -translateLang "C" = "c" -translateLang "C++" = "cpp" -translateLang "emacs-lisp" = "commonlisp" -- emacs lisp is not supported -translateLang "js" = "javascript" -translateLang "lisp" = "commonlisp" -translateLang "R" = "r" -translateLang "sh" = "bash" -translateLang "sqlite" = "sql" -translateLang cs = cs - --- | Prefix used for Rundoc classes and arguments. -rundocPrefix :: String -rundocPrefix = "rundoc-" - --- | The class-name used to mark rundoc blocks. -rundocBlockClass :: String -rundocBlockClass = rundocPrefix ++ "block" - -blockOption :: OrgParser (String, String) -blockOption = try $ do - argKey <- orgArgKey - paramValue <- option "yes" orgParamValue - return (argKey, paramValue) - -inlineBlockOption :: OrgParser (String, String) -inlineBlockOption = try $ do - argKey <- orgArgKey - paramValue <- option "yes" orgInlineParamValue - return (argKey, paramValue) - -orgArgKey :: OrgParser String -orgArgKey = try $ - skipSpaces *> char ':' - *> many1 orgArgWordChar - -orgParamValue :: OrgParser String -orgParamValue = try $ - skipSpaces - *> notFollowedBy (char ':' ) - *> many1 (noneOf "\t\n\r ") - <* skipSpaces - -orgInlineParamValue :: OrgParser String -orgInlineParamValue = try $ - skipSpaces - *> notFollowedBy (char ':') - *> many1 (noneOf "\t\n\r ]") - <* skipSpaces - -orgArgWordChar :: OrgParser Char -orgArgWordChar = alphaNum <|> oneOf "-_" - -toRundocAttrib :: (String, String) -> (String, String) -toRundocAttrib = first ("rundoc-" ++) - -commaEscaped :: String -> String -commaEscaped (',':cs@('*':_)) = cs -commaEscaped (',':cs@('#':'+':_)) = cs -commaEscaped cs = cs - -example :: OrgParser (F Blocks) -example = try $ do - return . return . exampleCode =<< unlines <$> many1 exampleLine - -exampleCode :: String -> Blocks -exampleCode = B.codeBlockWith ("", ["example"], []) - -exampleLine :: OrgParser String -exampleLine = try $ skipSpaces *> string ": " *> anyLine - --- Drawers for properties or a logbook -drawer :: OrgParser (F Blocks) -drawer = try $ do - drawerStart - manyTill drawerLine (try drawerEnd) - return mempty - -drawerStart :: OrgParser String -drawerStart = try $ - skipSpaces *> drawerName <* skipSpaces <* P.newline - where drawerName = try $ char ':' *> validDrawerName <* char ':' - validDrawerName = stringAnyCase "PROPERTIES" - <|> stringAnyCase "LOGBOOK" - -drawerLine :: OrgParser String -drawerLine = try anyLine - -drawerEnd :: OrgParser String -drawerEnd = try $ - skipSpaces *> stringAnyCase ":END:" <* skipSpaces <* P.newline - - --- --- Figures --- - --- Figures (Image on a line by itself, preceded by name and/or caption) -figure :: OrgParser (F Blocks) -figure = try $ do - (cap, nam) <- nameAndCaption - src <- skipSpaces *> selfTarget <* skipSpaces <* P.newline - guard (isImageFilename src) - return $ do - cap' <- cap - return $ B.para $ B.image src nam cap' - where - nameAndCaption = - do - maybeCap <- lookupInlinesAttr "caption" - maybeNam <- lookupBlockAttribute "name" - guard $ isJust maybeCap || isJust maybeNam - return ( fromMaybe mempty maybeCap - , withFigPrefix $ fromMaybe mempty maybeNam ) - withFigPrefix cs = - if "fig:" `isPrefixOf` cs - then cs - else "fig:" ++ cs - --- --- Comments, Options and Metadata -specialLine :: OrgParser (F Blocks) -specialLine = fmap return . try $ metaLine <|> commentLine - -metaLine :: OrgParser Blocks -metaLine = try $ mempty - <$ (metaLineStart *> (optionLine <|> declarationLine)) - -commentLine :: OrgParser Blocks -commentLine = try $ commentLineStart *> anyLine *> pure mempty - --- The order, in which blocks are tried, makes sure that we're not looking at --- the beginning of a block, so we don't need to check for it -metaLineStart :: OrgParser String -metaLineStart = try $ mappend <$> many spaceChar <*> string "#+" - -commentLineStart :: OrgParser String -commentLineStart = try $ mappend <$> many spaceChar <*> string "# " - -declarationLine :: OrgParser () -declarationLine = try $ do - key <- metaKey - inlinesF <- metaInlines - updateState $ \st -> - let meta' = B.setMeta <$> pure key <*> inlinesF <*> pure nullMeta - in st { orgStateMeta' = orgStateMeta' st <> meta' } - return () - -metaInlines :: OrgParser (F MetaValue) -metaInlines = fmap (MetaInlines . B.toList) <$> inlinesTillNewline - -metaKey :: OrgParser String -metaKey = map toLower <$> many1 (noneOf ": \n\r") - <* char ':' - <* skipSpaces - -optionLine :: OrgParser () -optionLine = try $ do - key <- metaKey - case key of - "link" -> parseLinkFormat >>= uncurry addLinkFormat - _ -> mzero - -parseLinkFormat :: OrgParser ((String, String -> String)) -parseLinkFormat = try $ do - linkType <- (:) <$> letter <*> many (alphaNum <|> oneOf "-_") <* skipSpaces - linkSubst <- parseFormat - return (linkType, linkSubst) - --- | An ad-hoc, single-argument-only implementation of a printf-style format --- parser. -parseFormat :: OrgParser (String -> String) -parseFormat = try $ do - replacePlain <|> replaceUrl <|> justAppend - where - -- inefficient, but who cares - replacePlain = try $ (\x -> concat . flip intersperse x) - <$> sequence [tillSpecifier 's', rest] - replaceUrl = try $ (\x -> concat . flip intersperse x . urlEncode) - <$> sequence [tillSpecifier 'h', rest] - justAppend = try $ (++) <$> rest - - rest = manyTill anyChar (eof <|> () <$ oneOf "\n\r") - tillSpecifier c = manyTill (noneOf "\n\r") (try $ string ('%':c:"")) - --- --- Headers --- - --- | Headers -header :: OrgParser (F Blocks) -header = try $ do - level <- headerStart - title <- manyTill inline (lookAhead headerEnd) - tags <- headerEnd - let inlns = trimInlinesF . mconcat $ title <> map tagToInlineF tags - st <- getState - let inlines = runF inlns st - attr <- registerHeader nullAttr inlines - return $ pure (B.headerWith attr level inlines) - where - tagToInlineF :: String -> F Inlines - tagToInlineF t = return $ B.spanWith ("", ["tag"], [("data-tag-name", t)]) mempty - -headerEnd :: OrgParser [String] -headerEnd = option [] headerTags <* newline - -headerTags :: OrgParser [String] -headerTags = try $ - skipSpaces - *> char ':' - *> many1 tag - <* skipSpaces - where tag = many1 (alphaNum <|> oneOf "@%#_") - <* char ':' - -headerStart :: OrgParser Int -headerStart = try $ - (length <$> many1 (char '*')) <* many1 (char ' ') <* updateLastPreCharPos - - --- Don't use (or need) the reader wrapper here, we want hline to be --- @show@able. Otherwise we can't use it with @notFollowedBy'@. - --- | Horizontal Line (five -- dashes or more) -hline :: OrgParser Blocks -hline = try $ do - skipSpaces - string "-----" - many (char '-') - skipSpaces - newline - return B.horizontalRule - --- --- Tables --- - -data OrgTableRow = OrgContentRow (F [Blocks]) - | OrgAlignRow [Alignment] - | OrgHlineRow - -data OrgTable = OrgTable - { orgTableColumns :: Int - , orgTableAlignments :: [Alignment] - , orgTableHeader :: [Blocks] - , orgTableRows :: [[Blocks]] - } - -table :: OrgParser (F Blocks) -table = try $ do - lookAhead tableStart - do - rows <- tableRows - cptn <- fromMaybe (pure "") <$> lookupInlinesAttr "caption" - return $ (<$> cptn) . orgToPandocTable . normalizeTable =<< rowsToTable rows - -orgToPandocTable :: OrgTable - -> Inlines - -> Blocks -orgToPandocTable (OrgTable _ aligns heads lns) caption = - B.table caption (zip aligns $ repeat 0) heads lns - -tableStart :: OrgParser Char -tableStart = try $ skipSpaces *> char '|' - -tableRows :: OrgParser [OrgTableRow] -tableRows = try $ many (tableAlignRow <|> tableHline <|> tableContentRow) - -tableContentRow :: OrgParser OrgTableRow -tableContentRow = try $ - OrgContentRow . sequence <$> (tableStart *> manyTill tableContentCell newline) - -tableContentCell :: OrgParser (F Blocks) -tableContentCell = try $ - fmap B.plain . trimInlinesF . mconcat <$> many1Till inline endOfCell - -endOfCell :: OrgParser Char -endOfCell = try $ char '|' <|> lookAhead newline - -tableAlignRow :: OrgParser OrgTableRow -tableAlignRow = try $ - OrgAlignRow <$> (tableStart *> manyTill tableAlignCell newline) - -tableAlignCell :: OrgParser Alignment -tableAlignCell = - choice [ try $ emptyCell *> return AlignDefault - , try $ skipSpaces - *> char '<' - *> tableAlignFromChar - <* many digit - <* char '>' - <* emptyCell - ] <?> "alignment info" - where emptyCell = try $ skipSpaces *> endOfCell - -tableAlignFromChar :: OrgParser Alignment -tableAlignFromChar = try $ choice [ char 'l' *> return AlignLeft - , char 'c' *> return AlignCenter - , char 'r' *> return AlignRight - ] - -tableHline :: OrgParser OrgTableRow -tableHline = try $ - OrgHlineRow <$ (tableStart *> char '-' *> anyLine) - -rowsToTable :: [OrgTableRow] - -> F OrgTable -rowsToTable = foldM (flip rowToContent) zeroTable - where zeroTable = OrgTable 0 mempty mempty mempty - -normalizeTable :: OrgTable - -> OrgTable -normalizeTable (OrgTable cols aligns heads lns) = - let aligns' = fillColumns aligns AlignDefault - heads' = if heads == mempty - then mempty - else fillColumns heads (B.plain mempty) - lns' = map (`fillColumns` B.plain mempty) lns - fillColumns base padding = take cols $ base ++ repeat padding - in OrgTable cols aligns' heads' lns' - - --- One or more horizontal rules after the first content line mark the previous --- line as a header. All other horizontal lines are discarded. -rowToContent :: OrgTableRow - -> OrgTable - -> F OrgTable -rowToContent OrgHlineRow t = maybeBodyToHeader t -rowToContent (OrgAlignRow as) t = setLongestRow as =<< setAligns as t -rowToContent (OrgContentRow rf) t = do - rs <- rf - setLongestRow rs =<< appendToBody rs t - -setLongestRow :: [a] - -> OrgTable - -> F OrgTable -setLongestRow rs t = - return t{ orgTableColumns = max (length rs) (orgTableColumns t) } - -maybeBodyToHeader :: OrgTable - -> F OrgTable -maybeBodyToHeader t = case t of - OrgTable{ orgTableHeader = [], orgTableRows = b:[] } -> - return t{ orgTableHeader = b , orgTableRows = [] } - _ -> return t - -appendToBody :: [Blocks] - -> OrgTable - -> F OrgTable -appendToBody r t = return t{ orgTableRows = orgTableRows t ++ [r] } - -setAligns :: [Alignment] - -> OrgTable - -> F OrgTable -setAligns aligns t = return $ t{ orgTableAlignments = aligns } - - --- --- LaTeX fragments --- -latexFragment :: OrgParser (F Blocks) -latexFragment = try $ do - envName <- latexEnvStart - content <- mconcat <$> manyTill anyLineNewline (latexEnd envName) - return . return $ B.rawBlock "latex" (content `inLatexEnv` envName) - where - c `inLatexEnv` e = mconcat [ "\\begin{", e, "}\n" - , c - , "\\end{", e, "}\n" - ] - -latexEnvStart :: OrgParser String -latexEnvStart = try $ do - skipSpaces *> string "\\begin{" - *> latexEnvName - <* string "}" - <* blankline - -latexEnd :: String -> OrgParser () -latexEnd envName = try $ - () <$ skipSpaces - <* string ("\\end{" ++ envName ++ "}") - <* blankline - --- | Parses a LaTeX environment name. -latexEnvName :: OrgParser String -latexEnvName = try $ do - mappend <$> many1 alphaNum - <*> option "" (string "*") - - --- --- Footnote defintions --- -noteBlock :: OrgParser (F Blocks) -noteBlock = try $ do - ref <- noteMarker <* skipSpaces - content <- mconcat <$> blocksTillHeaderOrNote - addToNotesTable (ref, content) - return mempty - where - blocksTillHeaderOrNote = - many1Till block (eof <|> () <$ lookAhead noteMarker - <|> () <$ lookAhead headerStart) - --- Paragraphs or Plain text -paraOrPlain :: OrgParser (F Blocks) -paraOrPlain = try $ do - ils <- parseInlines - nl <- option False (newline *> return True) - -- Read block as paragraph, except if we are in a list context and the block - -- is directly followed by a list item, in which case the block is read as - -- plain text. - try (guard nl - *> notFollowedBy (inList *> (orderedListStart <|> bulletListStart)) - *> return (B.para <$> ils)) - <|> (return (B.plain <$> ils)) - -inlinesTillNewline :: OrgParser (F Inlines) -inlinesTillNewline = trimInlinesF . mconcat <$> manyTill inline newline - - --- --- list blocks --- - -list :: OrgParser (F Blocks) -list = choice [ definitionList, bulletList, orderedList ] <?> "list" - -definitionList :: OrgParser (F Blocks) -definitionList = try $ do n <- lookAhead (bulletListStart' Nothing) - fmap B.definitionList . fmap compactify'DL . sequence - <$> many1 (definitionListItem $ bulletListStart' (Just n)) - -bulletList :: OrgParser (F Blocks) -bulletList = try $ do n <- lookAhead (bulletListStart' Nothing) - fmap B.bulletList . fmap compactify' . sequence - <$> many1 (listItem (bulletListStart' $ Just n)) - -orderedList :: OrgParser (F Blocks) -orderedList = fmap B.orderedList . fmap compactify' . sequence - <$> many1 (listItem orderedListStart) - -genericListStart :: OrgParser String - -> OrgParser Int -genericListStart listMarker = try $ - (+) <$> (length <$> many spaceChar) - <*> (length <$> listMarker <* many1 spaceChar) - --- parses bullet list marker. maybe we know the indent level -bulletListStart :: OrgParser Int -bulletListStart = bulletListStart' Nothing - -bulletListStart' :: Maybe Int -> OrgParser Int --- returns length of bulletList prefix, inclusive of marker -bulletListStart' Nothing = do ind <- length <$> many spaceChar - when (ind == 0) $ notFollowedBy (char '*') - oneOf bullets - many1 spaceChar - return (ind + 1) - -- Unindented lists are legal, but they can't use '*' bullets - -- We return n to maintain compatibility with the generic listItem -bulletListStart' (Just n) = do count (n-1) spaceChar - when (n == 1) $ notFollowedBy (char '*') - oneOf bullets - many1 spaceChar - return n - -bullets :: String -bullets = "*+-" - -orderedListStart :: OrgParser Int -orderedListStart = genericListStart orderedListMarker - -- Ordered list markers allowed in org-mode - where orderedListMarker = mappend <$> many1 digit <*> (pure <$> oneOf ".)") - -definitionListItem :: OrgParser Int - -> OrgParser (F (Inlines, [Blocks])) -definitionListItem parseMarkerGetLength = try $ do - markerLength <- parseMarkerGetLength - term <- manyTill (noneOf "\n\r") (try definitionMarker) - line1 <- anyLineNewline - blank <- option "" ("\n" <$ blankline) - cont <- concat <$> many (listContinuation markerLength) - term' <- parseFromString parseInlines term - contents' <- parseFromString parseBlocks $ line1 ++ blank ++ cont - return $ (,) <$> term' <*> fmap (:[]) contents' - where - definitionMarker = - spaceChar *> string "::" <* (spaceChar <|> lookAhead P.newline) - - --- parse raw text for one list item, excluding start marker and continuations -listItem :: OrgParser Int - -> OrgParser (F Blocks) -listItem start = try . withContext ListItemState $ do - markerLength <- try start - firstLine <- anyLineNewline - blank <- option "" ("\n" <$ blankline) - rest <- concat <$> many (listContinuation markerLength) - parseFromString parseBlocks $ firstLine ++ blank ++ rest - --- continuation of a list item - indented and separated by blankline or endline. --- Note: nested lists are parsed as continuations. -listContinuation :: Int - -> OrgParser String -listContinuation markerLength = try $ - notFollowedBy' blankline - *> (mappend <$> (concat <$> many1 listLine) - <*> many blankline) - where listLine = try $ indentWith markerLength *> anyLineNewline - -anyLineNewline :: OrgParser String -anyLineNewline = (++ "\n") <$> anyLine - - --- --- inline --- - -inline :: OrgParser (F Inlines) -inline = - choice [ whitespace - , linebreak - , cite - , footnote - , linkOrImage - , anchor - , inlineCodeBlock - , str - , endline - , emph - , strong - , strikeout - , underline - , code - , math - , displayMath - , verbatim - , subscript - , superscript - , inlineLaTeX - , smart - , symbol - ] <* (guard =<< newlinesCountWithinLimits) - <?> "inline" - -parseInlines :: OrgParser (F Inlines) -parseInlines = trimInlinesF . mconcat <$> many1 inline - --- treat these as potentially non-text when parsing inline: -specialChars :: [Char] -specialChars = "\"$'()*+-,./:<=>[\\]^_{|}~" - - -whitespace :: OrgParser (F Inlines) -whitespace = pure B.space <$ skipMany1 spaceChar - <* updateLastPreCharPos - <* updateLastForbiddenCharPos - <?> "whitespace" - -linebreak :: OrgParser (F Inlines) -linebreak = try $ pure B.linebreak <$ string "\\\\" <* skipSpaces <* newline - -str :: OrgParser (F Inlines) -str = return . B.str <$> many1 (noneOf $ specialChars ++ "\n\r ") - <* updateLastStrPos - --- | An endline character that can be treated as a space, not a structural --- break. This should reflect the values of the Emacs variable --- @org-element-pagaraph-separate@. -endline :: OrgParser (F Inlines) -endline = try $ do - newline - notFollowedBy blankline - notFollowedBy' exampleLine - notFollowedBy' hline - notFollowedBy' noteMarker - notFollowedBy' tableStart - notFollowedBy' drawerStart - notFollowedBy' headerStart - notFollowedBy' metaLineStart - notFollowedBy' latexEnvStart - notFollowedBy' commentLineStart - notFollowedBy' bulletListStart - notFollowedBy' orderedListStart - decEmphasisNewlinesCount - guard =<< newlinesCountWithinLimits - updateLastPreCharPos - return . return $ B.softbreak - -cite :: OrgParser (F Inlines) -cite = try $ do - guardEnabled Ext_citations - (cs, raw) <- withRaw normalCite - return $ (flip B.cite (B.text raw)) <$> cs - -normalCite :: OrgParser (F [Citation]) -normalCite = try $ char '[' - *> skipSpaces - *> citeList - <* skipSpaces - <* char ']' - -citeList :: OrgParser (F [Citation]) -citeList = sequence <$> sepBy1 citation (try $ char ';' *> skipSpaces) - -citation :: OrgParser (F Citation) -citation = try $ do - pref <- prefix - (suppress_author, key) <- citeKey - suff <- suffix - return $ do - x <- pref - y <- suff - return $ Citation{ citationId = key - , citationPrefix = B.toList x - , citationSuffix = B.toList y - , citationMode = if suppress_author - then SuppressAuthor - else NormalCitation - , citationNoteNum = 0 - , citationHash = 0 - } - where - prefix = trimInlinesF . mconcat <$> - manyTill inline (char ']' <|> (']' <$ lookAhead citeKey)) - suffix = try $ do - hasSpace <- option False (notFollowedBy nonspaceChar >> return True) - skipSpaces - rest <- trimInlinesF . mconcat <$> - many (notFollowedBy (oneOf ";]") *> inline) - return $ if hasSpace - then (B.space <>) <$> rest - else rest - -footnote :: OrgParser (F Inlines) -footnote = try $ inlineNote <|> referencedNote - -inlineNote :: OrgParser (F Inlines) -inlineNote = try $ do - string "[fn:" - ref <- many alphaNum - char ':' - note <- fmap B.para . trimInlinesF . mconcat <$> many1Till inline (char ']') - when (not $ null ref) $ - addToNotesTable ("fn:" ++ ref, note) - return $ B.note <$> note - -referencedNote :: OrgParser (F Inlines) -referencedNote = try $ do - ref <- noteMarker - return $ do - notes <- asksF orgStateNotes' - case lookup ref notes of - Nothing -> return $ B.str $ "[" ++ ref ++ "]" - Just contents -> do - st <- askF - let contents' = runF contents st{ orgStateNotes' = [] } - return $ B.note contents' - -noteMarker :: OrgParser String -noteMarker = try $ do - char '[' - choice [ many1Till digit (char ']') - , (++) <$> string "fn:" - <*> many1Till (noneOf "\n\r\t ") (char ']') - ] - -linkOrImage :: OrgParser (F Inlines) -linkOrImage = explicitOrImageLink - <|> selflinkOrImage - <|> angleLink - <|> plainLink - <?> "link or image" - -explicitOrImageLink :: OrgParser (F Inlines) -explicitOrImageLink = try $ do - char '[' - srcF <- applyCustomLinkFormat =<< possiblyEmptyLinkTarget - title <- enclosedRaw (char '[') (char ']') - title' <- parseFromString (mconcat <$> many inline) title - char ']' - return $ do - src <- srcF - if isImageFilename title - then pure $ B.link src "" $ B.image title mempty mempty - else linkToInlinesF src =<< title' - -selflinkOrImage :: OrgParser (F Inlines) -selflinkOrImage = try $ do - src <- char '[' *> linkTarget <* char ']' - return $ linkToInlinesF src (B.str src) - -plainLink :: OrgParser (F Inlines) -plainLink = try $ do - (orig, src) <- uri - returnF $ B.link src "" (B.str orig) - -angleLink :: OrgParser (F Inlines) -angleLink = try $ do - char '<' - link <- plainLink - char '>' - return link - -selfTarget :: OrgParser String -selfTarget = try $ char '[' *> linkTarget <* char ']' - -linkTarget :: OrgParser String -linkTarget = enclosedByPair '[' ']' (noneOf "\n\r[]") - -possiblyEmptyLinkTarget :: OrgParser String -possiblyEmptyLinkTarget = try linkTarget <|> ("" <$ string "[]") - -applyCustomLinkFormat :: String -> OrgParser (F String) -applyCustomLinkFormat link = do - let (linkType, rest) = break (== ':') link - return $ do - formatter <- M.lookup linkType <$> asksF orgStateLinkFormatters - return $ maybe link ($ drop 1 rest) formatter - --- | Take a link and return a function which produces new inlines when given --- description inlines. -linkToInlinesF :: String -> Inlines -> F Inlines -linkToInlinesF linkStr = - case linkStr of - "" -> pure . B.link mempty "" -- wiki link (empty by convention) - ('#':_) -> pure . B.link linkStr "" -- document-local fraction - _ -> case cleanLinkString linkStr of - (Just cleanedLink) -> if isImageFilename cleanedLink - then const . pure $ B.image cleanedLink "" "" - else pure . B.link cleanedLink "" - Nothing -> internalLink linkStr -- other internal link - --- | Cleanup and canonicalize a string describing a link. Return @Nothing@ if --- the string does not appear to be a link. -cleanLinkString :: String -> Maybe String -cleanLinkString s = - case s of - '/':_ -> Just $ "file://" ++ s -- absolute path - '.':'/':_ -> Just s -- relative path - '.':'.':'/':_ -> Just s -- relative path - -- Relative path or URL (file schema) - 'f':'i':'l':'e':':':s' -> Just $ if ("//" `isPrefixOf` s') then s else s' - _ | isUrl s -> Just s -- URL - _ -> Nothing - where - isUrl :: String -> Bool - isUrl cs = - let (scheme, path) = break (== ':') cs - in all (\c -> isAlphaNum c || c `elem` (".-"::String)) scheme - && not (null path) - -isImageFilename :: String -> Bool -isImageFilename filename = - any (\x -> ('.':x) `isSuffixOf` filename) imageExtensions && - (any (\x -> (x++":") `isPrefixOf` filename) protocols || - ':' `notElem` filename) - where - imageExtensions = [ "jpeg" , "jpg" , "png" , "gif" , "svg" ] - protocols = [ "file", "http", "https" ] - -internalLink :: String -> Inlines -> F Inlines -internalLink link title = do - anchorB <- (link `elem`) <$> asksF orgStateAnchorIds - if anchorB - then return $ B.link ('#':link) "" title - else return $ B.emph title - --- | Parse an anchor like @<<anchor-id>>@ and return an empty span with --- @anchor-id@ set as id. Legal anchors in org-mode are defined through --- @org-target-regexp@, which is fairly liberal. Since no link is created if --- @anchor-id@ contains spaces, we are more restrictive in what is accepted as --- an anchor. - -anchor :: OrgParser (F Inlines) -anchor = try $ do - anchorId <- parseAnchor - recordAnchorId anchorId - returnF $ B.spanWith (solidify anchorId, [], []) mempty - where - parseAnchor = string "<<" - *> many1 (noneOf "\t\n\r<>\"' ") - <* string ">>" - <* skipSpaces - --- | Replace every char but [a-zA-Z0-9_.-:] with a hypen '-'. This mirrors --- the org function @org-export-solidify-link-text@. - -solidify :: String -> String -solidify = map replaceSpecialChar - where replaceSpecialChar c - | isAlphaNum c = c - | c `elem` ("_.-:" :: String) = c - | otherwise = '-' - --- | Parses an inline code block and marks it as an babel block. -inlineCodeBlock :: OrgParser (F Inlines) -inlineCodeBlock = try $ do - string "src_" - lang <- many1 orgArgWordChar - opts <- option [] $ enclosedByPair '[' ']' inlineBlockOption - inlineCode <- enclosedByPair '{' '}' (noneOf "\n\r") - let attrClasses = [translateLang lang, rundocBlockClass] - let attrKeyVal = map toRundocAttrib (("language", lang) : opts) - returnF $ B.codeWith ("", attrClasses, attrKeyVal) inlineCode - -enclosedByPair :: Char -- ^ opening char - -> Char -- ^ closing char - -> OrgParser a -- ^ parser - -> OrgParser [a] -enclosedByPair s e p = char s *> many1Till p (char e) - -emph :: OrgParser (F Inlines) -emph = fmap B.emph <$> emphasisBetween '/' - -strong :: OrgParser (F Inlines) -strong = fmap B.strong <$> emphasisBetween '*' - -strikeout :: OrgParser (F Inlines) -strikeout = fmap B.strikeout <$> emphasisBetween '+' - --- There is no underline, so we use strong instead. -underline :: OrgParser (F Inlines) -underline = fmap B.strong <$> emphasisBetween '_' - -verbatim :: OrgParser (F Inlines) -verbatim = return . B.code <$> verbatimBetween '=' - -code :: OrgParser (F Inlines) -code = return . B.code <$> verbatimBetween '~' - -subscript :: OrgParser (F Inlines) -subscript = fmap B.subscript <$> try (char '_' *> subOrSuperExpr) - -superscript :: OrgParser (F Inlines) -superscript = fmap B.superscript <$> try (char '^' *> subOrSuperExpr) - -math :: OrgParser (F Inlines) -math = return . B.math <$> choice [ math1CharBetween '$' - , mathStringBetween '$' - , rawMathBetween "\\(" "\\)" - ] - -displayMath :: OrgParser (F Inlines) -displayMath = return . B.displayMath <$> choice [ rawMathBetween "\\[" "\\]" - , rawMathBetween "$$" "$$" - ] - -updatePositions :: Char - -> OrgParser (Char) -updatePositions c = do - when (c `elem` emphasisPreChars) updateLastPreCharPos - when (c `elem` emphasisForbiddenBorderChars) updateLastForbiddenCharPos - return c - -symbol :: OrgParser (F Inlines) -symbol = return . B.str . (: "") <$> (oneOf specialChars >>= updatePositions) - -emphasisBetween :: Char - -> OrgParser (F Inlines) -emphasisBetween c = try $ do - startEmphasisNewlinesCounting emphasisAllowedNewlines - res <- enclosedInlines (emphasisStart c) (emphasisEnd c) - isTopLevelEmphasis <- null . orgStateEmphasisCharStack <$> getState - when isTopLevelEmphasis - resetEmphasisNewlines - return res - -verbatimBetween :: Char - -> OrgParser String -verbatimBetween c = try $ - emphasisStart c *> - many1TillNOrLessNewlines 1 (noneOf "\n\r") (emphasisEnd c) - --- | Parses a raw string delimited by @c@ using Org's math rules -mathStringBetween :: Char - -> OrgParser String -mathStringBetween c = try $ do - mathStart c - body <- many1TillNOrLessNewlines mathAllowedNewlines - (noneOf (c:"\n\r")) - (lookAhead $ mathEnd c) - final <- mathEnd c - return $ body ++ [final] - --- | Parse a single character between @c@ using math rules -math1CharBetween :: Char - -> OrgParser String -math1CharBetween c = try $ do - char c - res <- noneOf $ c:mathForbiddenBorderChars - char c - eof <|> () <$ lookAhead (oneOf mathPostChars) - return [res] - -rawMathBetween :: String - -> String - -> OrgParser String -rawMathBetween s e = try $ string s *> manyTill anyChar (try $ string e) - --- | Parses the start (opening character) of emphasis -emphasisStart :: Char -> OrgParser Char -emphasisStart c = try $ do - guard =<< afterEmphasisPreChar - guard =<< notAfterString - char c - lookAhead (noneOf emphasisForbiddenBorderChars) - pushToInlineCharStack c - return c - --- | Parses the closing character of emphasis -emphasisEnd :: Char -> OrgParser Char -emphasisEnd c = try $ do - guard =<< notAfterForbiddenBorderChar - char c - eof <|> () <$ lookAhead acceptablePostChars - updateLastStrPos - popInlineCharStack - return c - where acceptablePostChars = - surroundingEmphasisChar >>= \x -> oneOf (x ++ emphasisPostChars) - -mathStart :: Char -> OrgParser Char -mathStart c = try $ - char c <* notFollowedBy' (oneOf (c:mathForbiddenBorderChars)) - -mathEnd :: Char -> OrgParser Char -mathEnd c = try $ do - res <- noneOf (c:mathForbiddenBorderChars) - char c - eof <|> () <$ lookAhead (oneOf mathPostChars) - return res - - -enclosedInlines :: OrgParser a - -> OrgParser b - -> OrgParser (F Inlines) -enclosedInlines start end = try $ - trimInlinesF . mconcat <$> enclosed start end inline - -enclosedRaw :: OrgParser a - -> OrgParser b - -> OrgParser String -enclosedRaw start end = try $ - start *> (onSingleLine <|> spanningTwoLines) - where onSingleLine = try $ many1Till (noneOf "\n\r") end - spanningTwoLines = try $ - anyLine >>= \f -> mappend (f <> " ") <$> onSingleLine - --- | Like many1Till, but parses at most @n+1@ lines. @p@ must not consume --- newlines. -many1TillNOrLessNewlines :: Int - -> OrgParser Char - -> OrgParser a - -> OrgParser String -many1TillNOrLessNewlines n p end = try $ - nMoreLines (Just n) mempty >>= oneOrMore - where - nMoreLines Nothing cs = return cs - nMoreLines (Just 0) cs = try $ (cs ++) <$> finalLine - nMoreLines k cs = try $ (final k cs <|> rest k cs) - >>= uncurry nMoreLines - final _ cs = (\x -> (Nothing, cs ++ x)) <$> try finalLine - rest m cs = (\x -> (minus1 <$> m, cs ++ x ++ "\n")) <$> try (manyTill p P.newline) - finalLine = try $ manyTill p end - minus1 k = k - 1 - oneOrMore cs = guard (not $ null cs) *> return cs - --- Org allows customization of the way it reads emphasis. We use the defaults --- here (see, e.g., the Emacs Lisp variable `org-emphasis-regexp-components` --- for details). - --- | Chars allowed to occur before emphasis (spaces and newlines are ok, too) -emphasisPreChars :: [Char] -emphasisPreChars = "\t \"'({" - --- | Chars allowed at after emphasis -emphasisPostChars :: [Char] -emphasisPostChars = "\t\n !\"'),-.:;?\\}" - --- | Chars not allowed at the (inner) border of emphasis -emphasisForbiddenBorderChars :: [Char] -emphasisForbiddenBorderChars = "\t\n\r \"'," - --- | The maximum number of newlines within -emphasisAllowedNewlines :: Int -emphasisAllowedNewlines = 1 - --- LaTeX-style math: see `org-latex-regexps` for details - --- | Chars allowed after an inline ($...$) math statement -mathPostChars :: [Char] -mathPostChars = "\t\n \"'),-.:;?" - --- | Chars not allowed at the (inner) border of math -mathForbiddenBorderChars :: [Char] -mathForbiddenBorderChars = "\t\n\r ,;.$" - --- | Maximum number of newlines in an inline math statement -mathAllowedNewlines :: Int -mathAllowedNewlines = 2 - --- | Whether we are right behind a char allowed before emphasis -afterEmphasisPreChar :: OrgParser Bool -afterEmphasisPreChar = do - pos <- getPosition - lastPrePos <- orgStateLastPreCharPos <$> getState - return . fromMaybe True $ (== pos) <$> lastPrePos - --- | Whether the parser is right after a forbidden border char -notAfterForbiddenBorderChar :: OrgParser Bool -notAfterForbiddenBorderChar = do - pos <- getPosition - lastFBCPos <- orgStateLastForbiddenCharPos <$> getState - return $ lastFBCPos /= Just pos - --- | Read a sub- or superscript expression -subOrSuperExpr :: OrgParser (F Inlines) -subOrSuperExpr = try $ - choice [ id <$> charsInBalanced '{' '}' (noneOf "\n\r") - , enclosing ('(', ')') <$> charsInBalanced '(' ')' (noneOf "\n\r") - , simpleSubOrSuperString - ] >>= parseFromString (mconcat <$> many inline) - where enclosing (left, right) s = left : s ++ [right] - -simpleSubOrSuperString :: OrgParser String -simpleSubOrSuperString = try $ - choice [ string "*" - , mappend <$> option [] ((:[]) <$> oneOf "+-") - <*> many1 alphaNum - ] - -inlineLaTeX :: OrgParser (F Inlines) -inlineLaTeX = try $ do - cmd <- inlineLaTeXCommand - maybe mzero returnF $ - parseAsMath cmd `mplus` parseAsMathMLSym cmd `mplus` parseAsInlineLaTeX cmd - where - parseAsMath :: String -> Maybe Inlines - parseAsMath cs = B.fromList <$> texMathToPandoc cs - - parseAsInlineLaTeX :: String -> Maybe Inlines - parseAsInlineLaTeX cs = maybeRight $ runParser inlineCommand state "" cs - - parseAsMathMLSym :: String -> Maybe Inlines - parseAsMathMLSym cs = B.str <$> MathMLEntityMap.getUnicode (clean cs) - -- dropWhileEnd would be nice here, but it's not available before base 4.5 - where clean = reverse . dropWhile (`elem` ("{}" :: String)) . reverse . drop 1 - - state :: ParserState - state = def{ stateOptions = def{ readerParseRaw = True }} - - texMathToPandoc inp = (maybeRight $ readTeX inp) >>= - writePandoc DisplayInline - -maybeRight :: Either a b -> Maybe b -maybeRight = either (const Nothing) Just - -inlineLaTeXCommand :: OrgParser String -inlineLaTeXCommand = try $ do - rest <- getInput - case runParser rawLaTeXInline def "source" rest of - Right (RawInline _ cs) -> do - let len = length cs - count len anyChar - return cs - _ -> mzero - -smart :: OrgParser (F Inlines) -smart = do - getOption readerSmart >>= guard - doubleQuoted <|> singleQuoted <|> - choice (map (return <$>) [orgApostrophe, orgDash, orgEllipses]) - where - orgDash = dash <* updatePositions '-' - orgEllipses = ellipses <* updatePositions '.' - orgApostrophe = - (char '\'' <|> char '\8217') <* updateLastPreCharPos - <* updateLastForbiddenCharPos - *> return (B.str "\x2019") - -singleQuoted :: OrgParser (F Inlines) -singleQuoted = try $ do - singleQuoteStart - updatePositions '\'' - withQuoteContext InSingleQuote $ - fmap B.singleQuoted . trimInlinesF . mconcat <$> - many1Till inline (singleQuoteEnd <* updatePositions '\'') - --- doubleQuoted will handle regular double-quoted sections, as well --- as dialogues with an open double-quote without a close double-quote --- in the same paragraph. -doubleQuoted :: OrgParser (F Inlines) -doubleQuoted = try $ do - doubleQuoteStart - updatePositions '"' - contents <- mconcat <$> many (try $ notFollowedBy doubleQuoteEnd >> inline) - (withQuoteContext InDoubleQuote $ (doubleQuoteEnd <* updateLastForbiddenCharPos) >> return - (fmap B.doubleQuoted . trimInlinesF $ contents)) - <|> (return $ return (B.str "\8220") <> contents) diff --git a/src/Text/Pandoc/Readers/Org/BlockStarts.hs b/src/Text/Pandoc/Readers/Org/BlockStarts.hs new file mode 100644 index 000000000..e4dc31342 --- /dev/null +++ b/src/Text/Pandoc/Readers/Org/BlockStarts.hs @@ -0,0 +1,112 @@ +{- +Copyright (C) 2014-2016 Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Readers.Org.Options + Copyright : Copyright (C) 2014-2016 Albert Krewinkel + License : GNU GPL, version 2 or above + + Maintainer : Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +Parsers for Org-mode inline elements. +-} +module Text.Pandoc.Readers.Org.BlockStarts + ( exampleLineStart + , hline + , noteMarker + , tableStart + , drawerStart + , headerStart + , metaLineStart + , latexEnvStart + , commentLineStart + , bulletListStart + , orderedListStart + ) where + +import Text.Pandoc.Readers.Org.Parsing + +-- | Horizontal Line (five -- dashes or more) +hline :: OrgParser () +hline = try $ do + skipSpaces + string "-----" + many (char '-') + skipSpaces + newline + return () + +-- | Read the start of a header line, return the header level +headerStart :: OrgParser Int +headerStart = try $ + (length <$> many1 (char '*')) <* many1 (char ' ') <* updateLastPreCharPos + +tableStart :: OrgParser Char +tableStart = try $ skipSpaces *> char '|' + +latexEnvStart :: OrgParser String +latexEnvStart = try $ do + skipSpaces *> string "\\begin{" + *> latexEnvName + <* string "}" + <* blankline + where + latexEnvName :: OrgParser String + latexEnvName = try $ mappend <$> many1 alphaNum <*> option "" (string "*") + + +-- | Parses bullet list marker. +bulletListStart :: OrgParser () +bulletListStart = try $ + choice + [ () <$ skipSpaces <* oneOf "+-" <* skipSpaces1 + , () <$ skipSpaces1 <* char '*' <* skipSpaces1 + ] + +genericListStart :: OrgParser String + -> OrgParser Int +genericListStart listMarker = try $ + (+) <$> (length <$> many spaceChar) + <*> (length <$> listMarker <* many1 spaceChar) + +orderedListStart :: OrgParser Int +orderedListStart = genericListStart orderedListMarker + -- Ordered list markers allowed in org-mode + where orderedListMarker = mappend <$> many1 digit <*> (pure <$> oneOf ".)") + +drawerStart :: OrgParser String +drawerStart = try $ + skipSpaces *> drawerName <* skipSpaces <* newline + where drawerName = char ':' *> manyTill nonspaceChar (char ':') + +metaLineStart :: OrgParser () +metaLineStart = try $ skipSpaces <* string "#+" + +commentLineStart :: OrgParser () +commentLineStart = try $ skipSpaces <* string "# " + +exampleLineStart :: OrgParser () +exampleLineStart = () <$ try (skipSpaces *> string ": ") + +noteMarker :: OrgParser String +noteMarker = try $ do + char '[' + choice [ many1Till digit (char ']') + , (++) <$> string "fn:" + <*> many1Till (noneOf "\n\r\t ") (char ']') + ] diff --git a/src/Text/Pandoc/Readers/Org/Blocks.hs b/src/Text/Pandoc/Readers/Org/Blocks.hs new file mode 100644 index 000000000..75e564f2f --- /dev/null +++ b/src/Text/Pandoc/Readers/Org/Blocks.hs @@ -0,0 +1,901 @@ +{-# LANGUAGE FlexibleContexts #-} +{- +Copyright (C) 2014-2016 Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Readers.Org.Options + Copyright : Copyright (C) 2014-2016 Albert Krewinkel + License : GNU GPL, version 2 or above + + Maintainer : Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +Parsers for Org-mode block elements. +-} +module Text.Pandoc.Readers.Org.Blocks + ( blockList + , meta + ) where + +import Text.Pandoc.Readers.Org.BlockStarts +import Text.Pandoc.Readers.Org.Inlines +import Text.Pandoc.Readers.Org.ParserState +import Text.Pandoc.Readers.Org.Parsing +import Text.Pandoc.Readers.Org.Shared + ( isImageFilename, rundocBlockClass, toRundocAttrib + , translateLang ) + +import qualified Text.Pandoc.Builder as B +import Text.Pandoc.Builder ( Inlines, Blocks ) +import Text.Pandoc.Definition +import Text.Pandoc.Compat.Monoid ((<>)) +import Text.Pandoc.Options +import Text.Pandoc.Shared ( compactify', compactify'DL ) + +import Control.Monad ( foldM, guard, mzero ) +import Data.Char ( isSpace, toLower, toUpper) +import Data.List ( foldl', intersperse, isPrefixOf ) +import qualified Data.Map as M +import Data.Maybe ( fromMaybe, isNothing ) +import Network.HTTP ( urlEncode ) + + +-- +-- parsing blocks +-- + +-- | Get a list of blocks. +blockList :: OrgParser [Block] +blockList = do + blocks' <- blocks + st <- getState + return . B.toList $ runF blocks' st + +-- | Get the meta information safed in the state. +meta :: OrgParser Meta +meta = do + st <- getState + return $ runF (orgStateMeta st) st + +blocks :: OrgParser (F Blocks) +blocks = mconcat <$> manyTill block eof + +block :: OrgParser (F Blocks) +block = choice [ mempty <$ blanklines + , table + , orgBlock + , figure + , example + , genericDrawer + , specialLine + , header + , horizontalRule + , list + , latexFragment + , noteBlock + , paraOrPlain + ] <?> "block" + + +-- +-- Block Attributes +-- + +-- | Attributes that may be added to figures (like a name or caption). +data BlockAttributes = BlockAttributes + { blockAttrName :: Maybe String + , blockAttrCaption :: Maybe (F Inlines) + , blockAttrKeyValues :: [(String, String)] + } + +stringyMetaAttribute :: (String -> Bool) -> OrgParser (String, String) +stringyMetaAttribute attrCheck = try $ do + metaLineStart + attrName <- map toUpper <$> many1Till nonspaceChar (char ':') + guard $ attrCheck attrName + skipSpaces + attrValue <- anyLine + return (attrName, attrValue) + +blockAttributes :: OrgParser BlockAttributes +blockAttributes = try $ do + kv <- many (stringyMetaAttribute attrCheck) + let caption = foldl' (appendValues "CAPTION") Nothing kv + let kvAttrs = foldl' (appendValues "ATTR_HTML") Nothing kv + let name = lookup "NAME" kv + caption' <- maybe (return Nothing) + (fmap Just . parseFromString inlines) + caption + kvAttrs' <- parseFromString keyValues . (++ "\n") $ fromMaybe mempty kvAttrs + return $ BlockAttributes + { blockAttrName = name + , blockAttrCaption = caption' + , blockAttrKeyValues = kvAttrs' + } + where + attrCheck :: String -> Bool + attrCheck attr = + case attr of + "NAME" -> True + "CAPTION" -> True + "ATTR_HTML" -> True + _ -> False + + appendValues :: String -> Maybe String -> (String, String) -> Maybe String + appendValues attrName accValue (key, value) = + if key /= attrName + then accValue + else case accValue of + Just acc -> Just $ acc ++ ' ':value + Nothing -> Just value + +keyValues :: OrgParser [(String, String)] +keyValues = try $ + manyTill ((,) <$> key <*> value) newline + where + key :: OrgParser String + key = try $ skipSpaces *> char ':' *> many1 nonspaceChar + + value :: OrgParser String + value = skipSpaces *> manyTill anyChar endOfValue + + endOfValue :: OrgParser () + endOfValue = + lookAhead $ (() <$ try (many1 spaceChar <* key)) + <|> () <$ newline + + +-- +-- Org Blocks (#+BEGIN_... / #+END_...) +-- + +-- | Read an org-mode block delimited by #+BEGIN_TYPE and #+END_TYPE. +orgBlock :: OrgParser (F Blocks) +orgBlock = try $ do + blockAttrs <- blockAttributes + blkType <- blockHeaderStart + ($ blkType) $ + case blkType of + "export" -> exportBlock + "comment" -> rawBlockLines (const mempty) + "html" -> rawBlockLines (return . (B.rawBlock blkType)) + "latex" -> rawBlockLines (return . (B.rawBlock blkType)) + "ascii" -> rawBlockLines (return . (B.rawBlock blkType)) + "example" -> rawBlockLines (return . exampleCode) + "quote" -> parseBlockLines (fmap B.blockQuote) + "verse" -> verseBlock + "src" -> codeBlock blockAttrs + _ -> parseBlockLines (fmap $ B.divWith (mempty, [blkType], mempty)) + where + blockHeaderStart :: OrgParser String + blockHeaderStart = try $ do + skipSpaces + blockType <- stringAnyCase "#+begin_" *> orgArgWord + return (map toLower blockType) + +rawBlockLines :: (String -> F Blocks) -> String -> OrgParser (F Blocks) +rawBlockLines f blockType = (ignHeaders *> (f <$> rawBlockContent blockType)) + +parseBlockLines :: (F Blocks -> F Blocks) -> String -> OrgParser (F Blocks) +parseBlockLines f blockType = (ignHeaders *> (f <$> parsedBlockContent)) + where + parsedBlockContent :: OrgParser (F Blocks) + parsedBlockContent = try $ do + raw <- rawBlockContent blockType + parseFromString blocks (raw ++ "\n") + +-- | Read the raw string content of a block +rawBlockContent :: String -> OrgParser String +rawBlockContent blockType = try $ do + blkLines <- manyTill rawLine blockEnder + tabLen <- getOption readerTabStop + return + . unlines + . stripIndent + . map (tabsToSpaces tabLen . commaEscaped) + $ blkLines + where + rawLine :: OrgParser String + rawLine = try $ ("" <$ blankline) <|> anyLine + + blockEnder :: OrgParser () + blockEnder = try $ skipSpaces <* stringAnyCase ("#+end_" <> blockType) + + stripIndent :: [String] -> [String] + stripIndent strs = map (drop (shortestIndent strs)) strs + + shortestIndent :: [String] -> Int + shortestIndent = minimum + . map (length . takeWhile isSpace) + . filter (not . null) + + tabsToSpaces :: Int -> String -> String + tabsToSpaces _ [] = [] + tabsToSpaces tabLen cs'@(c:cs) = + case c of + ' ' -> ' ':tabsToSpaces tabLen cs + '\t' -> (take tabLen $ repeat ' ') ++ tabsToSpaces tabLen cs + _ -> cs' + + commaEscaped :: String -> String + commaEscaped (',':cs@('*':_)) = cs + commaEscaped (',':cs@('#':'+':_)) = cs + commaEscaped (' ':cs) = ' ':commaEscaped cs + commaEscaped ('\t':cs) = '\t':commaEscaped cs + commaEscaped cs = cs + +-- | Read but ignore all remaining block headers. +ignHeaders :: OrgParser () +ignHeaders = (() <$ newline) <|> (() <$ anyLine) + +-- | Read a block containing code intended for export in specific backends +-- only. +exportBlock :: String -> OrgParser (F Blocks) +exportBlock blockType = try $ do + exportType <- skipSpaces *> orgArgWord <* ignHeaders + contents <- rawBlockContent blockType + returnF (B.rawBlock (map toLower exportType) contents) + +verseBlock :: String -> OrgParser (F Blocks) +verseBlock blockType = try $ do + ignHeaders + content <- rawBlockContent blockType + fmap B.para . mconcat . intersperse (pure B.linebreak) + <$> mapM (parseFromString inlines) (map (++ "\n") . lines $ content) + +-- | Read a code block and the associated results block if present. Which of +-- boths blocks is included in the output is determined using the "exports" +-- argument in the block header. +codeBlock :: BlockAttributes -> String -> OrgParser (F Blocks) +codeBlock blockAttrs blockType = do + skipSpaces + (classes, kv) <- codeHeaderArgs <|> (mempty <$ ignHeaders) + content <- rawBlockContent blockType + resultsContent <- trailingResultsBlock + let id' = fromMaybe mempty $ blockAttrName blockAttrs + let includeCode = exportsCode kv + let includeResults = exportsResults kv + let codeBlck = B.codeBlockWith ( id', classes, kv ) content + let labelledBlck = maybe (pure codeBlck) + (labelDiv codeBlck) + (blockAttrCaption blockAttrs) + let resultBlck = fromMaybe mempty resultsContent + return $ + (if includeCode then labelledBlck else mempty) <> + (if includeResults then resultBlck else mempty) + where + labelDiv :: Blocks -> F Inlines -> F Blocks + labelDiv blk value = + B.divWith nullAttr <$> (mappend <$> labelledBlock value <*> pure blk) + + labelledBlock :: F Inlines -> F Blocks + labelledBlock = fmap (B.plain . B.spanWith ("", ["label"], [])) + +exportsCode :: [(String, String)] -> Bool +exportsCode attrs = not (("rundoc-exports", "none") `elem` attrs + || ("rundoc-exports", "results") `elem` attrs) + +exportsResults :: [(String, String)] -> Bool +exportsResults attrs = ("rundoc-exports", "results") `elem` attrs + || ("rundoc-exports", "both") `elem` attrs + +trailingResultsBlock :: OrgParser (Maybe (F Blocks)) +trailingResultsBlock = optionMaybe . try $ do + blanklines + stringAnyCase "#+RESULTS:" + blankline + block + +-- | Parse code block arguments +-- TODO: We currently don't handle switches. +codeHeaderArgs :: OrgParser ([String], [(String, String)]) +codeHeaderArgs = try $ do + language <- skipSpaces *> orgArgWord + _ <- skipSpaces *> (try $ switch `sepBy` (many1 spaceChar)) + parameters <- manyTill blockOption newline + let pandocLang = translateLang language + return $ + if hasRundocParameters parameters + then ( [ pandocLang, rundocBlockClass ] + , map toRundocAttrib (("language", language) : parameters) + ) + else ([ pandocLang ], parameters) + where + hasRundocParameters = not . null + +switch :: OrgParser (Char, Maybe String) +switch = try $ simpleSwitch <|> lineNumbersSwitch + where + simpleSwitch = (\c -> (c, Nothing)) <$> (oneOf "-+" *> letter) + lineNumbersSwitch = (\ls -> ('l', Just ls)) <$> + (string "-l \"" *> many1Till nonspaceChar (char '"')) + +blockOption :: OrgParser (String, String) +blockOption = try $ do + argKey <- orgArgKey + paramValue <- option "yes" orgParamValue + return (argKey, paramValue) + +orgParamValue :: OrgParser String +orgParamValue = try $ + skipSpaces + *> notFollowedBy (char ':' ) + *> many1 nonspaceChar + <* skipSpaces + +horizontalRule :: OrgParser (F Blocks) +horizontalRule = return B.horizontalRule <$ try hline + + +-- +-- Drawers +-- + +-- | A generic drawer which has no special meaning for org-mode. +-- Whether or not this drawer is included in the output depends on the drawers +-- export setting. +genericDrawer :: OrgParser (F Blocks) +genericDrawer = try $ do + name <- map toUpper <$> drawerStart + content <- manyTill drawerLine (try drawerEnd) + state <- getState + -- Include drawer if it is explicitly included in or not explicitly excluded + -- from the list of drawers that should be exported. PROPERTIES drawers are + -- never exported. + case (exportDrawers . orgStateExportSettings $ state) of + _ | name == "PROPERTIES" -> return mempty + Left names | name `elem` names -> return mempty + Right names | name `notElem` names -> return mempty + _ -> drawerDiv name <$> parseLines content + where + parseLines :: [String] -> OrgParser (F Blocks) + parseLines = parseFromString blocks . (++ "\n") . unlines + + drawerDiv :: String -> F Blocks -> F Blocks + drawerDiv drawerName = fmap $ B.divWith (mempty, [drawerName, "drawer"], mempty) + +drawerLine :: OrgParser String +drawerLine = anyLine + +drawerEnd :: OrgParser String +drawerEnd = try $ + skipSpaces *> stringAnyCase ":END:" <* skipSpaces <* newline + +-- | Read a :PROPERTIES: drawer and return the key/value pairs contained +-- within. +propertiesDrawer :: OrgParser [(String, String)] +propertiesDrawer = try $ do + drawerType <- drawerStart + guard $ map toUpper drawerType == "PROPERTIES" + manyTill property (try drawerEnd) + where + property :: OrgParser (String, String) + property = try $ (,) <$> key <*> value + + key :: OrgParser String + key = try $ skipSpaces *> char ':' *> many1Till nonspaceChar (char ':') + + value :: OrgParser String + value = try $ skipSpaces *> manyTill anyChar (try $ skipSpaces *> newline) + +keyValuesToAttr :: [(String, String)] -> Attr +keyValuesToAttr kvs = + let + lowerKvs = map (\(k, v) -> (map toLower k, v)) kvs + id' = fromMaybe mempty . lookup "custom_id" $ lowerKvs + cls = fromMaybe mempty . lookup "class" $ lowerKvs + kvs' = filter (flip notElem ["custom_id", "class"] . fst) lowerKvs + in + (id', words cls, kvs') + + +-- +-- Figures +-- + +-- | Figures (Image on a line by itself, preceded by name and/or caption) +figure :: OrgParser (F Blocks) +figure = try $ do + figAttrs <- blockAttributes + src <- skipSpaces *> selfTarget <* skipSpaces <* newline + guard . not . isNothing . blockAttrCaption $ figAttrs + guard (isImageFilename src) + let figName = fromMaybe mempty $ blockAttrName figAttrs + let figCaption = fromMaybe mempty $ blockAttrCaption figAttrs + let figKeyVals = blockAttrKeyValues figAttrs + let attr = (mempty, mempty, figKeyVals) + return $ (B.para . B.imageWith attr src (withFigPrefix figName) <$> figCaption) + where + withFigPrefix :: String -> String + withFigPrefix cs = + if "fig:" `isPrefixOf` cs + then cs + else "fig:" ++ cs + + selfTarget :: OrgParser String + selfTarget = try $ char '[' *> linkTarget <* char ']' + +-- +-- Examples +-- + +-- | Example code marked up by a leading colon. +example :: OrgParser (F Blocks) +example = try $ do + return . return . exampleCode =<< unlines <$> many1 exampleLine + where + exampleLine :: OrgParser String + exampleLine = try $ exampleLineStart *> anyLine + +exampleCode :: String -> Blocks +exampleCode = B.codeBlockWith ("", ["example"], []) + + +-- +-- Comments, Options and Metadata +-- + +specialLine :: OrgParser (F Blocks) +specialLine = fmap return . try $ metaLine <|> commentLine + +-- The order, in which blocks are tried, makes sure that we're not looking at +-- the beginning of a block, so we don't need to check for it +metaLine :: OrgParser Blocks +metaLine = mempty <$ metaLineStart <* (optionLine <|> declarationLine) + +commentLine :: OrgParser Blocks +commentLine = commentLineStart *> anyLine *> pure mempty + +declarationLine :: OrgParser () +declarationLine = try $ do + key <- metaKey + value <- metaInlines + updateState $ \st -> + let meta' = B.setMeta key <$> value <*> pure nullMeta + in st { orgStateMeta = orgStateMeta st <> meta' } + +metaInlines :: OrgParser (F MetaValue) +metaInlines = fmap (MetaInlines . B.toList) <$> inlinesTillNewline + +metaKey :: OrgParser String +metaKey = map toLower <$> many1 (noneOf ": \n\r") + <* char ':' + <* skipSpaces + +optionLine :: OrgParser () +optionLine = try $ do + key <- metaKey + case key of + "link" -> parseLinkFormat >>= uncurry addLinkFormat + "options" -> () <$ sepBy spaces exportSetting + _ -> mzero + +addLinkFormat :: String + -> (String -> String) + -> OrgParser () +addLinkFormat key formatter = updateState $ \s -> + let fs = orgStateLinkFormatters s + in s{ orgStateLinkFormatters = M.insert key formatter fs } + + +-- +-- Export Settings +-- + +-- | Read and process org-mode specific export options. +exportSetting :: OrgParser () +exportSetting = choice + [ booleanSetting "^" setExportSubSuperscripts + , booleanSetting "'" setExportSmartQuotes + , booleanSetting "*" setExportEmphasizedText + , booleanSetting "-" setExportSpecialStrings + , ignoredSetting ":" + , ignoredSetting "<" + , ignoredSetting "\\n" + , ignoredSetting "arch" + , ignoredSetting "author" + , ignoredSetting "c" + , ignoredSetting "creator" + , complementableListSetting "d" setExportDrawers + , ignoredSetting "date" + , ignoredSetting "e" + , ignoredSetting "email" + , ignoredSetting "f" + , ignoredSetting "H" + , ignoredSetting "inline" + , ignoredSetting "num" + , ignoredSetting "p" + , ignoredSetting "pri" + , ignoredSetting "prop" + , ignoredSetting "stat" + , ignoredSetting "tags" + , ignoredSetting "tasks" + , ignoredSetting "tex" + , ignoredSetting "timestamp" + , ignoredSetting "title" + , ignoredSetting "toc" + , ignoredSetting "todo" + , ignoredSetting "|" + ] <?> "export setting" + +booleanSetting :: String -> ExportSettingSetter Bool -> OrgParser () +booleanSetting settingIdentifier setter = try $ do + string settingIdentifier + char ':' + value <- elispBoolean + updateState $ modifyExportSettings setter value + +-- | Read an elisp boolean. Only NIL is treated as false, non-NIL values are +-- interpreted as true. +elispBoolean :: OrgParser Bool +elispBoolean = try $ do + value <- many1 nonspaceChar + return $ case map toLower value of + "nil" -> False + "{}" -> False + "()" -> False + _ -> True + +-- | A list or a complement list (i.e. a list starting with `not`). +complementableListSetting :: String + -> ExportSettingSetter (Either [String] [String]) + -> OrgParser () +complementableListSetting settingIdentifier setter = try $ do + _ <- string settingIdentifier <* char ':' + value <- choice [ Left <$> complementStringList + , Right <$> stringList + , (\b -> if b then Left [] else Right []) <$> elispBoolean + ] + updateState $ modifyExportSettings setter value + where + -- Read a plain list of strings. + stringList :: OrgParser [String] + stringList = try $ + char '(' + *> sepBy elispString spaces + <* char ')' + + -- Read an emacs lisp list specifying a complement set. + complementStringList :: OrgParser [String] + complementStringList = try $ + string "(not " + *> sepBy elispString spaces + <* char ')' + + elispString :: OrgParser String + elispString = try $ + char '"' + *> manyTill alphaNum (char '"') + +ignoredSetting :: String -> OrgParser () +ignoredSetting s = try (() <$ string s <* char ':' <* many1 nonspaceChar) + + +parseLinkFormat :: OrgParser ((String, String -> String)) +parseLinkFormat = try $ do + linkType <- (:) <$> letter <*> many (alphaNum <|> oneOf "-_") <* skipSpaces + linkSubst <- parseFormat + return (linkType, linkSubst) + +-- | An ad-hoc, single-argument-only implementation of a printf-style format +-- parser. +parseFormat :: OrgParser (String -> String) +parseFormat = try $ do + replacePlain <|> replaceUrl <|> justAppend + where + -- inefficient, but who cares + replacePlain = try $ (\x -> concat . flip intersperse x) + <$> sequence [tillSpecifier 's', rest] + replaceUrl = try $ (\x -> concat . flip intersperse x . urlEncode) + <$> sequence [tillSpecifier 'h', rest] + justAppend = try $ (++) <$> rest + + rest = manyTill anyChar (eof <|> () <$ oneOf "\n\r") + tillSpecifier c = manyTill (noneOf "\n\r") (try $ string ('%':c:"")) + +-- +-- Headers +-- + +-- | Headers +header :: OrgParser (F Blocks) +header = try $ do + level <- headerStart + title <- manyTill inline (lookAhead $ optional headerTags <* newline) + tags <- option [] headerTags + newline + let text = tagTitle title tags + propAttr <- option nullAttr (keyValuesToAttr <$> propertiesDrawer) + attr <- registerHeader propAttr (runF text def) + return (B.headerWith attr level <$> text) + where + tagTitle :: [F Inlines] -> [String] -> F Inlines + tagTitle title tags = trimInlinesF . mconcat $ title <> map tagToInlineF tags + + tagToInlineF :: String -> F Inlines + tagToInlineF t = return $ B.spanWith ("", ["tag"], [("data-tag-name", t)]) mempty + + headerTags :: OrgParser [String] + headerTags = try $ + let tag = many1 (alphaNum <|> oneOf "@%#_") <* char ':' + in skipSpaces + *> char ':' + *> many1 tag + <* skipSpaces + + +-- +-- Tables +-- + +data OrgTableRow = OrgContentRow (F [Blocks]) + | OrgAlignRow [Alignment] + | OrgHlineRow + +-- OrgTable is strongly related to the pandoc table ADT. Using the same +-- (i.e. pandoc-global) ADT would mean that the reader would break if the +-- global structure was to be changed, which would be bad. The final table +-- should be generated using a builder function. Column widths aren't +-- implemented yet, so they are not tracked here. +data OrgTable = OrgTable + { orgTableAlignments :: [Alignment] + , orgTableHeader :: [Blocks] + , orgTableRows :: [[Blocks]] + } + +table :: OrgParser (F Blocks) +table = try $ do + blockAttrs <- blockAttributes + lookAhead tableStart + do + rows <- tableRows + let caption = fromMaybe (return mempty) $ blockAttrCaption blockAttrs + return $ (<$> caption) . orgToPandocTable . normalizeTable =<< rowsToTable rows + +orgToPandocTable :: OrgTable + -> Inlines + -> Blocks +orgToPandocTable (OrgTable aligns heads lns) caption = + B.table caption (zip aligns $ repeat 0) heads lns + +tableRows :: OrgParser [OrgTableRow] +tableRows = try $ many (tableAlignRow <|> tableHline <|> tableContentRow) + +tableContentRow :: OrgParser OrgTableRow +tableContentRow = try $ + OrgContentRow . sequence <$> (tableStart *> many1Till tableContentCell newline) + +tableContentCell :: OrgParser (F Blocks) +tableContentCell = try $ + fmap B.plain . trimInlinesF . mconcat <$> manyTill inline endOfCell + +tableAlignRow :: OrgParser OrgTableRow +tableAlignRow = try $ do + tableStart + cells <- many1Till tableAlignCell newline + -- Empty rows are regular (i.e. content) rows, not alignment rows. + guard $ any (/= AlignDefault) cells + return $ OrgAlignRow cells + +tableAlignCell :: OrgParser Alignment +tableAlignCell = + choice [ try $ emptyCell *> return AlignDefault + , try $ skipSpaces + *> char '<' + *> tableAlignFromChar + <* many digit + <* char '>' + <* emptyCell + ] <?> "alignment info" + where emptyCell = try $ skipSpaces *> endOfCell + +tableAlignFromChar :: OrgParser Alignment +tableAlignFromChar = try $ + choice [ char 'l' *> return AlignLeft + , char 'c' *> return AlignCenter + , char 'r' *> return AlignRight + ] + +tableHline :: OrgParser OrgTableRow +tableHline = try $ + OrgHlineRow <$ (tableStart *> char '-' *> anyLine) + +endOfCell :: OrgParser Char +endOfCell = try $ char '|' <|> lookAhead newline + +rowsToTable :: [OrgTableRow] + -> F OrgTable +rowsToTable = foldM rowToContent emptyTable + where emptyTable = OrgTable mempty mempty mempty + +normalizeTable :: OrgTable -> OrgTable +normalizeTable (OrgTable aligns heads rows) = OrgTable aligns' heads rows + where + refRow = if heads /= mempty + then heads + else if rows == mempty then mempty else head rows + cols = length refRow + fillColumns base padding = take cols $ base ++ repeat padding + aligns' = fillColumns aligns AlignDefault + +-- One or more horizontal rules after the first content line mark the previous +-- line as a header. All other horizontal lines are discarded. +rowToContent :: OrgTable + -> OrgTableRow + -> F OrgTable +rowToContent orgTable row = + case row of + OrgHlineRow -> return singleRowPromotedToHeader + OrgAlignRow as -> return . setAligns $ as + OrgContentRow cs -> appendToBody cs + where + singleRowPromotedToHeader :: OrgTable + singleRowPromotedToHeader = case orgTable of + OrgTable{ orgTableHeader = [], orgTableRows = b:[] } -> + orgTable{ orgTableHeader = b , orgTableRows = [] } + _ -> orgTable + + setAligns :: [Alignment] -> OrgTable + setAligns aligns = orgTable{ orgTableAlignments = aligns } + + appendToBody :: F [Blocks] -> F OrgTable + appendToBody frow = do + newRow <- frow + let oldRows = orgTableRows orgTable + -- NOTE: This is an inefficient O(n) operation. This should be changed + -- if performance ever becomes a problem. + return orgTable{ orgTableRows = oldRows ++ [newRow] } + + +-- +-- LaTeX fragments +-- +latexFragment :: OrgParser (F Blocks) +latexFragment = try $ do + envName <- latexEnvStart + content <- mconcat <$> manyTill anyLineNewline (latexEnd envName) + return . return $ B.rawBlock "latex" (content `inLatexEnv` envName) + where + c `inLatexEnv` e = mconcat [ "\\begin{", e, "}\n" + , c + , "\\end{", e, "}\n" + ] + +latexEnd :: String -> OrgParser () +latexEnd envName = try $ + () <$ skipSpaces + <* string ("\\end{" ++ envName ++ "}") + <* blankline + + +-- +-- Footnote defintions +-- +noteBlock :: OrgParser (F Blocks) +noteBlock = try $ do + ref <- noteMarker <* skipSpaces + content <- mconcat <$> blocksTillHeaderOrNote + addToNotesTable (ref, content) + return mempty + where + blocksTillHeaderOrNote = + many1Till block (eof <|> () <$ lookAhead noteMarker + <|> () <$ lookAhead headerStart) + +-- Paragraphs or Plain text +paraOrPlain :: OrgParser (F Blocks) +paraOrPlain = try $ do + ils <- inlines + nl <- option False (newline *> return True) + -- Read block as paragraph, except if we are in a list context and the block + -- is directly followed by a list item, in which case the block is read as + -- plain text. + try (guard nl + *> notFollowedBy (inList *> (() <$ orderedListStart <|> bulletListStart)) + *> return (B.para <$> ils)) + <|> (return (B.plain <$> ils)) + +inlinesTillNewline :: OrgParser (F Inlines) +inlinesTillNewline = trimInlinesF . mconcat <$> manyTill inline newline + + +-- +-- list blocks +-- + +list :: OrgParser (F Blocks) +list = choice [ definitionList, bulletList, orderedList ] <?> "list" + +definitionList :: OrgParser (F Blocks) +definitionList = try $ do n <- lookAhead (bulletListStart' Nothing) + fmap B.definitionList . fmap compactify'DL . sequence + <$> many1 (definitionListItem $ bulletListStart' (Just n)) + +bulletList :: OrgParser (F Blocks) +bulletList = try $ do n <- lookAhead (bulletListStart' Nothing) + fmap B.bulletList . fmap compactify' . sequence + <$> many1 (listItem (bulletListStart' $ Just n)) + +orderedList :: OrgParser (F Blocks) +orderedList = fmap B.orderedList . fmap compactify' . sequence + <$> many1 (listItem orderedListStart) + +bulletListStart' :: Maybe Int -> OrgParser Int +-- returns length of bulletList prefix, inclusive of marker +bulletListStart' Nothing = do ind <- length <$> many spaceChar + oneOf (bullets $ ind == 0) + skipSpaces1 + return (ind + 1) +bulletListStart' (Just n) = do count (n-1) spaceChar + oneOf (bullets $ n == 1) + many1 spaceChar + return n + +-- Unindented lists are legal, but they can't use '*' bullets. +-- We return n to maintain compatibility with the generic listItem. +bullets :: Bool -> String +bullets unindented = if unindented then "+-" else "*+-" + +definitionListItem :: OrgParser Int + -> OrgParser (F (Inlines, [Blocks])) +definitionListItem parseMarkerGetLength = try $ do + markerLength <- parseMarkerGetLength + term <- manyTill (noneOf "\n\r") (try definitionMarker) + line1 <- anyLineNewline + blank <- option "" ("\n" <$ blankline) + cont <- concat <$> many (listContinuation markerLength) + term' <- parseFromString inlines term + contents' <- parseFromString blocks $ line1 ++ blank ++ cont + return $ (,) <$> term' <*> fmap (:[]) contents' + where + definitionMarker = + spaceChar *> string "::" <* (spaceChar <|> lookAhead newline) + + +-- parse raw text for one list item, excluding start marker and continuations +listItem :: OrgParser Int + -> OrgParser (F Blocks) +listItem start = try . withContext ListItemState $ do + markerLength <- try start + firstLine <- anyLineNewline + blank <- option "" ("\n" <$ blankline) + rest <- concat <$> many (listContinuation markerLength) + parseFromString blocks $ firstLine ++ blank ++ rest + +-- continuation of a list item - indented and separated by blankline or endline. +-- Note: nested lists are parsed as continuations. +listContinuation :: Int + -> OrgParser String +listContinuation markerLength = try $ + notFollowedBy' blankline + *> (mappend <$> (concat <$> many1 listLine) + <*> many blankline) + where + listLine = try $ indentWith markerLength *> anyLineNewline + + -- indent by specified number of spaces (or equiv. tabs) + indentWith :: Int -> OrgParser String + indentWith num = do + tabStop <- getOption readerTabStop + if num < tabStop + then count num (char ' ') + else choice [ try (count num (char ' ')) + , try (char '\t' >> count (num - tabStop) (char ' ')) ] + +-- | Parse any line, include the final newline in the output. +anyLineNewline :: OrgParser String +anyLineNewline = (++ "\n") <$> anyLine diff --git a/src/Text/Pandoc/Readers/Org/Inlines.hs b/src/Text/Pandoc/Readers/Org/Inlines.hs new file mode 100644 index 000000000..001aeb569 --- /dev/null +++ b/src/Text/Pandoc/Readers/Org/Inlines.hs @@ -0,0 +1,762 @@ +{-# LANGUAGE OverloadedStrings #-} +{- +Copyright (C) 2014-2016 Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Readers.Org.Options + Copyright : Copyright (C) 2014-2016 Albert Krewinkel + License : GNU GPL, version 2 or above + + Maintainer : Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +Parsers for Org-mode inline elements. +-} +module Text.Pandoc.Readers.Org.Inlines + ( inline + , inlines + , addToNotesTable + , linkTarget + ) where + +import Text.Pandoc.Readers.Org.BlockStarts +import Text.Pandoc.Readers.Org.ParserState +import Text.Pandoc.Readers.Org.Parsing +import Text.Pandoc.Readers.Org.Shared + ( isImageFilename, rundocBlockClass, toRundocAttrib + , translateLang ) + +import qualified Text.Pandoc.Builder as B +import Text.Pandoc.Builder ( Inlines ) +import Text.Pandoc.Definition +import Text.Pandoc.Compat.Monoid ( (<>) ) +import Text.Pandoc.Options +import Text.Pandoc.Readers.LaTeX ( inlineCommand, rawLaTeXInline ) +import Text.TeXMath ( readTeX, writePandoc, DisplayType(..) ) +import qualified Text.TeXMath.Readers.MathML.EntityMap as MathMLEntityMap + +import Control.Monad ( guard, mplus, mzero, when ) +import Data.Char ( isAlphaNum, isSpace ) +import Data.List ( isPrefixOf ) +import Data.Maybe ( fromMaybe ) +import qualified Data.Map as M + +-- +-- Functions acting on the parser state +-- +recordAnchorId :: String -> OrgParser () +recordAnchorId i = updateState $ \s -> + s{ orgStateAnchorIds = i : (orgStateAnchorIds s) } + +pushToInlineCharStack :: Char -> OrgParser () +pushToInlineCharStack c = updateState $ \s -> + s{ orgStateEmphasisCharStack = c:orgStateEmphasisCharStack s } + +popInlineCharStack :: OrgParser () +popInlineCharStack = updateState $ \s -> + s{ orgStateEmphasisCharStack = drop 1 . orgStateEmphasisCharStack $ s } + +surroundingEmphasisChar :: OrgParser [Char] +surroundingEmphasisChar = + take 1 . drop 1 . orgStateEmphasisCharStack <$> getState + +startEmphasisNewlinesCounting :: Int -> OrgParser () +startEmphasisNewlinesCounting maxNewlines = updateState $ \s -> + s{ orgStateEmphasisNewlines = Just maxNewlines } + +decEmphasisNewlinesCount :: OrgParser () +decEmphasisNewlinesCount = updateState $ \s -> + s{ orgStateEmphasisNewlines = (\n -> n - 1) <$> orgStateEmphasisNewlines s } + +newlinesCountWithinLimits :: OrgParser Bool +newlinesCountWithinLimits = do + st <- getState + return $ ((< 0) <$> orgStateEmphasisNewlines st) /= Just True + +resetEmphasisNewlines :: OrgParser () +resetEmphasisNewlines = updateState $ \s -> + s{ orgStateEmphasisNewlines = Nothing } + +addToNotesTable :: OrgNoteRecord -> OrgParser () +addToNotesTable note = do + oldnotes <- orgStateNotes' <$> getState + updateState $ \s -> s{ orgStateNotes' = note:oldnotes } + +-- | Parse a single Org-mode inline element +inline :: OrgParser (F Inlines) +inline = + choice [ whitespace + , linebreak + , cite + , footnote + , linkOrImage + , anchor + , inlineCodeBlock + , str + , endline + , emphasizedText + , code + , math + , displayMath + , verbatim + , subscript + , superscript + , inlineLaTeX + , smart + , symbol + ] <* (guard =<< newlinesCountWithinLimits) + <?> "inline" + +-- | Read the rest of the input as inlines. +inlines :: OrgParser (F Inlines) +inlines = trimInlinesF . mconcat <$> many1 inline + +-- treat these as potentially non-text when parsing inline: +specialChars :: [Char] +specialChars = "\"$'()*+-,./:<=>[\\]^_{|}~" + + +whitespace :: OrgParser (F Inlines) +whitespace = pure B.space <$ skipMany1 spaceChar + <* updateLastPreCharPos + <* updateLastForbiddenCharPos + <?> "whitespace" + +linebreak :: OrgParser (F Inlines) +linebreak = try $ pure B.linebreak <$ string "\\\\" <* skipSpaces <* newline + +str :: OrgParser (F Inlines) +str = return . B.str <$> many1 (noneOf $ specialChars ++ "\n\r ") + <* updateLastStrPos + +-- | An endline character that can be treated as a space, not a structural +-- break. This should reflect the values of the Emacs variable +-- @org-element-pagaraph-separate@. +endline :: OrgParser (F Inlines) +endline = try $ do + newline + notFollowedBy blankline + notFollowedBy' exampleLineStart + notFollowedBy' hline + notFollowedBy' noteMarker + notFollowedBy' tableStart + notFollowedBy' drawerStart + notFollowedBy' headerStart + notFollowedBy' metaLineStart + notFollowedBy' latexEnvStart + notFollowedBy' commentLineStart + notFollowedBy' bulletListStart + notFollowedBy' orderedListStart + decEmphasisNewlinesCount + guard =<< newlinesCountWithinLimits + updateLastPreCharPos + return . return $ B.softbreak + +cite :: OrgParser (F Inlines) +cite = try $ do + guardEnabled Ext_citations + (cs, raw) <- withRaw (pandocOrgCite <|> orgRefCite) + return $ (flip B.cite (B.text raw)) <$> cs + +-- | A citation in Pandoc Org-mode style (@[\@citekey]@). +pandocOrgCite :: OrgParser (F [Citation]) +pandocOrgCite = try $ + char '[' *> skipSpaces *> citeList <* skipSpaces <* char ']' + +orgRefCite :: OrgParser (F [Citation]) +orgRefCite = try $ normalOrgRefCite <|> (fmap (:[]) <$> linkLikeOrgRefCite) + +normalOrgRefCite :: OrgParser (F [Citation]) +normalOrgRefCite = try $ do + mode <- orgRefCiteMode + sequence <$> sepBy1 (orgRefCiteList mode) (char ',') + where + -- | A list of org-ref style citation keys, parsed as citation of the given + -- citation mode. + orgRefCiteList :: CitationMode -> OrgParser (F Citation) + orgRefCiteList citeMode = try $ do + key <- orgRefCiteKey + returnF $ Citation + { citationId = key + , citationPrefix = mempty + , citationSuffix = mempty + , citationMode = citeMode + , citationNoteNum = 0 + , citationHash = 0 + } + +-- | Read a link-like org-ref style citation. The citation includes pre and +-- post text. However, multiple citations are not possible due to limitations +-- in the syntax. +linkLikeOrgRefCite :: OrgParser (F Citation) +linkLikeOrgRefCite = try $ do + _ <- string "[[" + mode <- orgRefCiteMode + key <- orgRefCiteKey + _ <- string "][" + pre <- trimInlinesF . mconcat <$> manyTill inline (try $ string "::") + spc <- option False (True <$ spaceChar) + suf <- trimInlinesF . mconcat <$> manyTill inline (try $ string "]]") + return $ do + pre' <- pre + suf' <- suf + return Citation + { citationId = key + , citationPrefix = B.toList pre' + , citationSuffix = B.toList (if spc then B.space <> suf' else suf') + , citationMode = mode + , citationNoteNum = 0 + , citationHash = 0 + } + +-- | Read a citation key. The characters allowed in citation keys are taken +-- from the `org-ref-cite-re` variable in `org-ref.el`. +orgRefCiteKey :: OrgParser String +orgRefCiteKey = try . many1 . satisfy $ \c -> + isAlphaNum c || c `elem` ("-_:\\./"::String) + +-- | Supported citation types. Only a small subset of org-ref types is +-- supported for now. TODO: rewrite this, use LaTeX reader as template. +orgRefCiteMode :: OrgParser CitationMode +orgRefCiteMode = + choice $ map (\(s, mode) -> mode <$ try (string s <* char ':')) + [ ("cite", AuthorInText) + , ("citep", NormalCitation) + , ("citep*", NormalCitation) + , ("citet", AuthorInText) + , ("citet*", AuthorInText) + , ("citeyear", SuppressAuthor) + ] + +citeList :: OrgParser (F [Citation]) +citeList = sequence <$> sepBy1 citation (try $ char ';' *> skipSpaces) + +citation :: OrgParser (F Citation) +citation = try $ do + pref <- prefix + (suppress_author, key) <- citeKey + suff <- suffix + return $ do + x <- pref + y <- suff + return $ Citation{ citationId = key + , citationPrefix = B.toList x + , citationSuffix = B.toList y + , citationMode = if suppress_author + then SuppressAuthor + else NormalCitation + , citationNoteNum = 0 + , citationHash = 0 + } + where + prefix = trimInlinesF . mconcat <$> + manyTill inline (char ']' <|> (']' <$ lookAhead citeKey)) + suffix = try $ do + hasSpace <- option False (notFollowedBy nonspaceChar >> return True) + skipSpaces + rest <- trimInlinesF . mconcat <$> + many (notFollowedBy (oneOf ";]") *> inline) + return $ if hasSpace + then (B.space <>) <$> rest + else rest + +footnote :: OrgParser (F Inlines) +footnote = try $ inlineNote <|> referencedNote + +inlineNote :: OrgParser (F Inlines) +inlineNote = try $ do + string "[fn:" + ref <- many alphaNum + char ':' + note <- fmap B.para . trimInlinesF . mconcat <$> many1Till inline (char ']') + when (not $ null ref) $ + addToNotesTable ("fn:" ++ ref, note) + return $ B.note <$> note + +referencedNote :: OrgParser (F Inlines) +referencedNote = try $ do + ref <- noteMarker + return $ do + notes <- asksF orgStateNotes' + case lookup ref notes of + Nothing -> return $ B.str $ "[" ++ ref ++ "]" + Just contents -> do + st <- askF + let contents' = runF contents st{ orgStateNotes' = [] } + return $ B.note contents' + +linkOrImage :: OrgParser (F Inlines) +linkOrImage = explicitOrImageLink + <|> selflinkOrImage + <|> angleLink + <|> plainLink + <?> "link or image" + +explicitOrImageLink :: OrgParser (F Inlines) +explicitOrImageLink = try $ do + char '[' + srcF <- applyCustomLinkFormat =<< possiblyEmptyLinkTarget + title <- enclosedRaw (char '[') (char ']') + title' <- parseFromString (mconcat <$> many inline) title + char ']' + return $ do + src <- srcF + if isImageFilename title + then pure $ B.link src "" $ B.image title mempty mempty + else linkToInlinesF src =<< title' + +selflinkOrImage :: OrgParser (F Inlines) +selflinkOrImage = try $ do + src <- char '[' *> linkTarget <* char ']' + return $ linkToInlinesF src (B.str src) + +plainLink :: OrgParser (F Inlines) +plainLink = try $ do + (orig, src) <- uri + returnF $ B.link src "" (B.str orig) + +angleLink :: OrgParser (F Inlines) +angleLink = try $ do + char '<' + link <- plainLink + char '>' + return link + +linkTarget :: OrgParser String +linkTarget = enclosedByPair '[' ']' (noneOf "\n\r[]") + +possiblyEmptyLinkTarget :: OrgParser String +possiblyEmptyLinkTarget = try linkTarget <|> ("" <$ string "[]") + +applyCustomLinkFormat :: String -> OrgParser (F String) +applyCustomLinkFormat link = do + let (linkType, rest) = break (== ':') link + return $ do + formatter <- M.lookup linkType <$> asksF orgStateLinkFormatters + return $ maybe link ($ drop 1 rest) formatter + +-- | Take a link and return a function which produces new inlines when given +-- description inlines. +linkToInlinesF :: String -> Inlines -> F Inlines +linkToInlinesF linkStr = + case linkStr of + "" -> pure . B.link mempty "" -- wiki link (empty by convention) + ('#':_) -> pure . B.link linkStr "" -- document-local fraction + _ -> case cleanLinkString linkStr of + (Just cleanedLink) -> if isImageFilename cleanedLink + then const . pure $ B.image cleanedLink "" "" + else pure . B.link cleanedLink "" + Nothing -> internalLink linkStr -- other internal link + +-- | Cleanup and canonicalize a string describing a link. Return @Nothing@ if +-- the string does not appear to be a link. +cleanLinkString :: String -> Maybe String +cleanLinkString s = + case s of + '/':_ -> Just $ "file://" ++ s -- absolute path + '.':'/':_ -> Just s -- relative path + '.':'.':'/':_ -> Just s -- relative path + -- Relative path or URL (file schema) + 'f':'i':'l':'e':':':s' -> Just $ if ("//" `isPrefixOf` s') then s else s' + _ | isUrl s -> Just s -- URL + _ -> Nothing + where + isUrl :: String -> Bool + isUrl cs = + let (scheme, path) = break (== ':') cs + in all (\c -> isAlphaNum c || c `elem` (".-"::String)) scheme + && not (null path) + +internalLink :: String -> Inlines -> F Inlines +internalLink link title = do + anchorB <- (link `elem`) <$> asksF orgStateAnchorIds + if anchorB + then return $ B.link ('#':link) "" title + else return $ B.emph title + +-- | Parse an anchor like @<<anchor-id>>@ and return an empty span with +-- @anchor-id@ set as id. Legal anchors in org-mode are defined through +-- @org-target-regexp@, which is fairly liberal. Since no link is created if +-- @anchor-id@ contains spaces, we are more restrictive in what is accepted as +-- an anchor. + +anchor :: OrgParser (F Inlines) +anchor = try $ do + anchorId <- parseAnchor + recordAnchorId anchorId + returnF $ B.spanWith (solidify anchorId, [], []) mempty + where + parseAnchor = string "<<" + *> many1 (noneOf "\t\n\r<>\"' ") + <* string ">>" + <* skipSpaces + +-- | Replace every char but [a-zA-Z0-9_.-:] with a hypen '-'. This mirrors +-- the org function @org-export-solidify-link-text@. + +solidify :: String -> String +solidify = map replaceSpecialChar + where replaceSpecialChar c + | isAlphaNum c = c + | c `elem` ("_.-:" :: String) = c + | otherwise = '-' + +-- | Parses an inline code block and marks it as an babel block. +inlineCodeBlock :: OrgParser (F Inlines) +inlineCodeBlock = try $ do + string "src_" + lang <- many1 orgArgWordChar + opts <- option [] $ enclosedByPair '[' ']' inlineBlockOption + inlineCode <- enclosedByPair '{' '}' (noneOf "\n\r") + let attrClasses = [translateLang lang, rundocBlockClass] + let attrKeyVal = map toRundocAttrib (("language", lang) : opts) + returnF $ B.codeWith ("", attrClasses, attrKeyVal) inlineCode + where + inlineBlockOption :: OrgParser (String, String) + inlineBlockOption = try $ do + argKey <- orgArgKey + paramValue <- option "yes" orgInlineParamValue + return (argKey, paramValue) + + orgInlineParamValue :: OrgParser String + orgInlineParamValue = try $ + skipSpaces + *> notFollowedBy (char ':') + *> many1 (noneOf "\t\n\r ]") + <* skipSpaces + + +emphasizedText :: OrgParser (F Inlines) +emphasizedText = do + state <- getState + guard . exportEmphasizedText . orgStateExportSettings $ state + try $ choice + [ emph + , strong + , strikeout + , underline + ] + +enclosedByPair :: Char -- ^ opening char + -> Char -- ^ closing char + -> OrgParser a -- ^ parser + -> OrgParser [a] +enclosedByPair s e p = char s *> many1Till p (char e) + +emph :: OrgParser (F Inlines) +emph = fmap B.emph <$> emphasisBetween '/' + +strong :: OrgParser (F Inlines) +strong = fmap B.strong <$> emphasisBetween '*' + +strikeout :: OrgParser (F Inlines) +strikeout = fmap B.strikeout <$> emphasisBetween '+' + +-- There is no underline, so we use strong instead. +underline :: OrgParser (F Inlines) +underline = fmap B.strong <$> emphasisBetween '_' + +verbatim :: OrgParser (F Inlines) +verbatim = return . B.code <$> verbatimBetween '=' + +code :: OrgParser (F Inlines) +code = return . B.code <$> verbatimBetween '~' + +subscript :: OrgParser (F Inlines) +subscript = fmap B.subscript <$> try (char '_' *> subOrSuperExpr) + +superscript :: OrgParser (F Inlines) +superscript = fmap B.superscript <$> try (char '^' *> subOrSuperExpr) + +math :: OrgParser (F Inlines) +math = return . B.math <$> choice [ math1CharBetween '$' + , mathStringBetween '$' + , rawMathBetween "\\(" "\\)" + ] + +displayMath :: OrgParser (F Inlines) +displayMath = return . B.displayMath <$> choice [ rawMathBetween "\\[" "\\]" + , rawMathBetween "$$" "$$" + ] + +updatePositions :: Char + -> OrgParser (Char) +updatePositions c = do + when (c `elem` emphasisPreChars) updateLastPreCharPos + when (c `elem` emphasisForbiddenBorderChars) updateLastForbiddenCharPos + return c + +symbol :: OrgParser (F Inlines) +symbol = return . B.str . (: "") <$> (oneOf specialChars >>= updatePositions) + +emphasisBetween :: Char + -> OrgParser (F Inlines) +emphasisBetween c = try $ do + startEmphasisNewlinesCounting emphasisAllowedNewlines + res <- enclosedInlines (emphasisStart c) (emphasisEnd c) + isTopLevelEmphasis <- null . orgStateEmphasisCharStack <$> getState + when isTopLevelEmphasis + resetEmphasisNewlines + return res + +verbatimBetween :: Char + -> OrgParser String +verbatimBetween c = try $ + emphasisStart c *> + many1TillNOrLessNewlines 1 (noneOf "\n\r") (emphasisEnd c) + +-- | Parses a raw string delimited by @c@ using Org's math rules +mathStringBetween :: Char + -> OrgParser String +mathStringBetween c = try $ do + mathStart c + body <- many1TillNOrLessNewlines mathAllowedNewlines + (noneOf (c:"\n\r")) + (lookAhead $ mathEnd c) + final <- mathEnd c + return $ body ++ [final] + +-- | Parse a single character between @c@ using math rules +math1CharBetween :: Char + -> OrgParser String +math1CharBetween c = try $ do + char c + res <- noneOf $ c:mathForbiddenBorderChars + char c + eof <|> () <$ lookAhead (oneOf mathPostChars) + return [res] + +rawMathBetween :: String + -> String + -> OrgParser String +rawMathBetween s e = try $ string s *> manyTill anyChar (try $ string e) + +-- | Parses the start (opening character) of emphasis +emphasisStart :: Char -> OrgParser Char +emphasisStart c = try $ do + guard =<< afterEmphasisPreChar + guard =<< notAfterString + char c + lookAhead (noneOf emphasisForbiddenBorderChars) + pushToInlineCharStack c + return c + +-- | Parses the closing character of emphasis +emphasisEnd :: Char -> OrgParser Char +emphasisEnd c = try $ do + guard =<< notAfterForbiddenBorderChar + char c + eof <|> () <$ lookAhead acceptablePostChars + updateLastStrPos + popInlineCharStack + return c + where acceptablePostChars = + surroundingEmphasisChar >>= \x -> oneOf (x ++ emphasisPostChars) + +mathStart :: Char -> OrgParser Char +mathStart c = try $ + char c <* notFollowedBy' (oneOf (c:mathForbiddenBorderChars)) + +mathEnd :: Char -> OrgParser Char +mathEnd c = try $ do + res <- noneOf (c:mathForbiddenBorderChars) + char c + eof <|> () <$ lookAhead (oneOf mathPostChars) + return res + + +enclosedInlines :: OrgParser a + -> OrgParser b + -> OrgParser (F Inlines) +enclosedInlines start end = try $ + trimInlinesF . mconcat <$> enclosed start end inline + +enclosedRaw :: OrgParser a + -> OrgParser b + -> OrgParser String +enclosedRaw start end = try $ + start *> (onSingleLine <|> spanningTwoLines) + where onSingleLine = try $ many1Till (noneOf "\n\r") end + spanningTwoLines = try $ + anyLine >>= \f -> mappend (f <> " ") <$> onSingleLine + +-- | Like many1Till, but parses at most @n+1@ lines. @p@ must not consume +-- newlines. +many1TillNOrLessNewlines :: Int + -> OrgParser Char + -> OrgParser a + -> OrgParser String +many1TillNOrLessNewlines n p end = try $ + nMoreLines (Just n) mempty >>= oneOrMore + where + nMoreLines Nothing cs = return cs + nMoreLines (Just 0) cs = try $ (cs ++) <$> finalLine + nMoreLines k cs = try $ (final k cs <|> rest k cs) + >>= uncurry nMoreLines + final _ cs = (\x -> (Nothing, cs ++ x)) <$> try finalLine + rest m cs = (\x -> (minus1 <$> m, cs ++ x ++ "\n")) <$> try (manyTill p newline) + finalLine = try $ manyTill p end + minus1 k = k - 1 + oneOrMore cs = guard (not $ null cs) *> return cs + +-- Org allows customization of the way it reads emphasis. We use the defaults +-- here (see, e.g., the Emacs Lisp variable `org-emphasis-regexp-components` +-- for details). + +-- | Chars allowed to occur before emphasis (spaces and newlines are ok, too) +emphasisPreChars :: [Char] +emphasisPreChars = "\t \"'({" + +-- | Chars allowed at after emphasis +emphasisPostChars :: [Char] +emphasisPostChars = "\t\n !\"'),-.:;?\\}" + +-- | Chars not allowed at the (inner) border of emphasis +emphasisForbiddenBorderChars :: [Char] +emphasisForbiddenBorderChars = "\t\n\r \"'," + +-- | The maximum number of newlines within +emphasisAllowedNewlines :: Int +emphasisAllowedNewlines = 1 + +-- LaTeX-style math: see `org-latex-regexps` for details + +-- | Chars allowed after an inline ($...$) math statement +mathPostChars :: [Char] +mathPostChars = "\t\n \"'),-.:;?" + +-- | Chars not allowed at the (inner) border of math +mathForbiddenBorderChars :: [Char] +mathForbiddenBorderChars = "\t\n\r ,;.$" + +-- | Maximum number of newlines in an inline math statement +mathAllowedNewlines :: Int +mathAllowedNewlines = 2 + +-- | Whether we are right behind a char allowed before emphasis +afterEmphasisPreChar :: OrgParser Bool +afterEmphasisPreChar = do + pos <- getPosition + lastPrePos <- orgStateLastPreCharPos <$> getState + return . fromMaybe True $ (== pos) <$> lastPrePos + +-- | Whether the parser is right after a forbidden border char +notAfterForbiddenBorderChar :: OrgParser Bool +notAfterForbiddenBorderChar = do + pos <- getPosition + lastFBCPos <- orgStateLastForbiddenCharPos <$> getState + return $ lastFBCPos /= Just pos + +-- | Read a sub- or superscript expression +subOrSuperExpr :: OrgParser (F Inlines) +subOrSuperExpr = try $ + choice [ id <$> charsInBalanced '{' '}' (noneOf "\n\r") + , enclosing ('(', ')') <$> charsInBalanced '(' ')' (noneOf "\n\r") + , simpleSubOrSuperString + ] >>= parseFromString (mconcat <$> many inline) + where enclosing (left, right) s = left : s ++ [right] + +simpleSubOrSuperString :: OrgParser String +simpleSubOrSuperString = try $ do + state <- getState + guard . exportSubSuperscripts . orgStateExportSettings $ state + choice [ string "*" + , mappend <$> option [] ((:[]) <$> oneOf "+-") + <*> many1 alphaNum + ] + +inlineLaTeX :: OrgParser (F Inlines) +inlineLaTeX = try $ do + cmd <- inlineLaTeXCommand + maybe mzero returnF $ + parseAsMath cmd `mplus` parseAsMathMLSym cmd `mplus` parseAsInlineLaTeX cmd + where + parseAsMath :: String -> Maybe Inlines + parseAsMath cs = B.fromList <$> texMathToPandoc cs + + parseAsInlineLaTeX :: String -> Maybe Inlines + parseAsInlineLaTeX cs = maybeRight $ runParser inlineCommand state "" cs + + parseAsMathMLSym :: String -> Maybe Inlines + parseAsMathMLSym cs = B.str <$> MathMLEntityMap.getUnicode (clean cs) + -- drop initial backslash and any trailing "{}" + where clean = dropWhileEnd (`elem` ("{}" :: String)) . drop 1 + + state :: ParserState + state = def{ stateOptions = def{ readerParseRaw = True }} + + texMathToPandoc :: String -> Maybe [Inline] + texMathToPandoc cs = (maybeRight $ readTeX cs) >>= writePandoc DisplayInline + +maybeRight :: Either a b -> Maybe b +maybeRight = either (const Nothing) Just + +inlineLaTeXCommand :: OrgParser String +inlineLaTeXCommand = try $ do + rest <- getInput + case runParser rawLaTeXInline def "source" rest of + Right (RawInline _ cs) -> do + -- drop any trailing whitespace, those are not be part of the command as + -- far as org mode is concerned. + let cmdNoSpc = dropWhileEnd isSpace cs + let len = length cmdNoSpc + count len anyChar + return cmdNoSpc + _ -> mzero + +-- Taken from Data.OldList. +dropWhileEnd :: (a -> Bool) -> [a] -> [a] +dropWhileEnd p = foldr (\x xs -> if p x && null xs then [] else x : xs) [] + +smart :: OrgParser (F Inlines) +smart = do + getOption readerSmart >>= guard + doubleQuoted <|> singleQuoted <|> + choice (map (return <$>) [orgApostrophe, orgDash, orgEllipses]) + where + orgDash = do + guard =<< getExportSetting exportSpecialStrings + dash <* updatePositions '-' + orgEllipses = do + guard =<< getExportSetting exportSpecialStrings + ellipses <* updatePositions '.' + orgApostrophe = + (char '\'' <|> char '\8217') <* updateLastPreCharPos + <* updateLastForbiddenCharPos + *> return (B.str "\x2019") + +singleQuoted :: OrgParser (F Inlines) +singleQuoted = try $ do + guard =<< getExportSetting exportSmartQuotes + singleQuoteStart + updatePositions '\'' + withQuoteContext InSingleQuote $ + fmap B.singleQuoted . trimInlinesF . mconcat <$> + many1Till inline (singleQuoteEnd <* updatePositions '\'') + +-- doubleQuoted will handle regular double-quoted sections, as well +-- as dialogues with an open double-quote without a close double-quote +-- in the same paragraph. +doubleQuoted :: OrgParser (F Inlines) +doubleQuoted = try $ do + guard =<< getExportSetting exportSmartQuotes + doubleQuoteStart + updatePositions '"' + contents <- mconcat <$> many (try $ notFollowedBy doubleQuoteEnd >> inline) + (withQuoteContext InDoubleQuote $ (doubleQuoteEnd <* updateLastForbiddenCharPos) >> return + (fmap B.doubleQuoted . trimInlinesF $ contents)) + <|> (return $ return (B.str "\8220") <> contents) diff --git a/src/Text/Pandoc/Readers/Org/ParserState.hs b/src/Text/Pandoc/Readers/Org/ParserState.hs new file mode 100644 index 000000000..0c58183f9 --- /dev/null +++ b/src/Text/Pandoc/Readers/Org/ParserState.hs @@ -0,0 +1,237 @@ +{-# LANGUAGE FlexibleInstances #-} +{-# LANGUAGE GeneralizedNewtypeDeriving #-} +{-# LANGUAGE MultiParamTypeClasses #-} +{- +Copyright (C) 2014-2016 Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Readers.Org.Options + Copyright : Copyright (C) 2014-2016 Albert Krewinkel + License : GNU GPL, version 2 or above + + Maintainer : Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +Define the Org-mode parser state. +-} +module Text.Pandoc.Readers.Org.ParserState + ( OrgParserState (..) + , OrgParserLocal (..) + , OrgNoteRecord + , HasReaderOptions (..) + , HasQuoteContext (..) + , F(..) + , askF + , asksF + , trimInlinesF + , runF + , returnF + , ExportSettingSetter + , ExportSettings (..) + , setExportDrawers + , setExportEmphasizedText + , setExportSmartQuotes + , setExportSpecialStrings + , setExportSubSuperscripts + , modifyExportSettings + , optionsToParserState + ) where + +import Control.Monad (liftM, liftM2) +import Control.Monad.Reader (Reader, runReader, ask, asks, local) + +import Data.Default (Default(..)) +import qualified Data.Map as M +import qualified Data.Set as Set + +import Text.Pandoc.Builder ( Inlines, Blocks, trimInlines ) +import Text.Pandoc.Definition ( Meta(..), nullMeta ) +import Text.Pandoc.Options ( ReaderOptions(..) ) +import Text.Pandoc.Parsing ( HasHeaderMap(..) + , HasIdentifierList(..) + , HasLastStrPosition(..) + , HasQuoteContext(..) + , HasReaderOptions(..) + , ParserContext(..) + , QuoteContext(..) + , SourcePos ) + +-- | An inline note / footnote containing the note key and its (inline) value. +type OrgNoteRecord = (String, F Blocks) +-- | Table of footnotes +type OrgNoteTable = [OrgNoteRecord] +-- | Map of functions for link transformations. The map key is refers to the +-- link-type, the corresponding function transforms the given link string. +type OrgLinkFormatters = M.Map String (String -> String) + +-- | Export settings <http://orgmode.org/manual/Export-settings.html> +-- These settings can be changed via OPTIONS statements. +data ExportSettings = ExportSettings + { exportDrawers :: Either [String] [String] + -- ^ Specify drawer names which should be exported. @Left@ names are + -- explicitly excluded from the resulting output while @Right@ means that + -- only the listed drawer names should be included. + , exportEmphasizedText :: Bool -- ^ Parse emphasized text + , exportSmartQuotes :: Bool -- ^ Parse quotes smartly + , exportSpecialStrings :: Bool -- ^ Parse ellipses and dashes smartly + , exportSubSuperscripts :: Bool -- ^ TeX-like syntax for sub- and superscripts + } + +-- | Org-mode parser state +data OrgParserState = OrgParserState + { orgStateAnchorIds :: [String] + , orgStateEmphasisCharStack :: [Char] + , orgStateEmphasisNewlines :: Maybe Int + , orgStateExportSettings :: ExportSettings + , orgStateHeaderMap :: M.Map Inlines String + , orgStateIdentifiers :: Set.Set String + , orgStateLastForbiddenCharPos :: Maybe SourcePos + , orgStateLastPreCharPos :: Maybe SourcePos + , orgStateLastStrPos :: Maybe SourcePos + , orgStateLinkFormatters :: OrgLinkFormatters + , orgStateMeta :: F Meta + , orgStateNotes' :: OrgNoteTable + , orgStateOptions :: ReaderOptions + , orgStateParserContext :: ParserContext + } + +data OrgParserLocal = OrgParserLocal { orgLocalQuoteContext :: QuoteContext } + +instance Default OrgParserLocal where + def = OrgParserLocal NoQuote + +instance HasReaderOptions OrgParserState where + extractReaderOptions = orgStateOptions + +instance HasLastStrPosition OrgParserState where + getLastStrPos = orgStateLastStrPos + setLastStrPos pos st = st{ orgStateLastStrPos = Just pos } + +instance HasQuoteContext st (Reader OrgParserLocal) where + getQuoteContext = asks orgLocalQuoteContext + withQuoteContext q = local (\s -> s{orgLocalQuoteContext = q}) + +instance HasIdentifierList OrgParserState where + extractIdentifierList = orgStateIdentifiers + updateIdentifierList f s = s{ orgStateIdentifiers = f (orgStateIdentifiers s) } + +instance HasHeaderMap OrgParserState where + extractHeaderMap = orgStateHeaderMap + updateHeaderMap f s = s{ orgStateHeaderMap = f (orgStateHeaderMap s) } + +instance Default ExportSettings where + def = defaultExportSettings + +instance Default OrgParserState where + def = defaultOrgParserState + +defaultOrgParserState :: OrgParserState +defaultOrgParserState = OrgParserState + { orgStateAnchorIds = [] + , orgStateEmphasisCharStack = [] + , orgStateEmphasisNewlines = Nothing + , orgStateExportSettings = def + , orgStateHeaderMap = M.empty + , orgStateIdentifiers = Set.empty + , orgStateLastForbiddenCharPos = Nothing + , orgStateLastPreCharPos = Nothing + , orgStateLastStrPos = Nothing + , orgStateLinkFormatters = M.empty + , orgStateMeta = return nullMeta + , orgStateNotes' = [] + , orgStateOptions = def + , orgStateParserContext = NullState + } + +defaultExportSettings :: ExportSettings +defaultExportSettings = ExportSettings + { exportDrawers = Left ["LOGBOOK"] + , exportEmphasizedText = True + , exportSmartQuotes = True + , exportSpecialStrings = True + , exportSubSuperscripts = True + } + +optionsToParserState :: ReaderOptions -> OrgParserState +optionsToParserState opts = + def { orgStateOptions = opts } + + +-- +-- Setter for exporting options +-- +type ExportSettingSetter a = a -> ExportSettings -> ExportSettings + +-- | Set export options for drawers. See the @exportDrawers@ in ADT +-- @ExportSettings@ for details. +setExportDrawers :: ExportSettingSetter (Either [String] [String]) +setExportDrawers val es = es { exportDrawers = val } + +-- | Set export options for emphasis parsing. +setExportEmphasizedText :: ExportSettingSetter Bool +setExportEmphasizedText val es = es { exportEmphasizedText = val } + +-- | Set export options for parsing of smart quotes. +setExportSmartQuotes :: ExportSettingSetter Bool +setExportSmartQuotes val es = es { exportSmartQuotes = val } + +-- | Set export options for parsing of special strings (like em/en dashes or +-- ellipses). +setExportSpecialStrings :: ExportSettingSetter Bool +setExportSpecialStrings val es = es { exportSpecialStrings = val } + +-- | Set export options for sub/superscript parsing. The short syntax will +-- not be parsed if this is set set to @False@. +setExportSubSuperscripts :: ExportSettingSetter Bool +setExportSubSuperscripts val es = es { exportSubSuperscripts = val } + +-- | Modify a parser state +modifyExportSettings :: ExportSettingSetter a -> a -> OrgParserState -> OrgParserState +modifyExportSettings setter val state = + state { orgStateExportSettings = setter val . orgStateExportSettings $ state } + + +-- +-- Parser state reader +-- + +-- | Reader monad wrapping the parser state. This is used to delay evaluation +-- until all relevant information has been parsed and made available in the +-- parser state. See also the newtype of the same name in +-- Text.Pandoc.Parsing. +newtype F a = F { unF :: Reader OrgParserState a + } deriving (Functor, Applicative, Monad) + +instance Monoid a => Monoid (F a) where + mempty = return mempty + mappend = liftM2 mappend + mconcat = fmap mconcat . sequence + +runF :: F a -> OrgParserState -> a +runF = runReader . unF + +askF :: F OrgParserState +askF = F ask + +asksF :: (OrgParserState -> a) -> F a +asksF f = F $ asks f + +trimInlinesF :: F Inlines -> F Inlines +trimInlinesF = liftM trimInlines + +returnF :: Monad m => a -> m (F a) +returnF = return . return diff --git a/src/Text/Pandoc/Readers/Org/Parsing.hs b/src/Text/Pandoc/Readers/Org/Parsing.hs new file mode 100644 index 000000000..8cf0c696c --- /dev/null +++ b/src/Text/Pandoc/Readers/Org/Parsing.hs @@ -0,0 +1,214 @@ +{- +Copyright (C) 2014-2016 Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Readers.Org.Options + Copyright : Copyright (C) 2014-2016 Albert Krewinkel + License : GNU GPL, version 2 or above + + Maintainer : Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +Org-mode parsing utilities. + +Most functions are simply re-exports from @Text.Pandoc.Parsing@, some +functions are adapted to Org-mode specific functionality. +-} +module Text.Pandoc.Readers.Org.Parsing + ( OrgParser + , anyLine + , blanklines + , newline + , parseFromString + , skipSpaces1 + , inList + , withContext + , getExportSetting + , updateLastForbiddenCharPos + , updateLastPreCharPos + , orgArgKey + , orgArgWord + , orgArgWordChar + -- * Re-exports from Text.Pandoc.Parser + , ParserContext (..) + , many1Till + , notFollowedBy' + , spaceChar + , nonspaceChar + , skipSpaces + , blankline + , enclosed + , stringAnyCase + , charsInBalanced + , uri + , withRaw + , readWithM + , guardEnabled + , updateLastStrPos + , notAfterString + , ParserState (..) + , registerHeader + , QuoteContext (..) + , singleQuoteStart + , singleQuoteEnd + , doubleQuoteStart + , doubleQuoteEnd + , dash + , ellipses + , citeKey + -- * Re-exports from Text.Pandoc.Parsec + , runParser + , getInput + , char + , letter + , digit + , alphaNum + , skipMany1 + , spaces + , anyChar + , satisfy + , string + , count + , eof + , noneOf + , oneOf + , lookAhead + , notFollowedBy + , many + , many1 + , manyTill + , (<|>) + , (<?>) + , choice + , try + , sepBy + , sepBy1 + , option + , optional + , optionMaybe + , getState + , updateState + , SourcePos + , getPosition + ) where + +import Text.Pandoc.Readers.Org.ParserState + +import qualified Text.Pandoc.Parsing as P +import Text.Pandoc.Parsing hiding ( anyLine, blanklines, newline + , parseFromString ) + +import Control.Monad ( guard ) +import Control.Monad.Reader ( Reader ) + +-- | The parser used to read org files. +type OrgParser = ParserT [Char] OrgParserState (Reader OrgParserLocal) + +-- +-- Adaptions and specializations of parsing utilities +-- + +-- | Parse any line of text +anyLine :: OrgParser String +anyLine = + P.anyLine + <* updateLastPreCharPos + <* updateLastForbiddenCharPos + +-- The version Text.Pandoc.Parsing cannot be used, as we need additional parts +-- of the state saved and restored. +parseFromString :: OrgParser a -> String -> OrgParser a +parseFromString parser str' = do + oldLastPreCharPos <- orgStateLastPreCharPos <$> getState + updateState $ \s -> s{ orgStateLastPreCharPos = Nothing } + result <- P.parseFromString parser str' + updateState $ \s -> s{ orgStateLastPreCharPos = oldLastPreCharPos } + return result + +-- | Skip one or more tab or space characters. +skipSpaces1 :: OrgParser () +skipSpaces1 = skipMany1 spaceChar + +-- | Like @Text.Parsec.Char.newline@, but causes additional state changes. +newline :: OrgParser Char +newline = + P.newline + <* updateLastPreCharPos + <* updateLastForbiddenCharPos + +-- | Like @Text.Parsec.Char.blanklines@, but causes additional state changes. +blanklines :: OrgParser [Char] +blanklines = + P.blanklines + <* updateLastPreCharPos + <* updateLastForbiddenCharPos + +-- | Succeeds when we're in list context. +inList :: OrgParser () +inList = do + ctx <- orgStateParserContext <$> getState + guard (ctx == ListItemState) + +-- | Parse in different context +withContext :: ParserContext -- ^ New parser context + -> OrgParser a -- ^ Parser to run in that context + -> OrgParser a +withContext context parser = do + oldContext <- orgStateParserContext <$> getState + updateState $ \s -> s{ orgStateParserContext = context } + result <- parser + updateState $ \s -> s{ orgStateParserContext = oldContext } + return result + +-- +-- Parser state functions +-- + +-- | Get an export setting. +getExportSetting :: (ExportSettings -> a) -> OrgParser a +getExportSetting s = s . orgStateExportSettings <$> getState + +-- | Set the current position as the last position at which a forbidden char +-- was found (i.e. a character which is not allowed at the inner border of +-- markup). +updateLastForbiddenCharPos :: OrgParser () +updateLastForbiddenCharPos = getPosition >>= \p -> + updateState $ \s -> s{ orgStateLastForbiddenCharPos = Just p} + +-- | Set the current parser position as the position at which a character was +-- seen which allows inline markup to follow. +updateLastPreCharPos :: OrgParser () +updateLastPreCharPos = getPosition >>= \p -> + updateState $ \s -> s{ orgStateLastPreCharPos = Just p} + +-- +-- Org key-value parsing +-- + +-- | Read the key of a plist style key-value list. +orgArgKey :: OrgParser String +orgArgKey = try $ + skipSpaces *> char ':' + *> many1 orgArgWordChar + +-- | Read the value of a plist style key-value list. +orgArgWord :: OrgParser String +orgArgWord = many1 orgArgWordChar + +-- | Chars treated as part of a word in plists. +orgArgWordChar :: OrgParser Char +orgArgWordChar = alphaNum <|> oneOf "-_" diff --git a/src/Text/Pandoc/Readers/Org/Shared.hs b/src/Text/Pandoc/Readers/Org/Shared.hs new file mode 100644 index 000000000..3ba46b9e4 --- /dev/null +++ b/src/Text/Pandoc/Readers/Org/Shared.hs @@ -0,0 +1,76 @@ +{-# LANGUAGE OverloadedStrings #-} +{- +Copyright (C) 2014-2016 Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Readers.Org.Options + Copyright : Copyright (C) 2014-2016 Albert Krewinkel + License : GNU GPL, version 2 or above + + Maintainer : Albert Krewinkel <tarleb+pandoc@moltkeplatz.de> + +Utility functions used in other Pandoc Org modules. +-} +module Text.Pandoc.Readers.Org.Shared + ( isImageFilename + , rundocBlockClass + , toRundocAttrib + , translateLang + ) where + +import Control.Arrow ( first ) +import Data.List ( isPrefixOf, isSuffixOf ) + + +-- | Check whether the given string looks like the path to of URL of an image. +isImageFilename :: String -> Bool +isImageFilename filename = + any (\x -> ('.':x) `isSuffixOf` filename) imageExtensions && + (any (\x -> (x++":") `isPrefixOf` filename) protocols || + ':' `notElem` filename) + where + imageExtensions = [ "jpeg" , "jpg" , "png" , "gif" , "svg" ] + protocols = [ "file", "http", "https" ] + +-- | Prefix used for Rundoc classes and arguments. +rundocPrefix :: String +rundocPrefix = "rundoc-" + +-- | The class-name used to mark rundoc blocks. +rundocBlockClass :: String +rundocBlockClass = rundocPrefix ++ "block" + +-- | Prefix the name of a attribute, marking it as a code execution parameter. +toRundocAttrib :: (String, String) -> (String, String) +toRundocAttrib = first (rundocPrefix ++) + +-- | Translate from Org-mode's programming language identifiers to those used +-- by Pandoc. This is useful to allow for proper syntax highlighting in +-- Pandoc output. +translateLang :: String -> String +translateLang cs = + case cs of + "C" -> "c" + "C++" -> "cpp" + "emacs-lisp" -> "commonlisp" -- emacs lisp is not supported + "js" -> "javascript" + "lisp" -> "commonlisp" + "R" -> "r" + "sh" -> "bash" + "sqlite" -> "sql" + _ -> cs diff --git a/src/Text/Pandoc/Readers/RST.hs b/src/Text/Pandoc/Readers/RST.hs index 7be0cd392..296c55f32 100644 --- a/src/Text/Pandoc/Readers/RST.hs +++ b/src/Text/Pandoc/Readers/RST.hs @@ -586,8 +586,9 @@ directive' = do case trim top of "" -> stateRstDefaultRole def role -> role }) - "code" -> codeblock (lookup "number-lines" fields) (trim top) body - "code-block" -> codeblock (lookup "number-lines" fields) (trim top) body + x | x == "code" || x == "code-block" -> + codeblock (words $ fromMaybe [] $ lookup "class" fields) + (lookup "number-lines" fields) (trim top) body "aafig" -> do let attribs = ("", ["aafig"], map (\(k,v) -> (k, trimr v)) fields) return $ B.codeBlockWith attribs $ stripTrailingNewlines body @@ -713,12 +714,13 @@ toChunks = dropWhile null . map (trim . unlines) . splitBy (all (`elem` (" \t" :: String))) . lines -codeblock :: Maybe String -> String -> String -> RSTParser Blocks -codeblock numberLines lang body = +codeblock :: [String] -> Maybe String -> String -> String -> RSTParser Blocks +codeblock classes numberLines lang body = return $ B.codeBlockWith attribs $ stripTrailingNewlines body - where attribs = ("", classes, kvs) - classes = "sourceCode" : lang + where attribs = ("", classes', kvs) + classes' = "sourceCode" : lang : maybe [] (\_ -> ["numberLines"]) numberLines + ++ classes kvs = case numberLines of Just "" -> [] Nothing -> [] diff --git a/src/Text/Pandoc/Writers/Docbook.hs b/src/Text/Pandoc/Writers/Docbook.hs index 2aaebf99f..9acfe289a 100644 --- a/src/Text/Pandoc/Writers/Docbook.hs +++ b/src/Text/Pandoc/Writers/Docbook.hs @@ -112,10 +112,15 @@ elementToDocbook opts lvl (Sec _ _num (id',_,_) title elements) = else elements tag = case lvl of n | n == 0 -> "chapter" - | n >= 1 && n <= 5 -> "sect" ++ show n + | n >= 1 && n <= 5 -> if writerDocbook5 opts + then "section" + else "sect" ++ show n | otherwise -> "simplesect" - in inTags True tag [("id", writerIdentifierPrefix opts ++ id') | - not (null id')] $ + idAttr = [("id", writerIdentifierPrefix opts ++ id') | not (null id')] + nsAttr = if writerDocbook5 opts && lvl == 0 then [("xmlns", "http://docbook.org/ns/docbook")] + else [] + attribs = nsAttr ++ idAttr + in inTags True tag attribs $ inTagsSimple "title" (inlinesToDocbook opts title) $$ vcat (map (elementToDocbook opts (lvl + 1)) elements') @@ -227,9 +232,11 @@ blockToDocbook opts (OrderedList (start, numstyle, _) (first:rest)) = blockToDocbook opts (DefinitionList lst) = let attribs = [("spacing", "compact") | isTightList $ concatMap snd lst] in inTags True "variablelist" attribs $ deflistItemsToDocbook opts lst -blockToDocbook _ (RawBlock f str) +blockToDocbook opts (RawBlock f str) | f == "docbook" = text str -- raw XML block - | f == "html" = text str -- allow html for backwards compatibility + | f == "html" = if writerDocbook5 opts + then empty -- No html in Docbook5 + else text str -- allow html for backwards compatibility | otherwise = empty blockToDocbook _ HorizontalRule = empty -- not semantic blockToDocbook opts (Table caption aligns widths headers rows) = @@ -344,7 +351,9 @@ inlineToDocbook opts (Link attr txt (src, _)) | otherwise = (if isPrefixOf "#" src then inTags False "link" $ ("linkend", drop 1 src) : idAndRole attr - else inTags False "ulink" $ ("url", src) : idAndRole attr ) $ + else if writerDocbook5 opts + then inTags False "link" $ ("xlink:href", src) : idAndRole attr + else inTags False "ulink" $ ("url", src) : idAndRole attr ) $ inlinesToDocbook opts txt inlineToDocbook opts (Image attr _ (src, tit)) = let titleDoc = if null tit diff --git a/src/Text/Pandoc/Writers/EPUB.hs b/src/Text/Pandoc/Writers/EPUB.hs index 804dbb926..90f502f6f 100644 --- a/src/Text/Pandoc/Writers/EPUB.hs +++ b/src/Text/Pandoc/Writers/EPUB.hs @@ -667,7 +667,8 @@ writeEPUB opts doc@(Pandoc meta _) = do ] ] else [] - let navData = renderHtml $ writeHtml opts' + let navData = renderHtml $ writeHtml + opts'{ writerVariables = ("navpage","true"):vars } (Pandoc (setMeta "title" (walk removeNote $ fromList $ docTitle' meta) nullMeta) (navBlocks ++ landmarks)) diff --git a/src/Text/Pandoc/Writers/HTML.hs b/src/Text/Pandoc/Writers/HTML.hs index c5b6a6db2..d8b8384e7 100644 --- a/src/Text/Pandoc/Writers/HTML.hs +++ b/src/Text/Pandoc/Writers/HTML.hs @@ -855,13 +855,12 @@ inlineToHtml opts inline = (Note contents) | writerIgnoreNotes opts -> return mempty | otherwise -> do - st <- get - let notes = stNotes st + notes <- gets stNotes let number = (length notes) + 1 let ref = show number htmlContents <- blockListToNote opts ref contents -- push contents onto front of notes - put $ st {stNotes = (htmlContents:notes)} + modify $ \st -> st {stNotes = (htmlContents:notes)} let revealSlash = ['/' | writerSlideVariant opts == RevealJsSlides] let link = H.a ! A.href (toValue $ "#" ++ diff --git a/src/Text/Pandoc/Writers/LaTeX.hs b/src/Text/Pandoc/Writers/LaTeX.hs index dd5b14424..888c866a6 100644 --- a/src/Text/Pandoc/Writers/LaTeX.hs +++ b/src/Text/Pandoc/Writers/LaTeX.hs @@ -39,8 +39,10 @@ import Text.Pandoc.Templates import Text.Printf ( printf ) import Network.URI ( isURI, unEscapeString ) import Data.Aeson (object, (.=), FromJSON) -import Data.List ( (\\), isInfixOf, stripPrefix, intercalate, intersperse, nub, nubBy ) -import Data.Char ( toLower, isPunctuation, isAscii, isLetter, isDigit, ord ) +import Data.List ( (\\), isInfixOf, stripPrefix, intercalate, intersperse, + nub, nubBy, foldl' ) +import Data.Char ( toLower, isPunctuation, isAscii, isLetter, isDigit, + ord, isAlphaNum ) import Data.Maybe ( fromMaybe, isJust, catMaybes ) import qualified Data.Text as T import Control.Applicative ((<|>)) @@ -223,7 +225,7 @@ pandocToLaTeX options (Pandoc meta blocks) = do ++ poly ++ "}{##2}}}\n" else "\\newcommand{\\text" ++ poly ++ "}[2][]{\\foreignlanguage{" ++ babel ++ "}{#2}}\n" ++ - "\\newenvironment{" ++ poly ++ "}[1]{\\begin{otherlanguage}{" + "\\newenvironment{" ++ poly ++ "}[2][]{\\begin{otherlanguage}{" ++ babel ++ "}}{\\end{otherlanguage}}\n" ) -- eliminate duplicates that have same polyglossia name @@ -308,7 +310,7 @@ toLabel z = go `fmap` stringToLaTeX URLString z where go [] = "" go (x:xs) | (isLetter x || isDigit x) && isAscii x = x:go xs - | elem x ("-+=:;." :: String) = x:go xs + | elem x ("_-+=:;." :: String) = x:go xs | otherwise = "ux" ++ printf "%x" (ord x) ++ go xs -- | Puts contents into LaTeX command. @@ -409,8 +411,6 @@ blockToLaTeX (Para [Image attr@(ident, _, _) txt (src,'f':'i':'g':':':tit)]) = d capt <- inlineListToLaTeX txt notes <- gets stNotes modify $ \st -> st{ stInMinipage = False, stNotes = [] } - ref <- text `fmap` toLabel ident - internalLinks <- gets stInternalLinks -- We can't have footnotes in the list of figures, so remove them: captForLof <- if null notes @@ -473,23 +473,27 @@ blockToLaTeX (CodeBlock (identifier,classes,keyvalAttr) str) = do st <- get let params = if writerListings (stOptions st) then (case getListingsLanguage classes of - Just l -> [ "language=" ++ l ] + Just l -> [ "language=" ++ mbBraced l ] Nothing -> []) ++ [ "numbers=left" | "numberLines" `elem` classes || "number" `elem` classes || "number-lines" `elem` classes ] ++ [ (if key == "startFrom" then "firstnumber" - else key) ++ "=" ++ attr | + else key) ++ "=" ++ mbBraced attr | (key,attr) <- keyvalAttr ] ++ (if identifier == "" then [] else [ "label=" ++ ref ]) else [] + mbBraced x = if not (all isAlphaNum x) + then "{" <> x <> "}" + else x printParams | null params = empty - | otherwise = brackets $ hcat (intersperse ", " (map text params)) + | otherwise = brackets $ hcat (intersperse ", " + (map text params)) return $ flush ("\\begin{lstlisting}" <> printParams $$ text str $$ "\\end{lstlisting}") $$ cr let highlightedCodeBlock = @@ -510,7 +514,8 @@ blockToLaTeX (RawBlock f x) blockToLaTeX (BulletList []) = return empty -- otherwise latex error blockToLaTeX (BulletList lst) = do incremental <- gets stIncremental - let inc = if incremental then "[<+->]" else "" + beamer <- writerBeamer `fmap` gets stOptions + let inc = if beamer && incremental then "[<+->]" else "" items <- mapM listItemToLaTeX lst let spacing = if isTightList lst then text "\\tightlist" @@ -670,8 +675,8 @@ tableCellToLaTeX header (width, align, blocks) = do AlignDefault -> "\\raggedright" return $ ("\\begin{minipage}" <> valign <> braces (text (printf "%.2f\\columnwidth" width)) <> - (halign <> "\\strut" <> cr <> cellContents <> cr) <> - "\\strut\\end{minipage}") $$ + (halign <> "\\strut" <> cr <> cellContents <> "\\strut" <> cr) <> + "\\end{minipage}") $$ notesToLaTeX notes notesToLaTeX :: [Doc] -> Doc @@ -722,7 +727,7 @@ sectionHeader :: Bool -- True for unnumbered -> State WriterState Doc sectionHeader unnumbered ident level lst = do txt <- inlineListToLaTeX lst - plain <- stringToLaTeX TextString $ foldl (++) "" $ map stringify lst + plain <- stringToLaTeX TextString $ concatMap stringify lst let noNote (Note _) = Str "" noNote x = x let lstNoNotes = walk noNote lst @@ -1034,7 +1039,7 @@ citationsToNatbib (c:cs) | citationMode c == AuthorInText = do citationsToNatbib cits = do cits' <- mapM convertOne cits - return $ text "\\citetext{" <> foldl combineTwo empty cits' <> text "}" + return $ text "\\citetext{" <> foldl' combineTwo empty cits' <> text "}" where combineTwo a b | isEmpty a = b | otherwise = a <> text "; " <> b @@ -1083,7 +1088,7 @@ citationsToBiblatex (one:[]) citationsToBiblatex (c:cs) = do args <- mapM convertOne (c:cs) - return $ text cmd <> foldl (<>) empty args + return $ text cmd <> foldl' (<>) empty args where cmd = case citationMode c of AuthorInText -> "\\textcites" @@ -1127,7 +1132,7 @@ toPolyglossiaEnv l = -- Takes a list of the constituents of a BCP 47 language code and -- converts it to a Polyglossia (language, options) tuple --- http://mirrors.concertpass.com/tex-archive/macros/latex/contrib/polyglossia/polyglossia.pdf +-- http://mirrors.ctan.org/macros/latex/contrib/polyglossia/polyglossia.pdf toPolyglossia :: [String] -> (String, String) toPolyglossia ("ar":"DZ":_) = ("arabic", "locale=algeria") toPolyglossia ("ar":"IQ":_) = ("arabic", "locale=mashriq") @@ -1155,17 +1160,21 @@ toPolyglossia ("en":"UK":_) = ("english", "variant=british") toPolyglossia ("en":"US":_) = ("english", "variant=american") toPolyglossia ("grc":_) = ("greek", "variant=ancient") toPolyglossia ("hsb":_) = ("usorbian", "") +toPolyglossia ("la":"x":"classic":_) = ("latin", "variant=classic") toPolyglossia ("sl":_) = ("slovenian", "") toPolyglossia x = (commonFromBcp47 x, "") -- Takes a list of the constituents of a BCP 47 language code and -- converts it to a Babel language string. --- http://mirrors.concertpass.com/tex-archive/macros/latex/required/babel/base/babel.pdf --- Note that the PDF unfortunately does not contain a complete list of supported languages. +-- http://mirrors.ctan.org/macros/latex/required/babel/base/babel.pdf +-- List of supported languages (slightly outdated): +-- http://tug.ctan.org/language/hyph-utf8/doc/generic/hyph-utf8/hyphenation.pdf toBabel :: [String] -> String toBabel ("de":"1901":_) = "german" toBabel ("de":"AT":"1901":_) = "austrian" toBabel ("de":"AT":_) = "naustrian" +toBabel ("de":"CH":"1901":_) = "swissgerman" +toBabel ("de":"CH":_) = "nswissgerman" toBabel ("de":_) = "ngerman" toBabel ("dsb":_) = "lowersorbian" toBabel ("el":"polyton":_) = "polutonikogreek" @@ -1179,6 +1188,7 @@ toBabel ("fr":"CA":_) = "canadien" toBabel ("fra":"aca":_) = "acadian" toBabel ("grc":_) = "polutonikogreek" toBabel ("hsb":_) = "uppersorbian" +toBabel ("la":"x":"classic":_) = "classiclatin" toBabel ("sl":_) = "slovene" toBabel x = commonFromBcp47 x @@ -1187,12 +1197,17 @@ toBabel x = commonFromBcp47 x -- https://tools.ietf.org/html/bcp47#section-2.1 commonFromBcp47 :: [String] -> String commonFromBcp47 [] = "" -commonFromBcp47 ("pt":"BR":_) = "brazilian" +commonFromBcp47 ("pt":"BR":_) = "brazil" +-- Note: documentation says "brazilian" works too, but it doesn't seem to work +-- on some systems. See #2953. +commonFromBcp47 ("sr":"Cyrl":_) = "serbianc" +commonFromBcp47 ("zh":"Latn":"pinyin":_) = "pinyin" commonFromBcp47 x = fromIso $ head x where fromIso "af" = "afrikaans" fromIso "am" = "amharic" fromIso "ar" = "arabic" + fromIso "as" = "assamese" fromIso "ast" = "asturian" fromIso "bg" = "bulgarian" fromIso "bn" = "bengali" @@ -1216,12 +1231,13 @@ commonFromBcp47 x = fromIso $ head x fromIso "fur" = "friulan" fromIso "ga" = "irish" fromIso "gd" = "scottish" + fromIso "gez" = "ethiopic" fromIso "gl" = "galician" fromIso "he" = "hebrew" fromIso "hi" = "hindi" fromIso "hr" = "croatian" - fromIso "hy" = "armenian" fromIso "hu" = "magyar" + fromIso "hy" = "armenian" fromIso "ia" = "interlingua" fromIso "id" = "indonesian" fromIso "ie" = "interlingua" @@ -1229,6 +1245,7 @@ commonFromBcp47 x = fromIso $ head x fromIso "it" = "italian" fromIso "jp" = "japanese" fromIso "km" = "khmer" + fromIso "kmr" = "kurmanji" fromIso "kn" = "kannada" fromIso "ko" = "korean" fromIso "la" = "latin" @@ -1244,6 +1261,7 @@ commonFromBcp47 x = fromIso $ head x fromIso "no" = "norsk" fromIso "nqo" = "nko" fromIso "oc" = "occitan" + fromIso "pa" = "panjabi" fromIso "pl" = "polish" fromIso "pms" = "piedmontese" fromIso "pt" = "portuguese" @@ -1260,6 +1278,7 @@ commonFromBcp47 x = fromIso $ head x fromIso "ta" = "tamil" fromIso "te" = "telugu" fromIso "th" = "thai" + fromIso "ti" = "ethiopic" fromIso "tk" = "turkmen" fromIso "tr" = "turkish" fromIso "uk" = "ukrainian" @@ -1290,4 +1309,3 @@ pDocumentClass = else do P.skipMany (P.satisfy (/='{')) P.char '{' P.manyTill P.letter (P.char '}') - diff --git a/src/Text/Pandoc/Writers/Org.hs b/src/Text/Pandoc/Writers/Org.hs index 20086ed19..f87aeca81 100644 --- a/src/Text/Pandoc/Writers/Org.hs +++ b/src/Text/Pandoc/Writers/Org.hs @@ -110,6 +110,17 @@ isRawFormat f = blockToOrg :: Block -- ^ Block element -> State WriterState Doc blockToOrg Null = return empty +blockToOrg (Div (_,classes@(cls:_),kvs) bs) | "drawer" `elem` classes = do + contents <- blockListToOrg bs + let drawerNameTag = ":" <> text cls <> ":" + let keys = vcat $ map (\(k,v) -> + ":" <> text k <> ":" + <> space <> text v) kvs + let drawerEndTag = text ":END:" + return $ drawerNameTag $$ cr $$ keys $$ + blankline $$ contents $$ + blankline $$ drawerEndTag $$ + blankline blockToOrg (Div attrs bs) = do contents <- blockListToOrg bs let startTag = tagWithAttrs "div" attrs @@ -137,10 +148,13 @@ blockToOrg (RawBlock f str) | isRawFormat f = return $ text str blockToOrg (RawBlock _ _) = return empty blockToOrg HorizontalRule = return $ blankline $$ "--------------" $$ blankline -blockToOrg (Header level _ inlines) = do +blockToOrg (Header level attr inlines) = do contents <- inlineListToOrg inlines let headerStr = text $ if level > 999 then " " else replicate level '*' - return $ headerStr <> " " <> contents <> blankline + let drawerStr = if attr == nullAttr + then empty + else cr <> nest (level + 1) (propertiesDrawer attr) + return $ headerStr <> " " <> contents <> drawerStr <> blankline blockToOrg (CodeBlock (_,classes,_) str) = do opts <- stOptions <$> get let tabstop = writerTabStop opts @@ -170,7 +184,7 @@ blockToOrg (Table caption' _ _ headers rows) = do map ((+2) . numChars) $ transpose (headers' : rawRows) -- FIXME: Org doesn't allow blocks with height more than 1. let hpipeBlocks blocks = hcat [beg, middle, end] - where h = maximum (map height blocks) + where h = maximum (1 : map height blocks) sep' = lblock 3 $ vcat (map text $ replicate h " | ") beg = lblock 2 $ vcat (map text $ replicate h "| ") end = lblock 2 $ vcat (map text $ replicate h " |") @@ -230,6 +244,22 @@ definitionListItemToOrg (label, defs) = do contents <- liftM vcat $ mapM blockListToOrg defs return $ hang 3 "- " $ label' <> " :: " <> (contents <> cr) +-- | Convert list of key/value pairs to Org :PROPERTIES: drawer. +propertiesDrawer :: Attr -> Doc +propertiesDrawer (ident, classes, kv) = + let + drawerStart = text ":PROPERTIES:" + drawerEnd = text ":END:" + kv' = if (classes == mempty) then kv else ("CLASS", unwords classes):kv + kv'' = if (ident == mempty) then kv' else ("CUSTOM_ID", ident):kv' + properties = vcat $ map kvToOrgProperty kv'' + in + drawerStart <> cr <> properties <> cr <> drawerEnd + where + kvToOrgProperty :: (String, String) -> Doc + kvToOrgProperty (key, value) = + text ":" <> text key <> text ": " <> text value <> cr + -- | Convert list of Pandoc block elements to Org. blockListToOrg :: [Block] -- ^ List of block elements -> State WriterState Doc diff --git a/stack.yaml b/stack.yaml index e25ad9b07..a8a71f47e 100644 --- a/stack.yaml +++ b/stack.yaml @@ -7,9 +7,7 @@ flags: network-uri: true packages: - '.' -extra-deps: [] -# to compile against aeson 0.11.0.0: -# - 'aeson-0.11.0.0' -# - 'fail-4.9.0.0' -# - 'pandoc-types-1.16.1' -resolver: lts-5.8 +extra-deps: +- data-default-0.6.0 +- data-default-instances-base-0.1.0 +resolver: lts-6.1 diff --git a/tests/Tests/Old.hs b/tests/Tests/Old.hs index 36bb3398e..4e0eb46a4 100644 --- a/tests/Tests/Old.hs +++ b/tests/Tests/Old.hs @@ -57,7 +57,7 @@ tests = [ testGroup "markdown" "tables.txt" "tables.native" , test "pipe tables" ["-r", "markdown", "-w", "native", "--columns=80"] "pipe-tables.txt" "pipe-tables.native" - , test "more" ["-r", "markdown", "-w", "native", "-S"] + , test "more" ["-r", "markdown", "-w", "native", "-s", "-S"] "markdown-reader-more.txt" "markdown-reader-more.native" , lhsReaderTest "markdown+lhs" ] @@ -108,6 +108,9 @@ tests = [ testGroup "markdown" , test "reader" ["-r", "docbook", "-w", "native", "-s"] "docbook-xref.docbook" "docbook-xref.native" ] + , testGroup "docbook5" + [ testGroup "writer" $ writerTests "docbook5" + ] , testGroup "native" [ testGroup "writer" $ writerTests "native" , test "reader" ["-r", "native", "-w", "native", "-s"] diff --git a/tests/Tests/Readers/Docx.hs b/tests/Tests/Readers/Docx.hs index e09d56529..aeb6bf939 100644 --- a/tests/Tests/Readers/Docx.hs +++ b/tests/Tests/Readers/Docx.hs @@ -266,6 +266,18 @@ tests = [ testGroup "inlines" "keep deletion (all)" "docx/track_changes_deletion.docx" "docx/track_changes_deletion_all.native" + , testCompareWithOpts def{readerTrackChanges=AcceptChanges} + "move text (accept)" + "docx/track_changes_move.docx" + "docx/track_changes_move_accept.native" + , testCompareWithOpts def{readerTrackChanges=RejectChanges} + "move text (reject)" + "docx/track_changes_move.docx" + "docx/track_changes_move_reject.native" + , testCompareWithOpts def{readerTrackChanges=AllChanges} + "move text (all)" + "docx/track_changes_move.docx" + "docx/track_changes_move_all.native" ] , testGroup "media" [ testMediaBag diff --git a/tests/Tests/Readers/Org.hs b/tests/Tests/Readers/Org.hs index b095ac60a..9bd999b01 100644 --- a/tests/Tests/Readers/Org.hs +++ b/tests/Tests/Readers/Org.hs @@ -300,6 +300,42 @@ tests = , citationHash = 0} in (para $ cite [citation] "[see @item1 p. 34-35]") + , "Org-ref simple citation" =: + "cite:pandoc" =?> + let citation = Citation + { citationId = "pandoc" + , citationPrefix = mempty + , citationSuffix = mempty + , citationMode = AuthorInText + , citationNoteNum = 0 + , citationHash = 0 + } + in (para $ cite [citation] "cite:pandoc") + + , "Org-ref simple citep citation" =: + "citep:pandoc" =?> + let citation = Citation + { citationId = "pandoc" + , citationPrefix = mempty + , citationSuffix = mempty + , citationMode = NormalCitation + , citationNoteNum = 0 + , citationHash = 0 + } + in (para $ cite [citation] "citep:pandoc") + + , "Org-ref extended citation" =: + "[[citep:Dominik201408][See page 20::, for example]]" =?> + let citation = Citation + { citationId = "Dominik201408" + , citationPrefix = toList "See page 20" + , citationSuffix = toList ", for example" + , citationMode = NormalCitation + , citationNoteNum = 0 + , citationHash = 0 + } + in (para $ cite [citation] "[[citep:Dominik201408][See page 20::, for example]]") + , "Inline LaTeX symbol" =: "\\dots" =?> para "…" @@ -308,6 +344,10 @@ tests = "\\textit{Emphasised}" =?> para (emph "Emphasised") + , "Inline LaTeX command with spaces" =: + "\\emph{Emphasis mine}" =?> + para (emph "Emphasis mine") + , "Inline LaTeX math symbol" =: "\\tau" =?> para (emph "τ") @@ -328,6 +368,10 @@ tests = "\\copy" =?> para "©" + , "MathML symbols, space separated" =: + "\\ForAll \\Auml" =?> + para "∀ Ä" + , "LaTeX citation" =: "\\cite{Coffee}" =?> let citation = Citation @@ -404,17 +448,18 @@ tests = ] =?> para "Before" <> para "After" - , "Drawer start is the only text in first line of a drawer" =: + , "Drawer markers must be the only text in the line" =: unlines [ " :LOGBOOK: foo" - , " :END:" + , " :END: bar" ] =?> - para (":LOGBOOK:" <> space <> "foo" <> softbreak <> ":END:") + para (":LOGBOOK: foo" <> softbreak <> ":END: bar") - , "Drawers with unknown names are just text" =: + , "Drawers can be arbitrary" =: unlines [ ":FOO:" + , "/bar/" , ":END:" ] =?> - para (":FOO:" <> softbreak <> ":END:") + divWith (mempty, ["FOO", "drawer"], mempty) (para $ emph "bar") , "Anchor reference" =: unlines [ "<<link-here>> Target." @@ -461,6 +506,34 @@ tests = , "[[expl:foo][bar]]" ] =?> (para (link "http://example.com/foo" "" "bar")) + + , "Export option: Disable simple sub/superscript syntax" =: + unlines [ "#+OPTIONS: ^:nil" + , "a^b" + ] =?> + para "a^b" + + , "Export option: directly select drawers to be exported" =: + unlines [ "#+OPTIONS: d:(\"IMPORTANT\")" + , ":IMPORTANT:" + , "23" + , ":END:" + , ":BORING:" + , "very boring" + , ":END:" + ] =?> + divWith (mempty, ["IMPORTANT", "drawer"], mempty) (para "23") + + , "Export option: exclude drawers from being exported" =: + unlines [ "#+OPTIONS: d:(not \"BORING\")" + , ":IMPORTANT:" + , "5" + , ":END:" + , ":BORING:" + , "very boring" + , ":END:" + ] =?> + divWith (mempty, ["IMPORTANT", "drawer"], mempty) (para "5") ] , testGroup "Basic Blocks" $ @@ -583,6 +656,15 @@ tests = , headerWith ("but-this-is", [], []) 2 "But this is" ] + , "Preferences are treated as header attributes" =: + unlines [ "* foo" + , " :PROPERTIES:" + , " :custom_id: fubar" + , " :bar: baz" + , " :END:" + ] =?> + headerWith ("fubar", [], [("bar", "baz")]) 1 "foo" + , "Paragraph starting with an asterisk" =: "*five" =?> para "*five" @@ -653,6 +735,17 @@ tests = para (image "the-red-queen.jpg" "fig:redqueen" "Used as a metapher in evolutionary biology.") + , "Figure with HTML attributes" =: + unlines [ "#+CAPTION: mah brain just explodid" + , "#+NAME: lambdacat" + , "#+ATTR_HTML: :style color: blue :role button" + , "[[lambdacat.jpg]]" + ] =?> + let kv = [("style", "color: blue"), ("role", "button")] + name = "fig:lambdacat" + caption = "mah brain just explodid" + in para (imageWith (mempty, mempty, kv) "lambdacat.jpg" name caption) + , "Footnote" =: unlines [ "A footnote[1]" , "" @@ -941,7 +1034,7 @@ tests = , "Empty table" =: "||" =?> - simpleTable' 1 mempty mempty + simpleTable' 1 mempty [[mempty]] , "Glider Table" =: unlines [ "| 1 | 0 | 0 |" @@ -996,6 +1089,17 @@ tests = , [ plain "dynamic", plain "Lisp" ] ] + , "Table with empty cells" =: + "|||c|" =?> + simpleTable' 3 mempty [[mempty, mempty, plain "c"]] + + , "Table with empty rows" =: + unlines [ "| first |" + , "| |" + , "| third |" + ] =?> + simpleTable' 1 mempty [[plain "first"], [mempty], [plain "third"]] + , "Table with alignment row" =: unlines [ "| Numbers | Text | More |" , "| <c> | <r> | |" @@ -1024,10 +1128,10 @@ tests = , "| 1 | One | foo |" , "| 2" ] =?> - table "" (zip [AlignCenter, AlignRight, AlignDefault] [0, 0, 0]) - [ plain "Numbers", plain "Text" , plain mempty ] - [ [ plain "1" , plain "One" , plain "foo" ] - , [ plain "2" , plain mempty , plain mempty ] + table "" (zip [AlignCenter, AlignRight] [0, 0]) + [ plain "Numbers", plain "Text" ] + [ [ plain "1" , plain "One" , plain "foo" ] + , [ plain "2" ] ] , "Table with caption" =: @@ -1054,6 +1158,33 @@ tests = " where greeting = \"moin\"\n" in codeBlockWith attr' code' + , "Source block with indented code" =: + unlines [ " #+BEGIN_SRC haskell" + , " main = putStrLn greeting" + , " where greeting = \"moin\"" + , " #+END_SRC" ] =?> + let attr' = ("", ["haskell"], []) + code' = "main = putStrLn greeting\n" ++ + " where greeting = \"moin\"\n" + in codeBlockWith attr' code' + + , "Source block with tab-indented code" =: + unlines [ "\t#+BEGIN_SRC haskell" + , "\tmain = putStrLn greeting" + , "\t where greeting = \"moin\"" + , "\t#+END_SRC" ] =?> + let attr' = ("", ["haskell"], []) + code' = "main = putStrLn greeting\n" ++ + " where greeting = \"moin\"\n" + in codeBlockWith attr' code' + + , "Empty source block" =: + unlines [ " #+BEGIN_SRC haskell" + , " #+END_SRC" ] =?> + let attr' = ("", ["haskell"], []) + code' = "" + in codeBlockWith attr' code' + , "Source block between paragraphs" =: unlines [ "Low German greeting" , " #+BEGIN_SRC haskell" @@ -1198,7 +1329,7 @@ tests = ] ] - , "Verse block with newlines" =: + , "Verse block with blank lines" =: unlines [ "#+BEGIN_VERSE" , "foo" , "" @@ -1207,6 +1338,20 @@ tests = ] =?> para ("foo" <> linebreak <> linebreak <> "bar") + , "Raw block LaTeX" =: + unlines [ "#+BEGIN_LaTeX" + , "The category $\\cat{Set}$ is adhesive." + , "#+END_LaTeX" + ] =?> + rawBlock "latex" "The category $\\cat{Set}$ is adhesive.\n" + + , "Export block HTML" =: + unlines [ "#+BEGIN_export html" + , "<samp>Hello, World!</samp>" + , "#+END_export" + ] =?> + rawBlock "html" "<samp>Hello, World!</samp>\n" + , "LaTeX fragment" =: unlines [ "\\begin{equation}" , "X_i = \\begin{cases}" diff --git a/tests/Tests/Readers/RST.hs b/tests/Tests/Readers/RST.hs index ea85a5929..622f5e48b 100644 --- a/tests/Tests/Readers/RST.hs +++ b/tests/Tests/Readers/RST.hs @@ -94,6 +94,35 @@ tests = [ "line block with blank line" =: ("A-1-B_2_C:3:D+4+E.5.F_\n\n" ++ ".. _A-1-B_2_C:3:D+4+E.5.F: https://example.com\n") =?> para (link "https://example.com" "" "A-1-B_2_C:3:D+4+E.5.F") + , "Code directive with class and number-lines" =: unlines + [ ".. code::python" + , " :number-lines: 34" + , " :class: class1 class2 class3" + , "" + , " def func(x):" + , " return y" + ] =?> + ( doc $ codeBlockWith + ( "" + , ["sourceCode", "python", "numberLines", "class1", "class2", "class3"] + , [ ("startFrom", "34") ] + ) + "def func(x):\n return y" + ) + , "Code directive with number-lines, no line specified" =: unlines + [ ".. code::python" + , " :number-lines: " + , "" + , " def func(x):" + , " return y" + ] =?> + ( doc $ codeBlockWith + ( "" + , ["sourceCode", "python", "numberLines"] + , [ ("startFrom", "") ] + ) + "def func(x):\n return y" + ) , testGroup "literal / line / code blocks" [ "indented literal block" =: unlines [ "::" diff --git a/tests/docx/track_changes_move.docx b/tests/docx/track_changes_move.docx Binary files differnew file mode 100644 index 000000000..b70779fd4 --- /dev/null +++ b/tests/docx/track_changes_move.docx diff --git a/tests/docx/track_changes_move_accept.native b/tests/docx/track_changes_move_accept.native new file mode 100644 index 000000000..0cf276768 --- /dev/null +++ b/tests/docx/track_changes_move_accept.native @@ -0,0 +1,3 @@ +[Para [Str "Here",Space,Str "is",Space,Str "some",Space,Str "text."] +,Para [Str "Here",Space,Str "is",Space,Str "the",Space,Str "text",Space,Str "to",Space,Str "be",Space,Str "moved."] +,Para [Str "Here",Space,Str "is",Space,Str "some",Space,Str "more",Space,Str "text."]] diff --git a/tests/docx/track_changes_move_all.native b/tests/docx/track_changes_move_all.native new file mode 100644 index 000000000..3afae83a5 --- /dev/null +++ b/tests/docx/track_changes_move_all.native @@ -0,0 +1,4 @@ +[Para [Str "Here",Space,Str "is",Space,Str "some",Space,Str "text."] +,Para [Span ("",["insertion"],[("author","Jesse Rosenthal"),("date","2016-04-16T08:20:00Z")]) [Str "Here",Space,Str "is",Space,Str "the",Space,Str "text",Space,Str "to",Space,Str "be",Space,Str "moved."]] +,Para [Str "Here",Space,Str "is",Space,Str "some",Space,Str "more",Space,Str "text."] +,Para [Span ("",["deletion"],[("author","Jesse Rosenthal"),("date","2016-04-16T08:20:00Z")]) [Str "Here",Space,Str "is",Space,Str "the",Space,Str "text",Space,Str "to",Space,Str "be",Space,Str "moved."]]] diff --git a/tests/docx/track_changes_move_reject.native b/tests/docx/track_changes_move_reject.native new file mode 100644 index 000000000..9c57871b6 --- /dev/null +++ b/tests/docx/track_changes_move_reject.native @@ -0,0 +1,3 @@ +[Para [Str "Here",Space,Str "is",Space,Str "some",Space,Str "text."] +,Para [Str "Here",Space,Str "is",Space,Str "some",Space,Str "more",Space,Str "text."] +,Para [Str "Here",Space,Str "is",Space,Str "the",Space,Str "text",Space,Str "to",Space,Str "be",Space,Str "moved."]] diff --git a/tests/mallard-reader.native b/tests/mallard-reader.native new file mode 100644 index 000000000..16274f00a --- /dev/null +++ b/tests/mallard-reader.native @@ -0,0 +1,3 @@ +Pandoc (Meta {unMeta = fromList [("guide-group",MetaInlines [Str ""]),("guide-xref",MetaInlines [Str "index#intro"]),("title",MetaInlines [Str "Title"])]}) +[Header 1 ("introduction",[],[]) [Str "Title"] +,Para [Str "This",Space,Str "is",Space,Str "a",Space,Str "test."]] diff --git a/tests/markdown-reader-more.native b/tests/markdown-reader-more.native index 0148e9394..c38ffe038 100644 --- a/tests/markdown-reader-more.native +++ b/tests/markdown-reader-more.native @@ -1,5 +1,5 @@ -[Para [Str "spanning",Space,Str "multiple",Space,Str "lines",SoftBreak,Str "%",Space,Str "Author",Space,Str "One",SoftBreak,Str "Author",Space,Str "Two;",Space,Str "Author",Space,Str "Three;",SoftBreak,Str "Author",Space,Str "Four"] -,Header 1 ("additional-markdown-reader-tests",[],[]) [Str "Additional",Space,Str "markdown",Space,Str "reader",Space,Str "tests"] +Pandoc (Meta {unMeta = fromList [("author",MetaList [MetaInlines [Str "Author",Space,Str "One"],MetaInlines [Str "Author",Space,Str "Two"],MetaInlines [Str "Author",Space,Str "Three"],MetaInlines [Str "Author",Space,Str "Four"]]),("title",MetaInlines [Str "Title",SoftBreak,Str "spanning",Space,Str "multiple",Space,Str "lines"])]}) +[Header 1 ("additional-markdown-reader-tests",[],[]) [Str "Additional",Space,Str "markdown",Space,Str "reader",Space,Str "tests"] ,Header 2 ("blank-line-before-url-in-link-reference",[],[]) [Str "Blank",Space,Str "line",Space,Str "before",Space,Str "URL",Space,Str "in",Space,Str "link",Space,Str "reference"] ,Para [Link ("",[],[]) [Str "foo"] ("/url",""),Space,Str "and",Space,Link ("",[],[]) [Str "bar"] ("/url","title")] ,Header 2 ("raw-context-environments",[],[]) [Str "Raw",Space,Str "ConTeXt",Space,Str "environments"] diff --git a/tests/mediawiki-reader.native b/tests/mediawiki-reader.native index cf80d0664..6afeb602c 100644 --- a/tests/mediawiki-reader.native +++ b/tests/mediawiki-reader.native @@ -252,6 +252,11 @@ Pandoc (Meta {unMeta = fromList []}) [[]] [[[Para [Str "Orange"]]]] ,Para [Str "Paragraph",Space,Str "after",Space,Str "the",Space,Str "table."] +,Table [] [AlignDefault,AlignDefault] [0.0,0.0] + [[Para [Str "fruit"]] + ,[Para [Str "topping"]]] + [[[Para [Str "apple"]] + ,[Para [Str "ice",Space,Str "cream"]]]] ,Header 2 ("notes",[],[]) [Str "notes"] ,Para [Str "My",Space,Str "note!",Note [Plain [Str "This."]]] ,Para [Str "URL",Space,Str "note.",Note [Plain [Link ("",[],[]) [Str "http://docs.python.org/library/functions.html#range"] ("http://docs.python.org/library/functions.html#range","")]]]] diff --git a/tests/mediawiki-reader.wiki b/tests/mediawiki-reader.wiki index 862bb3b48..11cd52d9c 100644 --- a/tests/mediawiki-reader.wiki +++ b/tests/mediawiki-reader.wiki @@ -381,6 +381,14 @@ and cheese |Orange |}Paragraph after the table. +{| + !fruit + !topping + |- + |apple + |ice cream + |} + == notes == My note!<ref>This.</ref> diff --git a/tests/tables.docbook5 b/tests/tables.docbook5 new file mode 100644 index 000000000..6224cf222 --- /dev/null +++ b/tests/tables.docbook5 @@ -0,0 +1,432 @@ +<para> + Simple table with caption: +</para> +<table> + <title> + Demonstration of simple table syntax. + </title> + <tgroup cols="4"> + <colspec align="right" /> + <colspec align="left" /> + <colspec align="center" /> + <colspec align="left" /> + <thead> + <row> + <entry> + Right + </entry> + <entry> + Left + </entry> + <entry> + Center + </entry> + <entry> + Default + </entry> + </row> + </thead> + <tbody> + <row> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + </row> + <row> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + </row> + <row> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + </row> + </tbody> + </tgroup> +</table> +<para> + Simple table without caption: +</para> +<informaltable> + <tgroup cols="4"> + <colspec align="right" /> + <colspec align="left" /> + <colspec align="center" /> + <colspec align="left" /> + <thead> + <row> + <entry> + Right + </entry> + <entry> + Left + </entry> + <entry> + Center + </entry> + <entry> + Default + </entry> + </row> + </thead> + <tbody> + <row> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + </row> + <row> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + </row> + <row> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + </row> + </tbody> + </tgroup> +</informaltable> +<para> + Simple table indented two spaces: +</para> +<table> + <title> + Demonstration of simple table syntax. + </title> + <tgroup cols="4"> + <colspec align="right" /> + <colspec align="left" /> + <colspec align="center" /> + <colspec align="left" /> + <thead> + <row> + <entry> + Right + </entry> + <entry> + Left + </entry> + <entry> + Center + </entry> + <entry> + Default + </entry> + </row> + </thead> + <tbody> + <row> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + </row> + <row> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + </row> + <row> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + </row> + </tbody> + </tgroup> +</table> +<para> + Multiline table with caption: +</para> +<table> + <title> + Here's the caption. It may span multiple lines. + </title> + <tgroup cols="4"> + <colspec colwidth="15*" align="center" /> + <colspec colwidth="13*" align="left" /> + <colspec colwidth="16*" align="right" /> + <colspec colwidth="33*" align="left" /> + <thead> + <row> + <entry> + Centered Header + </entry> + <entry> + Left Aligned + </entry> + <entry> + Right Aligned + </entry> + <entry> + Default aligned + </entry> + </row> + </thead> + <tbody> + <row> + <entry> + First + </entry> + <entry> + row + </entry> + <entry> + 12.0 + </entry> + <entry> + Example of a row that spans multiple lines. + </entry> + </row> + <row> + <entry> + Second + </entry> + <entry> + row + </entry> + <entry> + 5.0 + </entry> + <entry> + Here's another one. Note the blank line between rows. + </entry> + </row> + </tbody> + </tgroup> +</table> +<para> + Multiline table without caption: +</para> +<informaltable> + <tgroup cols="4"> + <colspec colwidth="15*" align="center" /> + <colspec colwidth="13*" align="left" /> + <colspec colwidth="16*" align="right" /> + <colspec colwidth="33*" align="left" /> + <thead> + <row> + <entry> + Centered Header + </entry> + <entry> + Left Aligned + </entry> + <entry> + Right Aligned + </entry> + <entry> + Default aligned + </entry> + </row> + </thead> + <tbody> + <row> + <entry> + First + </entry> + <entry> + row + </entry> + <entry> + 12.0 + </entry> + <entry> + Example of a row that spans multiple lines. + </entry> + </row> + <row> + <entry> + Second + </entry> + <entry> + row + </entry> + <entry> + 5.0 + </entry> + <entry> + Here's another one. Note the blank line between rows. + </entry> + </row> + </tbody> + </tgroup> +</informaltable> +<para> + Table without column headers: +</para> +<informaltable> + <tgroup cols="4"> + <colspec align="right" /> + <colspec align="left" /> + <colspec align="center" /> + <colspec align="right" /> + <tbody> + <row> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + <entry> + 12 + </entry> + </row> + <row> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + <entry> + 123 + </entry> + </row> + <row> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + <entry> + 1 + </entry> + </row> + </tbody> + </tgroup> +</informaltable> +<para> + Multiline table without column headers: +</para> +<informaltable> + <tgroup cols="4"> + <colspec colwidth="15*" align="center" /> + <colspec colwidth="13*" align="left" /> + <colspec colwidth="16*" align="right" /> + <colspec colwidth="33*" align="left" /> + <tbody> + <row> + <entry> + First + </entry> + <entry> + row + </entry> + <entry> + 12.0 + </entry> + <entry> + Example of a row that spans multiple lines. + </entry> + </row> + <row> + <entry> + Second + </entry> + <entry> + row + </entry> + <entry> + 5.0 + </entry> + <entry> + Here's another one. Note the blank line between rows. + </entry> + </row> + </tbody> + </tgroup> +</informaltable> diff --git a/tests/tables.latex b/tests/tables.latex index 96cbc9579..38d4d089e 100644 --- a/tests/tables.latex +++ b/tests/tables.latex @@ -53,46 +53,46 @@ Multiline table with caption: \caption{Here's the caption. It may span multiple lines.}\tabularnewline \toprule \begin{minipage}[b]{0.13\columnwidth}\centering\strut -Centered Header -\strut\end{minipage} & \begin{minipage}[b]{0.12\columnwidth}\raggedright\strut -Left Aligned -\strut\end{minipage} & \begin{minipage}[b]{0.14\columnwidth}\raggedleft\strut -Right Aligned -\strut\end{minipage} & \begin{minipage}[b]{0.30\columnwidth}\raggedright\strut -Default aligned -\strut\end{minipage}\tabularnewline +Centered Header\strut +\end{minipage} & \begin{minipage}[b]{0.12\columnwidth}\raggedright\strut +Left Aligned\strut +\end{minipage} & \begin{minipage}[b]{0.14\columnwidth}\raggedleft\strut +Right Aligned\strut +\end{minipage} & \begin{minipage}[b]{0.30\columnwidth}\raggedright\strut +Default aligned\strut +\end{minipage}\tabularnewline \midrule \endfirsthead \toprule \begin{minipage}[b]{0.13\columnwidth}\centering\strut -Centered Header -\strut\end{minipage} & \begin{minipage}[b]{0.12\columnwidth}\raggedright\strut -Left Aligned -\strut\end{minipage} & \begin{minipage}[b]{0.14\columnwidth}\raggedleft\strut -Right Aligned -\strut\end{minipage} & \begin{minipage}[b]{0.30\columnwidth}\raggedright\strut -Default aligned -\strut\end{minipage}\tabularnewline +Centered Header\strut +\end{minipage} & \begin{minipage}[b]{0.12\columnwidth}\raggedright\strut +Left Aligned\strut +\end{minipage} & \begin{minipage}[b]{0.14\columnwidth}\raggedleft\strut +Right Aligned\strut +\end{minipage} & \begin{minipage}[b]{0.30\columnwidth}\raggedright\strut +Default aligned\strut +\end{minipage}\tabularnewline \midrule \endhead \begin{minipage}[t]{0.13\columnwidth}\centering\strut -First -\strut\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut -row -\strut\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut -12.0 -\strut\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut -Example of a row that spans multiple lines. -\strut\end{minipage}\tabularnewline +First\strut +\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut +row\strut +\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut +12.0\strut +\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut +Example of a row that spans multiple lines.\strut +\end{minipage}\tabularnewline \begin{minipage}[t]{0.13\columnwidth}\centering\strut -Second -\strut\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut -row -\strut\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut -5.0 -\strut\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut -Here's another one. Note the blank line between rows. -\strut\end{minipage}\tabularnewline +Second\strut +\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut +row\strut +\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut +5.0\strut +\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut +Here's another one. Note the blank line between rows.\strut +\end{minipage}\tabularnewline \bottomrule \end{longtable} @@ -101,34 +101,34 @@ Multiline table without caption: \begin{longtable}[]{@{}clrl@{}} \toprule \begin{minipage}[b]{0.13\columnwidth}\centering\strut -Centered Header -\strut\end{minipage} & \begin{minipage}[b]{0.12\columnwidth}\raggedright\strut -Left Aligned -\strut\end{minipage} & \begin{minipage}[b]{0.14\columnwidth}\raggedleft\strut -Right Aligned -\strut\end{minipage} & \begin{minipage}[b]{0.30\columnwidth}\raggedright\strut -Default aligned -\strut\end{minipage}\tabularnewline +Centered Header\strut +\end{minipage} & \begin{minipage}[b]{0.12\columnwidth}\raggedright\strut +Left Aligned\strut +\end{minipage} & \begin{minipage}[b]{0.14\columnwidth}\raggedleft\strut +Right Aligned\strut +\end{minipage} & \begin{minipage}[b]{0.30\columnwidth}\raggedright\strut +Default aligned\strut +\end{minipage}\tabularnewline \midrule \endhead \begin{minipage}[t]{0.13\columnwidth}\centering\strut -First -\strut\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut -row -\strut\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut -12.0 -\strut\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut -Example of a row that spans multiple lines. -\strut\end{minipage}\tabularnewline +First\strut +\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut +row\strut +\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut +12.0\strut +\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut +Example of a row that spans multiple lines.\strut +\end{minipage}\tabularnewline \begin{minipage}[t]{0.13\columnwidth}\centering\strut -Second -\strut\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut -row -\strut\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut -5.0 -\strut\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut -Here's another one. Note the blank line between rows. -\strut\end{minipage}\tabularnewline +Second\strut +\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut +row\strut +\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut +5.0\strut +\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut +Here's another one. Note the blank line between rows.\strut +\end{minipage}\tabularnewline \bottomrule \end{longtable} @@ -147,22 +147,22 @@ Multiline table without column headers: \begin{longtable}[]{@{}clrl@{}} \toprule \begin{minipage}[t]{0.13\columnwidth}\centering\strut -First -\strut\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut -row -\strut\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut -12.0 -\strut\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut -Example of a row that spans multiple lines. -\strut\end{minipage}\tabularnewline +First\strut +\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut +row\strut +\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut +12.0\strut +\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut +Example of a row that spans multiple lines.\strut +\end{minipage}\tabularnewline \begin{minipage}[t]{0.13\columnwidth}\centering\strut -Second -\strut\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut -row -\strut\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut -5.0 -\strut\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut -Here's another one. Note the blank line between rows. -\strut\end{minipage}\tabularnewline +Second\strut +\end{minipage} & \begin{minipage}[t]{0.12\columnwidth}\raggedright\strut +row\strut +\end{minipage} & \begin{minipage}[t]{0.14\columnwidth}\raggedleft\strut +5.0\strut +\end{minipage} & \begin{minipage}[t]{0.30\columnwidth}\raggedright\strut +Here's another one. Note the blank line between rows.\strut +\end{minipage}\tabularnewline \bottomrule \end{longtable} diff --git a/tests/writer.docbook5 b/tests/writer.docbook5 new file mode 100644 index 000000000..5261a35be --- /dev/null +++ b/tests/writer.docbook5 @@ -0,0 +1,1395 @@ +<?xml version="1.0" encoding="utf-8" ?> +<!DOCTYPE article> +<article xmlns="http://docbook.org/ns/docbook" version="5.0"> + <info> + <title>Pandoc Test Suite</title> + <authorgroup> + <author> + <firstname>John</firstname> + <surname>MacFarlane</surname> + </author> + <author> + <firstname></firstname> + <surname>Anonymous</surname> + </author> + </authorgroup> + <date>July 17, 2006</date> + </info> +<para> + This is a set of tests for pandoc. Most of them are adapted from John + Gruber’s markdown test suite. +</para> +<section id="headers"> + <title>Headers</title> + <section id="level-2-with-an-embedded-link"> + <title>Level 2 with an <link xlink:href="/url">embedded + link</link></title> + <section id="level-3-with-emphasis"> + <title>Level 3 with <emphasis>emphasis</emphasis></title> + <section id="level-4"> + <title>Level 4</title> + <section id="level-5"> + <title>Level 5</title> + <para> + </para> + </section> + </section> + </section> + </section> +</section> +<section id="level-1"> + <title>Level 1</title> + <section id="level-2-with-emphasis"> + <title>Level 2 with <emphasis>emphasis</emphasis></title> + <section id="level-3"> + <title>Level 3</title> + <para> + with no blank line + </para> + </section> + </section> + <section id="level-2"> + <title>Level 2</title> + <para> + with no blank line + </para> + </section> +</section> +<section id="paragraphs"> + <title>Paragraphs</title> + <para> + Here’s a regular paragraph. + </para> + <para> + In Markdown 1.0.0 and earlier. Version 8. This line turns into a list + item. Because a hard-wrapped line in the middle of a paragraph looked like + a list item. + </para> + <para> + Here’s one with a bullet. * criminey. + </para> +<literallayout>There should be a hard line break +here.</literallayout> +</section> +<section id="block-quotes"> + <title>Block Quotes</title> + <para> + E-mail style: + </para> + <blockquote> + <para> + This is a block quote. It is pretty short. + </para> + </blockquote> + <blockquote> + <para> + Code in a block quote: + </para> + <programlisting> +sub status { + print "working"; +} +</programlisting> + <para> + A list: + </para> + <orderedlist numeration="arabic" spacing="compact"> + <listitem> + <para> + item one + </para> + </listitem> + <listitem> + <para> + item two + </para> + </listitem> + </orderedlist> + <para> + Nested block quotes: + </para> + <blockquote> + <para> + nested + </para> + </blockquote> + <blockquote> + <para> + nested + </para> + </blockquote> + </blockquote> + <para> + This should not be a block quote: 2 > 1. + </para> + <para> + And a following paragraph. + </para> +</section> +<section id="code-blocks"> + <title>Code Blocks</title> + <para> + Code: + </para> + <programlisting> +---- (should be four hyphens) + +sub status { + print "working"; +} + +this code block is indented by one tab +</programlisting> + <para> + And: + </para> + <programlisting> + this code block is indented by two tabs + +These should not be escaped: \$ \\ \> \[ \{ +</programlisting> +</section> +<section id="lists"> + <title>Lists</title> + <section id="unordered"> + <title>Unordered</title> + <para> + Asterisks tight: + </para> + <itemizedlist spacing="compact"> + <listitem> + <para> + asterisk 1 + </para> + </listitem> + <listitem> + <para> + asterisk 2 + </para> + </listitem> + <listitem> + <para> + asterisk 3 + </para> + </listitem> + </itemizedlist> + <para> + Asterisks loose: + </para> + <itemizedlist> + <listitem> + <para> + asterisk 1 + </para> + </listitem> + <listitem> + <para> + asterisk 2 + </para> + </listitem> + <listitem> + <para> + asterisk 3 + </para> + </listitem> + </itemizedlist> + <para> + Pluses tight: + </para> + <itemizedlist spacing="compact"> + <listitem> + <para> + Plus 1 + </para> + </listitem> + <listitem> + <para> + Plus 2 + </para> + </listitem> + <listitem> + <para> + Plus 3 + </para> + </listitem> + </itemizedlist> + <para> + Pluses loose: + </para> + <itemizedlist> + <listitem> + <para> + Plus 1 + </para> + </listitem> + <listitem> + <para> + Plus 2 + </para> + </listitem> + <listitem> + <para> + Plus 3 + </para> + </listitem> + </itemizedlist> + <para> + Minuses tight: + </para> + <itemizedlist spacing="compact"> + <listitem> + <para> + Minus 1 + </para> + </listitem> + <listitem> + <para> + Minus 2 + </para> + </listitem> + <listitem> + <para> + Minus 3 + </para> + </listitem> + </itemizedlist> + <para> + Minuses loose: + </para> + <itemizedlist> + <listitem> + <para> + Minus 1 + </para> + </listitem> + <listitem> + <para> + Minus 2 + </para> + </listitem> + <listitem> + <para> + Minus 3 + </para> + </listitem> + </itemizedlist> + </section> + <section id="ordered"> + <title>Ordered</title> + <para> + Tight: + </para> + <orderedlist numeration="arabic" spacing="compact"> + <listitem> + <para> + First + </para> + </listitem> + <listitem> + <para> + Second + </para> + </listitem> + <listitem> + <para> + Third + </para> + </listitem> + </orderedlist> + <para> + and: + </para> + <orderedlist numeration="arabic" spacing="compact"> + <listitem> + <para> + One + </para> + </listitem> + <listitem> + <para> + Two + </para> + </listitem> + <listitem> + <para> + Three + </para> + </listitem> + </orderedlist> + <para> + Loose using tabs: + </para> + <orderedlist numeration="arabic"> + <listitem> + <para> + First + </para> + </listitem> + <listitem> + <para> + Second + </para> + </listitem> + <listitem> + <para> + Third + </para> + </listitem> + </orderedlist> + <para> + and using spaces: + </para> + <orderedlist numeration="arabic"> + <listitem> + <para> + One + </para> + </listitem> + <listitem> + <para> + Two + </para> + </listitem> + <listitem> + <para> + Three + </para> + </listitem> + </orderedlist> + <para> + Multiple paragraphs: + </para> + <orderedlist numeration="arabic"> + <listitem> + <para> + Item 1, graf one. + </para> + <para> + Item 1. graf two. The quick brown fox jumped over the lazy dog’s + back. + </para> + </listitem> + <listitem> + <para> + Item 2. + </para> + </listitem> + <listitem> + <para> + Item 3. + </para> + </listitem> + </orderedlist> + </section> + <section id="nested"> + <title>Nested</title> + <itemizedlist spacing="compact"> + <listitem> + <para> + Tab + </para> + <itemizedlist spacing="compact"> + <listitem> + <para> + Tab + </para> + <itemizedlist spacing="compact"> + <listitem> + <para> + Tab + </para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + <para> + Here’s another: + </para> + <orderedlist numeration="arabic" spacing="compact"> + <listitem> + <para> + First + </para> + </listitem> + <listitem> + <para> + Second: + </para> + <itemizedlist spacing="compact"> + <listitem> + <para> + Fee + </para> + </listitem> + <listitem> + <para> + Fie + </para> + </listitem> + <listitem> + <para> + Foe + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Third + </para> + </listitem> + </orderedlist> + <para> + Same thing but with paragraphs: + </para> + <orderedlist numeration="arabic"> + <listitem> + <para> + First + </para> + </listitem> + <listitem> + <para> + Second: + </para> + <itemizedlist spacing="compact"> + <listitem> + <para> + Fee + </para> + </listitem> + <listitem> + <para> + Fie + </para> + </listitem> + <listitem> + <para> + Foe + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Third + </para> + </listitem> + </orderedlist> + </section> + <section id="tabs-and-spaces"> + <title>Tabs and spaces</title> + <itemizedlist> + <listitem> + <para> + this is a list item indented with tabs + </para> + </listitem> + <listitem> + <para> + this is a list item indented with spaces + </para> + <itemizedlist> + <listitem> + <para> + this is an example list item indented with tabs + </para> + </listitem> + <listitem> + <para> + this is an example list item indented with spaces + </para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </section> + <section id="fancy-list-markers"> + <title>Fancy list markers</title> + <orderedlist numeration="arabic"> + <listitem override="2"> + <para> + begins with 2 + </para> + </listitem> + <listitem> + <para> + and now 3 + </para> + <para> + with a continuation + </para> + <orderedlist numeration="lowerroman" spacing="compact"> + <listitem override="4"> + <para> + sublist with roman numerals, starting with 4 + </para> + </listitem> + <listitem> + <para> + more items + </para> + <orderedlist numeration="upperalpha" spacing="compact"> + <listitem> + <para> + a subsublist + </para> + </listitem> + <listitem> + <para> + a subsublist + </para> + </listitem> + </orderedlist> + </listitem> + </orderedlist> + </listitem> + </orderedlist> + <para> + Nesting: + </para> + <orderedlist numeration="upperalpha" spacing="compact"> + <listitem> + <para> + Upper Alpha + </para> + <orderedlist numeration="upperroman" spacing="compact"> + <listitem> + <para> + Upper Roman. + </para> + <orderedlist numeration="arabic" spacing="compact"> + <listitem override="6"> + <para> + Decimal start with 6 + </para> + <orderedlist numeration="loweralpha" spacing="compact"> + <listitem override="3"> + <para> + Lower alpha with paren + </para> + </listitem> + </orderedlist> + </listitem> + </orderedlist> + </listitem> + </orderedlist> + </listitem> + </orderedlist> + <para> + Autonumbering: + </para> + <orderedlist spacing="compact"> + <listitem> + <para> + Autonumber. + </para> + </listitem> + <listitem> + <para> + More. + </para> + <orderedlist spacing="compact"> + <listitem> + <para> + Nested. + </para> + </listitem> + </orderedlist> + </listitem> + </orderedlist> + <para> + Should not be a list item: + </para> + <para> + M.A. 2007 + </para> + <para> + B. Williams + </para> + </section> +</section> +<section id="definition-lists"> + <title>Definition Lists</title> + <para> + Tight using spaces: + </para> + <variablelist spacing="compact"> + <varlistentry> + <term> + apple + </term> + <listitem> + <para> + red fruit + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + orange + </term> + <listitem> + <para> + orange fruit + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + banana + </term> + <listitem> + <para> + yellow fruit + </para> + </listitem> + </varlistentry> + </variablelist> + <para> + Tight using tabs: + </para> + <variablelist spacing="compact"> + <varlistentry> + <term> + apple + </term> + <listitem> + <para> + red fruit + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + orange + </term> + <listitem> + <para> + orange fruit + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + banana + </term> + <listitem> + <para> + yellow fruit + </para> + </listitem> + </varlistentry> + </variablelist> + <para> + Loose: + </para> + <variablelist> + <varlistentry> + <term> + apple + </term> + <listitem> + <para> + red fruit + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + orange + </term> + <listitem> + <para> + orange fruit + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + banana + </term> + <listitem> + <para> + yellow fruit + </para> + </listitem> + </varlistentry> + </variablelist> + <para> + Multiple blocks with italics: + </para> + <variablelist> + <varlistentry> + <term> + <emphasis>apple</emphasis> + </term> + <listitem> + <para> + red fruit + </para> + <para> + contains seeds, crisp, pleasant to taste + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + <emphasis>orange</emphasis> + </term> + <listitem> + <para> + orange fruit + </para> + <programlisting> +{ orange code block } +</programlisting> + <blockquote> + <para> + orange block quote + </para> + </blockquote> + </listitem> + </varlistentry> + </variablelist> + <para> + Multiple definitions, tight: + </para> + <variablelist spacing="compact"> + <varlistentry> + <term> + apple + </term> + <listitem> + <para> + red fruit + </para> + <para> + computer + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + orange + </term> + <listitem> + <para> + orange fruit + </para> + <para> + bank + </para> + </listitem> + </varlistentry> + </variablelist> + <para> + Multiple definitions, loose: + </para> + <variablelist> + <varlistentry> + <term> + apple + </term> + <listitem> + <para> + red fruit + </para> + <para> + computer + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + orange + </term> + <listitem> + <para> + orange fruit + </para> + <para> + bank + </para> + </listitem> + </varlistentry> + </variablelist> + <para> + Blank line after term, indented marker, alternate markers: + </para> + <variablelist> + <varlistentry> + <term> + apple + </term> + <listitem> + <para> + red fruit + </para> + <para> + computer + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + orange + </term> + <listitem> + <para> + orange fruit + </para> + <orderedlist numeration="arabic" spacing="compact"> + <listitem> + <para> + sublist + </para> + </listitem> + <listitem> + <para> + sublist + </para> + </listitem> + </orderedlist> + </listitem> + </varlistentry> + </variablelist> +</section> +<section id="html-blocks"> + <title>HTML Blocks</title> + <para> + Simple block on one line: + </para> + <para> + foo + </para> + <para> + And nested without indentation: + </para> + <para> + foo + </para> + <para> + bar + </para> + <para> + Interpreted markdown in a table: + </para> + This is <emphasis>emphasized</emphasis> + And this is <emphasis role="strong">strong</emphasis> + <para> + Here’s a simple block: + </para> + <para> + foo + </para> + <para> + This should be a code block, though: + </para> + <programlisting> +<div> + foo +</div> +</programlisting> + <para> + As should this: + </para> + <programlisting> +<div>foo</div> +</programlisting> + <para> + Now, nested: + </para> + <para> + foo + </para> + <para> + This should just be an HTML comment: + </para> + <para> + Multiline: + </para> + <para> + Code block: + </para> + <programlisting> +<!-- Comment --> +</programlisting> + <para> + Just plain comment, with trailing spaces on the line: + </para> + <para> + Code: + </para> + <programlisting> +<hr /> +</programlisting> + <para> + Hr’s: + </para> +</section> +<section id="inline-markup"> + <title>Inline Markup</title> + <para> + This is <emphasis>emphasized</emphasis>, and so <emphasis>is + this</emphasis>. + </para> + <para> + This is <emphasis role="strong">strong</emphasis>, and so + <emphasis role="strong">is this</emphasis>. + </para> + <para> + An <emphasis><link xlink:href="/url">emphasized link</link></emphasis>. + </para> + <para> + <emphasis role="strong"><emphasis>This is strong and + em.</emphasis></emphasis> + </para> + <para> + So is <emphasis role="strong"><emphasis>this</emphasis></emphasis> word. + </para> + <para> + <emphasis role="strong"><emphasis>This is strong and + em.</emphasis></emphasis> + </para> + <para> + So is <emphasis role="strong"><emphasis>this</emphasis></emphasis> word. + </para> + <para> + This is code: <literal>></literal>, <literal>$</literal>, + <literal>\</literal>, <literal>\$</literal>, + <literal><html></literal>. + </para> + <para> + <emphasis role="strikethrough">This is + <emphasis>strikeout</emphasis>.</emphasis> + </para> + <para> + Superscripts: a<superscript>bc</superscript>d + a<superscript><emphasis>hello</emphasis></superscript> + a<superscript>hello there</superscript>. + </para> + <para> + Subscripts: H<subscript>2</subscript>O, H<subscript>23</subscript>O, + H<subscript>many of them</subscript>O. + </para> + <para> + These should not be superscripts or subscripts, because of the unescaped + spaces: a^b c^d, a~b c~d. + </para> +</section> +<section id="smart-quotes-ellipses-dashes"> + <title>Smart quotes, ellipses, dashes</title> + <para> + <quote>Hello,</quote> said the spider. <quote><quote>Shelob</quote> is my + name.</quote> + </para> + <para> + <quote>A</quote>, <quote>B</quote>, and <quote>C</quote> are letters. + </para> + <para> + <quote>Oak,</quote> <quote>elm,</quote> and <quote>beech</quote> are names + of trees. So is <quote>pine.</quote> + </para> + <para> + <quote>He said, <quote>I want to go.</quote></quote> Were you alive in the + 70’s? + </para> + <para> + Here is some quoted <quote><literal>code</literal></quote> and a + <quote><link xlink:href="http://example.com/?foo=1&bar=2">quoted + link</link></quote>. + </para> + <para> + Some dashes: one—two — three—four — five. + </para> + <para> + Dashes between numbers: 5–7, 255–66, 1987–1999. + </para> + <para> + Ellipses…and…and…. + </para> +</section> +<section id="latex"> + <title>LaTeX</title> + <itemizedlist spacing="compact"> + <listitem> + <para> + </para> + </listitem> + <listitem> + <para> + 2 + 2 = 4 + </para> + </listitem> + <listitem> + <para> + <emphasis>x</emphasis> ∈ <emphasis>y</emphasis> + </para> + </listitem> + <listitem> + <para> + <emphasis>α</emphasis> ∧ <emphasis>ω</emphasis> + </para> + </listitem> + <listitem> + <para> + 223 + </para> + </listitem> + <listitem> + <para> + <emphasis>p</emphasis>-Tree + </para> + </listitem> + <listitem> + <para> + Here’s some display math: + $$\frac{d}{dx}f(x)=\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}$$ + </para> + </listitem> + <listitem> + <para> + Here’s one that has a line break in it: + <emphasis>α</emphasis> + <emphasis>ω</emphasis> × <emphasis>x</emphasis><superscript>2</superscript>. + </para> + </listitem> + </itemizedlist> + <para> + These shouldn’t be math: + </para> + <itemizedlist spacing="compact"> + <listitem> + <para> + To get the famous equation, write <literal>$e = mc^2$</literal>. + </para> + </listitem> + <listitem> + <para> + $22,000 is a <emphasis>lot</emphasis> of money. So is $34,000. (It + worked if <quote>lot</quote> is emphasized.) + </para> + </listitem> + <listitem> + <para> + Shoes ($20) and socks ($5). + </para> + </listitem> + <listitem> + <para> + Escaped <literal>$</literal>: $73 <emphasis>this should be + emphasized</emphasis> 23$. + </para> + </listitem> + </itemizedlist> + <para> + Here’s a LaTeX table: + </para> +</section> +<section id="special-characters"> + <title>Special Characters</title> + <para> + Here is some unicode: + </para> + <itemizedlist spacing="compact"> + <listitem> + <para> + I hat: Î + </para> + </listitem> + <listitem> + <para> + o umlaut: ö + </para> + </listitem> + <listitem> + <para> + section: § + </para> + </listitem> + <listitem> + <para> + set membership: ∈ + </para> + </listitem> + <listitem> + <para> + copyright: © + </para> + </listitem> + </itemizedlist> + <para> + AT&T has an ampersand in their name. + </para> + <para> + AT&T is another way to write it. + </para> + <para> + This & that. + </para> + <para> + 4 < 5. + </para> + <para> + 6 > 5. + </para> + <para> + Backslash: \ + </para> + <para> + Backtick: ` + </para> + <para> + Asterisk: * + </para> + <para> + Underscore: _ + </para> + <para> + Left brace: { + </para> + <para> + Right brace: } + </para> + <para> + Left bracket: [ + </para> + <para> + Right bracket: ] + </para> + <para> + Left paren: ( + </para> + <para> + Right paren: ) + </para> + <para> + Greater-than: > + </para> + <para> + Hash: # + </para> + <para> + Period: . + </para> + <para> + Bang: ! + </para> + <para> + Plus: + + </para> + <para> + Minus: - + </para> +</section> +<section id="links"> + <title>Links</title> + <section id="explicit"> + <title>Explicit</title> + <para> + Just a <link xlink:href="/url/">URL</link>. + </para> + <para> + <link xlink:href="/url/">URL and title</link>. + </para> + <para> + <link xlink:href="/url/">URL and title</link>. + </para> + <para> + <link xlink:href="/url/">URL and title</link>. + </para> + <para> + <link xlink:href="/url/">URL and title</link> + </para> + <para> + <link xlink:href="/url/">URL and title</link> + </para> + <para> + <link xlink:href="/url/with_underscore">with_underscore</link> + </para> + <para> + Email link (<email>nobody@nowhere.net</email>) + </para> + <para> + <link xlink:href="">Empty</link>. + </para> + </section> + <section id="reference"> + <title>Reference</title> + <para> + Foo <link xlink:href="/url/">bar</link>. + </para> + <para> + Foo <link xlink:href="/url/">bar</link>. + </para> + <para> + Foo <link xlink:href="/url/">bar</link>. + </para> + <para> + With <link xlink:href="/url/">embedded [brackets]</link>. + </para> + <para> + <link xlink:href="/url/">b</link> by itself should be a link. + </para> + <para> + Indented <link xlink:href="/url">once</link>. + </para> + <para> + Indented <link xlink:href="/url">twice</link>. + </para> + <para> + Indented <link xlink:href="/url">thrice</link>. + </para> + <para> + This should [not][] be a link. + </para> + <programlisting> +[not]: /url +</programlisting> + <para> + Foo <link xlink:href="/url/">bar</link>. + </para> + <para> + Foo <link xlink:href="/url/">biz</link>. + </para> + </section> + <section id="with-ampersands"> + <title>With ampersands</title> + <para> + Here’s a <link xlink:href="http://example.com/?foo=1&bar=2">link + with an ampersand in the URL</link>. + </para> + <para> + Here’s a link with an amersand in the link text: + <link xlink:href="http://att.com/">AT&T</link>. + </para> + <para> + Here’s an <link xlink:href="/script?foo=1&bar=2">inline link</link>. + </para> + <para> + Here’s an <link xlink:href="/script?foo=1&bar=2">inline link in + pointy braces</link>. + </para> + </section> + <section id="autolinks"> + <title>Autolinks</title> + <para> + With an ampersand: + <link xlink:href="http://example.com/?foo=1&bar=2">http://example.com/?foo=1&bar=2</link> + </para> + <itemizedlist spacing="compact"> + <listitem> + <para> + In a list? + </para> + </listitem> + <listitem> + <para> + <link xlink:href="http://example.com/">http://example.com/</link> + </para> + </listitem> + <listitem> + <para> + It should. + </para> + </listitem> + </itemizedlist> + <para> + An e-mail address: <email>nobody@nowhere.net</email> + </para> + <blockquote> + <para> + Blockquoted: + <link xlink:href="http://example.com/">http://example.com/</link> + </para> + </blockquote> + <para> + Auto-links should not occur here: + <literal><http://example.com/></literal> + </para> + <programlisting> +or here: <http://example.com/> +</programlisting> + </section> +</section> +<section id="images"> + <title>Images</title> + <para> + From <quote>Voyage dans la Lune</quote> by Georges Melies (1902): + </para> + <figure> + <title>lalune</title> + <mediaobject> + <imageobject> + <imagedata fileref="lalune.jpg" /> + </imageobject> + <textobject><phrase>lalune</phrase></textobject> + </mediaobject> + </figure> + <para> + Here is a movie <inlinemediaobject> + <imageobject> + <imagedata fileref="movie.jpg" /> + </imageobject> + </inlinemediaobject> icon. + </para> +</section> +<section id="footnotes"> + <title>Footnotes</title> + <para> + Here is a footnote reference,<footnote> + <para> + Here is the footnote. It can go anywhere after the footnote reference. + It need not be placed at the end of the document. + </para> + </footnote> and another.<footnote> + <para> + Here’s the long note. This one contains multiple blocks. + </para> + <para> + Subsequent blocks are indented to show that they belong to the + footnote (as with list items). + </para> + <programlisting> + { <code> } +</programlisting> + <para> + If you want, you can indent every line, but you can also be lazy and + just indent the first line of each block. + </para> + </footnote> This should <emphasis>not</emphasis> be a footnote reference, + because it contains a space.[^my note] Here is an inline note.<footnote> + <para> + This is <emphasis>easier</emphasis> to type. Inline notes may contain + <link xlink:href="http://google.com">links</link> and + <literal>]</literal> verbatim characters, as well as [bracketed text]. + </para> + </footnote> + </para> + <blockquote> + <para> + Notes can go in quotes.<footnote> + <para> + In quote. + </para> + </footnote> + </para> + </blockquote> + <orderedlist numeration="arabic" spacing="compact"> + <listitem> + <para> + And in list items.<footnote> + <para> + In list. + </para> + </footnote> + </para> + </listitem> + </orderedlist> + <para> + This paragraph should not be part of the note, as it is not indented. + </para> +</section> +</article> diff --git a/tests/writer.org b/tests/writer.org index 13bacdfa6..4c7f363a6 100644 --- a/tests/writer.org +++ b/tests/writer.org @@ -9,30 +9,60 @@ markdown test suite. -------------- * Headers + :PROPERTIES: + :CUSTOM_ID: headers + :END: ** Level 2 with an [[/url][embedded link]] + :PROPERTIES: + :CUSTOM_ID: level-2-with-an-embedded-link + :END: *** Level 3 with /emphasis/ + :PROPERTIES: + :CUSTOM_ID: level-3-with-emphasis + :END: **** Level 4 + :PROPERTIES: + :CUSTOM_ID: level-4 + :END: ***** Level 5 + :PROPERTIES: + :CUSTOM_ID: level-5 + :END: * Level 1 + :PROPERTIES: + :CUSTOM_ID: level-1 + :END: ** Level 2 with /emphasis/ + :PROPERTIES: + :CUSTOM_ID: level-2-with-emphasis + :END: *** Level 3 + :PROPERTIES: + :CUSTOM_ID: level-3 + :END: with no blank line ** Level 2 + :PROPERTIES: + :CUSTOM_ID: level-2 + :END: with no blank line -------------- * Paragraphs + :PROPERTIES: + :CUSTOM_ID: paragraphs + :END: Here's a regular paragraph. @@ -48,6 +78,9 @@ here. -------------- * Block Quotes + :PROPERTIES: + :CUSTOM_ID: block-quotes + :END: E-mail style: @@ -87,6 +120,9 @@ And a following paragraph. -------------- * Code Blocks + :PROPERTIES: + :CUSTOM_ID: code-blocks + :END: Code: @@ -111,8 +147,14 @@ And: -------------- * Lists + :PROPERTIES: + :CUSTOM_ID: lists + :END: ** Unordered + :PROPERTIES: + :CUSTOM_ID: unordered + :END: Asterisks tight: @@ -157,6 +199,9 @@ Minuses loose: - Minus 3 ** Ordered + :PROPERTIES: + :CUSTOM_ID: ordered + :END: Tight: @@ -197,6 +242,9 @@ Multiple paragraphs: 3. Item 3. ** Nested + :PROPERTIES: + :CUSTOM_ID: nested + :END: - Tab @@ -228,6 +276,9 @@ Same thing but with paragraphs: 3. Third ** Tabs and spaces + :PROPERTIES: + :CUSTOM_ID: tabs-and-spaces + :END: - this is a list item indented with tabs @@ -238,6 +289,9 @@ Same thing but with paragraphs: - this is an example list item indented with spaces ** Fancy list markers + :PROPERTIES: + :CUSTOM_ID: fancy-list-markers + :END: 2) begins with 2 3) and now 3 @@ -276,6 +330,9 @@ B. Williams -------------- * Definition Lists + :PROPERTIES: + :CUSTOM_ID: definition-lists + :END: Tight using spaces: @@ -342,6 +399,9 @@ Blank line after term, indented marker, alternate markers: 2. sublist * HTML Blocks + :PROPERTIES: + :CUSTOM_ID: html-blocks + :END: Simple block on one line: @@ -569,6 +629,9 @@ Hr's: -------------- * Inline Markup + :PROPERTIES: + :CUSTOM_ID: inline-markup + :END: This is /emphasized/, and so /is this/. @@ -598,6 +661,9 @@ spaces: a\^b c\^d, a~b c~d. -------------- * Smart quotes, ellipses, dashes + :PROPERTIES: + :CUSTOM_ID: smart-quotes-ellipses-dashes + :END: "Hello," said the spider. "'Shelob' is my name." @@ -619,6 +685,9 @@ Ellipses...and...and.... -------------- * LaTeX + :PROPERTIES: + :CUSTOM_ID: latex + :END: - \cite[22-23]{smith.1899} - $2+2=4$ @@ -649,6 +718,9 @@ Cat & 1 \\ \hline -------------- * Special Characters + :PROPERTIES: + :CUSTOM_ID: special-characters + :END: Here is some unicode: @@ -703,8 +775,14 @@ Minus: - -------------- * Links + :PROPERTIES: + :CUSTOM_ID: links + :END: ** Explicit + :PROPERTIES: + :CUSTOM_ID: explicit + :END: Just a [[/url/][URL]]. @@ -725,6 +803,9 @@ Just a [[/url/][URL]]. [[][Empty]]. ** Reference + :PROPERTIES: + :CUSTOM_ID: reference + :END: Foo [[/url/][bar]]. @@ -753,6 +834,9 @@ Foo [[/url/][bar]]. Foo [[/url/][biz]]. ** With ampersands + :PROPERTIES: + :CUSTOM_ID: with-ampersands + :END: Here's a [[http://example.com/?foo=1&bar=2][link with an ampersand in the URL]]. @@ -764,6 +848,9 @@ Here's an [[/script?foo=1&bar=2][inline link]]. Here's an [[/script?foo=1&bar=2][inline link in pointy braces]]. ** Autolinks + :PROPERTIES: + :CUSTOM_ID: autolinks + :END: With an ampersand: [[http://example.com/?foo=1&bar=2]] @@ -786,6 +873,9 @@ Auto-links should not occur here: =<http://example.com/>= -------------- * Images + :PROPERTIES: + :CUSTOM_ID: images + :END: From "Voyage dans la Lune" by Georges Melies (1902): @@ -797,6 +887,9 @@ Here is a movie [[movie.jpg]] icon. -------------- * Footnotes + :PROPERTIES: + :CUSTOM_ID: footnotes + :END: Here is a footnote reference, [1] and another. [2] This should /not/ be a footnote reference, because it contains a space.[\^my note] Here is an inline diff --git a/tests/writers-lang-and-dir.latex b/tests/writers-lang-and-dir.latex index 056809a5e..346675353 100644 --- a/tests/writers-lang-and-dir.latex +++ b/tests/writers-lang-and-dir.latex @@ -27,16 +27,16 @@ breaklinks=true} \urlstyle{same} % don't use monospace font for urls \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex - \usepackage[shorthands=off,ngerman,british,ngerman,spanish,french,main=english]{babel} + \usepackage[shorthands=off,ngerman,british,nswissgerman,spanish,french,main=english]{babel} \newcommand{\textgerman}[2][]{\foreignlanguage{ngerman}{#2}} - \newenvironment{german}[1]{\begin{otherlanguage}{ngerman}}{\end{otherlanguage}} + \newenvironment{german}[2][]{\begin{otherlanguage}{ngerman}}{\end{otherlanguage}} \newcommand{\textenglish}[2][]{\foreignlanguage{british}{#2}} - \newenvironment{english}[1]{\begin{otherlanguage}{british}}{\end{otherlanguage}} + \newenvironment{english}[2][]{\begin{otherlanguage}{british}}{\end{otherlanguage}} \let\oritextspanish\textspanish \AddBabelHook{spanish}{beforeextras}{\renewcommand{\textspanish}{\oritextspanish}} \AddBabelHook{spanish}{afterextras}{\renewcommand{\textspanish}[2][]{\foreignlanguage{spanish}{##2}}} \newcommand{\textfrench}[2][]{\foreignlanguage{french}{#2}} - \newenvironment{french}[1]{\begin{otherlanguage}{french}}{\end{otherlanguage}} + \newenvironment{french}[2][]{\begin{otherlanguage}{french}}{\end{otherlanguage}} \else \usepackage{polyglossia} \setmainlanguage[]{english} |