summaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Parsing.hs
Commit message (Collapse)AuthorAge
...
* Parsing: Make characterReference fail if entity not found.John MacFarlane2012-02-05
|
* Removed module Text.Pandoc.CharacterReferences.John MacFarlane2012-02-05
| | | | | | Moved characterReference parser to Text.Pandoc.Parsing. decodeCharacterReferences is now replaced by fromEntities in Text.Pandoc.XML.
* Complete rewrite of LaTeX reader.John MacFarlane2012-02-04
| | | | | | | | | | | | | | | | | | | | | | | * The new reader is more robust, accurate, and extensible. It is still quite incomplete, but it should be easier now to add features. * Text.Pandoc.Parsing: Added withRaw combinator. * Markdown reader: do escapedChar before raw latex inline. Otherwise we capture commands like \{. * Fixed latex citation tests for new citeproc. * Handle \include{} commands in latex. This is done in pandoc.hs, not the (pure) latex reader. But the reader exports the needed function, handleIncludes. * Moved err and warn from pandoc.hs to Shared. * Fixed tests - raw tex should sometimes have trailing space. * Updated lhs-test for highlighting-kate changes.
* Fixed table parsing with wide or combining characters.John MacFarlane2012-01-27
| | | | Closes #348. Closes #108.
* New treatment of dashes in --smart mode.John MacFarlane2012-01-01
| | | | | | | | | | | | * `---` is always em-dash, `--` is always en-dash. * pandoc no longer tries to guess when `-` should be en-dash. * A new option, `--old-dashes`, is provided for legacy documents. Rationale: The rules for en-dash are too complex and language-dependent for a guesser to work reliably. This change gives users greater control. The alternative of using unicode isn't very good, since unicode em- and en- dashes are barely distinguishable in a monospace font.
* Better smart quote parsing.John MacFarlane2011-12-29
| | | | | | | | | | | * Added stateLastStrPos to ParserState. This lets us keep track of whether we're parsing the position immediately after a 'str'. If we encounter a ' in such a location, it must be an apostrophe, and can't be a single quote start. * Set this in the markdown, textile, html, and rst str parsers. * Closes #360.
* Replaced Apostrophe, Ellipses, EmDash, EnDash w/ unicode strings.John MacFarlane2011-12-27
|
* Pretty: return Str with unicode instead of Apostrophe.John MacFarlane2011-12-27
|
* Parsing: Removed charsInBalanced', added param to charsInBalanced.John MacFarlane2011-12-05
| | | | | The extra parameter is a character parser. This is needed for proper handling of escapes, etc.
* Parsing: Changed type of escaped to return CharJohn MacFarlane2011-12-05
|
* Added nonspaceChar to Text.Pandoc.Parsing.John MacFarlane2011-07-30
|
* Smart quotes: handle '...hi' properly.John MacFarlane2011-07-25
| | | | Also added test case.
* Properly handle characters in the 128..159 range.John MacFarlane2011-07-23
| | | | | | These aren't valid in HTML, but many HTML files produced by Windows tools contain them. We substitute correct unicode characters.
* Revert "Parsing: Use new type aliases, PandocParser, GeneralParser."John MacFarlane2011-04-29
| | | | This reverts commit ec5410bc4e9d228b7dc0123061d80f9addf825bf.
* Parsing: Use new type aliases, PandocParser, GeneralParser.John MacFarlane2011-04-29
| | | | This should make it easier to change the types later.
* Changed uri parser so it doesn't include trailing punctuation.John MacFarlane2011-03-18
| | | | | | | | | | | | | | | So, in RST, 'http://google.com.' should be parsed as a link to 'http://google.com' followed by a period. The parser is smart enough to recognize balanced parentheses, as often occur in wikipedia links: 'http://foo.bar/baz_(bam)'. Also added ()s to RST specialChars, so '(http://google.com)' will be parsed as a link in parens. Added test cases. Resolves Issue #291.
* Add support for attributes in inline Code.John MacFarlane2011-01-26
| | | | | | | | Additional related changes: * URLs in Code in autolinks now use class "url". * Require highlighting-kate 0.2.8.2, which omits the final <br/> tag, essential for inline code.
* Bumped version to 1.8; depend on pandoc-types 1.8.John MacFarlane2011-01-26
| | | | | | | The old TeX, HtmlInline and RawHtml elements have been removed and replaced by generic RawInline and RawBlock elements. All modules updated to use the new raw elements.
* More small parser rewrites for small performance gains.John MacFarlane2011-01-19
|
* Parsing: Rewrote spaceChar for significant speedup in readers.John MacFarlane2011-01-19
|
* Parsing: Fixed bug in grid table parser.John MacFarlane2011-01-14
| | | | | Spaces at end of line were not being stripped properly, resulting in unintended LineBreaks.
* Fixed macro parsing.John MacFarlane2011-01-05
|
* Moved 'macro' and 'applyMacros'' from markdown reader to Parsing.John MacFarlane2011-01-04
|
* New HTML reader using tagsoup as a lexer.John MacFarlane2010-12-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * The new reader is faster and more accurate. * API changes for Text.Pandoc.Readers.HTML: - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag, anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType, htmlBlockElement, htmlComment - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag * tagsoup is a new dependency. * Text.Pandoc.Parsing: Generalized type on readWith. * Benchmark.hs: Added length calculation to force full evaluation. * Updated HTML reader tests. * Updated markdown and textile readers to use the functions from the HTML reader. * Note: The markdown reader now correctly handles some cases it did not before. For example: <hr/> is reproduced without adding a space. <script> a = '<b>'; </script> is parsed correctly.
* Use functions from Text.Pandoc.Generic instead of processWith(M).John MacFarlane2010-12-24
|
* Added new prettyprinting module.John MacFarlane2010-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * Added Text.Pandoc.Pretty. This is better suited for pandoc than the 'pretty' package. One advantage is that we now get proper wrapping; Emph [Inline] is no longer treated as a big unwrappable unit. Previously we only got breaks for spaces at the "outer level." We can also more easily avoid doubled blank lines. Performance is significantly better as well. * Removed Text.Pandoc.Blocks. Text.Pandoc.Pretty allows you to define blocks and concatenate them. * Modified markdown, RST, org readers to use Text.Pandoc.Pretty instead of Text.PrettyPrint.HughesPJ. * Text.Pandoc.Shared: Added writerColumns to WriterOptions. * Markdown, RST, Org writers now break text at writerColumns. * Added --columns command-line option, which sets stColumns and writerColumns. * Table parsing: If the size of the header > stColumns, use the header size as 100% for purposes of calculating relative widths of columns.
* Removed HTML sanitization.John MacFarlane2010-12-10
| | | | | | | | | This is better done on the resulting HTML; use the xss-sanitize library for this. xss-sanitize is based on pandoc's sanitization, but improves it. - Removed stateSanitize from ParserState. - Removed --sanitize-html option.
* Smart punctuation: recognize entities.John MacFarlane2010-12-07
| | | | Now &ldquo;Hi&rdquo; gets parsed as a Quoted DoubleQuote inline.
* Smart punctuation: don't alllow ellipses containing spaces.John MacFarlane2010-12-07
| | | | | | Previously we allowed '. . .', ' . . . ', etc. This caused too many complications, and removed author's flexibility in combining ellipses with spaces and periods.
* Moved smartPunctuation from Markdown to Parsing.John MacFarlane2010-12-07
| | | | | + Parameterized smartPunctuation on an inline parser. + Handle smartPunctuation in Textile reader.
* Fix regression: markdown references should be case-insensitive.John MacFarlane2010-12-05
| | | | | | | | | | | | | | | | This broke when we added the Key type. We had assumed that the custom case-insensitive Ord instance would ensure case-insensitive matching, but that is not how Data.Map works. * Added a test case for case-insensitivity in markdown-reader-more * Removed old refsMatch from Text.Pandoc.Parsing module; * hid the 'Key' constructor; * dropped the custom Ord and Eq instances, deriving instead; * added fromKey and toKey to convert between Keys and Inline lists; * toKey ensures that keys are case-insensitive, since this is the only way the API provides to construct a Key. Resolves Issue #272.
* Removed CITEPROC CPP conditionals from library code.John MacFarlane2010-11-06
| | | | By Cabal policy, the API should not change depending on flags.
* Process LaTeX macros in markdown, and apply to TeX math.John MacFarlane2010-10-26
| | | | | | | | | | | Example: \newcommand{\plus}[2]{#1 + #2} $\plus{3}{4}$ yields: 3+4
* Parse \chapter{} in latex.John MacFarlane2010-07-13
| | | | | | | + Added stateHasChapters to ParserState. + If a \chapter command is encountered, this is set to True and subsequent \section commands (etc.) will be bumped up one level.
* Merge branch 'atlists'. Added auto-numbered example lists.John MacFarlane2010-07-11
|
* Allow language-neutral table captions.John MacFarlane2010-07-06
| | | | | | + Captions may now begin simply with ':', instead of 'Table:' + Captions may now appear either above or below the table. + Resolves Issue #227.
* More refactoring of grid table code.John MacFarlane2010-07-05
|
* Minor reformatting.John MacFarlane2010-07-05
|
* Moved generic grid table functions from RST reader -> Parsing.John MacFarlane2010-07-05
| | | | Here they can be used by the Markdown reader as well.
* Moved parsing functions from Text.Pandoc.Shared to new module.John MacFarlane2010-07-05
+ Text.Pandoc.Parsing