summaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers/HTML.hs
Commit message (Collapse)AuthorAge
* HTML reader: Fix performance issue with malformed HTML tables.John MacFarlane2014-06-20
| | | | | We let a `</table>` tag close an open `<tr>` or `<td>`. Closes #1167.
* Support --trace in HTML reader.John MacFarlane2014-06-20
|
* HTML reader: Allow space between `<col>` and `</col>`.John MacFarlane2014-06-19
| | | | | | | | | | | | | | | | | | | | | | | | Test case: ``` <table border="1"> <colgroup> <col> </col> <col></col> </colgroup> <tbody> <tr> <td>X</td> <td>Y</td> </tr> <tr> <td>1</td> <td>2</td> </tr> </tbody> </table> ```
* HTML reader: Fixed major parsing problem with HTML tables.John MacFarlane2014-06-16
| | | | Table cells were being combined into one cell. Closes #1341.
* Moved extractSpaces to Shared.hsmpickering2014-06-16
| | | | | Generalised and move the extractSpaces function from `HTML.hs` to `Shared.hs` so that the docx reader can also use it.
* Update copyright notices for 2014, add missing noticesAlbert Krewinkel2014-05-09
|
* HTML reader: Treat processing instructions & declarations as block.John MacFarlane2014-04-11
| | | | | | | Previously these were treated as inline, and included in paragraph tags in HTML or DocBook output, which is generally not what is wanted. Closes #1233.
* HTML reader: Updated `closes` with rules from HTML5 spec.John MacFarlane2014-04-05
|
* HTML reader: idiomatic rewriting for clarity.John MacFarlane2014-04-01
|
* Converted HTML reader to use builder. Fixes #1162.Matthew Pickering2014-04-01
|
* HTML reader: Fixed bug reading inline math with `$$`.John MacFarlane2014-01-20
| | | | See #225.
* HTML reader: Parse name/content pairs from meta tags as metadata.John MacFarlane2014-01-01
| | | | Closes #1106.
* HLint: use fromMaybeHenry de Valence2013-12-19
| | | | Replace uses of `maybe x id` with `fromMaybe x`.
* HTML reader: Parse LaTeX math if appropriate options are set.John MacFarlane2013-12-06
| | | | | | * Moved inlineMath, displayMath from Markdown reader to Parsing. * Export them from Parsing. (API change.) * Generalize their types.
* recognize svg tag in HTML ReaderMinRK2013-11-07
| | | | avoids adding lots of `<p>` tags in embedded SVG content, for instance in markdown to HTML.
* HTML reader: Use pandoc Div and Span for raw "<div>", "<span>".John MacFarlane2013-11-03
| | | | Only if --parse-raw.
* Adjustments for new Format newtype.John MacFarlane2013-08-10
|
* HTML reader: read widths from col tags if present.John MacFarlane2013-07-16
| | | | Closes #893.
* HTML reader: Handle non-simple tables (#893).John MacFarlane2013-07-16
| | | | | | Column widths are divided equally. TODO: Get column widths from col tags if present.
* HTML reader: Generalized table parser.John MacFarlane2013-07-16
| | | | | This commit doesn't change the present behavior at all, but it will make it easier to support non-simple tables in the future.
* Use new flexible metadata type.John MacFarlane2013-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | * Depend on pandoc 1.12. * Added yaml dependency. * `Text.Pandoc.XML`: Removed `stripTags`. (API change.) * `Text.Pandoc.Shared`: Added `metaToJSON`. This will be used in writers to create a JSON object for use in the templates from the pandoc metadata. * Revised readers and writers to use the new Meta type. * `Text.Pandoc.Options`: Added `Ext_yaml_title_block`. * Markdown reader: Added support for YAML metadata block. Note that it must come at the beginning of the document. * `Text.Pandoc.Parsing.ParserState`: Replace `stateTitle`, `stateAuthors`, `stateDate` with `stateMeta`. * RST reader: Improved metadata. Treat initial field list as metadata when standalone specified. Previously ALL fields "title", "author", "date" in field lists were treated as metadata, even if not at the beginning. Use `subtitle` metadata field for subtitle. * `Text.Pandoc.Templates`: Export `renderTemplate'` that takes a string instead of a compiled template.. * OPML template: Use 'for' loop for authors. * Org template: '#+TITLE:' is inserted before the title. Previously the writer did this.
* Parsing: Better error reporting in readWith.John MacFarlane2013-03-28
| | | | | | | - Specialize readWith to String input. - On error have it print the line in which the error occurred, with a caret pointing to the column. - This should help diagnose parsing problems in LaTeX especially.
* HTML reader: Preserve all header attributes.John MacFarlane2013-02-16
|
* HTML reader: Handle colgroup tag.John MacFarlane2013-01-30
|
* HTML reader: Added html5 tags to list of block-level tags.John MacFarlane2013-01-12
|
* Added Attr field to Header.John MacFarlane2013-01-09
| | | | | | | | | | Previously header ids were autogenerated by the writers. Now they are generated (unless supplied explicitly) in the markdown parser, if the `header_identifiers` extension is selected. In addition, the textile reader now supports id attributes on headers.
* HTML reader: Modified htmlTag for fewer false positives.John MacFarlane2012-09-15
| | | | | | A tag must start with `<` followed by `!`,`?`, `/`, or a letter. This makes it more useful in the wikimedia and markdown parsers.
* MediaWiki reader: Use MWState instead of ParserState.John MacFarlane2012-09-13
|
* HTML reader: Handle nested `<q>` tags properly.John MacFarlane2012-09-09
|
* HTML reader: Parse <q> as Quoted DoubleQuote.John MacFarlane2012-09-09
|
* Moved renderTags' from HTML reader & SelfContained to Shared.John MacFarlane2012-08-15
| | | | Improved removal of markdown="1" attribute in Markdow reader.
* Fixed whitespace errors.John MacFarlane2012-07-26
|
* Use readerExtensions instead of readerStrict in readers.John MacFarlane2012-07-26
| | | | Test individually for the extensions.
* Changed reader parameters from ParserState to ReaderOptions.John MacFarlane2012-07-25
|
* Moved ParseRaw from ParserState to ReaderOptions.John MacFarlane2012-07-25
|
* Options -> ReaderOptions.John MacFarlane2012-07-25
| | | | Better to keep reader and writer options separate.
* Put smart, strict in separate options field in state.John MacFarlane2012-07-25
| | | | | | | | | This is the beginning of a larger transition that will make Options, not ParserState, the parameter of the read functions. (Options will also be used in writers, in place of WriterOptions.) Next step is to remove strict, replacing it with granular tests for different extensions.
* HTML reader: Fixed bug in htmlBalanced.John MacFarlane2012-07-24
| | | | This caused hangs in parsing certain markdown input using --strict.
* Use Parser as type synonym for Parsec.John MacFarlane2012-07-20
|
* Text.Pandoc.Parsing: Export all Parsec functions used in pandoc code.John MacFarlane2012-07-20
| | | | | No other module directly imports Parsec. This will make it easier to change the parsing backend in the future, if we want to.
* Use Text.Parsec instead of Text.ParserCombinators.Parsec.John MacFarlane2012-07-20
|
* HTML reader: Support `<col>` and `<caption>` in tables.John MacFarlane2012-04-29
| | | | Closes #486.
* HTML reader: Don't skip nonbreaking spaces.John MacFarlane2012-04-28
| | | | | Previously a paragraph containing just `&nbsp;` would be rendered as an empty paragraph. Thanks to Paul Vorbach for pointing out the bug.
* Don't escape `<` in `<style>` tags with `--self-contained`.John MacFarlane2012-02-17
| | | | Closes #422: highlighting lost using `--self-contained`.
* Added "title" to list of docbook block-level tags.John MacFarlane2012-01-12
|
* Better smart quote parsing.John MacFarlane2011-12-29
| | | | | | | | | | | * Added stateLastStrPos to ParserState. This lets us keep track of whether we're parsing the position immediately after a 'str'. If we encounter a ' in such a location, it must be an apostrophe, and can't be a single quote start. * Set this in the markdown, textile, html, and rst str parsers. * Closes #360.
* HTML reader now recognizes DocBook block and inline tags.John MacFarlane2011-10-25
| | | | | | | | | | | | It was always possible to include raw DocBook tags in a markdown document, but now pandoc will be able to distinguish block from inline tags and behave accordingly. Thus, for example, <sidebar> hello </sidebar> will not be wrapped in `<para>` tags.
* HTML reader: Fixed bug parsing tables w both thead and tbody.John MacFarlane2011-08-01
| | | | See bug #274, which was not completely fixed by the last patch.
* Properly handle characters in the 128..159 range.John MacFarlane2011-07-23
| | | | | | These aren't valid in HTML, but many HTML files produced by Windows tools contain them. We substitute correct unicode characters.
* HTML reader: treat Plain as Para when needed.John MacFarlane2011-07-16
| | | | | | | | | | | | | For example, in Just a few glitches remaining. <ul><li> In this situation, one loses the list. </ul> And in this, the preformatting. <pre>Preformatted text not starting with its own blank line. </pre> Thansk to Dirk Laurie for noticing the issue.