summaryrefslogtreecommitdiff
path: root/src/Text/Pandoc/Readers/HTML.hs
Commit message (Collapse)AuthorAge
* hlint code improvements.John MacFarlane2018-01-19
|
* HTML reader: Fix col width parsing for percentages < 10% (#4262)n3fariox2018-01-15
| | | | | Rather than take user input, and place a "0." in front, actually calculate the percentage to catch cases where small column sizes (e.g. `2%`) are needed.
* Update copyright notices to include 2018Albert Krewinkel2018-01-05
|
* Fix warning.John MacFarlane2017-12-27
|
* Small improvement to figcaption parsing. #4184.John MacFarlane2017-12-27
|
* Merge pull request #4184 from mb21/html-reader-figcaptionJohn MacFarlane2017-12-27
|\ | | | | HTML Reader: be more forgiving about figcaption
| * HTML Reader: be more forgiving about figcaptionmb212017-12-23
| | | | | | | | fixes #4183
* | HTML reader: parse div with class `line-block` as LineBlock.John MacFarlane2017-12-27
|/ | | | See #4162.
* Markdown reader: accept processing instructions as raw HTML.John MacFarlane2017-12-06
| | | | Closes #4125.
* Add `empty_paragraphs` extension.John MacFarlane2017-12-04
| | | | | | | | | | | | | | | | * Deprecate `--strip-empty-paragraphs` option. Instead we now use an `empty_paragraphs` extension that can be enabled on the reader or writer. By default, disabled. * Add `Ext_empty_paragraphs` constructor to `Extension`. * Revert "Docx reader: don't strip out empty paragraphs." This reverts commit d6c58eb836f033a48955796de4d9ffb3b30e297b. * Implement `empty_paragraphs` extension in docx reader and writer, opendocument writer, html reader and writer. * Add tests for `empty_paragraphs` extension.
* Fix comment typo: s/elemnet/element/Alexander Krotov2017-11-25
|
* HTML reader: ensure we don't produce level 0 headers,John MacFarlane2017-11-18
| | | | | | | | | | | | even for chapter sections in epubs. This causes problems because writers aren't set up to expect these. This fixes the most immediate problem in #4076. It would be good to think more about how to propagate the information that top-level headers are chapters from the reader to the writer.
* HTML reader: hlintAlexander Krotov2017-11-10
|
* Really fix #3989.John MacFarlane2017-11-01
| | | | | The previous fix only worked in certain cases. Other cases with `>` in an HTML attribute broke.
* hlintAlexander Krotov2017-11-01
|
* Fixed regression in parsing of HTML comments in markdown...John MacFarlane2017-10-31
| | | | | | | | and other non-HTML formats (`Text.Pandoc.Readers.HTML.htmlTag`). The parser stopped at the first `>` character, even if it wasn't the end of the comment. Closes #4019.
* Source code reformatting.John MacFarlane2017-10-29
|
* Consistent underline for Readers (#2270)hftf2017-10-27
| | | | | | | | | | | | | | * Added underlineSpan builder function. This can be easily updated if needed. The purpose is for Readers to transform underlines consistently. * Docx Reader: Use underlineSpan and update test * Org Reader: Use underlineSpan and add test * Textile Reader: Use underlineSpan and add test case * Txt2Tags Reader: Use underlineSpan and update test * HTML Reader: Use underlineSpan and add test case
* HTML reader: close td/th should close any open block tag...John MacFarlane2017-10-24
| | | | Closes #3991.
* HTML reader: td should close an open th or td.John MacFarlane2017-10-24
|
* Revert "HTML reader: td or th implicitly closes blocks within last td/th."John MacFarlane2017-10-24
| | | | This reverts commit d2c4243f89a6368d4f9f8a511d9b026d0be19cd8.
* HTML reader: td or th implicitly closes blocks within last td/th.John MacFarlane2017-10-24
|
* HTML reader: `htmlTag` improvements.John MacFarlane2017-10-23
| | | | | | | We previously failed on cases where an attribute contained a `>` character. This patch fixes the bug. Closes #3989.
* Added `--strip-comments` option, `readerStripComments` in `ReaderOptions`.John MacFarlane2017-09-17
| | | | | | | | | | * Options: Added readerStripComments to ReaderOptions. * Added `--strip-comments` command-line option. * Made `htmlTag` from the HTML reader sensitive to this feature. This affects Markdown and Textile input. Closes #2552.
* HTML reader: Fix pattern match.John MacFarlane2017-09-04
|
* HTML reader: improved handling of figure.John MacFarlane2017-08-30
| | | | | Previously we had a parse failure if the figure contained anything besides an image and caption.
* HTML reader: support column alignments.John MacFarlane2017-08-17
| | | | | | | These can be set either with a `width` attribute or with `text-width` in a `style` attribute. Closes #1881.
* HTML reader: parse <main> like <div role=main>. (#3791)bucklereed2017-08-09
| | | | | | * HTML reader: parse <main> like <div role=main>. * <main> closes <p> and behaves like a block element generally
* HTML Reader: parse figure and figcaption (#3813)Mauro Bieg2017-07-22
|
* HTML reader: Ensure that paragraphs are closed properly...John MacFarlane2017-07-11
| | | | | | when the parent block element closes, even without `</p>`. Closes #3794.
* HTML reader: Use the lang value of <html> to set the lang meta value. (#3765)bucklereed2017-06-27
| | | | | | * HTML reader: Use the lang value of <html> to set the lang meta value. * Fix for pre-AMP environments.
* Move CR filtering from tabFilter to the readers.John MacFarlane2017-06-20
| | | | | | | | | | The readers previously assumed that CRs had been filtered from the input. Now we strip the CRs in the readers themselves, before parsing. (The point of this is just to simplify the parsers.) Shared now exports a new function `crFilter`. [API change] And `tabFilter` no longer filters CRs.
* Separated tracing from logging.John MacFarlane2017-06-19
| | | | | | | | | | | Formerly tracing was just log messages with a DEBUG log level. We now make these things independent. Tracing can be turned on or off in PandocMonad using `setTrace`; it is independent of logging. * Removed `DEBUG` from `Verbosity`. * Removed `ParserTrace` from `LogMessage`. * Added `trace`, `setTrace` to `PandocMonad`.
* Rewrote HTML reader to use Text throughout.John MacFarlane2017-06-11
| | | | | - Export new NamedTag class from HTML reader. - Effect on memory usage is modest (< 10%).
* Changed all readers to take Text instead of String.John MacFarlane2017-06-10
| | | | | | | | Readers: Renamed StringReader -> TextReader. Updated tests. API change.
* Fixed HTML reader.John MacFarlane2017-06-02
|
* HTML reader: Use sets instead of lists for block tag lookup.John MacFarlane2017-06-01
|
* HTML reader: Removed "button" from block tag list.John MacFarlane2017-06-01
| | | | | | | | | | | | | It is already in the eitherBlockOrInlineTag list, and should be both places. Closes #3717. Note: the result of this change is that there will be p tags around the whole paragraph. That is the right result, because the `button` tags are treated as inline HTML here, and the whole chunk of text is a Markdown paragraph.
* HTML reader: Add `details` tag to list of block tags.John MacFarlane2017-05-24
| | | | Closes #3694.
* Update dates in copyright noticesAlbert Krewinkel2017-05-13
| | | | | This follows the suggestions given by the FSF for GPL licensed software. <https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html>
* HTML reader: Revise treatment of li with id attribute.John MacFarlane2017-04-23
| | | | | | | | | | | | | | | | Previously we always added an empty div before the list item, but this created problems with spacing in tight lists. Now we do this: If the list item contents begin with a Plain block, we modify the Plain block by adding a Span around its contents. Otherwise, we add a Div around the contents of the list item (instead of adding an empty Div to the beginning, as before). Closes #3596.
* HTML reader: Better sanity checks on raw HTML.John MacFarlane2017-03-18
| | | | | | This also affects the Markdown reader. Closes #3257.
* Issue warning for duplicate header identifiers.John MacFarlane2017-03-12
| | | | | | | | | | | | | | | As noted in the previous commit, an autogenerated identifier may still coincide with an explicit identifier that is given for a header later in the document, or with an identifier on a div, span, link, or image. This commit adds a warning in this case, so users can supply an explicit identifier. * Added `DuplicateIdentifier` to LogMessage. * Modified HTML, Org, MediaWiki readers so their custom state type is an instance of HasLogMessages. This is necessary for `registerHeader` to issue warnings. See #1745.
* Fixed some loose ends in #1592.John MacFarlane2017-03-04
| | | | | | | | | | Added test cases. Fixed HTML reader to parse a span with class "smallcaps" as SmallCaps. Fixed Markdown writer to render SmallCaps as a native span when native spans are enabled.
* Tighten up HasQuoteContext instance in HTML reader.John MacFarlane2017-02-20
| | | | | | We constrain it to the state used in the HTML reader. Otherwise we can get overlap with the general instance for ParserState m.
* Use new warnings throughout the code base.John MacFarlane2017-02-11
|
* Added Text.Pandoc.Logging (exported module).John MacFarlane2017-02-10
| | | | | | | | | | | | This now contains the Verbosity definition previously in Options, as well as a new LogMessage datatype that will eventually be used instead of raw strings for warnings. This will enable us, among other things, to provide machine-readable warnings if desired. See #3392.
* HTML reader: Added warnings for ignored material.John MacFarlane2017-02-10
| | | | See #3392.
* Removed --parse-raw and readerParseRaw.John MacFarlane2017-02-06
| | | | | | | | | | | | | | | | | | | | | | | These were confusing. Now we rely on the +raw_tex or +raw_html extension with latex or html input. Thus, instead of --parse-raw -f latex we use -f latex+raw_tex and instead of --parse-raw -f html we use -f html+raw_html
* More logging-related changes.John MacFarlane2017-01-25
| | | | | | | | | | | | | | | | | | Class: * Removed getWarnings, withWarningsToStderr * Added report * Added logOutput to PandocMonad * Make logOutput streaming in PandocIO monad * Properly reverse getLog output Readers: * Replaced use of trace with report DEBUG. TWiki Reader: Put everything inside PandocMonad m. API changes.