New HTML reader using tagsoup as a lexer.

* The new reader is faster and more accurate. * API changes for Text.Pandoc.Readers.HTML: - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag, anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType, htmlBlockElement, htmlComment - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag * tagsoup is a new dependency. * Text.Pandoc.Parsing: Generalized type on readWith. * Benchmark.hs: Added length calculation to force full evaluation. * Updated HTML reader tests. * Updated markdown and textile readers to use the functions from the HTML reader. * Note: The markdown reader now correctly handles some cases it did not before. For example: <hr/> is reproduced without adding a space. <script> a = '<b>'; </script> is parsed correctly.
author: John MacFarlane <jgm@berkeley.edu> 2010-12-22 20:25:15 -0800
committer: John MacFarlane <jgm@berkeley.edu> 2010-12-30 13:55:40 -0800
commit: 904050fa36715e18522d80432a2666fcbaacd105 (patch)
tree: 4745876e797d400539dd80309d31c330a013e969 /Benchmark.hs
parent: 220fe5fab89ce84fcb98f0430c4126281ca8362d (diff)
1 files changed, 5 insertions, 2 deletions
diff --git a/Benchmark.hs b/Benchmark.hs
index 67c790526..0f3520fde 100644
--- a/Benchmark.hs
+++ b/Benchmark.hs
@@ -13,8 +13,11 @@ readerBench doc (name, reader) =
       inp = writer defaultWriterOptions{ writerWrapText = True
                                        , writerLiterateHaskell =
                                           "+lhs" `isSuffixOf` name } doc
-  in  bench (name ++ " reader") $ whnf
-        (reader defaultParserState{stateSmart = True
+      -- we compute the length to force full evaluation
+      getLength (Pandoc (Meta a b c) d) =
+            length a + length b + length c + length d
+  in  bench (name ++ " reader") $ whnf (getLength .
+         reader defaultParserState{ stateSmart = True
                                   , stateStandalone = True
                                   , stateLiterateHaskell =
                                       "+lhs" `isSuffixOf` name }) inp
author	John MacFarlane <jgm@berkeley.edu>	2010-12-22 20:25:15 -0800
committer	John MacFarlane <jgm@berkeley.edu>	2010-12-30 13:55:40 -0800
commit	904050fa36715e18522d80432a2666fcbaacd105 (patch)
tree	4745876e797d400539dd80309d31c330a013e969 /Benchmark.hs
parent	220fe5fab89ce84fcb98f0430c4126281ca8362d (diff)