Markdown reader: use CommonMark rules for list item nesting.

Closes #3511. Previously pandoc used the four-space rule: continuation paragraphs, sublists, and other block level content had to be indented 4 spaces. Now the indentation required is determined by the first line of the list item: to be included in the list item, blocks must be indented to the level of the first non-space content after the list marker. Exception: if are 5 or more spaces after the list marker, then the content is interpreted as an indented code block, and continuation paragraphs must be indented two spaces beyond the end of the list marker. See the CommonMark spec for more details and examples. Documents that adhere to the four-space rule should, in most cases, be parsed the same way by the new rules. Here are some examples of texts that will be parsed differently: - a - b will be parsed as a list item with a sublist; under the four-space rule, it would be a list with two items. - a code Here we have an indented code block under the list item, even though it is only indented six spaces from the margin, because it is four spaces past the point where a continuation paragraph could begin. With the four-space rule, this would be a regular paragraph rather than a code block. - a code Here the code block will start with two spaces, whereas under the four-space rule, it would start with `code`. With the four-space rule, indented code under a list item always must be indented eight spaces from the margin, while the new rules require only that it be indented four spaces from the beginning of the first non-space text after the list marker (here, `a`). This change was motivated by a slew of bug reports from people who expected lists to work differently (#3125, #2367, #2575, #2210, #1990, #1137, #744, #172, #137, #128) and by the growing prevalance of CommonMark (now used by GitHub, for example). Users who want to use the old rules can select the `four_space_rule` extension. * Added `four_space_rule` extension. * Added `Ext_four_space_rule` to `Extensions`. * `Parsing` now exports `gobbleAtMostSpaces`, and the type of `gobbleSpaces` has been changed so that a `ReaderOptions` parameter is not needed.
author: John MacFarlane <jgm@berkeley.edu> 2017-08-19 10:56:15 -0700
committer: John MacFarlane <jgm@berkeley.edu> 2017-08-19 15:45:01 -0700
commit: a31241a08bcd3d546528ef7eed4c126fff3cd3bd (patch)
tree: 9ef25b27aa4bb512b4d23b343c3b6b0353cdd8b9 /src/Text/Pandoc/Parsing.hs
parent: 5ab1162def4e6379c84e3363d917252155d9239a (diff)
1 files changed, 28 insertions, 8 deletions
diff --git a/src/Text/Pandoc/Parsing.hs b/src/Text/Pandoc/Parsing.hs
index 37a0b53b4..9ed18d4e0 100644
--- a/src/Text/Pandoc/Parsing.hs
+++ b/src/Text/Pandoc/Parsing.hs
@@ -50,6 +50,7 @@ module Text.Pandoc.Parsing ( takeWhileP,
                              blankline,
                              blanklines,
                              gobbleSpaces,
+                             gobbleAtMostSpaces,
                              enclosed,
                              stringAnyCase,
                              parseFromString,
@@ -380,14 +381,33 @@ blanklines = many1 blankline
 
 -- | Gobble n spaces; if tabs are encountered, expand them
 -- and gobble some or all of their spaces, leaving the rest.
-gobbleSpaces :: Monad m => ReaderOptions -> Int -> ParserT [Char] st m ()
-gobbleSpaces _    0 = return ()
-gobbleSpaces opts n = try $ do
-  char ' ' <|> do char '\t'
-                  inp <- getInput
-                  setInput $ replicate (readerTabStop opts - 1) ' ' ++ inp
-                  return ' '
-  gobbleSpaces opts (n - 1)
+gobbleSpaces :: (HasReaderOptions st, Monad m)
+             => Int -> ParserT [Char] st m ()
+gobbleSpaces 0 = return ()
+gobbleSpaces n
+  | n < 0     = error "gobbleSpaces called with negative number"
+  | otherwise = try $ do
+      char ' ' <|> eatOneSpaceOfTab
+      gobbleSpaces (n - 1)
+
+eatOneSpaceOfTab :: (HasReaderOptions st, Monad m) => ParserT [Char] st m Char
+eatOneSpaceOfTab = do
+  char '\t'
+  tabstop <- getOption readerTabStop
+  inp <- getInput
+  setInput $ replicate (tabstop - 1) ' ' ++ inp
+  return ' '
+
+-- | Gobble up to n spaces; if tabs are encountered, expand them
+-- and gobble some or all of their spaces, leaving the rest.
+gobbleAtMostSpaces :: (HasReaderOptions st, Monad m)
+                   => Int -> ParserT [Char] st m Int
+gobbleAtMostSpaces 0 = return 0
+gobbleAtMostSpaces n
+  | n < 0     = error "gobbleAtMostSpaces called with negative number"
+  | otherwise = option 0 $ do
+      char ' ' <|> eatOneSpaceOfTab
+      (+ 1) <$> gobbleAtMostSpaces (n - 1)
 
 -- | Parses material enclosed between start and end parsers.
 enclosed :: (Show end, Stream s  m Char) => ParserT s st m t   -- ^ start parser
author	John MacFarlane <jgm@berkeley.edu>	2017-08-19 10:56:15 -0700
committer	John MacFarlane <jgm@berkeley.edu>	2017-08-19 15:45:01 -0700
commit	a31241a08bcd3d546528ef7eed4c126fff3cd3bd (patch)
tree	9ef25b27aa4bb512b4d23b343c3b6b0353cdd8b9 /src/Text/Pandoc/Parsing.hs
parent	5ab1162def4e6379c84e3363d917252155d9239a (diff)