summaryrefslogtreecommitdiff
path: root/doc/using-the-pandoc-api.md
diff options
context:
space:
mode:
authorJohn MacFarlane <jgm@berkeley.edu>2017-10-24 22:25:45 -0700
committerJohn MacFarlane <jgm@berkeley.edu>2017-10-24 22:25:59 -0700
commit07bf48d07b58d490c14022a6714f37aa97ca53ab (patch)
treec839bdfd03c7570e50a1c041c3c167e637228b85 /doc/using-the-pandoc-api.md
parentf82bcc2bf344a884851a7e7f475986055df6c27a (diff)
More progress on using-the-pandoc-api.md.
Diffstat (limited to 'doc/using-the-pandoc-api.md')
-rw-r--r--doc/using-the-pandoc-api.md270
1 files changed, 238 insertions, 32 deletions
diff --git a/doc/using-the-pandoc-api.md b/doc/using-the-pandoc-api.md
index 75bf227f1..c00db5b10 100644
--- a/doc/using-the-pandoc-api.md
+++ b/doc/using-the-pandoc-api.md
@@ -9,77 +9,283 @@ Detailed API documentation at the level of individual functions
and types is available at
<https://hackage.haskell.org/package/pandoc>.
-# Pandoc's architecture, and a simple example
+# Pandoc's architecture
-Pandoc structure, readers, writers.
+Pandoc is structured as a set of *readers*, which translate
+various input formats into an abstract syntax tree (the
+Pandoc AST) representing a structured document, and a set of
+*writers*, which render this AST into various input formats.
+Pictorially:
-example of using a reader.
+```
+[input format] ==reader==> [Pandoc AST] ==writer==> [output format]
+```
+
+This architecture allows pandoc to perform $M \times n$
+conversions with $M$ readers and $N$ writers.
+
+The Pandoc AST is defined in the
+[pandoc-types](https://hackage.haskell.org/package/pandoc-types)
+package. You should start by looking at the Haddock
+documentation for
+[Text.Pandoc.Definition](https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html). As you'll see, a `Pandoc` is
+composed of some metadata and a list of `Block`s. There are
+various kinds of `Block`, including `Para` (paragraph),
+`Header` (section heading), and `BlockQuote`. Some of the
+`Block`s (like `BlockQuote`) contain lists of `Block`s,
+while others (like `Para`) contain lists of `Inline`s, and
+still others (like `CodeBlock`) contain plain text or
+nothing. `Inline`s are the basic elements of paragraphs.
+The distinction between `Block` and `Inline` in the type
+system makes it impossible to represent, for example,
+a link (`Inline`) whose link text is a block quote (`Block`).
+This expressive limitation is mostly a help rather than a
+hindrance, since many of the formats pandoc supports have
+similar limitations.
+
+The best way to explore the pandoc AST is to use `pandoc -t
+native`, which will display the AST correspoding to some
+Markdown input:
+
+```
+% echo -e "1. *foo*\n2. bar" | pandoc -t native
+[OrderedList (1,Decimal,Period)
+ [[Plain [Emph [Str "foo"]]]
+ ,[Plain [Str "bar"]]]]
+```
+
+# A simple example
+
+Here is a simple example of the use of a pandoc reader and
+writer to perform a conversion inside ghci:
+
+```
+import Text.Pandoc
+import qualified Data.Text as T
+import qualified Data.Text.IO as TIO
+
+main :: IO ()
+main = do
+ result <- runIO $ do
+ doc <- readMarkdown def (T.pack "[testing](url)")
+ writeRST def doc
+ rst <- handleError result
+ TIO.putStrLn rst
+```
-example of using a writer.
+Some notes:
-chaining them together.
+1. The first part constructs a conversion pipeline: the input
+ string is passed to `readMarkdown`, and the resulting Pandoc
+ AST (`doc`) is then rendered by `writeRST`. The conversion
+ pipeline is "run" by `runIO`---more on that below.
+
+2. `result` has the type `Either PandocError Text`. We could
+ pattern-match on this manually, but it's simpler in this
+ context to use the `handleError` function from
+ Text.Pandoc.Error. This exits with an appropriate error
+ code and message if the value is a `Left`, and returns the
+ `Text` if the value is a `Right`.
# The PandocMonad class
-Pandoc's functions define computations that can be run in
-any instance of the `PandocMonad` typeclass. Two instances
-are provided: `PandocIO` and `PandocPure`. The difference is
-that computations run in `PandocIO` are allowed to do IO
-(for example, read a file), while computations in `PandocPure`
-are free of any side effects. `PandocPure` is useful when
-you want to prevent users from doing anything malicious.
+Let's look at the types of `readMarkdown` and `writeRST`:
+
+```haskell
+readMarkdown :: PandocMonad m => ReaderOptions -> Text -> m Pandoc
+
+writeRST :: PandocMonad m => WriterOptions -> Pandoc -> m Text
+```
-Here's an example of such a computation, from the module
-`Text.Pandoc.Class`:
+The `PandocMonad m =>` part is a typeclass constraint.
+It says that `readMarkdown` and `writeRST` define computations
+that can be used in any instance of the `PandocMonad`
+type class. `PandocMonad` is defined in the module
+Text.Pandoc.Class.
+
+Two instances of `PandocMonad` are provided: `PandocIO` and
+`PandocPure`. The difference is that computations run in
+`PandocIO` are allowed to do IO (for example, read a file),
+while computations in `PandocPure` are free of any side effects.
+`PandocPure` is useful for sandboxed environments, when you want
+to prevent users from doing anything malicious. To run the
+conversion in `PandocIO`, use `runIO` (as above). To run it in
+`PandocPure`, use `runPure`.
+
+As you can see from the Haddocks,
+[Text.Pandoc.Class](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Class.html)
+exports many auxiliary functions that can be used in any
+instance of `PandocMonad`. For example:
```haskell
-- | Get the verbosity level.
getVerbosity :: PandocMonad m => m Verbosity
-```
-motivations
-Class.
+-- | Set the verbosity level.
+setVerbosity :: PandocMonad m => Verbosity -> m ()
-# The Pandoc structure
+-- Get the accomulated log messages (in temporal order).
+getLog :: PandocMonad m => m [LogMessage]
+getLog = reverse <$> getsCommonState stLog
-blocks/inlines
+-- | Log a message using 'logOutput'. Note that
+-- 'logOutput' is called only if the verbosity
+-- level exceeds the level of the message, but
+-- the message is added to the list of log messages
+-- that will be retrieved by 'getLog' regardless
+-- of its verbosity level.
+report :: PandocMonad m => LogMessage -> m ()
-# Readers and writers
+-- | Fetch an image or other item from the local filesystem or the net.
+-- Returns raw content and maybe mime type.
+fetchItem :: PandocMonad m
+ => String
+ -> m (B.ByteString, Maybe MimeType)
-getReader, getWriter
+setResourcePath :: PandocMonad m => [FilePath] -> m ()
+```
+
+If we wanted more verbose informational messages
+during the conversion we defined in the previous
+section, we could do this:
+
+```haskell
+ result <- runIO $ do
+ setVerbosity INFO
+ doc <- readMarkdown def (T.pack "[testing](url)")
+ writeRST def doc
+```
# Options
-various reader and writer options you can set.
-templates
-extensions
+The first argument of each reader or writer is for
+options controlling the behavior of the reader or writer:
+`ReaderOptions` for readers and `WriterOptions`
+for writers. These are defined in
+[Text.Pandoc.Options](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Options.html). It is a good idea to study these
+options to see what can be adjusted.
+
+`def` (from Data.Default) denotes a default value for
+each kind of option. (You can also use `defaultWriterOptions`
+and `defaultReaderOptions`.) Generally you'll want to use
+the defaults and modify them only when needed, for example:
+
+```haskell
+ writeRST def{ writerReferenceLinks = True }
+```
+
+Some particularly important options to know about:
+
+1. `writerTemplate`: By default, this is `Nothing`, which
+ means that a document fragment will be produced. If you
+ want a full document, you need to specify `Just template`,
+ where `template` is a String containing the template's
+ contents (not the path).
+
+2. `readerExtensions` and `writerExtensions`: These specify
+ the extensions to be used in parsing and rendering.
+ Extensions are defined in [Text.Pandoc.Extensions](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Extensions.html).
# Builder
+
Inlines vs Inline, etc.
-Monoid
-example: report from CSV data
+Concatenating lists is slow. So we use special types Inlines and Blocks that wrap Sequences of Inline and Block elements.
+
+Monoid - makes it easy to build up docs programatically.
+
+Example.
+Here’s a JSON data source about CNG fueling stations in the
+Chicago area: cng_fuel_chicago.json. Boss says: write me a
+letter in Word listing all the stations that take the Voyager
+card.
+
+```
+[ {
+ "state" : "IL",
+ "city" : "Chicago",
+ "fuel_type_code" : "CNG",
+ "zip" : "60607",
+ "station_name" : "Clean Energy - Yellow Cab",
+ "cards_accepted" : "A D M V Voyager Wright_Exp CleanEnergy",
+ "street_address" : "540 W Grenshaw"
+}, ...
+```
+
+No need to open Word for this job! fuel.hs
+
+```
+{-# LANGUAGE OverloadedStrings #-}
+import Text.Pandoc.Builder
+import Text.Pandoc
+import Data.Monoid ((<>), mempty, mconcat)
+import Data.Aeson
+import Control.Applicative
+import Control.Monad (mzero)
+import qualified Data.ByteString.Lazy as BL
+import qualified Data.Text as T
+import Data.List (intersperse)
+
+data Station = Station{
+ address :: String
+ , name :: String
+ , cardsAccepted :: [String]
+ } deriving Show
+
+instance FromJSON Station where
+ parseJSON (Object v) = Station <$>
+ v .: "street_address" <*>
+ v .: "station_name" <*>
+ (words <$> (v .:? "cards_accepted" .!= ""))
+ parseJSON _ = mzero
+
+createLetter :: [Station] -> Pandoc
+createLetter stations = doc $
+ para "Dear Boss:" <>
+ para "Here are the CNG stations that accept Voyager cards:" <>
+ simpleTable [plain "Station", plain "Address", plain "Cards accepted"]
+ (map stationToRow stations) <>
+ para "Your loyal servant," <>
+ plain (image "JohnHancock.png" "" mempty)
+ where
+ stationToRow station =
+ [ plain (text $ name station)
+ , plain (text $ address station)
+ , plain (mconcat $ intersperse linebreak $ map text $ cardsAccepted station)
+ ]
+
+main :: IO ()
+main = do
+ json <- BL.readFile "cng_fuel_chicago.json"
+ let letter = case decode json of
+ Just stations -> createLetter [s | s <- stations,
+ "Voyager" `elem` cardsAccepted s]
+ Nothing -> error "Could not decode JSON"
+ BL.writeFile "letter.docx" =<< writeDocx def letter
+ putStrLn "Created letter.docx"
+```
+
+
# Templates and other data files
+readDataFile
+
# Handling errors and warnings
# Generic transformations
Walk and syb for AST transformations
-# Filters
-
-just the basic idea of toJSONFilter
-the rest can be left to filters.md
+Filters: see filters.md
-# Self-contained
+but, how do you run filters from a program?
+need to export these functions from Text.Pandoc.App!
# PDF
-# Custom PandocMonad instances
# Creating a front-end