More progress on using-the-pandoc-api.md.

author: John MacFarlane <jgm@berkeley.edu> 2017-10-24 22:25:45 -0700
committer: John MacFarlane <jgm@berkeley.edu> 2017-10-24 22:25:59 -0700
commit: 07bf48d07b58d490c14022a6714f37aa97ca53ab (patch)
tree: c839bdfd03c7570e50a1c041c3c167e637228b85 /doc/using-the-pandoc-api.md
parent: f82bcc2bf344a884851a7e7f475986055df6c27a (diff)
1 files changed, 238 insertions, 32 deletions
diff --git a/doc/using-the-pandoc-api.md b/doc/using-the-pandoc-api.md
index 75bf227f1..c00db5b10 100644
--- a/doc/using-the-pandoc-api.md
+++ b/doc/using-the-pandoc-api.md
@@ -9,77 +9,283 @@ Detailed API documentation at the level of individual functions
 and types is available at
 <https://hackage.haskell.org/package/pandoc>.
 
-# Pandoc's architecture, and a simple example
+# Pandoc's architecture
 
-Pandoc structure, readers, writers.
+Pandoc is structured as a set of *readers*, which translate
+various input formats into an abstract syntax tree (the
+Pandoc AST) representing a structured document, and a set of
+*writers*, which render this AST into various input formats.
+Pictorially:
 
-example of using a reader.
+```
+[input format] ==reader==> [Pandoc AST] ==writer==> [output format]
+```
+
+This architecture allows pandoc to perform $M \times n$
+conversions with $M$ readers and $N$ writers.
+
+The Pandoc AST is defined in the
+[pandoc-types](https://hackage.haskell.org/package/pandoc-types)
+package.  You should start by looking at the Haddock
+documentation for
+[Text.Pandoc.Definition](https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html).  As you'll see, a `Pandoc` is
+composed of some metadata and a list of `Block`s.  There are
+various kinds of `Block`, including `Para` (paragraph),
+`Header` (section heading), and `BlockQuote`.  Some of the
+`Block`s (like `BlockQuote`) contain lists of `Block`s,
+while others (like `Para`) contain lists of `Inline`s, and
+still others (like `CodeBlock`) contain plain text or
+nothing.  `Inline`s are the basic elements of paragraphs.
+The distinction between `Block` and `Inline` in the type
+system makes it impossible to represent, for example,
+a link (`Inline`) whose link text is a block quote (`Block`).
+This expressive limitation is mostly a help rather than a
+hindrance, since many of the formats pandoc supports have
+similar limitations.
+
+The best way to explore the pandoc AST is to use `pandoc -t
+native`, which will display the AST correspoding to some
+Markdown input:
+
+```
+% echo -e "1. *foo*\n2. bar" | pandoc -t native
+[OrderedList (1,Decimal,Period)
+ [[Plain [Emph [Str "foo"]]]
+ ,[Plain [Str "bar"]]]]
+```
+
+# A simple example
+
+Here is a simple example of the use of a pandoc reader and
+writer to perform a conversion inside ghci:
+
+```
+import Text.Pandoc
+import qualified Data.Text as T
+import qualified Data.Text.IO as TIO
+
+main :: IO ()
+main = do
+  result <- runIO $ do
+    doc <- readMarkdown def (T.pack "[testing](url)")
+    writeRST def doc
+  rst <- handleError result
+  TIO.putStrLn rst
+```
 
-example of using a writer.
+Some notes:
 
-chaining them together.
+1. The first part constructs a conversion pipeline: the input
+   string is passed to `readMarkdown`, and the resulting Pandoc
+   AST (`doc`) is then rendered by `writeRST`.  The conversion
+   pipeline is "run" by `runIO`---more on that below.
+
+2. `result` has the type `Either PandocError Text`.  We could
+   pattern-match on this manually, but it's simpler in this
+   context to use the `handleError` function from
+   Text.Pandoc.Error.  This exits with an appropriate error
+   code and message if the value is a `Left`, and returns the
+   `Text` if the value is a `Right`.
 
 # The PandocMonad class
 
-Pandoc's functions define computations that can be run in
-any instance of the `PandocMonad` typeclass.  Two instances
-are provided: `PandocIO` and `PandocPure`. The difference is
-that computations run in `PandocIO` are allowed to do IO
-(for example, read a file), while computations in `PandocPure`
-are free of any side effects.  `PandocPure` is useful when
-you want to prevent users from doing anything malicious.
+Let's look at the types of `readMarkdown` and `writeRST`:
+
+```haskell
+readMarkdown :: PandocMonad m => ReaderOptions -> Text -> m Pandoc
+
+writeRST :: PandocMonad m => WriterOptions -> Pandoc -> m Text
+```
 
-Here's an example of such a computation, from the module
-`Text.Pandoc.Class`:
+The `PandocMonad m =>` part is a typeclass constraint.
+It says that `readMarkdown` and `writeRST` define computations
+that can be used in any instance of the `PandocMonad`
+type class.  `PandocMonad` is defined in the module
+Text.Pandoc.Class.
+
+Two instances of `PandocMonad` are provided: `PandocIO` and
+`PandocPure`. The difference is that computations run in
+`PandocIO` are allowed to do IO (for example, read a file),
+while computations in `PandocPure` are free of any side effects.
+`PandocPure` is useful for sandboxed environments, when you want
+to prevent users from doing anything malicious.  To run the
+conversion in `PandocIO`, use `runIO` (as above).  To run it in
+`PandocPure`, use `runPure`.
+
+As you can see from the Haddocks,
+[Text.Pandoc.Class](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Class.html)
+exports many auxiliary functions that can be used in any
+instance of `PandocMonad`.  For example:
 
 ```haskell
 -- | Get the verbosity level.
 getVerbosity :: PandocMonad m => m Verbosity
-```
 
-motivations
-Class.
+-- | Set the verbosity level.
+setVerbosity :: PandocMonad m => Verbosity -> m ()
 
-# The Pandoc structure
+-- Get the accomulated log messages (in temporal order).
+getLog :: PandocMonad m => m [LogMessage]
+getLog = reverse <$> getsCommonState stLog
 
-blocks/inlines
+-- | Log a message using 'logOutput'.  Note that
+-- 'logOutput' is called only if the verbosity
+-- level exceeds the level of the message, but
+-- the message is added to the list of log messages
+-- that will be retrieved by 'getLog' regardless
+-- of its verbosity level.
+report :: PandocMonad m => LogMessage -> m ()
 
-# Readers and writers
+-- | Fetch an image or other item from the local filesystem or the net.
+-- Returns raw content and maybe mime type.
+fetchItem :: PandocMonad m
+          => String
+          -> m (B.ByteString, Maybe MimeType)
 
-getReader, getWriter
+setResourcePath :: PandocMonad m => [FilePath] -> m ()
+```
+
+If we wanted more verbose informational messages
+during the conversion we defined in the previous
+section, we could do this:
+
+```haskell
+  result <- runIO $ do
+    setVerbosity INFO
+    doc <- readMarkdown def (T.pack "[testing](url)")
+    writeRST def doc
+```
 
 # Options
 
-various reader and writer options you can set.
-templates
-extensions
+The first argument of each reader or writer is for
+options controlling the behavior of the reader or writer:
+`ReaderOptions` for readers and `WriterOptions`
+for writers.  These are defined in
+[Text.Pandoc.Options](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Options.html).  It is a good idea to study these
+options to see what can be adjusted.
+
+`def` (from Data.Default) denotes a default value for
+each kind of option.  (You can also use `defaultWriterOptions`
+and `defaultReaderOptions`.)  Generally you'll want to use
+the defaults and modify them only when needed, for example:
+
+```haskell
+    writeRST def{ writerReferenceLinks = True }
+```
+
+Some particularly important options to know about:
+
+1.  `writerTemplate`:  By default, this is `Nothing`, which
+    means that a document fragment will be produced. If you
+    want a full document, you need to specify `Just template`,
+    where `template` is a String containing the template's
+    contents (not the path).
+
+2.  `readerExtensions` and `writerExtensions`:  These specify
+    the extensions to be used in parsing and rendering.
+    Extensions are defined in [Text.Pandoc.Extensions](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Extensions.html).
 
 # Builder
 
+
 Inlines vs Inline, etc.
-Monoid
 
-example: report from CSV data
+Concatenating lists is slow. So we use special types Inlines and Blocks that wrap Sequences of Inline and Block elements.
+
+Monoid - makes it easy to build up docs programatically.
+
+Example.
+Here’s a JSON data source about CNG fueling stations in the
+Chicago area: cng_fuel_chicago.json. Boss says: write me a
+letter in Word listing all the stations that take the Voyager
+card.
+
+```
+[ {
+  "state" : "IL",
+  "city" : "Chicago",
+  "fuel_type_code" : "CNG",
+  "zip" : "60607",
+  "station_name" : "Clean Energy - Yellow Cab",
+  "cards_accepted" : "A D M V Voyager Wright_Exp CleanEnergy",
+  "street_address" : "540 W Grenshaw"
+}, ...
+```
+
+No need to open Word for this job! fuel.hs
+
+```
+{-# LANGUAGE OverloadedStrings #-}
+import Text.Pandoc.Builder
+import Text.Pandoc
+import Data.Monoid ((<>), mempty, mconcat)
+import Data.Aeson
+import Control.Applicative
+import Control.Monad (mzero)
+import qualified Data.ByteString.Lazy as BL
+import qualified Data.Text as T
+import Data.List (intersperse)
+
+data Station = Station{
+    address        :: String
+  , name           :: String
+  , cardsAccepted  :: [String]
+  } deriving Show
+
+instance FromJSON Station where
+    parseJSON (Object v) = Station <$>
+       v .: "street_address" <*>
+       v .: "station_name" <*>
+       (words <$> (v .:? "cards_accepted" .!= ""))
+    parseJSON _          = mzero
+
+createLetter :: [Station] -> Pandoc
+createLetter stations = doc $
+    para "Dear Boss:" <>
+    para "Here are the CNG stations that accept Voyager cards:" <>
+    simpleTable [plain "Station", plain "Address", plain "Cards accepted"]
+           (map stationToRow stations) <>
+    para "Your loyal servant," <>
+    plain (image "JohnHancock.png" "" mempty)
+  where
+    stationToRow station =
+      [ plain (text $ name station)
+      , plain (text $ address station)
+      , plain (mconcat $ intersperse linebreak $ map text $ cardsAccepted station)
+      ]
+
+main :: IO ()
+main = do
+  json <- BL.readFile "cng_fuel_chicago.json"
+  let letter = case decode json of
+                    Just stations -> createLetter [s | s <- stations,
+                                        "Voyager" `elem` cardsAccepted s]
+                    Nothing       -> error "Could not decode JSON"
+  BL.writeFile "letter.docx" =<< writeDocx def letter
+  putStrLn "Created letter.docx"
+```
+
+
 
 # Templates and other data files
 
+readDataFile
+
 # Handling errors and warnings
 
 # Generic transformations
 
 Walk and syb for AST transformations
 
-# Filters
-
-just the basic idea of toJSONFilter
-the rest can be left to filters.md
+Filters: see filters.md
 
-# Self-contained
+but, how do you run filters from a program?
+need to export these functions from Text.Pandoc.App!
 
 
 # PDF
 
-# Custom PandocMonad instances
 
 # Creating a front-end
author	John MacFarlane <jgm@berkeley.edu>	2017-10-24 22:25:45 -0700
committer	John MacFarlane <jgm@berkeley.edu>	2017-10-24 22:25:59 -0700
commit	07bf48d07b58d490c14022a6714f37aa97ca53ab (patch)
tree	c839bdfd03c7570e50a1c041c3c167e637228b85 /doc/using-the-pandoc-api.md
parent	f82bcc2bf344a884851a7e7f475986055df6c27a (diff)