summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/lua-filters.md221
1 files changed, 150 insertions, 71 deletions
diff --git a/doc/lua-filters.md b/doc/lua-filters.md
index 320e983ad..5d0bfaf1e 100644
--- a/doc/lua-filters.md
+++ b/doc/lua-filters.md
@@ -1,86 +1,156 @@
-Lua Filters
-===========
+% Pandoc Lua Filters
+% Albert Krewinkel, John MacFarlane
+% August 21, 2017
+
+# Introduction
+
+Pandoc has long supported filters, which allow the pandoc
+abstract syntax tree (AST) to be manipulated between the parsing
+and the writing phase. Traditional pandoc filters accept a JSON
+representation of the pandoc AST and produce an altered JSON
+representation of the AST. They may be written in any
+programming language, and invoked from pandoc using the
+`--filter` option.
+
+Although traditional filters are very flexible, they have a
+couple of disadvantages. First, there is some overhead in
+writing JSON to stdout and reading it from stdin (twice,
+once on each side of the filter). Second, whether a filter
+will work will depend on details of the user's environment.
+A filter may require an interpreter for a certain programming
+language to be available, as well as a library for manipulating
+the pandoc AST in JSON form. One cannot simply provide a filter
+that can be used by anyone who has a certain version of the
+pandoc executable.
+
+Starting with pandoc 2.0, we have made it possible to write
+filters in lua without any external dependencies at all.
+A lua interpreter and a lua library for creating pandoc filters
+is built into the pandoc executable. Pandoc data types
+are marshalled to lua directly, avoiding the overhead of writing
+JSON to stdout and reading it from stdin.
+
+Here is an example of a lua filter that converts strong emphasis
+to small caps:
-Pandoc expects lua files to return a list of filters. The filters in that list
-are called sequentially, each on the result of the previous filter. If there is
-no value returned by the filter script, then pandoc will try to generate a
-filter by collecting all top-level functions whose names correspond to those of
-pandoc elements (e.g., `Str`, `Para`, `Meta`, or `Pandoc`).
+``` lua
+return {
+ {
+ Strong = function (elem)
+ return pandoc.SmallCaps(elem.c)
+ end,
+ }
+}
+```
-Filters are expected to be put into separate files and are passed via the
-`--lua-filter` command-line argument. E.g., if a filter is defined in a file
-`current-date.lua`, then it would be applied like this:
+or equivalently,
- pandoc --lua-filter=current-date.lua -f markdown MANUAL.txt
+``` lua
+function Strong(elem)
+ return pandoc.SmallCaps(elem.c)
+end
+```
-The `--lua-filter` can be supplied multiple times, causing the filters to be
-applied sequentially in the order they were given. If other, non-Lua filters are
-given as well (via `--filter`), then those are executed *after* all Lua filters
-have been applied.
+This says: walk the AST, and when you find a Strong element,
+replace it with a SmallCaps element with the same content.
-Lua Filter Structure
---------------------
+To run it, save it in a file, say `smallcaps.lua`, and invoke
+pandoc with `--lua-filter=smallcaps.lua`.
-Lua filters are tables with element names as keys and values consisting
-of functions acting on those elements.
+Here's a quick performance comparison, using a version of the
+pandoc manual, MANUAL.txt, and versions of the same filter
+written in compiled Haskell (`smallcaps`) and interpreted Python
+(`smallcaps.py`):
-Filter Application
-------------------
+| Command | Time |
+|--------------------------------------------------|------:|
+| `pandoc MANUAL.txt` | 1.01s |
+| `pandoc MANUAL.txt --filter ./smallcaps` | 1.36s |
+| `pandoc MANUAL.txt --filter ./smallcaps.py` | 1.40s |
+| `pandoc MANUAL.txt --lua-filter ./smallcaps.lua` | 1.03s |
-For each filter, the document is traversed and each element subjected to
-the filter. Elements for which the filter contains an entry (i.e. a
-function of the same name) are passed to lua element filtering function.
-In other words, filter entries will be called for each corresponding
-element in the document, getting the respective element as input.
+As you can see, the lua filter avoids the substantial overhead
+associated with marshalling to and from JSON over a pipe.
-The element function's output must be an element of the same type as the
-input. This means a filter function acting on an inline element must
-return an inline, and a block element must remain a block element after
-filter application. Pandoc will throw an error if this condition is
-violated.
+# Lua filter structure
+
+Lua filters are tables with element names as keys and values
+consisting of functions acting on those elements.
+
+Filters are expected to be put into separate files and are
+passed via the `--lua-filter` command-line argument. For
+example, if a filter is defined in a file `current-date.lua`,
+then it would be applied like this:
+
+ pandoc --lua-filter=current-date.lua -f markdown MANUAL.txt
+
+The `--lua-filter` can be supplied multiple times, causing the
+filters to be applied sequentially in the order they were given.
+If other, non-Lua filters are given as well (via `--filter`),
+then those are executed *after* all Lua filters have been
+applied.
+
+Pandoc expects each lua file to return a list of filters. The
+filters in that list are called sequentially, each on the result
+of the previous filter. If there is no value returned by the
+filter script, then pandoc will try to generate a single filter
+by collecting all top-level functions whose names correspond to
+those of pandoc elements (e.g., `Str`, `Para`, `Meta`, or
+`Pandoc`). (That is why the two examples above are equivalent.)
+
+For each filter, the document is traversed and each element
+subjected to the filter. Elements for which the filter contains
+an entry (i.e. a function of the same name) are passed to lua
+element filtering function. In other words, filter entries will
+be called for each corresponding element in the document,
+getting the respective element as input.
+
+The element function's output must be an element of the same
+type as the input. This means a filter function acting on an
+inline element must return an inline, and a block element must
+remain a block element after filter application. Pandoc will
+throw an error if this condition is violated.
Elements without matching functions are left untouched.
See [module documentation](pandoc-module.html) for a list of pandoc
elements.
+# Pandoc Module
-Pandoc Module
-=============
+The `pandoc` lua module is loaded into the filter's lua
+environment and provides a set of functions and constants to
+make creation and manipulation of elements easier. The global
+variable `pandoc` is bound to the module and should generally
+not be overwritten for this reason.
-The `pandoc` lua module is loaded into the filter's lua environment and
-provides a set of functions and constants to make creation and
-manipulation of elements easier. The global variable `pandoc` is bound
-to the module and should generally not be overwritten for this reason.
+Two major functionalities are provided by the module: element
+creator functions and access to some of pandoc's main
+functionalities.
-Two major functionalities are provided by the module: element creator
-functions and access to some of pandoc's main functionalities.
+## Element creation
-Element creation
-----------------
+Element creator functions like `Str`, `Para`, and `Pandoc` are
+designed to allow easy creation of new elements that are simple
+to use and can be read back from the lua environment.
+Internally, pandoc uses these functions to create the lua
+objects which are passed to element filter functions. This means
+that elements created via this module will behave exactly as
+those elements accessible through the filter function parameter.
-Element creator functions like `Str`, `Para`, and `Pandoc` are designed to
-allow easy creation of new elements that are simple to use and can be
-read back from the lua environment. Internally, pandoc uses these
-functions to create the lua objects which are passed to element filter
-functions. This means that elements created via this module will behave
-exactly as those elements accessible through the filter function parameter.
+## Exposed pandoc functionality
-Exposed pandoc functionality
-----------------------------
+Some filters will require access to certain functions provided
+by pandoc. This is currently limited to the `read` function
+which allows to parse strings into pandoc documents from within
+the lua filter.
-Some filters will require access to certain functions provided by
-pandoc. This is currently limited to the `read` function which allows to
-parse strings into pandoc documents from within the lua filter.
+# Examples
+## Macro substitution.
-Examples
---------
-
-### Macro substitution.
-
-The following filter converts strings containing `{{helloworld}}` with
-emphasized text.
+The following filter converts the string `{{helloworld}}` into
+emphasized text "Hello, World".
``` lua
return {
@@ -96,9 +166,11 @@ return {
}
```
-### Default metadata file
+## Default metadata file
-Using the metadata from an external file as default values.
+This filter causes metadata defined in an external file
+(`metadata-file.yaml`) to be used as default values in
+a document's metadata:
``` lua
-- read metadata file into string
@@ -122,7 +194,10 @@ return {
}
```
-### Setting the date in the metadata
+## Setting the date in the metadata
+
+This filter sets the date in the document's metadata to the
+current date:
```lua
function Meta(m)
@@ -131,7 +206,7 @@ function Meta(m)
end
```
-### Extracting information about links
+## Extracting information about links
This filter prints a table of all the URLs linked to
in the document, together with the number of links to
@@ -168,11 +243,15 @@ function Doc (blocks, meta)
end
```
-### Replacing placeholders with their metadata value
+## Replacing placeholders with their metadata value
+
+Lua filter functions are run in the order
+
+> *Inlines → Blocks → Meta → Pandoc*.
-Lua filter functions are run in the order *Inlines → Blocks → Meta → Pandoc*.
-Passing information from a higher level (e.g., metadata) to a lower level (e.g.,
-inlines) is still possible by using two filters living in the same file:
+Passing information from a higher level (e.g., metadata) to a
+lower level (e.g., inlines) is still possible by using two
+filters living in the same file:
``` lua
local vars = {}
@@ -200,8 +279,8 @@ If the contents of file `occupations.md` is
``` markdown
---
-name: John MacFarlane
-occupation: Professor of Philosophy
+name: Samuel Q. Smith
+occupation: Professor of Phrenology
---
Name
@@ -218,10 +297,10 @@ then running `pandoc --lua-filter=meta-vars.lua occupations.md` will output:
``` html
<dl>
<dt>Name</dt>
-<dd><p><span>John MacFarlane</span></p>
+<dd><p><span>Samuel Q. Smith</span></p>
</dd>
<dt>Occupation</dt>
-<dd><p><span>Professor of Philosophy</span></p>
+<dd><p><span>Professor of Phrenology</span></p>
</dd>
</dl>
```