summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authorfiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>2006-10-17 14:22:29 +0000
committerfiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>2006-10-17 14:22:29 +0000
commitdf7b68225101966051f8b592a27127bf789eb81e (patch)
treea063e97ed58d0bdb2cbb5a95c3e8c1bcce54aa00 /README
parente7dbfef4d8aa528d9245424e9c372e900a774c90 (diff)
initial import
git-svn-id: https://pandoc.googlecode.com/svn/trunk@2 788f1e2b-df1e-0410-8736-df70ead52e1b
Diffstat (limited to 'README')
-rw-r--r--README508
1 files changed, 508 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 000000000..e387e5f3c
--- /dev/null
+++ b/README
@@ -0,0 +1,508 @@
+% pandoc
+% John MacFarlane
+% August 10, 2006
+
+`pandoc` converts files from one markup format to another. It can
+read [markdown] and (with some limitations) [reStructuredText], [HTML], and
+[LaTeX], and it can write [markdown], [reStructuredText], [HTML],
+[LaTeX], [RTF], and [S5] HTML slide shows. It is written in
+[Haskell], using the excellent [Parsec] parser combinator library.
+
+[markdown]: http://daringfireball.net/projects/markdown/
+[reStructuredText]: http://docutils.sourceforge.net/docs/ref/rst/introduction.html
+[S5]: http://meyerweb.com/eric/tools/s5/
+[HTML]: http://www.w3.org/TR/html40/
+[LaTeX]: http://www.latex-project.org/
+[RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format
+[Haskell]: http://www.haskell.org/
+[Parsec]: http://www.cs.uu.nl/~daan/download/parsec/parsec.html
+
+(c) 2006 John MacFarlane (jgm At berkeley.edu). Released under the
+[GPL], version 2 or greater. This software carries no warranty of
+any kind. (See LICENSE for full copyright and warranty notices.)
+
+[GPL]: http://www.gnu.org/copyleft/gpl.html
+
+# Installation
+
+## Installing GHC
+
+To compile `pandoc`, you'll need [GHC] version 6.4 or greater.
+
+If you don't have GHC already, you can get it from the
+[GHC Download] page.
+
+[GHC]: http://www.haskell.org/ghc/
+[GHC Download]: http://www.haskell.org/ghc/download.html
+
+Note: As of this writing, there's no MacOS X installer package for
+GHC 6.4.2 (the latest version). There is an installer for
+GHC 6.4.1 [here](http://www.haskell.org/ghc/download_ghc_641.html#macosx).
+It will work just fine on PPC-based Macs. GHC has not yet been ported
+to Intel Macs: see <http://hackage.haskell.org/trac/ghc/wiki/X86OSXGhc>.
+
+You'll also need standard build tools: GNU Make, sed, bash, and perl.
+These are standard on unix systems (including MacOS X). If you're
+using Windows, you can install [Cygwin].
+
+[Cygwin]: http://www.cygwin.com/
+
+Note: I have tested `pandoc` on MacOS X and Linux systems. I have not
+tried it on Windows, and I have no idea whether it will work on Windows.
+
+## Installing `pandoc`
+
+1. Change to the directory containing the `pandoc` distribution.
+
+2. Compile:
+
+ make
+
+3. Optional, but recommended:
+
+ make test
+
+4. If you want to install the `pandoc` program and the relevant wrappers
+ and documents (including this file) into `/usr/local` directory, type:
+
+ make install
+
+ If you only want the `pandoc` program and the shell scripts `latex2markdown`,
+ `markdown2latex`, `markdown2pdf`, `markdown2html`, `html2markdown` installed
+ into your `~/bin` directory, type (note the **`-exec`** suffix):
+
+ PREFIX=~ make install-exec
+
+5. If you want to install the Pandoc library modules for use in
+ other Haskell programs, type (as root):
+
+ make install-lib
+
+6. To install the library documentation (into `/usr/local/pandoc-doc`),
+ type:
+
+ make install-lib-doc
+
+# Using `pandoc`
+
+You can run `pandoc` like this:
+
+ ./pandoc
+
+If you copy the `pandoc` executable to a directory in your path
+(perhaps using `make install`), you can invoke it without the "./":
+
+ pandoc
+
+If you run `pandoc` without arguments, it will accept input from
+STDIN. If you run it with file names as arguments, it will take input
+from those files. It accepts several command-line options. For a
+list, type
+
+ pandoc -h
+
+The most important options specify the format of the source file and
+the output. The default reader is markdown; the default writer is
+HTML. So if you don't specify a reader or writer, `pandoc` will
+convert markdown to HTML. To convert markdown to LaTeX, you could
+write:
+
+ pandoc -w latex input.txt
+
+To convert html to markdown:
+
+ pandoc -r html -w markdown input.txt
+
+Supported writers include markdown, LaTeX, HTML, RTF,
+reStructuredText, and S5 (which produces an HTML file that acts like
+powerpoint). Supported readers include markdown, HTML, LaTeX, and
+reStructuredText. Note that the rst (reStructuredText) reader only
+parses a subset of rst syntax. For example, it doesn't handle tables,
+definition lists, option lists, or footnotes. It handles only the
+constructs expressible in unextended markdown. But for simple
+documents it should be adequate. The LaTeX and HTML readers are also
+limited in what they can do.
+
+`pandoc` writes its output to STDOUT. If you want to write to a file,
+use redirection:
+
+ pandoc input.txt > output.html
+
+Note that you can specify multiple input files on the command line.
+`pandoc` will concatenate them all (with blank lines between them)
+before parsing:
+
+ pandoc -s chapter1.txt chapter2.txt chapter3.txt references.txt > book.html
+
+## Character encoding
+
+Unfortunately, due to limitations in GHC, `pandoc` does not
+automatically detect the system's local character encoding. Hence,
+all input and output is assumed to be in the UTF-8 encoding. If you
+use accented or foreign characters, you should convert the input file
+to UTF-8 before processing it with `pandoc`. This can be done by
+piping the input through [`iconv`]: for example,
+
+ iconv -t utf-8 source.txt | pandoc > output.html
+
+will convert `source.txt` from the local encoding to UTF-8, then
+convert it to HTML, putting the output in `output.html`.
+
+[`iconv`]: http://www.gnu.org/software/libiconv/
+
+The shell scripts (described below) automatically convert the source
+from the local encoding to UTF-8 before running them through `pandoc`.
+
+## The shell scripts
+
+For convenience, five shell scripts have been included that make it
+easy to run `pandoc` without remembering all the command-line options.
+All of the scripts presuppose that `pandoc` is in the path, and
+`html2markdown` also presupposes that `curl` and `tidy` are in the
+path.
+
+1. `markdown2html` converts markdown to HTML, running `iconv` first to
+ convert the file to UTF-8. (This can be used as a replacement for
+ `Markdown.pl`.)
+
+2. `html2markdown` can take either a filename or a URL as argument. If
+ it is given a URL, it uses `curl` to fetch the contents of the
+ specified URL, then filters this through `tidy` to straighten up the
+ HTML and convert to UTF-8, and finally passes this HTML to `pandoc` to
+ produce markdown text:
+
+ html2markdown http://www.fsf.org
+
+ html2markdown www.fsf.org
+
+ html2markdown subdir/mylocalfile.html
+
+3. `latex2markdown` converts a LaTeX file to markdown.
+
+ latex2markdown mytexfile.tex
+
+4. `markdown2latex` converts markdown to LaTeX:
+
+ markdown2latex mytextfile.txt
+
+5. `markdown2pdf` converts markdown to PDF, using LaTeX, but removing
+ all the intermediate files created by LaTeX. Example:
+
+ markdown2pdf mytextfile.txt
+
+ creates a file `mytextfile.pdf` in the working directory.
+
+# Command-line options
+
+Various command-line options can be used to customize the output.
+For a complete list, type
+
+ pandoc --help
+
+`-p` or `--preserve-tabs` causes tabs in the source text to be
+preserved, rather than converted to spaces (the default).
+
+`--tabstop` allows the user to set the tab stop (which defaults to 4).
+
+`-R` or `--parse-raw` causes the HTML and LaTeX readers to parse HTML
+codes and LaTeX environments that it can't translate as raw HTML or
+LaTeX. Raw HTML can be printed in markdown, reStructuredText, HTML,
+and S5 output; raw LaTeX can be printed in markdown, reStructuredText,
+and LaTeX output. The default is for the readers to omit
+untranslatable HTML codes and LaTeX environments. (The LaTeX reader
+does pass through untranslatable LaTeX commands, even if `-R` is not
+specified.)
+
+`-s` or `--standalone` causes `pandoc` to produce a standalone file,
+complete with appropriate document headers. By default, `pandoc`
+produces a fragment.
+
+`--custom-header` can be used to specify a custom document header. To
+see the headers used by default, use the `-D` option: for example,
+`pandoc -D html` prints the default HTML header.
+
+`-c` or `--css` allows the user to specify a custom stylesheet that
+will be linked to in HTML and S5 output.
+
+`-H` or `--include-in-header` specifies a file to be included
+(verbatim) at the end of the document header. This can be used, for
+example, to include special CSS or javascript in HTML documents.
+
+`-B` or `--include-before-body` specifies a file to be included
+(verbatim) at the beginning of the document body (after the `<body>`
+tag in HTML, or the `\begin{document}` command in LaTeX). This can be
+used to include navigation bars or banners in HTML documents.
+
+`-A` or `--include-after-body` specifies a file to be included
+(verbatim) at the end of the docment body (before the `</body>` tag in
+HTML, or the `\end{document}` command in LaTeX).
+
+`-T` or `--title-prefix` specifies a string to be included as a prefix
+at the beginning of the title that appears in the HTML header (but not
+in the title as it appears at the beginning of the HTML body). (See
+below on Titles.)
+
+`-S` or `--smartypants` causes `pandoc` to produce typographically
+correct HTML output, along the lines of John Gruber's [Smartypants].
+Straight quotes are converted to curly quotes, `---` to dashes, and
+`...` to ellipses.
+
+[Smartypants]: http://daringfireball.net/projects/smartypants/
+
+`-m` or `--asciimathml` will cause LaTeX formulas (between $ signs) in
+HTML or S5 to display as formulas rather than as code. The trick will
+not work in all browsers, but it works in Firefox. Peter Jipsen's
+[ASCIIMathML] script is used to do the magic.
+
+[ASCIIMathML]: http://www1.chapman.edu/~jipsen/mathml/asciimath.html
+
+`-i` or `--incremental` causes all lists in S5 output to be displayed
+incrementally by default (one item at a time). The normal default
+is for lists to be displayed all at once.
+
+`-N` or `--number-sections` causes sections to be numbered in LaTeX
+output. By default, sections are not numbered.
+
+# `pandoc`'s markdown vs. standard markdown
+
+In parsing markdown, `pandoc` departs from and extends [standard markdown]
+in a few respects. (To run `pandoc` on the official
+markdown test suite, type `make markdown_tests`.)
+
+[standard markdown]: http://daringfireball.net/projects/markdown/syntax
+
+## Lists
+
+`pandoc` behaves differently from standard markdown on some "edge
+cases" involving lists. Consider this source:
+
+ 1. First
+ 2. Second:
+ - Fee
+ - Fie
+ - Foe
+
+ 3. Third
+
+`pandoc` transforms this into a "compact list" (with no `<p>` tags
+around "First", "Second", or "Third"), while markdown puts `<p>`
+tags around "Second" and "Third" (but not "First"), because of
+the blank space around "Third". `pandoc` follows a simple rule:
+if the text is followed by a blank line, it is treated as a
+paragraph. Since "Second" is followed by a list, and not a blank
+line, it isn't treated as a paragraph. The fact that the list
+is followed by a blank line is irrelevant.
+
+## Literal quotes in titles
+
+Standard markdown allows unescaped literal quotes in titles, as
+in
+
+ [foo]: "bar "embedded" baz"
+
+`pandoc` requires all quotes within titles to be escaped:
+
+ [foo]: "bar \"embedded\" baz"
+
+## Reference links
+
+`pandoc` allows implicit reference links in either of two styles:
+
+ 1. Here's my [link]
+ 2. Here's my [link][]
+
+ [link]: linky.com
+
+If there's no corresponding reference, the implicit reference link
+will appear as regular bracketed text. Note: even `[link][]` will
+appear as `[link]` if there's no reference for `link`. If you want
+`[link][]`, use a backslash escape: `\[link]\[]`.
+
+## Footnotes
+
+`pandoc`'s markdown allows footnotes, using the following syntax:
+
+ here is a footnote reference,^(1) and another.^(longnote)
+
+ ^(1) Here is the footnote. It can go anywhere in the document,
+ except in embedded contexts like block quotes or lists.
+
+ ^(longnote) Here's the other note. This one contains multiple
+ blocks.
+ ^
+ ^ Caret characters are used to indicate that the blocks all belong
+ to a single footnote (as with block quotes).
+ ^
+ ^ If you want, you can use a caret at the beginning of every line,
+ ^ as with blockquotes, but all that you need is a caret at the
+ ^ beginning of the first line of the block and any preceding
+ ^ blank lines.
+
+Footnote references may not contain spaces, tabs, or newlines.
+
+## Embedded HTML
+
+`pandoc` treats embedded HTML in markdown a bit differently than
+Markdown 1.0. While Markdown 1.0 leaves HTML blocks exactly as they
+are, `pandoc` treats text between HTML tags as markdown. Thus, for
+example, `pandoc` will turn
+
+ <table>
+ <tr>
+ <td>*one*</td>
+ <td>[a link](http://google.com)</td>
+ </tr>
+ </table>
+
+into
+
+ <table>
+ <tr>
+ <td><em>one</em></td>
+ <td><a href="http://google.com">a link</a></td>
+ </tr>
+ </table>
+
+whereas Markdown 1.0 will preserve it as is.
+
+There is one exception to this rule: text between `<script>` and
+`</script>` tags is not interpreted as markdown.
+
+This departure from standard markdown should make it easier to mix
+markdown with HTML block elements. For example, one can surround
+a block of markdown text with `<div>` tags without preventing it
+from being interpreted as markdown.
+
+## Title blocks
+
+If the file begins with a title block
+
+ % title
+ % author(s) (separated by commas)
+ % date
+
+it will be parsed as bibliographic information, not regular text. (It
+will be used, for example, in the title of standalone LaTeX or HTML
+output.) The block may contain just a title, a title and an author,
+or all three lines. Each must begin with a % and fit on one line.
+The title may contain standard inline formatting. If you want to
+include an author but no title, or a title and a date but no author,
+you need a blank line:
+
+ % My title
+ %
+ % June 15, 2006
+
+Titles will be written only when the `--standalone` (`-s`) option is
+chosen. In HTML output, titles will appear twice: once in the
+document head -- this is the title that will appear at the top of the
+window in a browser -- and once at the beginning of the document body.
+The title in the document head can have an optional prefix attached
+(`--title-prefix` or `-T` option). The title in the body appears as
+an H1 element with class "title", so it can be suppressed or
+reformatted with CSS.
+
+If a title prefix is specified with `-T` and no title block appears
+in the document, the title prefix will be used by itself as the
+HTML title.
+
+## Box-style blockquotes
+
+`pandoc` supports emacs-style boxquote block quotes, in addition to
+standard markdown (email-style) boxquotes:
+
+ ,----
+ | They look like this.
+ `----
+
+## Inline LaTeX
+
+Anything between two $ characters will be parsed as LaTeX math. The
+opening $ must have a character immediately to its right, while the
+closing $ must have a character immediately to its left. Thus,
+`$20,000 and $30,000` won't parse as math. The $ character can be
+escaped with a backslash if needed.
+
+If you pass the `-m` (`--asciimathml`) option to `pandoc`, it will
+include the [ASCIIMathML] script in the resulting HTML. This will
+cause LaTeX math to be displayed as formulas in better browsers.
+
+[ASCIIMathML]: http://www1.chapman.edu/~jipsen/asciimath.html
+
+Inline LaTeX commands will also be preserved and passed unchanged
+to the LaTeX writer. Thus, for example, you can use LaTeX to
+include BibTeX citations:
+
+ This result was proved in \cite{jones.1967}.
+
+You can also use LaTeX environments. For example,
+
+ \begin{tabular}{|l|l|}\hline
+ Age & Frequency \\ \hline
+ 18--25 & 15 \\
+ 26--35 & 33 \\
+ 36--45 & 22 \\ \hline
+ \end{tabular}
+
+Note, however, that material between the begin and end tags will
+be interpreted as raw LaTeX, not as markdown.
+
+## Custom headers
+
+When run with the "standalone" option (`-s`), `pandoc` creates a
+standalone file, complete with an appropriate header. To see the
+default headers used for html and latex, use the following commands:
+
+ pandoc -D html
+
+ pandoc -D latex
+
+If you want to use a different header, just create a file containing
+it and specify it on the command line as follows:
+
+ pandoc --header=MyHeaderFile
+
+# Producing S5 with `pandoc`
+
+Producing an [S5] slide show with `pandoc` is easy. A title page is
+constructed automatically from the document's title block (see above).
+Each section (with a level-one header) produces a single slide. (Note
+that if the section is too big, the slide will not fit on the page; S5
+is not smart enough to produce multiple pages.)
+
+Here's the markdown source for a simple slide show, `eating.txt`:
+
+ % Eating Habits
+ % John Doe
+ % March 22, 2005
+
+ # In the morning
+
+ - Eat eggs
+ - Drink coffee
+
+ # In the evening
+
+ - Eat spaghetti
+ - Drink wine
+
+To produce the slide show, simply type
+
+ pandoc -w s5 -s eating.txt > eating.html
+
+and open up `eating.html` in a browser. The HTML file embeds
+all the required javascript and CSS, so no other files are necessary.
+
+Note that by default, the S5 writer produces lists that display
+"all at once." If you want your lists to display incrementally
+(one item at a time), use the `-i` option. If you want a
+particular list to depart from the default (that is, to display
+incrementally without the `-i` option and all at once with the
+`-i` option), put it in a block quote:
+
+ > - Eat spaghetti
+ > - Drink wine
+
+In this way incremental and nonincremental lists can be mixed in
+a single document.
+