From df7b68225101966051f8b592a27127bf789eb81e Mon Sep 17 00:00:00 2001 From: fiddlosopher Date: Tue, 17 Oct 2006 14:22:29 +0000 Subject: initial import git-svn-id: https://pandoc.googlecode.com/svn/trunk@2 788f1e2b-df1e-0410-8736-df70ead52e1b --- README | 508 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 508 insertions(+) create mode 100644 README (limited to 'README') diff --git a/README b/README new file mode 100644 index 000000000..e387e5f3c --- /dev/null +++ b/README @@ -0,0 +1,508 @@ +% pandoc +% John MacFarlane +% August 10, 2006 + +`pandoc` converts files from one markup format to another. It can +read [markdown] and (with some limitations) [reStructuredText], [HTML], and +[LaTeX], and it can write [markdown], [reStructuredText], [HTML], +[LaTeX], [RTF], and [S5] HTML slide shows. It is written in +[Haskell], using the excellent [Parsec] parser combinator library. + +[markdown]: http://daringfireball.net/projects/markdown/ +[reStructuredText]: http://docutils.sourceforge.net/docs/ref/rst/introduction.html +[S5]: http://meyerweb.com/eric/tools/s5/ +[HTML]: http://www.w3.org/TR/html40/ +[LaTeX]: http://www.latex-project.org/ +[RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format +[Haskell]: http://www.haskell.org/ +[Parsec]: http://www.cs.uu.nl/~daan/download/parsec/parsec.html + +(c) 2006 John MacFarlane (jgm At berkeley.edu). Released under the +[GPL], version 2 or greater. This software carries no warranty of +any kind. (See LICENSE for full copyright and warranty notices.) + +[GPL]: http://www.gnu.org/copyleft/gpl.html + +# Installation + +## Installing GHC + +To compile `pandoc`, you'll need [GHC] version 6.4 or greater. + +If you don't have GHC already, you can get it from the +[GHC Download] page. + +[GHC]: http://www.haskell.org/ghc/ +[GHC Download]: http://www.haskell.org/ghc/download.html + +Note: As of this writing, there's no MacOS X installer package for +GHC 6.4.2 (the latest version). There is an installer for +GHC 6.4.1 [here](http://www.haskell.org/ghc/download_ghc_641.html#macosx). +It will work just fine on PPC-based Macs. GHC has not yet been ported +to Intel Macs: see . + +You'll also need standard build tools: GNU Make, sed, bash, and perl. +These are standard on unix systems (including MacOS X). If you're +using Windows, you can install [Cygwin]. + +[Cygwin]: http://www.cygwin.com/ + +Note: I have tested `pandoc` on MacOS X and Linux systems. I have not +tried it on Windows, and I have no idea whether it will work on Windows. + +## Installing `pandoc` + +1. Change to the directory containing the `pandoc` distribution. + +2. Compile: + + make + +3. Optional, but recommended: + + make test + +4. If you want to install the `pandoc` program and the relevant wrappers + and documents (including this file) into `/usr/local` directory, type: + + make install + + If you only want the `pandoc` program and the shell scripts `latex2markdown`, + `markdown2latex`, `markdown2pdf`, `markdown2html`, `html2markdown` installed + into your `~/bin` directory, type (note the **`-exec`** suffix): + + PREFIX=~ make install-exec + +5. If you want to install the Pandoc library modules for use in + other Haskell programs, type (as root): + + make install-lib + +6. To install the library documentation (into `/usr/local/pandoc-doc`), + type: + + make install-lib-doc + +# Using `pandoc` + +You can run `pandoc` like this: + + ./pandoc + +If you copy the `pandoc` executable to a directory in your path +(perhaps using `make install`), you can invoke it without the "./": + + pandoc + +If you run `pandoc` without arguments, it will accept input from +STDIN. If you run it with file names as arguments, it will take input +from those files. It accepts several command-line options. For a +list, type + + pandoc -h + +The most important options specify the format of the source file and +the output. The default reader is markdown; the default writer is +HTML. So if you don't specify a reader or writer, `pandoc` will +convert markdown to HTML. To convert markdown to LaTeX, you could +write: + + pandoc -w latex input.txt + +To convert html to markdown: + + pandoc -r html -w markdown input.txt + +Supported writers include markdown, LaTeX, HTML, RTF, +reStructuredText, and S5 (which produces an HTML file that acts like +powerpoint). Supported readers include markdown, HTML, LaTeX, and +reStructuredText. Note that the rst (reStructuredText) reader only +parses a subset of rst syntax. For example, it doesn't handle tables, +definition lists, option lists, or footnotes. It handles only the +constructs expressible in unextended markdown. But for simple +documents it should be adequate. The LaTeX and HTML readers are also +limited in what they can do. + +`pandoc` writes its output to STDOUT. If you want to write to a file, +use redirection: + + pandoc input.txt > output.html + +Note that you can specify multiple input files on the command line. +`pandoc` will concatenate them all (with blank lines between them) +before parsing: + + pandoc -s chapter1.txt chapter2.txt chapter3.txt references.txt > book.html + +## Character encoding + +Unfortunately, due to limitations in GHC, `pandoc` does not +automatically detect the system's local character encoding. Hence, +all input and output is assumed to be in the UTF-8 encoding. If you +use accented or foreign characters, you should convert the input file +to UTF-8 before processing it with `pandoc`. This can be done by +piping the input through [`iconv`]: for example, + + iconv -t utf-8 source.txt | pandoc > output.html + +will convert `source.txt` from the local encoding to UTF-8, then +convert it to HTML, putting the output in `output.html`. + +[`iconv`]: http://www.gnu.org/software/libiconv/ + +The shell scripts (described below) automatically convert the source +from the local encoding to UTF-8 before running them through `pandoc`. + +## The shell scripts + +For convenience, five shell scripts have been included that make it +easy to run `pandoc` without remembering all the command-line options. +All of the scripts presuppose that `pandoc` is in the path, and +`html2markdown` also presupposes that `curl` and `tidy` are in the +path. + +1. `markdown2html` converts markdown to HTML, running `iconv` first to + convert the file to UTF-8. (This can be used as a replacement for + `Markdown.pl`.) + +2. `html2markdown` can take either a filename or a URL as argument. If + it is given a URL, it uses `curl` to fetch the contents of the + specified URL, then filters this through `tidy` to straighten up the + HTML and convert to UTF-8, and finally passes this HTML to `pandoc` to + produce markdown text: + + html2markdown http://www.fsf.org + + html2markdown www.fsf.org + + html2markdown subdir/mylocalfile.html + +3. `latex2markdown` converts a LaTeX file to markdown. + + latex2markdown mytexfile.tex + +4. `markdown2latex` converts markdown to LaTeX: + + markdown2latex mytextfile.txt + +5. `markdown2pdf` converts markdown to PDF, using LaTeX, but removing + all the intermediate files created by LaTeX. Example: + + markdown2pdf mytextfile.txt + + creates a file `mytextfile.pdf` in the working directory. + +# Command-line options + +Various command-line options can be used to customize the output. +For a complete list, type + + pandoc --help + +`-p` or `--preserve-tabs` causes tabs in the source text to be +preserved, rather than converted to spaces (the default). + +`--tabstop` allows the user to set the tab stop (which defaults to 4). + +`-R` or `--parse-raw` causes the HTML and LaTeX readers to parse HTML +codes and LaTeX environments that it can't translate as raw HTML or +LaTeX. Raw HTML can be printed in markdown, reStructuredText, HTML, +and S5 output; raw LaTeX can be printed in markdown, reStructuredText, +and LaTeX output. The default is for the readers to omit +untranslatable HTML codes and LaTeX environments. (The LaTeX reader +does pass through untranslatable LaTeX commands, even if `-R` is not +specified.) + +`-s` or `--standalone` causes `pandoc` to produce a standalone file, +complete with appropriate document headers. By default, `pandoc` +produces a fragment. + +`--custom-header` can be used to specify a custom document header. To +see the headers used by default, use the `-D` option: for example, +`pandoc -D html` prints the default HTML header. + +`-c` or `--css` allows the user to specify a custom stylesheet that +will be linked to in HTML and S5 output. + +`-H` or `--include-in-header` specifies a file to be included +(verbatim) at the end of the document header. This can be used, for +example, to include special CSS or javascript in HTML documents. + +`-B` or `--include-before-body` specifies a file to be included +(verbatim) at the beginning of the document body (after the `` +tag in HTML, or the `\begin{document}` command in LaTeX). This can be +used to include navigation bars or banners in HTML documents. + +`-A` or `--include-after-body` specifies a file to be included +(verbatim) at the end of the docment body (before the `` tag in +HTML, or the `\end{document}` command in LaTeX). + +`-T` or `--title-prefix` specifies a string to be included as a prefix +at the beginning of the title that appears in the HTML header (but not +in the title as it appears at the beginning of the HTML body). (See +below on Titles.) + +`-S` or `--smartypants` causes `pandoc` to produce typographically +correct HTML output, along the lines of John Gruber's [Smartypants]. +Straight quotes are converted to curly quotes, `---` to dashes, and +`...` to ellipses. + +[Smartypants]: http://daringfireball.net/projects/smartypants/ + +`-m` or `--asciimathml` will cause LaTeX formulas (between $ signs) in +HTML or S5 to display as formulas rather than as code. The trick will +not work in all browsers, but it works in Firefox. Peter Jipsen's +[ASCIIMathML] script is used to do the magic. + +[ASCIIMathML]: http://www1.chapman.edu/~jipsen/mathml/asciimath.html + +`-i` or `--incremental` causes all lists in S5 output to be displayed +incrementally by default (one item at a time). The normal default +is for lists to be displayed all at once. + +`-N` or `--number-sections` causes sections to be numbered in LaTeX +output. By default, sections are not numbered. + +# `pandoc`'s markdown vs. standard markdown + +In parsing markdown, `pandoc` departs from and extends [standard markdown] +in a few respects. (To run `pandoc` on the official +markdown test suite, type `make markdown_tests`.) + +[standard markdown]: http://daringfireball.net/projects/markdown/syntax + +## Lists + +`pandoc` behaves differently from standard markdown on some "edge +cases" involving lists. Consider this source: + + 1. First + 2. Second: + - Fee + - Fie + - Foe + + 3. Third + +`pandoc` transforms this into a "compact list" (with no `

` tags +around "First", "Second", or "Third"), while markdown puts `

` +tags around "Second" and "Third" (but not "First"), because of +the blank space around "Third". `pandoc` follows a simple rule: +if the text is followed by a blank line, it is treated as a +paragraph. Since "Second" is followed by a list, and not a blank +line, it isn't treated as a paragraph. The fact that the list +is followed by a blank line is irrelevant. + +## Literal quotes in titles + +Standard markdown allows unescaped literal quotes in titles, as +in + + [foo]: "bar "embedded" baz" + +`pandoc` requires all quotes within titles to be escaped: + + [foo]: "bar \"embedded\" baz" + +## Reference links + +`pandoc` allows implicit reference links in either of two styles: + + 1. Here's my [link] + 2. Here's my [link][] + + [link]: linky.com + +If there's no corresponding reference, the implicit reference link +will appear as regular bracketed text. Note: even `[link][]` will +appear as `[link]` if there's no reference for `link`. If you want +`[link][]`, use a backslash escape: `\[link]\[]`. + +## Footnotes + +`pandoc`'s markdown allows footnotes, using the following syntax: + + here is a footnote reference,^(1) and another.^(longnote) + + ^(1) Here is the footnote. It can go anywhere in the document, + except in embedded contexts like block quotes or lists. + + ^(longnote) Here's the other note. This one contains multiple + blocks. + ^ + ^ Caret characters are used to indicate that the blocks all belong + to a single footnote (as with block quotes). + ^ + ^ If you want, you can use a caret at the beginning of every line, + ^ as with blockquotes, but all that you need is a caret at the + ^ beginning of the first line of the block and any preceding + ^ blank lines. + +Footnote references may not contain spaces, tabs, or newlines. + +## Embedded HTML + +`pandoc` treats embedded HTML in markdown a bit differently than +Markdown 1.0. While Markdown 1.0 leaves HTML blocks exactly as they +are, `pandoc` treats text between HTML tags as markdown. Thus, for +example, `pandoc` will turn + + + + + + +
*one*[a link](http://google.com)
+ +into + + + + + + +
onea link
+ +whereas Markdown 1.0 will preserve it as is. + +There is one exception to this rule: text between `` tags is not interpreted as markdown. + +This departure from standard markdown should make it easier to mix +markdown with HTML block elements. For example, one can surround +a block of markdown text with `

` tags without preventing it +from being interpreted as markdown. + +## Title blocks + +If the file begins with a title block + + % title + % author(s) (separated by commas) + % date + +it will be parsed as bibliographic information, not regular text. (It +will be used, for example, in the title of standalone LaTeX or HTML +output.) The block may contain just a title, a title and an author, +or all three lines. Each must begin with a % and fit on one line. +The title may contain standard inline formatting. If you want to +include an author but no title, or a title and a date but no author, +you need a blank line: + + % My title + % + % June 15, 2006 + +Titles will be written only when the `--standalone` (`-s`) option is +chosen. In HTML output, titles will appear twice: once in the +document head -- this is the title that will appear at the top of the +window in a browser -- and once at the beginning of the document body. +The title in the document head can have an optional prefix attached +(`--title-prefix` or `-T` option). The title in the body appears as +an H1 element with class "title", so it can be suppressed or +reformatted with CSS. + +If a title prefix is specified with `-T` and no title block appears +in the document, the title prefix will be used by itself as the +HTML title. + +## Box-style blockquotes + +`pandoc` supports emacs-style boxquote block quotes, in addition to +standard markdown (email-style) boxquotes: + + ,---- + | They look like this. + `---- + +## Inline LaTeX + +Anything between two $ characters will be parsed as LaTeX math. The +opening $ must have a character immediately to its right, while the +closing $ must have a character immediately to its left. Thus, +`$20,000 and $30,000` won't parse as math. The $ character can be +escaped with a backslash if needed. + +If you pass the `-m` (`--asciimathml`) option to `pandoc`, it will +include the [ASCIIMathML] script in the resulting HTML. This will +cause LaTeX math to be displayed as formulas in better browsers. + +[ASCIIMathML]: http://www1.chapman.edu/~jipsen/asciimath.html + +Inline LaTeX commands will also be preserved and passed unchanged +to the LaTeX writer. Thus, for example, you can use LaTeX to +include BibTeX citations: + + This result was proved in \cite{jones.1967}. + +You can also use LaTeX environments. For example, + + \begin{tabular}{|l|l|}\hline + Age & Frequency \\ \hline + 18--25 & 15 \\ + 26--35 & 33 \\ + 36--45 & 22 \\ \hline + \end{tabular} + +Note, however, that material between the begin and end tags will +be interpreted as raw LaTeX, not as markdown. + +## Custom headers + +When run with the "standalone" option (`-s`), `pandoc` creates a +standalone file, complete with an appropriate header. To see the +default headers used for html and latex, use the following commands: + + pandoc -D html + + pandoc -D latex + +If you want to use a different header, just create a file containing +it and specify it on the command line as follows: + + pandoc --header=MyHeaderFile + +# Producing S5 with `pandoc` + +Producing an [S5] slide show with `pandoc` is easy. A title page is +constructed automatically from the document's title block (see above). +Each section (with a level-one header) produces a single slide. (Note +that if the section is too big, the slide will not fit on the page; S5 +is not smart enough to produce multiple pages.) + +Here's the markdown source for a simple slide show, `eating.txt`: + + % Eating Habits + % John Doe + % March 22, 2005 + + # In the morning + + - Eat eggs + - Drink coffee + + # In the evening + + - Eat spaghetti + - Drink wine + +To produce the slide show, simply type + + pandoc -w s5 -s eating.txt > eating.html + +and open up `eating.html` in a browser. The HTML file embeds +all the required javascript and CSS, so no other files are necessary. + +Note that by default, the S5 writer produces lists that display +"all at once." If you want your lists to display incrementally +(one item at a time), use the `-i` option. If you want a +particular list to depart from the default (that is, to display +incrementally without the `-i` option and all at once with the +`-i` option), put it in a block quote: + + > - Eat spaghetti + > - Drink wine + +In this way incremental and nonincremental lists can be mixed in +a single document. + -- cgit v1.2.3