From df7b68225101966051f8b592a27127bf789eb81e Mon Sep 17 00:00:00 2001
From: fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>
Date: Tue, 17 Oct 2006 14:22:29 +0000
Subject: initial import

git-svn-id: https://pandoc.googlecode.com/svn/trunk@2 788f1e2b-df1e-0410-8736-df70ead52e1b
---
 README | 508 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 508 insertions(+)
 create mode 100644 README

(limited to 'README')

diff --git a/README b/README
new file mode 100644
index 000000000..e387e5f3c
--- /dev/null
+++ b/README
@@ -0,0 +1,508 @@
+% pandoc
+% John MacFarlane
+% August 10, 2006 
+
+`pandoc` converts files from one markup format to another.  It can
+read [markdown] and (with some limitations) [reStructuredText], [HTML], and
+[LaTeX], and it can write [markdown], [reStructuredText], [HTML],
+[LaTeX], [RTF], and [S5] HTML slide shows.  It is written in
+[Haskell], using the excellent [Parsec] parser combinator library.
+
+[markdown]: http://daringfireball.net/projects/markdown/
+[reStructuredText]: http://docutils.sourceforge.net/docs/ref/rst/introduction.html
+[S5]: http://meyerweb.com/eric/tools/s5/
+[HTML]:  http://www.w3.org/TR/html40/
+[LaTeX]:  http://www.latex-project.org/
+[RTF]:  http://en.wikipedia.org/wiki/Rich_Text_Format
+[Haskell]:  http://www.haskell.org/
+[Parsec]:  http://www.cs.uu.nl/~daan/download/parsec/parsec.html
+
+(c) 2006 John MacFarlane (jgm At berkeley.edu). Released under the
+[GPL], version 2 or greater.  This software carries no warranty of
+any kind.  (See LICENSE for full copyright and warranty notices.)
+
+[GPL]: http://www.gnu.org/copyleft/gpl.html
+
+# Installation
+
+## Installing GHC
+
+To compile `pandoc`, you'll need [GHC] version 6.4 or greater.  
+
+If you don't have GHC already, you can get it from the 
+[GHC Download] page.
+
+[GHC]: http://www.haskell.org/ghc/
+[GHC Download]: http://www.haskell.org/ghc/download.html
+
+Note:  As of this writing, there's no MacOS X installer package for
+GHC 6.4.2 (the latest version).  There is an installer for
+GHC 6.4.1 [here](http://www.haskell.org/ghc/download_ghc_641.html#macosx).
+It will work just fine on PPC-based Macs.  GHC has not yet been ported
+to Intel Macs:  see <http://hackage.haskell.org/trac/ghc/wiki/X86OSXGhc>.
+
+You'll also need standard build tools: GNU Make, sed, bash, and perl.
+These are standard on unix systems (including MacOS X).  If you're
+using Windows, you can install [Cygwin].
+
+[Cygwin]: http://www.cygwin.com/
+
+Note:  I have tested `pandoc` on MacOS X and Linux systems.  I have not
+tried it on Windows, and I have no idea whether it will work on Windows.
+  
+## Installing `pandoc`
+
+1.  Change to the directory containing the `pandoc` distribution.
+
+2.  Compile:
+
+            make
+
+3.  Optional, but recommended:
+
+            make test
+
+4.  If you want to install the `pandoc` program and the relevant wrappers 
+    and documents (including this file) into `/usr/local` directory, type:
+            
+            make install
+    
+    If you only want the `pandoc` program and the shell scripts `latex2markdown`,
+    `markdown2latex`, `markdown2pdf`, `markdown2html`, `html2markdown` installed
+    into your `~/bin` directory, type (note the **`-exec`** suffix):
+
+            PREFIX=~ make install-exec
+
+5.  If you want to install the Pandoc library modules for use in 
+    other Haskell programs, type (as root):
+
+            make install-lib
+   
+6.  To install the library documentation (into `/usr/local/pandoc-doc`), 
+    type:
+
+            make install-lib-doc
+ 
+# Using `pandoc`
+
+You can run `pandoc` like this:
+
+    ./pandoc
+
+If you copy the `pandoc` executable to a directory in your path
+(perhaps using `make install`), you can invoke it without the "./":
+
+    pandoc
+
+If you run `pandoc` without arguments, it will accept input from
+STDIN.  If you run it with file names as arguments, it will take input
+from those files.  It accepts several command-line options.  For a
+list, type
+
+    pandoc -h
+
+The most important options specify the format of the source file and
+the output.  The default reader is markdown; the default writer is
+HTML.  So if you don't specify a reader or writer, `pandoc` will
+convert markdown to HTML.  To convert markdown to LaTeX, you could
+write:
+
+    pandoc -w latex input.txt
+
+To convert html to markdown:
+
+    pandoc -r html -w markdown input.txt
+
+Supported writers include markdown, LaTeX, HTML, RTF,
+reStructuredText, and S5 (which produces an HTML file that acts like
+powerpoint).  Supported readers include markdown, HTML, LaTeX, and
+reStructuredText.  Note that the rst (reStructuredText) reader only
+parses a subset of rst syntax.  For example, it doesn't handle tables,
+definition lists, option lists, or footnotes.  It handles only the
+constructs expressible in unextended markdown.  But for simple
+documents it should be adequate.  The LaTeX and HTML readers are also
+limited in what they can do.  
+
+`pandoc` writes its output to STDOUT.  If you want to write to a file,
+use redirection:
+
+	pandoc input.txt > output.html
+
+Note that you can specify multiple input files on the command line.
+`pandoc` will concatenate them all (with blank lines between them)
+before parsing:
+
+	pandoc -s chapter1.txt chapter2.txt chapter3.txt references.txt > book.html
+
+## Character encoding
+
+Unfortunately, due to limitations in GHC, `pandoc` does not
+automatically detect the system's local character encoding.  Hence,
+all input and output is assumed to be in the UTF-8 encoding.  If you
+use accented or foreign characters, you should convert the input file
+to UTF-8 before processing it with `pandoc`.  This can be done by
+piping the input through [`iconv`]: for example,
+
+	iconv -t utf-8 source.txt | pandoc > output.html
+
+will convert `source.txt` from the local encoding to UTF-8, then
+convert it to HTML, putting the output in `output.html`.
+
+[`iconv`]: http://www.gnu.org/software/libiconv/
+
+The shell scripts (described below) automatically convert the source
+from the local encoding to UTF-8 before running them through `pandoc`.
+
+## The shell scripts 
+
+For convenience, five shell scripts have been included that make it
+easy to run `pandoc` without remembering all the command-line options.
+All of the scripts presuppose that `pandoc` is in the path, and
+`html2markdown` also presupposes that `curl` and `tidy` are in the
+path.
+
+1.  `markdown2html` converts markdown to HTML, running `iconv` first to
+	convert the file to UTF-8.  (This can be used as a replacement for
+	`Markdown.pl`.)
+
+2.	`html2markdown` can take either a filename or a URL as argument.  If
+	it is given a URL, it uses `curl` to fetch the contents of the
+	specified URL, then filters this through `tidy` to straighten up the
+	HTML and convert to UTF-8, and finally passes this HTML to `pandoc` to
+	produce markdown text:
+
+	    html2markdown http://www.fsf.org
+
+	    html2markdown www.fsf.org
+
+	    html2markdown subdir/mylocalfile.html
+
+3. 	`latex2markdown` converts a LaTeX file to markdown. 
+
+	    latex2markdown mytexfile.tex
+
+4. 	`markdown2latex` converts markdown to LaTeX:
+
+	    markdown2latex mytextfile.txt
+
+5.	`markdown2pdf` converts markdown to PDF, using LaTeX, but removing
+	all the intermediate files created by LaTeX.  Example:
+
+	    markdown2pdf mytextfile.txt
+
+	creates a file `mytextfile.pdf` in the working directory.
+
+# Command-line options
+
+Various command-line options can be used to customize the output.  
+For a complete list, type 
+
+    pandoc --help  
+
+`-p` or `--preserve-tabs` causes tabs in the source text to be
+preserved, rather than converted to spaces (the default).
+
+`--tabstop` allows the user to set the tab stop (which defaults to 4).  
+
+`-R` or `--parse-raw` causes the HTML and LaTeX readers to parse HTML
+codes and LaTeX environments that it can't translate as raw HTML or
+LaTeX.  Raw HTML can be printed in markdown, reStructuredText, HTML,
+and S5 output; raw LaTeX can be printed in markdown, reStructuredText,
+and LaTeX output.  The default is for the readers to omit
+untranslatable HTML codes and LaTeX environments.  (The LaTeX reader
+does pass through untranslatable LaTeX commands, even if `-R` is not
+specified.)
+
+`-s` or `--standalone` causes `pandoc` to produce a standalone file,
+complete with appropriate document headers.  By default, `pandoc`
+produces a fragment.
+
+`--custom-header` can be used to specify a custom document header.  To
+see the headers used by default, use the `-D` option: for example,
+`pandoc -D html` prints the default HTML header.
+
+`-c` or `--css` allows the user to specify a custom stylesheet that
+will be linked to in HTML and S5 output.
+
+`-H` or `--include-in-header` specifies a file to be included
+(verbatim) at the end of the document header.  This can be used, for
+example, to include special CSS or javascript in HTML documents.
+
+`-B` or `--include-before-body` specifies a file to be included
+(verbatim) at the beginning of the document body (after the `<body>`
+tag in HTML, or the `\begin{document}` command in LaTeX).  This can be
+used to include navigation bars or banners in HTML documents.
+
+`-A` or `--include-after-body` specifies a file to be included
+(verbatim) at the end of the docment body (before the `</body>` tag in
+HTML, or the `\end{document}` command in LaTeX).
+
+`-T` or `--title-prefix` specifies a string to be included as a prefix
+at the beginning of the title that appears in the HTML header (but not
+in the title as it appears at the beginning of the HTML body).  (See
+below on Titles.)
+
+`-S` or `--smartypants` causes `pandoc` to produce typographically
+correct HTML output, along the lines of John Gruber's [Smartypants].
+Straight quotes are converted to curly quotes, `---` to dashes, and
+`...` to ellipses.
+
+[Smartypants]: http://daringfireball.net/projects/smartypants/
+
+`-m` or `--asciimathml` will cause LaTeX formulas (between $ signs) in
+HTML or S5 to display as formulas rather than as code.  The trick will
+not work in all browsers, but it works in Firefox.  Peter Jipsen's
+[ASCIIMathML] script is used to do the magic.
+
+[ASCIIMathML]: http://www1.chapman.edu/~jipsen/mathml/asciimath.html
+
+`-i` or `--incremental` causes all lists in S5 output to be displayed
+incrementally by default (one item at a time).  The normal default
+is for lists to be displayed all at once.  
+
+`-N` or `--number-sections` causes sections to be numbered in LaTeX
+output.  By default, sections are not numbered.
+
+# `pandoc`'s markdown vs. standard markdown
+
+In parsing markdown, `pandoc` departs from and extends [standard markdown]
+in a few respects.  (To run `pandoc` on the official
+markdown test suite, type `make markdown_tests`.)
+
+[standard markdown]:  http://daringfireball.net/projects/markdown/syntax
+
+## Lists
+
+`pandoc` behaves differently from standard markdown on some "edge
+cases" involving lists.  Consider this source: 
+
+    1.  First
+    2.  Second:
+        -   Fee
+        -   Fie
+        -   Foe
+
+    3.  Third
+
+`pandoc` transforms this into a "compact list" (with no `<p>` tags
+around "First", "Second", or "Third"), while markdown puts `<p>`
+tags around "Second" and "Third" (but not "First"), because of
+the blank space around "Third".  `pandoc` follows a simple rule:
+if the text is followed by a blank line, it is treated as a
+paragraph.  Since "Second" is followed by a list, and not a blank
+line, it isn't treated as a paragraph.  The fact that the list
+is followed by a blank line is irrelevant. 
+
+## Literal quotes in titles
+
+Standard markdown allows unescaped literal quotes in titles, as
+in 
+
+    [foo]: "bar "embedded" baz"
+
+`pandoc` requires all quotes within titles to be escaped:
+
+    [foo]: "bar \"embedded\" baz"
+
+## Reference links
+
+`pandoc` allows implicit reference links in either of two styles:
+
+    1. Here's my [link]
+    2. Here's my [link][]
+
+    [link]: linky.com
+
+If there's no corresponding reference, the implicit reference link
+will appear as regular bracketed text.  Note: even `[link][]` will
+appear as `[link]` if there's no reference for `link`.  If you want
+`[link][]`, use a backslash escape: `\[link]\[]`.
+
+## Footnotes
+
+`pandoc`'s markdown allows footnotes, using the following syntax:
+
+	here is a footnote reference,^(1) and another.^(longnote)
+
+	^(1) Here is the footnote.  It can go anywhere in the document,
+    except in embedded contexts like block quotes or lists.	
+
+	^(longnote) Here's the other note.  This one contains multiple
+	blocks.  
+	^
+	^ Caret characters are used to indicate that the blocks all belong
+    to a single footnote (as with block quotes).
+	^
+	^ If you want, you can use a caret at the beginning of every line,
+    ^ as with blockquotes, but all that you need is a caret at the
+    ^ beginning of the first line of the block and any preceding 
+    ^ blank lines.
+
+Footnote references may not contain spaces, tabs, or newlines.
+
+## Embedded HTML
+
+`pandoc` treats embedded HTML in markdown a bit differently than
+Markdown 1.0.  While Markdown 1.0 leaves HTML blocks exactly as they
+are, `pandoc` treats text between HTML tags as markdown.  Thus, for
+example, `pandoc` will turn
+
+    <table>
+        <tr>
+            <td>*one*</td>
+            <td>[a link](http://google.com)</td>
+        </tr>
+    </table>
+
+into
+
+    <table>
+        <tr>
+            <td><em>one</em></td>
+            <td><a href="http://google.com">a link</a></td>
+        </tr>
+    </table>
+
+whereas Markdown 1.0 will preserve it as is.  
+
+There is one exception to this rule:  text between `<script>` and
+`</script>` tags is not interpreted as markdown.
+
+This departure from standard markdown should make it easier to mix
+markdown with HTML block elements.  For example, one can surround
+a block of markdown text with `<div>` tags without preventing it
+from being interpreted as markdown.
+
+## Title blocks
+
+If the file begins with a title block
+
+	% title
+	% author(s) (separated by commas)
+	% date
+
+it will be parsed as bibliographic information, not regular text.  (It
+will be used, for example, in the title of standalone LaTeX or HTML
+output.)  The block may contain just a title, a title and an author,
+or all three lines.  Each must begin with a % and fit on one line.
+The title may contain standard inline formatting.  If you want to
+include an author but no title, or a title and a date but no author,
+you need a blank line:
+
+	% My title
+	% 
+	% June 15, 2006
+
+Titles will be written only when the `--standalone` (`-s`) option is
+chosen.  In HTML output, titles will appear twice: once in the
+document head -- this is the title that will appear at the top of the
+window in a browser -- and once at the beginning of the document body.
+The title in the document head can have an optional prefix attached
+(`--title-prefix` or `-T` option).  The title in the body appears as
+an H1 element with class "title", so it can be suppressed or
+reformatted with CSS.
+
+If a title prefix is specified with `-T` and no title block appears
+in the document, the title prefix will be used by itself as the
+HTML title.
+
+## Box-style blockquotes
+
+`pandoc` supports emacs-style boxquote block quotes, in addition to
+standard markdown (email-style) boxquotes:
+
+	,----
+	| They look like this.
+	`----
+
+## Inline LaTeX
+
+Anything between two $ characters will be parsed as LaTeX math.  The
+opening $ must have a character immediately to its right, while the
+closing $ must have a character immediately to its left.  Thus,
+`$20,000 and $30,000` won't parse as math.  The $ character can be
+escaped with a backslash if needed.
+
+If you pass the `-m` (`--asciimathml`) option to `pandoc`, it will
+include the [ASCIIMathML] script in the resulting HTML.  This will
+cause LaTeX math to be displayed as formulas in better browsers.
+
+[ASCIIMathML]: http://www1.chapman.edu/~jipsen/asciimath.html 
+
+Inline LaTeX commands will also be preserved and passed unchanged
+to the LaTeX writer.  Thus, for example, you can use LaTeX to
+include BibTeX citations:
+
+	This result was proved in \cite{jones.1967}.
+
+You can also use LaTeX environments.  For example,
+
+	\begin{tabular}{|l|l|}\hline
+    Age & Frequency \\ \hline
+	18--25  & 15 \\
+    26--35  & 33 \\ 
+    36--45  & 22 \\ \hline
+	\end{tabular}
+
+Note, however, that material between the begin and end tags will
+be interpreted as raw LaTeX, not as markdown.
+
+## Custom headers
+
+When run with the "standalone" option (`-s`), `pandoc` creates a
+standalone file, complete with an appropriate header.  To see the
+default headers used for html and latex, use the following commands:
+
+    pandoc -D html
+
+    pandoc -D latex 
+
+If you want to use a different header, just create a file containing
+it and specify it on the command line as follows:
+
+    pandoc --header=MyHeaderFile
+
+# Producing S5 with `pandoc`
+
+Producing an [S5] slide show with `pandoc` is easy.  A title page is
+constructed automatically from the document's title block (see above).
+Each section (with a level-one header) produces a single slide.  (Note
+that if the section is too big, the slide will not fit on the page; S5
+is not smart enough to produce multiple pages.)  
+
+Here's the markdown source for a simple slide show, `eating.txt`:
+
+	% Eating Habits
+	% John Doe
+	% March 22, 2005
+
+	# In the morning
+
+	- Eat eggs
+	- Drink coffee
+
+	# In the evening
+
+	- Eat spaghetti
+	- Drink wine
+
+To produce the slide show, simply type
+
+	pandoc -w s5 -s eating.txt > eating.html
+
+and open up `eating.html` in a browser.  The HTML file embeds
+all the required javascript and CSS, so no other files are necessary.
+
+Note that by default, the S5 writer produces lists that display
+"all at once."  If you want your lists to display incrementally
+(one item at a time), use the `-i` option.  If you want a
+particular list to depart from the default (that is, to display
+incrementally without the `-i` option and all at once with the
+`-i` option), put it in a block quote:
+
+	> - Eat spaghetti
+	> - Drink wine
+
+In this way incremental and nonincremental lists can be mixed in
+a single document.
+
-- 
cgit v1.2.3