Minor corrections and improvements to README.

git-svn-id: https://pandoc.googlecode.com/svn/trunk@10 788f1e2b-df1e-0410-8736-df70ead52e1b
author: fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b> 2006-10-27 03:16:13 +0000
committer: fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b> 2006-10-27 03:16:13 +0000
commit: 3a9d4b2d1688ca9a5964c8355b1ad6dfc3639c0d (patch)
tree: cd0863718deca07d78591e12befc1c7aa53daf73 /README
parent: 86e8b9635a18c3c85f933b97d0da15a1638fe408 (diff)
1 files changed, 88 insertions, 80 deletions
diff --git a/README b/README
index e387e5f3c..141e7d6f5 100644
--- a/README
+++ b/README
@@ -2,11 +2,19 @@
 % John MacFarlane
 % August 10, 2006 
 
-`pandoc` converts files from one markup format to another.  It can
-read [markdown] and (with some limitations) [reStructuredText], [HTML], and
-[LaTeX], and it can write [markdown], [reStructuredText], [HTML],
-[LaTeX], [RTF], and [S5] HTML slide shows.  It is written in
-[Haskell], using the excellent [Parsec] parser combinator library.
+`pandoc` is a [Haskell] library for converting files from one markup
+format to another, and a command-line tool that uses this library. It can
+read [markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX],
+and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF],
+and [S5] HTML slide shows. `pandoc`'s version of markdown contains some
+enhancements, like footnotes and embedded LaTeX.
+
+In contrast to existing tools for converting markdown to HTML, which
+use regex substitutions, `pandoc` has a modular design: it consists of a
+set of readers, which parse text in a given format and produce a native
+representation of the document, and a set of writers, which convert
+this native representation into a target format. Thus, adding an input
+or output format requires only adding a reader or writer.
 
 [markdown]: http://daringfireball.net/projects/markdown/
 [reStructuredText]: http://docutils.sourceforge.net/docs/ref/rst/introduction.html
@@ -15,7 +23,6 @@ read [markdown] and (with some limitations) [reStructuredText], [HTML], and
 [LaTeX]:  http://www.latex-project.org/
 [RTF]:  http://en.wikipedia.org/wiki/Rich_Text_Format
 [Haskell]:  http://www.haskell.org/
-[Parsec]:  http://www.cs.uu.nl/~daan/download/parsec/parsec.html
 
 (c) 2006 John MacFarlane (jgm At berkeley.edu). Released under the
 [GPL], version 2 or greater.  This software carries no warranty of
@@ -27,7 +34,7 @@ any kind.  (See LICENSE for full copyright and warranty notices.)
 
 ## Installing GHC
 
-To compile `pandoc`, you'll need [GHC] version 6.4 or greater.  
+To compile `pandoc`, you'll need [GHC] version 6.4 or greater. 
 
 If you don't have GHC already, you can get it from the 
 [GHC Download] page.
@@ -35,64 +42,57 @@ If you don't have GHC already, you can get it from the
 [GHC]: http://www.haskell.org/ghc/
 [GHC Download]: http://www.haskell.org/ghc/download.html
 
-Note:  As of this writing, there's no MacOS X installer package for
-GHC 6.4.2 (the latest version).  There is an installer for
-GHC 6.4.1 [here](http://www.haskell.org/ghc/download_ghc_641.html#macosx).
-It will work just fine on PPC-based Macs.  GHC has not yet been ported
-to Intel Macs:  see <http://hackage.haskell.org/trac/ghc/wiki/X86OSXGhc>.
-
-You'll also need standard build tools: GNU Make, sed, bash, and perl.
+You'll also need standard build tools: GNU `make`, `sed`, `bash`, and `perl`.
 These are standard on unix systems (including MacOS X).  If you're
 using Windows, you can install [Cygwin].
 
 [Cygwin]: http://www.cygwin.com/
 
-Note:  I have tested `pandoc` on MacOS X and Linux systems.  I have not
-tried it on Windows, and I have no idea whether it will work on Windows.
-  
 ## Installing `pandoc`
 
 1.  Change to the directory containing the `pandoc` distribution.
 
 2.  Compile:
 
-            make
+        make
 
-3.  Optional, but recommended:
+3.  See if it worked (optional, but recommended): 
 
-            make test
+        make test
 
-4.  If you want to install the `pandoc` program and the relevant wrappers 
-    and documents (including this file) into `/usr/local` directory, type:
-            
-            make install
-    
-    If you only want the `pandoc` program and the shell scripts `latex2markdown`,
-    `markdown2latex`, `markdown2pdf`, `markdown2html`, `html2markdown` installed
-    into your `~/bin` directory, type (note the **`-exec`** suffix):
+4.  Install:
 
-            PREFIX=~ make install-exec
+        make install
 
-5.  If you want to install the Pandoc library modules for use in 
-    other Haskell programs, type (as root):
+    Note:  This installs `pandoc`, together with its wrappers and
+    documentation, into the `/usr/local` directory, which requires root
+    privileges.  If you don't have root privileges or would prefer to
+    install `pandoc` and the associated shell scripts into your `~/bin`
+    directory, type this instead:
 
-            make install-lib
-   
-6.  To install the library documentation (into `/usr/local/pandoc-doc`), 
-    type:
+        PREFIX=~ make install-exec
 
-            make install-lib-doc
- 
-# Using `pandoc`
+5.  Install Haskell libraries (optional):
+
+        make install-lib
+
+6.  Install library documentation into `/usr/local/pandoc-doc` (optional):
+
+        make install-lib-doc
+
+## Removing `pandoc`
 
-You can run `pandoc` like this:
+Each of the installation steps described above can be reversed:
 
-    ./pandoc
+    make uninstall
 
-If you copy the `pandoc` executable to a directory in your path
-(perhaps using `make install`), you can invoke it without the "./":
+    PREFIX=~ make uninstall-exec
 
-    pandoc
+    make uninstall-lib
+
+    make uninstall-lib-doc
+ 
+# Using `pandoc`
 
 If you run `pandoc` without arguments, it will accept input from
 STDIN.  If you run it with file names as arguments, it will take input
@@ -104,29 +104,34 @@ list, type
 The most important options specify the format of the source file and
 the output.  The default reader is markdown; the default writer is
 HTML.  So if you don't specify a reader or writer, `pandoc` will
-convert markdown to HTML.  To convert markdown to LaTeX, you could
-write:
+convert markdown to HTML.  For example,
 
-    pandoc -w latex input.txt
+    pandoc hello.txt
+
+will convert `hello.txt` from markdown to HTML.  For other conversions,
+you must specify a reader and/or a writer using the `-r` and `-w`
+flags.  To convert markdown to LaTeX, you would write:
+
+    pandoc -w latex hello.txt
 
 To convert html to markdown:
 
-    pandoc -r html -w markdown input.txt
+    pandoc -r html -w markdown hello.txt
 
-Supported writers include markdown, LaTeX, HTML, RTF,
-reStructuredText, and S5 (which produces an HTML file that acts like
-powerpoint).  Supported readers include markdown, HTML, LaTeX, and
-reStructuredText.  Note that the rst (reStructuredText) reader only
-parses a subset of rst syntax.  For example, it doesn't handle tables,
-definition lists, option lists, or footnotes.  It handles only the
-constructs expressible in unextended markdown.  But for simple
-documents it should be adequate.  The LaTeX and HTML readers are also
-limited in what they can do.  
+Supported writers include `markdown`, `latex`, `html`, `rtf` (rich text
+format), `rst` (reStructuredText), and `s5` (which produces an HTML
+file that acts like powerpoint).  Supported readers include `markdown`,
+`html`, `latex`, and `rst`.  Note that the `rst` reader only parses
+a subset of reStructuredText syntax.  For example, it doesn't handle
+tables, definition lists, option lists, or footnotes.  It handles only the
+constructs expressible in unextended markdown.  But for simple documents
+it should be adequate.  The `latex` and `html` readers are also limited
+in what they can do.
 
 `pandoc` writes its output to STDOUT.  If you want to write to a file,
 use redirection:
 
-	pandoc input.txt > output.html
+	pandoc hello.txt > hello.html
 
 Note that you can specify multiple input files on the command line.
 `pandoc` will concatenate them all (with blank lines between them)
@@ -134,14 +139,18 @@ before parsing:
 
 	pandoc -s chapter1.txt chapter2.txt chapter3.txt references.txt > book.html
 
+(The `-s` option here tells `pandoc` to produce a standalone HTML file,
+with a proper header, rather than a fragment.  For more details on this
+and many other command-line options, see below.)
+
 ## Character encoding
 
-Unfortunately, due to limitations in GHC, `pandoc` does not
-automatically detect the system's local character encoding.  Hence,
-all input and output is assumed to be in the UTF-8 encoding.  If you
-use accented or foreign characters, you should convert the input file
-to UTF-8 before processing it with `pandoc`.  This can be done by
-piping the input through [`iconv`]: for example,
+Unfortunately, due to limitations in GHC, `pandoc` does not automatically
+detect the system's local character encoding.  Hence, all input and
+output is assumed to be in the UTF-8 encoding.  If you use accented or
+foreign characters, you should convert the input file to UTF-8 before
+processing it with `pandoc`.  This can be done by piping the input through
+[`iconv`]: for example,
 
 	iconv -t utf-8 source.txt | pandoc > output.html
 
@@ -158,18 +167,18 @@ from the local encoding to UTF-8 before running them through `pandoc`.
 For convenience, five shell scripts have been included that make it
 easy to run `pandoc` without remembering all the command-line options.
 All of the scripts presuppose that `pandoc` is in the path, and
-`html2markdown` also presupposes that `curl` and `tidy` are in the
-path.
+some have additional requirements.  (For example, `html2markdown`
+uses `tidy`, and `markdown2pdf` uses `pdflatex`.)
 
 1.  `markdown2html` converts markdown to HTML, running `iconv` first to
 	convert the file to UTF-8.  (This can be used as a replacement for
 	`Markdown.pl`.)
 
 2.	`html2markdown` can take either a filename or a URL as argument.  If
-	it is given a URL, it uses `curl` to fetch the contents of the
-	specified URL, then filters this through `tidy` to straighten up the
-	HTML and convert to UTF-8, and finally passes this HTML to `pandoc` to
-	produce markdown text:
+	it is given a URL, it uses `curl`, `wget`, or an available text-based
+    browser to fetch the contents of the specified URL, then filters this
+	through `tidy` to straighten up the HTML and convert to UTF-8,
+	and finally passes this HTML to `pandoc` to produce markdown text:
 
 	    html2markdown http://www.fsf.org
 
@@ -185,24 +194,23 @@ path.
 
 	    markdown2latex mytextfile.txt
 
-5.	`markdown2pdf` converts markdown to PDF, using LaTeX, but removing
-	all the intermediate files created by LaTeX.  Example:
+5.	`markdown2pdf` converts markdown to PDF using `pdflatex`.  Example:
 
 	    markdown2pdf mytextfile.txt
 
-	creates a file `mytextfile.pdf` in the working directory.
+	creates a file `mytextfile.pdf`.
 
 # Command-line options
 
-Various command-line options can be used to customize the output.  
+Various command-line options can be used to customize the output.
 For a complete list, type 
 
-    pandoc --help  
+    pandoc --help
 
 `-p` or `--preserve-tabs` causes tabs in the source text to be
 preserved, rather than converted to spaces (the default).
 
-`--tabstop` allows the user to set the tab stop (which defaults to 4).  
+`--tabstop` allows the user to set the tab stop (which defaults to 4).
 
 `-R` or `--parse-raw` causes the HTML and LaTeX readers to parse HTML
 codes and LaTeX environments that it can't translate as raw HTML or
@@ -258,7 +266,7 @@ not work in all browsers, but it works in Firefox.  Peter Jipsen's
 
 `-i` or `--incremental` causes all lists in S5 output to be displayed
 incrementally by default (one item at a time).  The normal default
-is for lists to be displayed all at once.  
+is for lists to be displayed all at once.
 
 `-N` or `--number-sections` causes sections to be numbered in LaTeX
 output.  By default, sections are not numbered.
@@ -267,7 +275,7 @@ output.  By default, sections are not numbered.
 
 In parsing markdown, `pandoc` departs from and extends [standard markdown]
 in a few respects.  (To run `pandoc` on the official
-markdown test suite, type `make markdown_tests`.)
+markdown test suite, type `make test-markdown`.)
 
 [standard markdown]:  http://daringfireball.net/projects/markdown/syntax
 
@@ -328,7 +336,7 @@ appear as `[link]` if there's no reference for `link`.  If you want
     except in embedded contexts like block quotes or lists.	
 
 	^(longnote) Here's the other note.  This one contains multiple
-	blocks.  
+	blocks.
 	^
 	^ Caret characters are used to indicate that the blocks all belong
     to a single footnote (as with block quotes).
@@ -363,7 +371,7 @@ into
         </tr>
     </table>
 
-whereas Markdown 1.0 will preserve it as is.  
+whereas Markdown 1.0 will preserve it as is.
 
 There is one exception to this rule:  text between `<script>` and
 `</script>` tags is not interpreted as markdown.
@@ -468,7 +476,7 @@ Producing an [S5] slide show with `pandoc` is easy.  A title page is
 constructed automatically from the document's title block (see above).
 Each section (with a level-one header) produces a single slide.  (Note
 that if the section is too big, the slide will not fit on the page; S5
-is not smart enough to produce multiple pages.)  
+is not smart enough to produce multiple pages.)
 
 Here's the markdown source for a simple slide show, `eating.txt`:
author	fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>	2006-10-27 03:16:13 +0000
committer	fiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>	2006-10-27 03:16:13 +0000
commit	3a9d4b2d1688ca9a5964c8355b1ad6dfc3639c0d (patch)
tree	cd0863718deca07d78591e12befc1c7aa53daf73 /README
parent	86e8b9635a18c3c85f933b97d0da15a1638fe408 (diff)