diff options
authorfiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>2010-02-02 07:37:01 +0000
committerfiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>2010-02-02 07:37:01 +0000
commit9fee73d2a335e7ea8dbbfc149cfa4be580afbdca (patch)
parent19b0c72dd18050a00dd77bb3dfddd0d0702d157f (diff)
Allow absolute URI as parameter (in this case, content is downloaded).
+ Adds dependency on HTTP. + If a parameter is an absolute URI, pandoc will try to get the content via HTTP. + So, you can do: pandoc -r html -w markdown git-svn-id: 788f1e2b-df1e-0410-8736-df70ead52e1b
4 files changed, 28 insertions, 7 deletions
diff --git a/README b/README
index f06bdbb6a..34f3e455a 100644
--- a/README
+++ b/README
@@ -62,12 +62,17 @@ Note that you can specify multiple input files on the command line.
`pandoc` will concatenate them all (with blank lines between them)
before parsing:
- pandoc -s ch1.txt ch2.txt refs.txt > book.html
+ pandoc -s ch1.txt ch2.txt refs.txt > book.html
(The `-s` option here tells `pandoc` to produce a standalone HTML file,
with a proper header, rather than a fragment. For more details on this
and many other command-line options, see below.)
+Instead of a filename, you can specify an absolute URI. In this
+case pandoc will attempt to download the content via HTTP:
+ pandoc -f html -t markdown
The format of the input and output can be specified explicitly using
command-line options. The input format can be specified using the
`-r/--read` or `-f/--from` options, the output format using the
@@ -113,7 +118,9 @@ Character encodings
All input is assumed to be in the UTF-8 encoding, and all output
-is in UTF-8. If your local character encoding is not UTF-8 and you use
+is in UTF-8 (unless your version of pandoc was compiled using
+GHC 6.12 or higher, in which case the local encoding will be used).
+If your local character encoding is not UTF-8 and you use
accented or foreign characters, you should pipe the input and output
through [`iconv`]. For example,
diff --git a/man/man1/ b/man/man1/
index 6ade178b7..4c6be3faf 100644
--- a/man/man1/
+++ b/man/man1/
@@ -26,6 +26,11 @@ format). For output to a file, use the `-o` option:
pandoc -o output.html input.txt
+Instead of a file, an absolute URI may be given. In this case
+pandoc will fetch the content using HTTP:
+ pandoc -f html -t markdown
The input and output formats may be specified using command-line options
(see **OPTIONS**, below, for details). If these formats are not
specified explicitly, Pandoc will attempt to determine them
@@ -48,9 +53,10 @@ markdown: the differences are described in the *README* file in
the user documentation. If standard markdown syntax is desired, the
`--strict` option may be used.
-Pandoc uses the UTF-8 character encoding for both input and output.
-If your local character encoding is not UTF-8, you should pipe input
-and output through `iconv`:
+Pandoc uses the UTF-8 character encoding for both input and output
+(unless compiled with GHC 6.12 or higher, in which case it uses
+the local encoding). If your local character encoding is not UTF-8, you
+should pipe input and output through `iconv`:
iconv -t utf-8 input.txt | pandoc | iconv -f utf-8
diff --git a/pandoc.cabal b/pandoc.cabal
index eba38ca3d..7cc77e72c 100644
--- a/pandoc.cabal
+++ b/pandoc.cabal
@@ -145,7 +145,8 @@ Library
mtl >= 1.1, network >= 2, filepath >= 1.1,
process >= 1, directory >= 1,
bytestring >= 0.9, zip-archive >=,
- utf8-string >= 0.3, old-time >= 1
+ utf8-string >= 0.3, old-time >= 1,
+ HTTP >= 4000.0
if impl(ghc >= 6.10)
Build-depends: base >= 4 && < 5, syb
diff --git a/src/pandoc.hs b/src/pandoc.hs
index f64f218fe..9b05f054c 100644
--- a/src/pandoc.hs
+++ b/src/pandoc.hs
@@ -59,6 +59,9 @@ import Text.CSL
import Text.Pandoc.Biblio
import Control.Monad (when, unless)
+import Network.HTTP
+import Network.URI (parseURI)
+import Data.ByteString.Lazy.UTF8 (toString)
copyrightMessage :: String
copyrightMessage = "\nCopyright (C) 2006-8 John MacFarlane\n" ++
@@ -731,7 +734,11 @@ main = do
let readSources [] = mapM readSource ["-"]
readSources srcs = mapM readSource srcs
readSource "-" = getContents
- readSource src = readFile src
+ readSource src = case parseURI src of
+ Just u -> readURI u
+ Nothing -> readFile src
+ readURI uri = simpleHTTP (mkRequest GET uri) >>= getResponseBody >>=
+ return . toString -- treat all as UTF8
let convertTabs = tabFilter (if preserveTabs then 0 else tabStop)