From 9fee73d2a335e7ea8dbbfc149cfa4be580afbdca Mon Sep 17 00:00:00 2001 From: fiddlosopher Date: Tue, 2 Feb 2010 07:37:01 +0000 Subject: Allow absolute URI as parameter (in this case, content is downloaded). + Adds dependency on HTTP. + If a parameter is an absolute URI, pandoc will try to get the content via HTTP. + So, you can do: pandoc -r html -w markdown http://www.fsf.org git-svn-id: https://pandoc.googlecode.com/svn/trunk@1826 788f1e2b-df1e-0410-8736-df70ead52e1b --- README | 11 +++++++++-- man/man1/pandoc.1.md | 12 +++++++++--- pandoc.cabal | 3 ++- src/pandoc.hs | 9 ++++++++- 4 files changed, 28 insertions(+), 7 deletions(-) diff --git a/README b/README index f06bdbb6a..34f3e455a 100644 --- a/README +++ b/README @@ -62,12 +62,17 @@ Note that you can specify multiple input files on the command line. `pandoc` will concatenate them all (with blank lines between them) before parsing: - pandoc -s ch1.txt ch2.txt refs.txt > book.html + pandoc -s ch1.txt ch2.txt refs.txt > book.html (The `-s` option here tells `pandoc` to produce a standalone HTML file, with a proper header, rather than a fragment. For more details on this and many other command-line options, see below.) +Instead of a filename, you can specify an absolute URI. In this +case pandoc will attempt to download the content via HTTP: + + pandoc -f html -t markdown http://www.fsf.org + The format of the input and output can be specified explicitly using command-line options. The input format can be specified using the `-r/--read` or `-f/--from` options, the output format using the @@ -113,7 +118,9 @@ Character encodings ------------------- All input is assumed to be in the UTF-8 encoding, and all output -is in UTF-8. If your local character encoding is not UTF-8 and you use +is in UTF-8 (unless your version of pandoc was compiled using +GHC 6.12 or higher, in which case the local encoding will be used). +If your local character encoding is not UTF-8 and you use accented or foreign characters, you should pipe the input and output through [`iconv`]. For example, diff --git a/man/man1/pandoc.1.md b/man/man1/pandoc.1.md index 6ade178b7..4c6be3faf 100644 --- a/man/man1/pandoc.1.md +++ b/man/man1/pandoc.1.md @@ -26,6 +26,11 @@ format). For output to a file, use the `-o` option: pandoc -o output.html input.txt +Instead of a file, an absolute URI may be given. In this case +pandoc will fetch the content using HTTP: + + pandoc -f html -t markdown http://www.fsf.org + The input and output formats may be specified using command-line options (see **OPTIONS**, below, for details). If these formats are not specified explicitly, Pandoc will attempt to determine them @@ -48,9 +53,10 @@ markdown: the differences are described in the *README* file in the user documentation. If standard markdown syntax is desired, the `--strict` option may be used. -Pandoc uses the UTF-8 character encoding for both input and output. -If your local character encoding is not UTF-8, you should pipe input -and output through `iconv`: +Pandoc uses the UTF-8 character encoding for both input and output +(unless compiled with GHC 6.12 or higher, in which case it uses +the local encoding). If your local character encoding is not UTF-8, you +should pipe input and output through `iconv`: iconv -t utf-8 input.txt | pandoc | iconv -f utf-8 diff --git a/pandoc.cabal b/pandoc.cabal index eba38ca3d..7cc77e72c 100644 --- a/pandoc.cabal +++ b/pandoc.cabal @@ -145,7 +145,8 @@ Library mtl >= 1.1, network >= 2, filepath >= 1.1, process >= 1, directory >= 1, bytestring >= 0.9, zip-archive >= 0.1.1.4, - utf8-string >= 0.3, old-time >= 1 + utf8-string >= 0.3, old-time >= 1, + HTTP >= 4000.0 if impl(ghc >= 6.10) Build-depends: base >= 4 && < 5, syb else diff --git a/src/pandoc.hs b/src/pandoc.hs index f64f218fe..9b05f054c 100644 --- a/src/pandoc.hs +++ b/src/pandoc.hs @@ -59,6 +59,9 @@ import Text.CSL import Text.Pandoc.Biblio #endif import Control.Monad (when, unless) +import Network.HTTP +import Network.URI (parseURI) +import Data.ByteString.Lazy.UTF8 (toString) copyrightMessage :: String copyrightMessage = "\nCopyright (C) 2006-8 John MacFarlane\n" ++ @@ -731,7 +734,11 @@ main = do let readSources [] = mapM readSource ["-"] readSources srcs = mapM readSource srcs readSource "-" = getContents - readSource src = readFile src + readSource src = case parseURI src of + Just u -> readURI u + Nothing -> readFile src + readURI uri = simpleHTTP (mkRequest GET uri) >>= getResponseBody >>= + return . toString -- treat all as UTF8 let convertTabs = tabFilter (if preserveTabs then 0 else tabStop) -- cgit v1.2.3