summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README30
-rw-r--r--debian/changelog3
-rw-r--r--man/man1/html2markdown.141
-rw-r--r--man/man1/markdown2pdf.119
4 files changed, 45 insertions, 48 deletions
diff --git a/README b/README
index de1efc1bc..f95a93758 100644
--- a/README
+++ b/README
@@ -176,20 +176,32 @@ may be used in Windows under Cygwin.)
markdown2pdf -o "My Book.pdf" chap1.txt chap2.txt chap3.txt
If no input file is specified, input will be taken from STDIN.
+ All of `pandoc`'s options will work with `markdown2pdf` as well.
2. `html2markdown` grabs a web page from a file or URL and converts
it to markdown-formatted text, using `tidy` and `pandoc`.
- Unless input is from STDIN, an attempt is made to determine the
- character encoding of the page from the "Content-type" meta tag.
- If this is not present, UTF-8 is assumed. Alternatively, a character
- encoding may be specified explicitly using the `-e` option.
- `html2markdown` searches for an available program (`wget`, `curl`,
- or a text-mode browser) to fetch the contents of a URL.
- Optionally, the `-g` command may be used to specify the command
- to be used:
+ All of `pandoc`'s options will work with `html2markdown` as well.
+ In addition, the following special options may be used.
+ The special options must be separated from the `html2markdown`
+ command and any regular Pandoc options by the delimiter `--`:
- html2markdown -g 'wget --user=foo --password=bar' mysite.com
+ html2markdown -o out.txt -- -e latin1 -g curl google.com
+
+ The `-e` or `--encoding` option specifies the character encoding
+ of the HTML input. If this option is not specified, and input
+ is not from STDIN, `html2markdown` will attempt to determine the
+ page's character encoding from the "Content-type" meta tag.
+ If this is not present, UTF-8 is assumed.
+
+ The `-g` or `--grabber` option specifies the command to be used to
+ fetch the contents of a URL:
+
+ html2markdown -g 'curl --user foo:bar' www.mysite.com
+
+ If this option is not specified, `html2markdown` searches for an
+ available program (`wget`, `curl`, or a text-mode browser) to fetch
+ the contents of a URL.
3. `hsmarkdown` is designed to be used as a drop-in replacement for
`Markdown.pl`. It forces `pandoc` to convert from markdown to
diff --git a/debian/changelog b/debian/changelog
index a06c40579..8ce9acc47 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -210,9 +210,6 @@ pandoc (0.3) unstable; urgency=low
+ getopts shell builtin is used for portable option parsing.
+ Improved html2markdown's web grabber code, making it more robust,
configurable and verbose. Added '-e', '-g' options.
- Possible use case:
- # Use wget by setting timeout to 10 seconds and limit retries to 2.
- html2markdown -g 'wget --timeout=10 --tries=2'
-- Recai Oktaş <roktas@debian.org> Fri, 05 Jan 2007 09:41:19 +0200
diff --git a/man/man1/html2markdown.1 b/man/man1/html2markdown.1
index 542d26852..78c27808e 100644
--- a/man/man1/html2markdown.1
+++ b/man/man1/html2markdown.1
@@ -2,7 +2,8 @@
.SH NAME
html2markdown \- converts HTML to markdown-formatted text
.SH SYNOPSIS
-\fBhtml2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
+\fBhtml2markdown\fR [\fIpandoc\-options\fR]
+[\-\- \fIspecial\-options\fR] [\fIinput\-file\fR or \fIURL\fR]
.SH DESCRIPTION
\fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
from STDIN) from HTML to markdown\-formatted plain text.
@@ -14,10 +15,12 @@ option.
\fBhtml2markdown\fR uses the character encoding specified in the
"Content-type" meta tag. If this is not present, or if input comes
from STDIN, UTF-8 is assumed. A character encoding may be specified
-explicitly using the \fB\-e\fR option.
-.PP
-\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR.
+explicitly using the \fB\-e\fR special option.
.SH OPTIONS
+.PP
+\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR, so all of
+\fBpandoc\fR's options may be used. See \fBpandoc\fR(1) for
+a complete list. The following options are most relevant:
.TP
.B \-s, \-\-standalone
Include title, author, and date information (if present) at the
@@ -26,12 +29,6 @@ top of markdown output.
.B \-o FILE, \-\-output=FILE
Write output to \fIFILE\fR instead of STDOUT.
.TP
-.B \-p, \-\-preserve-tabs
-Preserve tabs instead of converting them to spaces.
-.TP
-.B \-\-tab-stop=\fITABSTOP\fB
-Specify tab stop (default is 4).
-.TP
.B \-\-strict
Use strict markdown syntax, with no extensions or variants.
.TP
@@ -54,29 +51,29 @@ Use contents of \fIFILE\fR
as the document header (overriding the default header, which can be
printed using '\fBpandoc \-D markdown\fR'). Implies
\fB-s\fR.
+.SH "SPECIAL OPTIONS"
+.PP
+In addition, the following special options may be used. The special
+options must be separated from the \fBhtml2markdown\fR command and any
+regular \fBpandoc\fR options by the delimiter `\-\-', as in
+.IP
+.B html2markdown \-o foo.txt \-\- \-g 'curl \-u bar:baz' \-e latin1
+.B www.foo.com
.TP
-.B \-v, \-\-version
-Print version.
-.TP
-.B \-h, \-\-help
-Show usage message.
-.TP
-.B \-e \fIencoding\fR
+.B \-e \fIencoding\fR, \-\-encoding=\fIencoding\fR
Assume the character encoding \fIencoding\fR in reading HTML.
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
available encodings may be obtained using `\fBiconv \-l\fR'.)
-If the \fB\-e\fR option is not specified and input is not from
+If this option is not specified and input is not from
STDIN, \fBhtml2markdown\fR will try to extract the character encoding
from the "Content-type" meta tag. If no character encoding is
specified in this way, or if input is from STDIN, UTF-8 will be
assumed.
.TP
-.B \-g \fIcommand\fR
+.B \-g \fIcommand\fR, \-\-grabber=\fIcommand\fR
Use \fIcommand\fR to fetch the contents of a URL. (By default,
\fBhtml2markdown\fR searches for an available program or text-based
-browser to fetch the contents of a URL.) For example:
-.IP
-html2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com
+browser to fetch the contents of a URL.)
.SH "SEE ALSO"
\fBpandoc\fR(1),
diff --git a/man/man1/markdown2pdf.1 b/man/man1/markdown2pdf.1
index 4524c0ac2..3162742bb 100644
--- a/man/man1/markdown2pdf.1
+++ b/man/man1/markdown2pdf.1
@@ -23,19 +23,16 @@ output through \fBiconv\fR:
\fBmarkdown2pdf\fR assumes that the 'unicode' and 'fancyvrb' packages
are in latex's search path. If these packages are not included in your
latex setup, they can be obtained from <http://ctan.org>.
-.PP
-\fBmarkdown2pdf\fR is a wrapper around \fBpandoc\fR.
.SH OPTIONS
+.PP
+\fBmarkdown2pdf\fR is a wrapper around \fBpandoc\fR, so all of
+\fBpandoc\fR's options can be used with \fBmarkdown2pdf\fR as well.
+See \fBpandoc\fR(1) for a complete list.
+The following options are most relevant:
.TP
.B \-o FILE, \-\-output=FILE
Write output to \fIFILE\fR.
.TP
-.B \-p, \-\-preserve-tabs
-Preserve tabs instead of converting them to spaces.
-.TP
-.B \-\-tab-stop=\fITABSTOP\fB
-Specify tab stop (default is 4).
-.TP
.B \-\-strict
Use strict markdown syntax, with no extensions or variants.
.TP
@@ -57,12 +54,6 @@ Include (LaTeX) contents of \fIFILE\fR at the end of the document body.
Use contents of \fIFILE\fR
as the LaTeX document header (overriding the default header, which can be
printed using '\fBpandoc \-D latex\fR'). Implies \fB-s\fR.
-.TP
-.B \-v, \-\-version
-Print version.
-.TP
-.B \-h, \-\-help
-Show usage message.
.SH "SEE ALSO"
\fBpandoc\fR(1),
\fBpdflatex\fR(1)