summaryrefslogtreecommitdiff
path: root/man/man1/html2markdown.1
blob: 413feb115d9098271252cb227779e68f9808c94c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
.TH HTML2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
.SH NAME
html2markdown \- converts HTML to markdown-formatted text
.SH SYNOPSIS
\fBhtml2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
.SH DESCRIPTION
\fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
from STDIN) from HTML to markdown\-formatted plain text. 
If a URL is specified, \fBhtml2markdown\fR uses an available program
(e.g. wget, w3m, lynx or curl) to fetch its contents.  Output is sent
to STDOUT unless an output file is specified using the \fB\-o\fR
option.
.PP
\fBhtml2markdown\fR uses the character encoding specified in the
"Content-type" meta tag.  If this is not present, or if input comes
from STDIN, UTF-8 is assumed.  A character encoding may be specified
explicitly using the \fB\-e\fR option.
.PP
\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR.
.SH OPTIONS
.TP
.B \-s, \-\-standalone
Include title, author, and date information (if present) at the
top of markdown output.
.TP
.B \-o FILE, \-\-output=FILE
Write output to \fIFILE\fR instead of STDOUT. 
.TP
.B \-p, \-\-preserve-tabs
Preserve tabs instead of converting them to spaces.
.TP
.B \-\-tab-stop=\fITABSTOP\fB
Specify tab stop (default is 4).
.TP
.B \-R, \-\-parse-raw
Parse untranslatable HTML codes as raw HTML.
.TP
.B \-H \fIFILE\fB, \-\-include-in-header=\fIFILE\fB
Include contents of \fIFILE\fR at the end of the header.  Implies
\fB\-s\fR.
.TP
.B \-B \fIFILE\fB, \-\-include-before-body=\fIFILE\fB
Include contents of \fIFILE\fR at the beginning of the document body.
.TP
.B \-A \fIFILE\fB, \-\-include-after-body=\fIFILE\fB
Include contents of \fIFILE\fR at the end of the document body.
.TP
.B \-C \fIFILE\fB, \-\-custom-header=\fIFILE\fB
Use contents of \fIFILE\fR
as the document header (overriding the default header, which can be
printed using '\fBpandoc \-D markdown\fR').  Implies
\fB-s\fR.
.TP
.B \-v, \-\-version
Print version.
.TP
.B \-h, \-\-help
Show usage message.
.TP
.B \-e \fIencoding\fR
Assume the character encoding \fIencoding\fR in reading HTML.
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
available encodings may be obtained using `\fBiconv \-l\fR'.)
If the \fB\-e\fR option is not specified and input is not from
STDIN, \fBhtml2markdown\fR will try to extract the character encoding
from the "Content-type" meta tag.  If no character encoding is
specified in this way, or if input is from STDIN, UTF-8 will be
assumed.
.TP
.B \-g \fIcommand\fR
Use \fIcommand\fR to fetch the contents of a URL.  (By default,
\fBhtml2markdown\fR searches for an available program or text-based
browser to fetch the contents of a URL.)  For example:
.IP
html2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com

.SH "SEE ALSO"
\fBpandoc\fR(1),
\fBiconv\fR(1)
.SH AUTHOR
John MacFarlane and Recai Oktas