summaryrefslogtreecommitdiff
path: root/man
diff options
context:
space:
mode:
authorfiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>2008-01-03 21:32:32 +0000
committerfiddlosopher <fiddlosopher@788f1e2b-df1e-0410-8736-df70ead52e1b>2008-01-03 21:32:32 +0000
commit5df912b162575cb9daf6702bb7f2c2a5858c0b00 (patch)
treeb7e165f47e19839fe30ddcd8250f4fb2e89e4ebb /man
parenta505f70f0b33b7fa52ad4d8df77ebff090b36d01 (diff)
Added optional HTML sanitization using a whitelist.
When this option is specified (--sanitize-html on the command line), unsafe HTML tags will be replaced by HTML comments, and unsafe HTML attributes will be removed. This option should be especially useful for those who want to use pandoc libraries in web applications, where users will provide the input. + Main.hs: Added --sanitize-html option. + Text.Pandoc.Shared: Added stateSanitizeHTML to ParserState. + Text.Pandoc.Readers.HTML: - Added whitelists of sanitaryTags and sanitaryAttributes. - Added parsers to check these lists (and state) to see if a given tag or attribute should be counted unsafe. - Modified anyHtmlTag and anyHtmlEndTag to replace unsafe tags with comments. - Modified htmlAttribute to remove unsafe attributes. - Modified htmlScript and htmlStyle to remove these elements if unsafe. - Modified rawHtmlBlock to use anyHtmlBlockTag instead of anyHtmlTag and anyHtmlEndTag. This fixes a bug in markdown parsing, where inline tags would be included in raw HTML blocks. - Modified anyHtmlBlockTag to test for (not inline) rather than directly for block. This allows us to handle e.g. docbook in the markdown reader. - Minor tweaks in nonTitleNonHead and parseTitle. + Text.Pandoc.Readers.Markdown: - In non-strict mode use rawHtmlBlocks instead of htmlBlock. Simplified htmlBlock, since we know it's only called in strict mode. + Modified README and man pages to document new option. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1166 788f1e2b-df1e-0410-8736-df70ead52e1b
Diffstat (limited to 'man')
-rw-r--r--man/man1/html2markdown.1.md4
-rw-r--r--man/man1/pandoc.1.md5
2 files changed, 9 insertions, 0 deletions
diff --git a/man/man1/html2markdown.1.md b/man/man1/html2markdown.1.md
index 6c5d2dcc8..19d5104af 100644
--- a/man/man1/html2markdown.1.md
+++ b/man/man1/html2markdown.1.md
@@ -51,6 +51,10 @@ a complete list. The following options are most relevant:
\--no-wrap
: Disable text wrapping in output. (Default is to wrap text.)
+\--sanitize-html
+: Sanitizes HTML using a whitelist. Unsafe tags are replaced by HTML
+ comments; unsafe attributes are omitted.
+
-H *FILE*, \--include-in-header=*FILE*
: Include contents of *FILE* at the end of the header. Implies
`-s`.
diff --git a/man/man1/pandoc.1.md b/man/man1/pandoc.1.md
index 37d3dc262..427004419 100644
--- a/man/man1/pandoc.1.md
+++ b/man/man1/pandoc.1.md
@@ -126,6 +126,11 @@ to Pandoc. Or use `html2markdown`(1), a wrapper around `pandoc`.
\--no-wrap
: Disable text wrapping in output. (Default is to wrap text.)
+\--sanitize-html
+: Sanitizes HTML (in markdown or HTML input) using a whitelist.
+ Unsafe tags are replaced by HTML comments; unsafe attributes
+ are omitted.
+
\--toc, \--table-of-contents
: Include an automatically generated table of contents (HTML, markdown,
RTF) or an instruction to create one (LaTeX, reStructuredText).