summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJohannes 'josch' Schauer <josch@debian.org>2018-07-20 07:21:40 +0200
committerJohannes 'josch' Schauer <josch@debian.org>2018-07-20 07:21:40 +0200
commitaa564ac57de87724808ae3c2c6baf92688d181cc (patch)
treeffa2a656b8a0935220ae295a95971e0d97576daf
parent20da8c12b9524ea6d405cf12df5a2f426b60b4c5 (diff)
Import upstream version 0.3.0
-rw-r--r--CHANGES.rst108
-rw-r--r--MANIFEST.in3
-rw-r--r--PKG-INFO106
-rw-r--r--README.md98
-rw-r--r--setup.cfg1
-rw-r--r--setup.py28
-rw-r--r--src/img2pdf.egg-info/PKG-INFO106
-rw-r--r--src/img2pdf.egg-info/SOURCES.txt8
-rw-r--r--src/img2pdf.egg-info/requires.txt3
-rwxr-xr-xsrc/img2pdf.py383
-rw-r--r--src/tests/__init__.py231
-rw-r--r--src/tests/input/CMYK.tifbin0 -> 22286 bytes
-rw-r--r--src/tests/input/animation.gifbin0 -> 1930 bytes
-rw-r--r--src/tests/input/gray.pngbin0 -> 814 bytes
-rw-r--r--src/tests/input/mono.tifbin0 -> 262 bytes
-rw-r--r--src/tests/input/normal.pngbin1130 -> 4992 bytes
-rw-r--r--src/tests/output/CMYK.jpg.pdfbin5560 -> 5558 bytes
-rw-r--r--src/tests/output/CMYK.tif.pdfbin1724 -> 1722 bytes
-rw-r--r--src/tests/output/animation.gif.pdfbin0 -> 6070 bytes
-rw-r--r--src/tests/output/gray.png.pdfbin0 -> 1329 bytes
-rw-r--r--src/tests/output/mono.png.pdfbin915 -> 958 bytes
-rw-r--r--src/tests/output/mono.tif.pdfbin0 -> 921 bytes
-rw-r--r--src/tests/output/normal.jpg.pdfbin3091 -> 3089 bytes
-rw-r--r--src/tests/output/normal.png.pdfbin1573 -> 1670 bytes
24 files changed, 762 insertions, 313 deletions
diff --git a/CHANGES.rst b/CHANGES.rst
new file mode 100644
index 0000000..d4476a8
--- /dev/null
+++ b/CHANGES.rst
@@ -0,0 +1,108 @@
+=======
+CHANGES
+=======
+
+0.3.0
+-----
+
+ - Store non-jpeg images using PNG compression
+ - Support arbitrarily large pages via PDF /UserUnit field
+ - Disallow input with alpha channel as it cannot be preserved
+ - Add option --pillow-limit-break to support very large input
+
+0.2.4
+-----
+
+ - Restore support for Python 2.7
+ - Add support for PyPy
+ - Add support for testing using tox
+
+0.2.3
+-----
+
+ - version number bump for botched pypi upload...
+
+0.2.2
+-----
+
+ - automatic monochrome CCITT Group4 encoding via Pillow/libtiff
+
+0.2.1
+-----
+
+ - set img2pdf as /producer value
+ - support multi-frame images like multipage TIFF and animated GIF
+ - support for palette images like GIF
+ - support all colorspaces and imageformats knows by PIL
+ - read horizontal and vertical dpi from JPEG2000 files
+
+0.2.0
+-----
+
+ - now Python3 only
+ - pep8 compliant code
+ - update my email to josch@mister-muffin.de
+ - move from github to gitlab.mister-muffin.de/josch/img2pdf
+ - use logging module
+ - add extensive test suite
+ - ability to read from standard input
+ - pdf writer:
+ - make more compatible with the interface of pdfrw module
+ - print floats which equal to their integer conversion as integer
+ - do not print trailing zeroes for floating point numbers
+ - print more linebreaks
+ - add binary string at beginning of PDF to indicate that the PDF
+ contains binary data
+ - handle datetime and unicode strings by using utf-16-be encoding
+ - new options (see --help for more details):
+ - --without-pdfrw
+ - --imgsize
+ - --border
+ - --fit
+ - --auto-orient
+ - --viewer-panes
+ - --viewer-initial-page
+ - --viewer-magnification
+ - --viewer-page-layout
+ - --viewer-fit-window
+ - --viewer-center-window
+ - --viewer-fullscreen
+ - remove short options for metadata command line arguments
+ - correctly encode and escape non-ascii metadata
+ - explicitly store date in UTC and allow parsing all date formats understood
+ by dateutil and `date --date`
+
+0.1.5
+-----
+
+- Enable support for CMYK images
+- Rework test suite
+- support file objects as input
+
+0.1.4
+-----
+
+- add Python 3 support
+- make output reproducible by sorting and --nodate option
+
+0.1.3
+-----
+
+- Avoid leaking file descriptors
+- Convert unrecognized colorspaces to RGB
+
+0.1.1
+-----
+
+- allow running src/img2pdf.py standalone
+- license change from GPL to LGPL
+- Add pillow 2.4.0 support
+- add options to specify pdf dimensions in points
+
+0.1.0 (unreleased)
+------------------
+
+- Initial PyPI release.
+- Modified code to create proper package.
+- Added tests.
+- Added console script entry point.
diff --git a/MANIFEST.in b/MANIFEST.in
index 534bab3..4ee2b37 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1,6 +1,9 @@
include README.md
include test_comp.sh
+include CHANGES.rst
recursive-include src *.jpg
recursive-include src *.pdf
recursive-include src *.png
+recursive-include src *.tif
+recursive-include src *.gif
recursive-include src *.py
diff --git a/PKG-INFO b/PKG-INFO
index 870fa2d..e3ecf4b 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,43 +1,46 @@
Metadata-Version: 1.1
Name: img2pdf
-Version: 0.2.3
+Version: 0.3.0
Summary: Convert images to PDF via direct JPEG inclusion.
Home-page: https://gitlab.mister-muffin.de/josch/img2pdf
Author: Johannes 'josch' Schauer
Author-email: josch@mister-muffin.de
License: LGPL
-Download-URL: https://gitlab.mister-muffin.de/josch/img2pdf/repository/archive.tar.gz?ref=0.2.3
+Download-URL: https://gitlab.mister-muffin.de/josch/img2pdf/repository/archive.tar.gz?ref=0.3.0
+Description-Content-Type: UNKNOWN
Description: img2pdf
=======
Losslessly convert raster images to PDF. The file size will not unnecessarily
- increase. One major application would be a number of scans made in JPEG format
- which should now become part of a single PDF document. Existing solutions
- would either re-encode the input JPEG files (leading to quality loss) or store
- them in the zip/flate format which results into the PDF becoming unnecessarily
- large in terms of its file size.
+ increase. It can for example be used to create a PDF document from a number of
+ scans that are only available in JPEG format. Existing solutions would either
+ re-encode the input JPEG files (leading to quality loss) or store them in the
+ zip/flate format which results into the PDF becoming unnecessarily large in
+ terms of its file size.
Background
----------
- Quality loss can be avoided when converting JPEG and JPEG2000 images to PDF by
- embedding them without re-encoding. I wrote this piece of python code.
- because I was missing a tool to do this automatically. Img2pdf basically just
- wraps JPEG images into the PDF container as they are.
+ Quality loss can be avoided when converting PNG, JPEG and JPEG2000 images to
+ PDF by embedding them into the PDF without re-encoding them. This is what
+ img2pdf does. It thus treats the PDF format merely as a container format for
+ storing one or more JPEGs or PNGs without re-encoding the images themselves.
- If you know an existing tool which allows one to embed JPEG and JPEG2000 images
- into a PDF container without recompression, please contact me so that I can put
- this code into the garbage bin.
+ If you know an existing tool which allows one to embed PNG, JPEG and JPEG2000
+ images into a PDF container without recompression, please contact me so that I
+ can put this code into the garbage bin.
Functionality
-------------
- This program will take a list of images and produce a PDF file with the images
- embedded in it. JPEG and JPEG2000 images will be included without
- recompression. Raster images in other formats will be included with zip/flate
- encoding which usually leads to an increase in the resulting size because
- formats like png compress better than PDF which just zip/flate compresses the
- RGB data. As a result, this tool is able to losslessly wrap images into a PDF
+ This program will take a list of raster images and produce a PDF file with the
+ images embedded in it. PNG, JPEG and JPEG2000 images will be included without
+ recompression and the resulting PDF will only be slightly larger than the input
+ images due to the overhead of the PDF container. Raster images in other
+ formats (like gif or tif) will be included using the lossless zip/flate
+ encoding using the PNG Paeth predictor.
+
+ As a result, this tool is able to losslessly wrap raster images into a PDF
container with a quality to filesize ratio that is typically better (in case of
JPEG and JPEG2000 images) or equal (in case of other formats) than that of
existing tools.
@@ -61,13 +64,17 @@ Description: img2pdf
However, this approach will result in PDF files that are a few times larger
than the input JPEG or JPEG2000 file.
- img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
+ Furthermore, when converting PNG images, popular tools like imagemagick use
+ flate encoding without a predictor. This means, that image file size ends up
+ being several orders of magnitude larger then necessary.
+
+ img2pdf is able to losslessly embed PNG, JPEG and JPEG2000 files into a PDF
container without additional overhead (aside from the PDF structure itself),
save other graphics formats using lossless zip compression, and produce
multi-page PDF files when more than one input image is given.
- Also, since JPEG and JPEG2000 images are not reencoded, conversion with img2pdf
- is several times faster than with other tools.
+ Also, since PNG, JPEG and JPEG2000 images are not reencoded, conversion with
+ img2pdf is several times faster than with other tools.
Usage
-----
@@ -76,7 +83,9 @@ Description: img2pdf
descriptor.
If no output file is specified with the `-o`/`--output` option, output will be
- done to stdout.
+ done to stdout. A typical invocation is:
+
+ img2pdf img1.png img2.jpg -o out.pdf
The detailed documentation can be accessed by running:
@@ -89,14 +98,6 @@ Description: img2pdf
If you find a JPEG or JPEG2000 file that, when embedded cannot be read
by the Adobe Acrobat Reader, please contact me.
- For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
- encoding is used. This choice is based on tests I did with a number of images.
- I converted them into PDF using the lossless variants of the compression
- formats offered by imagemagick. In all my tests, zip/flate encoding performed
- best. You can verify my findings using the test_comp.sh script with any input
- image given as a commandline argument. If you find an input file that is
- outperformed by another lossless compression method, contact me.
-
I have not yet figured out how to determine the colorspace of JPEG2000 files.
Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
other colorspaces, you must explicitly specify it using the `--colorspace`
@@ -123,19 +124,19 @@ Description: img2pdf
You can then install the package using:
- $ pip install img2pdf
+ $ pip3 install img2pdf
If you prefer to install from source code use:
$ cd img2pdf/
- $ pip install .
+ $ pip3 install .
To test the console script without installing the package on your system,
use virtualenv:
$ cd img2pdf/
$ virtualenv ve
- $ ve/bin/pip install .
+ $ ve/bin/pip3 install .
You can then test the converter using:
@@ -144,10 +145,36 @@ Description: img2pdf
The package can also be used as a library:
import img2pdf
- pdf_bytes = img2pdf.convert('test.jpg')
- file = open("name.pdf","wb")
- file.write(pdf_bytes)
+ # opening from filename
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg'))
+
+ # opening from file handle
+ with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+ f1.write(img2pdf.convert(f2))
+
+ # using in-memory image data
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert("\x89PNG...")
+
+ # multiple inputs (variant 1)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert("test1.jpg", "test2.png"))
+
+ # multiple inputs (variant 2)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(["test1.jpg", "test2.png"]))
+
+ # writing to file descriptor
+ with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+ img2pdf.convert(f2, outputstream=f1)
+
+ # specify paper size (A4)
+ a4inpt = (img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297))
+ layout_fun = img2pdf.get_layout_fun(a4inpt)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
Keywords: jpeg pdf converter
Platform: UNKNOWN
@@ -156,9 +183,12 @@ Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Other Audience
Classifier: Environment :: Console
Classifier: Programming Language :: Python
+Classifier: Programming Language :: Python :: 2
+Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: Implementation :: CPython
+Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
diff --git a/README.md b/README.md
index 27637d6..249abb8 100644
--- a/README.md
+++ b/README.md
@@ -2,33 +2,35 @@ img2pdf
=======
Losslessly convert raster images to PDF. The file size will not unnecessarily
-increase. One major application would be a number of scans made in JPEG format
-which should now become part of a single PDF document. Existing solutions
-would either re-encode the input JPEG files (leading to quality loss) or store
-them in the zip/flate format which results into the PDF becoming unnecessarily
-large in terms of its file size.
+increase. It can for example be used to create a PDF document from a number of
+scans that are only available in JPEG format. Existing solutions would either
+re-encode the input JPEG files (leading to quality loss) or store them in the
+zip/flate format which results into the PDF becoming unnecessarily large in
+terms of its file size.
Background
----------
-Quality loss can be avoided when converting JPEG and JPEG2000 images to PDF by
-embedding them without re-encoding. I wrote this piece of python code.
-because I was missing a tool to do this automatically. Img2pdf basically just
-wraps JPEG images into the PDF container as they are.
+Quality loss can be avoided when converting PNG, JPEG and JPEG2000 images to
+PDF by embedding them into the PDF without re-encoding them. This is what
+img2pdf does. It thus treats the PDF format merely as a container format for
+storing one or more JPEGs or PNGs without re-encoding the images themselves.
-If you know an existing tool which allows one to embed JPEG and JPEG2000 images
-into a PDF container without recompression, please contact me so that I can put
-this code into the garbage bin.
+If you know an existing tool which allows one to embed PNG, JPEG and JPEG2000
+images into a PDF container without recompression, please contact me so that I
+can put this code into the garbage bin.
Functionality
-------------
-This program will take a list of images and produce a PDF file with the images
-embedded in it. JPEG and JPEG2000 images will be included without
-recompression. Raster images in other formats will be included with zip/flate
-encoding which usually leads to an increase in the resulting size because
-formats like png compress better than PDF which just zip/flate compresses the
-RGB data. As a result, this tool is able to losslessly wrap images into a PDF
+This program will take a list of raster images and produce a PDF file with the
+images embedded in it. PNG, JPEG and JPEG2000 images will be included without
+recompression and the resulting PDF will only be slightly larger than the input
+images due to the overhead of the PDF container. Raster images in other
+formats (like gif or tif) will be included using the lossless zip/flate
+encoding using the PNG Paeth predictor.
+
+As a result, this tool is able to losslessly wrap raster images into a PDF
container with a quality to filesize ratio that is typically better (in case of
JPEG and JPEG2000 images) or equal (in case of other formats) than that of
existing tools.
@@ -52,13 +54,17 @@ imagemagick, one has to use zip compression:
However, this approach will result in PDF files that are a few times larger
than the input JPEG or JPEG2000 file.
-img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
+Furthermore, when converting PNG images, popular tools like imagemagick use
+flate encoding without a predictor. This means, that image file size ends up
+being several orders of magnitude larger then necessary.
+
+img2pdf is able to losslessly embed PNG, JPEG and JPEG2000 files into a PDF
container without additional overhead (aside from the PDF structure itself),
save other graphics formats using lossless zip compression, and produce
multi-page PDF files when more than one input image is given.
-Also, since JPEG and JPEG2000 images are not reencoded, conversion with img2pdf
-is several times faster than with other tools.
+Also, since PNG, JPEG and JPEG2000 images are not reencoded, conversion with
+img2pdf is several times faster than with other tools.
Usage
-----
@@ -67,7 +73,9 @@ The images must be provided as files because img2pdf needs to seek in the file
descriptor.
If no output file is specified with the `-o`/`--output` option, output will be
-done to stdout.
+done to stdout. A typical invocation is:
+
+ img2pdf img1.png img2.jpg -o out.pdf
The detailed documentation can be accessed by running:
@@ -80,14 +88,6 @@ Bugs
If you find a JPEG or JPEG2000 file that, when embedded cannot be read
by the Adobe Acrobat Reader, please contact me.
-For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
-encoding is used. This choice is based on tests I did with a number of images.
-I converted them into PDF using the lossless variants of the compression
-formats offered by imagemagick. In all my tests, zip/flate encoding performed
-best. You can verify my findings using the test_comp.sh script with any input
-image given as a commandline argument. If you find an input file that is
-outperformed by another lossless compression method, contact me.
-
I have not yet figured out how to determine the colorspace of JPEG2000 files.
Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
other colorspaces, you must explicitly specify it using the `--colorspace`
@@ -114,19 +114,19 @@ with the following command:
You can then install the package using:
- $ pip install img2pdf
+ $ pip3 install img2pdf
If you prefer to install from source code use:
$ cd img2pdf/
- $ pip install .
+ $ pip3 install .
To test the console script without installing the package on your system,
use virtualenv:
$ cd img2pdf/
$ virtualenv ve
- $ ve/bin/pip install .
+ $ ve/bin/pip3 install .
You can then test the converter using:
@@ -135,7 +135,33 @@ You can then test the converter using:
The package can also be used as a library:
import img2pdf
- pdf_bytes = img2pdf.convert('test.jpg')
- file = open("name.pdf","wb")
- file.write(pdf_bytes)
+ # opening from filename
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg'))
+
+ # opening from file handle
+ with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+ f1.write(img2pdf.convert(f2))
+
+ # using in-memory image data
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert("\x89PNG...")
+
+ # multiple inputs (variant 1)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert("test1.jpg", "test2.png"))
+
+ # multiple inputs (variant 2)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(["test1.jpg", "test2.png"]))
+
+ # writing to file descriptor
+ with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+ img2pdf.convert(f2, outputstream=f1)
+
+ # specify paper size (A4)
+ a4inpt = (img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297))
+ layout_fun = img2pdf.get_layout_fun(a4inpt)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
diff --git a/setup.cfg b/setup.cfg
index 8c9157d..9f88734 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -4,5 +4,4 @@ description-file = README.md
[egg_info]
tag_build =
tag_date = 0
-tag_svn_revision = 0
diff --git a/setup.py b/setup.py
index 874380c..56e9c4c 100644
--- a/setup.py
+++ b/setup.py
@@ -1,6 +1,21 @@
+import sys
from setuptools import setup
-VERSION = "0.2.3"
+PY3 = sys.version_info[0] >= 3
+
+VERSION = "0.3.0"
+
+INSTALL_REQUIRES = (
+ 'Pillow',
+)
+
+TESTS_REQUIRE = (
+ 'pdfrw',
+)
+
+if not PY3:
+ INSTALL_REQUIRES += ('enum34',)
+
setup(
name='img2pdf',
@@ -17,9 +32,12 @@ setup(
'Intended Audience :: Other Audience',
'Environment :: Console',
'Programming Language :: Python',
+ 'Programming Language :: Python :: 2',
+ 'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: Implementation :: CPython',
+ "Programming Language :: Python :: Implementation :: PyPy",
'License :: OSI Approved :: GNU Lesser General Public License v3 '
'(LGPLv3)',
'Natural Language :: English',
@@ -32,9 +50,11 @@ setup(
include_package_data=True,
test_suite='tests.test_suite',
zip_safe=True,
- install_requires=(
- 'Pillow',
- ),
+ install_requires=INSTALL_REQUIRES,
+ tests_requires=TESTS_REQUIRE,
+ extras_require={
+ 'test': TESTS_REQUIRE,
+ },
entry_points='''
[console_scripts]
img2pdf = img2pdf:main
diff --git a/src/img2pdf.egg-info/PKG-INFO b/src/img2pdf.egg-info/PKG-INFO
index 870fa2d..e3ecf4b 100644
--- a/src/img2pdf.egg-info/PKG-INFO
+++ b/src/img2pdf.egg-info/PKG-INFO
@@ -1,43 +1,46 @@
Metadata-Version: 1.1
Name: img2pdf
-Version: 0.2.3
+Version: 0.3.0
Summary: Convert images to PDF via direct JPEG inclusion.
Home-page: https://gitlab.mister-muffin.de/josch/img2pdf
Author: Johannes 'josch' Schauer
Author-email: josch@mister-muffin.de
License: LGPL
-Download-URL: https://gitlab.mister-muffin.de/josch/img2pdf/repository/archive.tar.gz?ref=0.2.3
+Download-URL: https://gitlab.mister-muffin.de/josch/img2pdf/repository/archive.tar.gz?ref=0.3.0
+Description-Content-Type: UNKNOWN
Description: img2pdf
=======
Losslessly convert raster images to PDF. The file size will not unnecessarily
- increase. One major application would be a number of scans made in JPEG format
- which should now become part of a single PDF document. Existing solutions
- would either re-encode the input JPEG files (leading to quality loss) or store
- them in the zip/flate format which results into the PDF becoming unnecessarily
- large in terms of its file size.
+ increase. It can for example be used to create a PDF document from a number of
+ scans that are only available in JPEG format. Existing solutions would either
+ re-encode the input JPEG files (leading to quality loss) or store them in the
+ zip/flate format which results into the PDF becoming unnecessarily large in
+ terms of its file size.
Background
----------
- Quality loss can be avoided when converting JPEG and JPEG2000 images to PDF by
- embedding them without re-encoding. I wrote this piece of python code.
- because I was missing a tool to do this automatically. Img2pdf basically just
- wraps JPEG images into the PDF container as they are.
+ Quality loss can be avoided when converting PNG, JPEG and JPEG2000 images to
+ PDF by embedding them into the PDF without re-encoding them. This is what
+ img2pdf does. It thus treats the PDF format merely as a container format for
+ storing one or more JPEGs or PNGs without re-encoding the images themselves.
- If you know an existing tool which allows one to embed JPEG and JPEG2000 images
- into a PDF container without recompression, please contact me so that I can put
- this code into the garbage bin.
+ If you know an existing tool which allows one to embed PNG, JPEG and JPEG2000
+ images into a PDF container without recompression, please contact me so that I
+ can put this code into the garbage bin.
Functionality
-------------
- This program will take a list of images and produce a PDF file with the images
- embedded in it. JPEG and JPEG2000 images will be included without
- recompression. Raster images in other formats will be included with zip/flate
- encoding which usually leads to an increase in the resulting size because
- formats like png compress better than PDF which just zip/flate compresses the
- RGB data. As a result, this tool is able to losslessly wrap images into a PDF
+ This program will take a list of raster images and produce a PDF file with the
+ images embedded in it. PNG, JPEG and JPEG2000 images will be included without
+ recompression and the resulting PDF will only be slightly larger than the input
+ images due to the overhead of the PDF container. Raster images in other
+ formats (like gif or tif) will be included using the lossless zip/flate
+ encoding using the PNG Paeth predictor.
+
+ As a result, this tool is able to losslessly wrap raster images into a PDF
container with a quality to filesize ratio that is typically better (in case of
JPEG and JPEG2000 images) or equal (in case of other formats) than that of
existing tools.
@@ -61,13 +64,17 @@ Description: img2pdf
However, this approach will result in PDF files that are a few times larger
than the input JPEG or JPEG2000 file.
- img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
+ Furthermore, when converting PNG images, popular tools like imagemagick use
+ flate encoding without a predictor. This means, that image file size ends up
+ being several orders of magnitude larger then necessary.
+
+ img2pdf is able to losslessly embed PNG, JPEG and JPEG2000 files into a PDF
container without additional overhead (aside from the PDF structure itself),
save other graphics formats using lossless zip compression, and produce
multi-page PDF files when more than one input image is given.
- Also, since JPEG and JPEG2000 images are not reencoded, conversion with img2pdf
- is several times faster than with other tools.
+ Also, since PNG, JPEG and JPEG2000 images are not reencoded, conversion with
+ img2pdf is several times faster than with other tools.
Usage
-----
@@ -76,7 +83,9 @@ Description: img2pdf
descriptor.
If no output file is specified with the `-o`/`--output` option, output will be
- done to stdout.
+ done to stdout. A typical invocation is:
+
+ img2pdf img1.png img2.jpg -o out.pdf
The detailed documentation can be accessed by running:
@@ -89,14 +98,6 @@ Description: img2pdf
If you find a JPEG or JPEG2000 file that, when embedded cannot be read
by the Adobe Acrobat Reader, please contact me.
- For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
- encoding is used. This choice is based on tests I did with a number of images.
- I converted them into PDF using the lossless variants of the compression
- formats offered by imagemagick. In all my tests, zip/flate encoding performed
- best. You can verify my findings using the test_comp.sh script with any input
- image given as a commandline argument. If you find an input file that is
- outperformed by another lossless compression method, contact me.
-
I have not yet figured out how to determine the colorspace of JPEG2000 files.
Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
other colorspaces, you must explicitly specify it using the `--colorspace`
@@ -123,19 +124,19 @@ Description: img2pdf
You can then install the package using:
- $ pip install img2pdf
+ $ pip3 install img2pdf
If you prefer to install from source code use:
$ cd img2pdf/
- $ pip install .
+ $ pip3 install .
To test the console script without installing the package on your system,
use virtualenv:
$ cd img2pdf/
$ virtualenv ve
- $ ve/bin/pip install .
+ $ ve/bin/pip3 install .
You can then test the converter using:
@@ -144,10 +145,36 @@ Description: img2pdf
The package can also be used as a library:
import img2pdf
- pdf_bytes = img2pdf.convert('test.jpg')
- file = open("name.pdf","wb")
- file.write(pdf_bytes)
+ # opening from filename
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg'))
+
+ # opening from file handle
+ with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+ f1.write(img2pdf.convert(f2))
+
+ # using in-memory image data
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert("\x89PNG...")
+
+ # multiple inputs (variant 1)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert("test1.jpg", "test2.png"))
+
+ # multiple inputs (variant 2)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(["test1.jpg", "test2.png"]))
+
+ # writing to file descriptor
+ with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+ img2pdf.convert(f2, outputstream=f1)
+
+ # specify paper size (A4)
+ a4inpt = (img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297))
+ layout_fun = img2pdf.get_layout_fun(a4inpt)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
Keywords: jpeg pdf converter
Platform: UNKNOWN
@@ -156,9 +183,12 @@ Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Other Audience
Classifier: Environment :: Console
Classifier: Programming Language :: Python
+Classifier: Programming Language :: Python :: 2
+Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: Implementation :: CPython
+Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
diff --git a/src/img2pdf.egg-info/SOURCES.txt b/src/img2pdf.egg-info/SOURCES.txt
index add31f1..ae6e816 100644
--- a/src/img2pdf.egg-info/SOURCES.txt
+++ b/src/img2pdf.egg-info/SOURCES.txt
@@ -1,3 +1,4 @@
+CHANGES.rst
MANIFEST.in
README.md
setup.cfg
@@ -15,11 +16,18 @@ src/img2pdf.egg-info/top_level.txt
src/img2pdf.egg-info/zip-safe
src/tests/__init__.py
src/tests/input/CMYK.jpg
+src/tests/input/CMYK.tif
+src/tests/input/animation.gif
+src/tests/input/gray.png
src/tests/input/mono.png
+src/tests/input/mono.tif
src/tests/input/normal.jpg
src/tests/input/normal.png
src/tests/output/CMYK.jpg.pdf
src/tests/output/CMYK.tif.pdf
+src/tests/output/animation.gif.pdf
+src/tests/output/gray.png.pdf
src/tests/output/mono.png.pdf
+src/tests/output/mono.tif.pdf
src/tests/output/normal.jpg.pdf
src/tests/output/normal.png.pdf \ No newline at end of file
diff --git a/src/img2pdf.egg-info/requires.txt b/src/img2pdf.egg-info/requires.txt
index 7e2fba5..3a24589 100644
--- a/src/img2pdf.egg-info/requires.txt
+++ b/src/img2pdf.egg-info/requires.txt
@@ -1 +1,4 @@
Pillow
+
+[test]
+pdfrw
diff --git a/src/img2pdf.py b/src/img2pdf.py
index 20fe784..48ef964 100755
--- a/src/img2pdf.py
+++ b/src/img2pdf.py
@@ -1,4 +1,5 @@
#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
# Copyright (C) 2012-2014 Johannes 'josch' Schauer <j.schauer at email.de>
#
@@ -27,8 +28,11 @@ from jp2 import parsejp2
from enum import Enum
from io import BytesIO
import logging
+import struct
-__version__ = "0.2.3"
+PY3 = sys.version_info[0] >= 3
+
+__version__ = "0.3.0"
default_dpi = 96.0
papersizes = {
"letter": "8.5inx11in",
@@ -58,7 +62,7 @@ PageOrientation = Enum('PageOrientation', 'portrait landscape')
Colorspace = Enum('Colorspace', 'RGB L 1 CMYK CMYK;I RGBA P other')
-ImageFormat = Enum('ImageFormat', 'JPEG JPEG2000 CCITTGroup4 other')
+ImageFormat = Enum('ImageFormat', 'JPEG JPEG2000 CCITTGroup4 PNG other')
PageMode = Enum('PageMode', 'none outlines thumbs')
@@ -268,17 +272,33 @@ class MyPdfWriter():
self.addobj(page)
-class MyPdfString():
- @classmethod
- def encode(cls, string):
- try:
- string = string.encode('ascii')
- except UnicodeEncodeError:
- string = b"\xfe\xff"+string.encode("utf-16-be")
- string = string.replace(b'\\', b'\\\\')
- string = string.replace(b'(', b'\\(')
- string = string.replace(b')', b'\\)')
- return b'(' + string + b')'
+if PY3:
+ class MyPdfString():
+ @classmethod
+ def encode(cls, string, hextype=False):
+ if hextype:
+ return b'< ' + b' '.join(("%06x"%c).encode('ascii') for c in string) + b' >'
+ else:
+ try:
+ string = string.encode('ascii')
+ except UnicodeEncodeError:
+ string = b"\xfe\xff"+string.encode("utf-16-be")
+ string = string.replace(b'\\', b'\\\\')
+ string = string.replace(b'(', b'\\(')
+ string = string.replace(b')', b'\\)')
+ return b'(' + string + b')'
+else:
+ class MyPdfString(object):
+ @classmethod
+ def encode(cls, string, hextype=False):
+ if hextype:
+ return b'< ' + b' '.join(("%06x"%c).encode('ascii') for c in string) + b' >'
+ else:
+ # This mimics exactely to what pdfrw does.
+ string = string.replace(b'\\', b'\\\\')
+ string = string.replace(b'(', b'\\(')
+ string = string.replace(b')', b'\\)')
+ return b'(' + string + b')'
class pdfdoc(object):
@@ -354,14 +374,15 @@ class pdfdoc(object):
def add_imagepage(self, color, imgwidthpx, imgheightpx, imgformat, imgdata,
imgwidthpdf, imgheightpdf, imgxpdf, imgypdf, pagewidth,
- pageheight):
+ pageheight, userunit=None, palette=None):
if self.with_pdfrw:
- from pdfrw import PdfDict, PdfName, PdfObject
+ from pdfrw import PdfDict, PdfName, PdfObject, PdfString
from pdfrw.py23_diffs import convert_load
else:
PdfDict = MyPdfDict
PdfName = MyPdfName
PdfObject = MyPdfObject
+ PdfString = MyPdfString
convert_load = my_convert_load
if color == Colorspace['1'] or color == Colorspace.L:
@@ -370,21 +391,24 @@ class pdfdoc(object):
colorspace = PdfName.DeviceRGB
elif color == Colorspace.CMYK or color == Colorspace['CMYK;I']:
colorspace = PdfName.DeviceCMYK
+ elif color == Colorspace.P:
+ if self.with_pdfrw:
+ raise Exception("pdfrw does not support hex strings for palette image input, re-run with --without-pdfrw")
+ colorspace = [ PdfName.Indexed, PdfName.DeviceRGB, len(palette)-1, PdfString.encode(palette, hextype=True)]
else:
raise UnsupportedColorspaceError("unsupported color space: %s"
% color.name)
# either embed the whole jpeg or deflate the bitmap representation
- logging.debug(imgformat)
if imgformat is ImageFormat.JPEG:
- ofilter = [PdfName.DCTDecode]
+ ofilter = PdfName.DCTDecode
elif imgformat is ImageFormat.JPEG2000:
- ofilter = [PdfName.JPXDecode]
+ ofilter = PdfName.JPXDecode
self.writer.version = "1.5" # jpeg2000 needs pdf 1.5
elif imgformat is ImageFormat.CCITTGroup4:
ofilter = [PdfName.CCITTFaxDecode]
else:
- ofilter = [PdfName.FlateDecode]
+ ofilter = PdfName.FlateDecode
image = PdfDict(stream=convert_load(imgdata))
@@ -398,7 +422,17 @@ class pdfdoc(object):
if imgformat is ImageFormat.CCITTGroup4:
image[PdfName.BitsPerComponent] = 1
else:
- image[PdfName.BitsPerComponent] = 8
+ if color == Colorspace['1']:
+ image[PdfName.BitsPerComponent] = 1
+ elif color == Colorspace.P:
+ if len(palette) <= 2**1:
+ image[PdfName.BitsPerComponent] = 1
+ elif len(palette) <= 2**4:
+ image[PdfName.BitsPerComponent] = 4
+ else:
+ image[PdfName.BitsPerComponent] = 8
+ else:
+ image[PdfName.BitsPerComponent] = 8
if color == Colorspace['CMYK;I']:
# Inverts all four channels
@@ -411,6 +445,26 @@ class pdfdoc(object):
decodeparms[PdfName.Columns] = imgwidthpx
decodeparms[PdfName.Rows] = imgheightpx
image[PdfName.DecodeParms] = [decodeparms]
+ elif imgformat is ImageFormat.PNG:
+ decodeparms = PdfDict()
+ decodeparms[PdfName.Predictor] = 15
+ if color in [ Colorspace.P, Colorspace['1'], Colorspace.L ]:
+ decodeparms[PdfName.Colors] = 1
+ else:
+ decodeparms[PdfName.Colors] = 3
+ decodeparms[PdfName.Columns] = imgwidthpx
+ if color == Colorspace['1']:
+ decodeparms[PdfName.BitsPerComponent] = 1
+ elif color == Colorspace.P:
+ if len(palette) <= 2**1:
+ decodeparms[PdfName.BitsPerComponent] = 1
+ elif len(palette) <= 2**4:
+ decodeparms[PdfName.BitsPerComponent] = 4
+ else:
+ decodeparms[PdfName.BitsPerComponent] = 8
+ else:
+ decodeparms[PdfName.BitsPerComponent] = 8
+ image[PdfName.DecodeParms] = decodeparms
text = ("q\n%0.4f 0 0 %0.4f %0.4f %0.4f cm\n/Im0 Do\nQ" %
(imgwidthpdf, imgheightpdf, imgxpdf, imgypdf)).encode("ascii")
@@ -423,6 +477,11 @@ class pdfdoc(object):
page[PdfName.MediaBox] = [0, 0, pagewidth, pageheight]
page[PdfName.Resources] = resources
page[PdfName.Contents] = content
+ if userunit is not None:
+ # /UserUnit requires PDF 1.6
+ if self.writer.version < '1.6':
+ self.writer.version = '1.6'
+ page[PdfName.UserUnit] = userunit
self.writer.addpage(page)
@@ -582,6 +641,21 @@ def get_imgmetadata(imgdata, imgformat, default_dpi, colorspace, rawdata=None):
ndpi = (int(round(ndpi[0])), int(round(ndpi[1])))
ics = imgdata.mode
+ if ics in ["LA", "PA", "RGBA"]:
+ logging.warning("Image contains transparency which cannot be retained in PDF.")
+ logging.warning("img2pdf will not perform a lossy operation.")
+ logging.warning("You can remove the alpha channel using imagemagick:")
+ logging.warning(" $ convert input.png -background white -alpha remove -alpha off output.png")
+ raise Exception("Refusing to work on images with alpha channel")
+
+
+ # Since commit 07a96209597c5e8dfe785c757d7051ce67a980fb or release 4.1.0
+ # Pillow retrieves the DPI from EXIF if it cannot find the DPI in the JPEG
+ # header. In that case it can happen that the horizontal and vertical DPI
+ # are set to zero.
+ if ndpi == (0, 0):
+ ndpi = (default_dpi, default_dpi)
+
logging.debug("input dpi = %d x %d", *ndpi)
if colorspace:
@@ -621,7 +695,13 @@ def transcode_monochrome(imgdata):
# Convert the image to Group 4 in memory. If libtiff is not installed and
# Pillow is not compiled against it, .save() will raise an exception.
newimgio = BytesIO()
- imgdata.save(newimgio, format='TIFF', compression='group4')
+
+ # we create a whole new PIL image or otherwise it might happen with some
+ # input images, that libtiff fails an assert and the whole process is
+ # killed by a SIGABRT:
+ # https://gitlab.mister-muffin.de/josch/img2pdf/issues/46
+ im = Image.frombytes(imgdata.mode, imgdata.size, imgdata.tobytes())
+ im.save(newimgio, format='TIFF', compression='group4')
# Open new image in memory
newimgio.seek(0)
@@ -649,6 +729,25 @@ def transcode_monochrome(imgdata):
return ccittdata
+def parse_png(rawdata):
+ pngidat = b""
+ palette = []
+ i = 16
+ while i < len(rawdata):
+ # once we can require Python >= 3.2 we can use int.from_bytes() instead
+ n, = struct.unpack('>I', rawdata[i-8:i-4])
+ if i + n > len(rawdata):
+ raise Exception("invalid png: %d %d %d"%(i, n, len(rawdata)))
+ if rawdata[i-4:i] == b"IDAT":
+ pngidat += rawdata[i:i+n]
+ elif rawdata[i-4:i] == b"PLTE":
+ for j in range(i, i+n, 3):
+ # with int.from_bytes() we would not have to prepend extra zeroes
+ color, = struct.unpack('>I', b'\x00'+rawdata[j:j+3])
+ palette.append(color)
+ i += n
+ i += 12
+ return pngidat, palette
def read_images(rawdata, colorspace, first_frame_only=False):
im = BytesIO(rawdata)
@@ -658,7 +757,7 @@ def read_images(rawdata, colorspace, first_frame_only=False):
imgdata = Image.open(im)
except IOError as e:
# test if it is a jpeg2000 image
- if rawdata[:12] != "\x00\x00\x00\x0C\x6A\x50\x20\x20\x0D\x0A\x87\x0A":
+ if rawdata[:12] != b"\x00\x00\x00\x0C\x6A\x50\x20\x20\x0D\x0A\x87\x0A":
raise ImageOpenError("cannot read input image (not jpeg2000). "
"PIL: error reading image: %s" % e)
# image is jpeg2000
@@ -675,6 +774,8 @@ def read_images(rawdata, colorspace, first_frame_only=False):
# depending on the input format, determine whether to pass the raw
# image or the zlib compressed color information
+
+ # JPEG and JPEG2000 can be embedded into the PDF as-is
if imgformat == ImageFormat.JPEG or imgformat == ImageFormat.JPEG2000:
color, ndpi, imgwidthpx, imgheightpx = get_imgmetadata(
imgdata, imgformat, default_dpi, colorspace, rawdata)
@@ -685,81 +786,106 @@ def read_images(rawdata, colorspace, first_frame_only=False):
if color == Colorspace['RGBA']:
raise JpegColorspaceError("jpeg can't have an alpha channel")
im.close()
- return [(color, ndpi, imgformat, rawdata, imgwidthpx, imgheightpx)]
- else:
- result = []
- img_page_count = 0
- # loop through all frames of the image (example: multipage TIFF)
- while True:
- try:
- imgdata.seek(img_page_count)
- except EOFError:
- break
+ return [(color, ndpi, imgformat, rawdata, imgwidthpx, imgheightpx, [])]
+
+ # We can directly embed the IDAT chunk of PNG images if the PNG is not
+ # interlaced
+ #
+ # PIL does not provide the information whether a PNG was stored interlaced
+ # or not. Thus, we retrieve that info manually by looking at byte 13 in the
+ # IHDR chunk. We know where to find that in the file because the IHDR chunk
+ # must be the first chunk.
+ if imgformat == ImageFormat.PNG and rawdata[28] == 0:
+ color, ndpi, imgwidthpx, imgheightpx = get_imgmetadata(
+ imgdata, imgformat, default_dpi, colorspace, rawdata)
+ pngidat, palette = parse_png(rawdata)
+ return [(color, ndpi, imgformat, pngidat, imgwidthpx, imgheightpx, palette)]
- if first_frame_only and img_page_count > 0:
- break
+ # Everything else has to be encoded
- logging.debug("Converting frame: %d" % img_page_count)
+ result = []
+ img_page_count = 0
+ # loop through all frames of the image (example: multipage TIFF)
+ while True:
+ try:
+ imgdata.seek(img_page_count)
+ except EOFError:
+ break
- color, ndpi, imgwidthpx, imgheightpx = get_imgmetadata(
- imgdata, imgformat, default_dpi, colorspace)
+ if first_frame_only and img_page_count > 0:
+ break
- newimg = None
- if color == Colorspace['1']:
- try:
- ccittdata = transcode_monochrome(imgdata)
- imgformat = ImageFormat.CCITTGroup4
- result.append((color, ndpi, imgformat, ccittdata,
- imgwidthpx, imgheightpx))
- img_page_count += 1
- continue
- except Exception as e:
- logging.debug(e)
- logging.debug("Converting colorspace 1 to L")
- newimg = imgdata.convert('L')
- color = Colorspace.L
- elif color in [Colorspace.RGB, Colorspace.L, Colorspace.CMYK,
- Colorspace["CMYK;I"]]:
- logging.debug("Colorspace is OK: %s", color)
- newimg = imgdata
- elif color in [Colorspace.RGBA, Colorspace.P, Colorspace.other]:
- logging.debug("Converting colorspace %s to RGB", color)
- newimg = imgdata.convert('RGB')
- color = Colorspace.RGB
- else:
- raise ValueError("unknown colorspace: %s" % color.name)
+ logging.debug("Converting frame: %d" % img_page_count)
+
+ color, ndpi, imgwidthpx, imgheightpx = get_imgmetadata(
+ imgdata, imgformat, default_dpi, colorspace)
+
+ newimg = None
+ if color == Colorspace['1']:
+ try:
+ ccittdata = transcode_monochrome(imgdata)
+ imgformat = ImageFormat.CCITTGroup4
+ result.append((color, ndpi, imgformat, ccittdata,
+ imgwidthpx, imgheightpx, []))
+ img_page_count += 1
+ continue
+ except Exception as e:
+ logging.debug(e)
+ logging.debug("Converting colorspace 1 to L")
+ newimg = imgdata.convert('L')
+ color = Colorspace.L
+ elif color in [Colorspace.RGB, Colorspace.L, Colorspace.CMYK,
+ Colorspace["CMYK;I"], Colorspace.P]:
+ logging.debug("Colorspace is OK: %s", color)
+ newimg = imgdata
+ else:
+ raise ValueError("unknown or unsupported colorspace: %s" % color.name)
+ # the PNG format does not support CMYK, so we fall back to normal
+ # compression
+ if color in [Colorspace.CMYK, Colorspace["CMYK;I"]]:
imggz = zlib.compress(newimg.tobytes())
result.append((color, ndpi, imgformat, imggz, imgwidthpx,
- imgheightpx))
- img_page_count += 1
- # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the
- # close() method
- try:
- imgdata.close()
- except AttributeError:
- pass
- im.close()
- return result
+ imgheightpx, []))
+ else:
+ # cheapo version to retrieve a PNG encoding of the payload is to
+ # just save it with PIL. In the future this could be replaced by
+ # dedicated function applying the Paeth PNG filter to the raw pixel
+ pngbuffer = BytesIO()
+ newimg.save(pngbuffer, format="png")
+ pngidat, palette = parse_png(pngbuffer.getvalue())
+ imgformat = ImageFormat.PNG
+ result.append((color, ndpi, imgformat, pngidat, imgwidthpx,
+ imgheightpx, palette))
+ img_page_count += 1
+ # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the
+ # close() method
+ try:
+ imgdata.close()
+ except AttributeError:
+ pass
+ im.close()
+ return result
# converts a length in pixels to a length in PDF units (1/72 of an inch)
def px_to_pt(length, dpi):
- return 72*length/dpi
+ return 72.0*length/dpi
def cm_to_pt(length):
- return (72*length)/2.54
+ return (72.0*length)/2.54
def mm_to_pt(length):
- return (72*length)/25.4
+ return (72.0*length)/25.4
def in_to_pt(length):
- return 72*length
+ return 72.0*length
-def get_layout_fun(pagesize, imgsize, border, fit, auto_orient):
+def get_layout_fun(pagesize=None, imgsize=None, border=None, fit=None,
+ auto_orient=False):
def fitfun(fit, imgwidth, imgheight, fitwidth, fitheight):
if fitwidth is None and fitheight is None:
raise ValueError("fitwidth and fitheight cannot both be None")
@@ -970,6 +1096,17 @@ def get_fixed_dpi_layout_fun(fixed_dpi):
return fixed_dpi_layout_fun
+def find_scale(pagewidth, pageheight):
+ """Find the power of 10 (10, 100, 1000...) that will reduce the scale
+ below the PDF specification limit of 14400 PDF units (=200 inches)"""
+ from math import log10, ceil
+
+ major = max(pagewidth, pageheight)
+ oversized = major / 14400.0
+
+ return 10 ** ceil(log10(oversized))
+
+
# given one or more input image, depending on outputstream, either return a
# string containing the whole PDF if outputstream is None or write the PDF
# data to the given file-like object and return None
@@ -977,20 +1114,31 @@ def get_fixed_dpi_layout_fun(fixed_dpi):
# Input images can be given as file like objects (they must implement read()),
# as a binary string representing the image content or as filenames to the
# images.
-def convert(*images, title=None,
- author=None, creator=None, producer=None, creationdate=None,
- moddate=None, subject=None, keywords=None, colorspace=None,
- nodate=False, layout_fun=default_layout_fun, viewer_panes=None,
- viewer_initial_page=None, viewer_magnification=None,
- viewer_page_layout=None, viewer_fit_window=False,
- viewer_center_window=False, viewer_fullscreen=False,
- with_pdfrw=True, outputstream=None, first_frame_only=False):
-
- pdf = pdfdoc("1.3", title, author, creator, producer, creationdate,
- moddate, subject, keywords, nodate, viewer_panes,
- viewer_initial_page, viewer_magnification, viewer_page_layout,
- viewer_fit_window, viewer_center_window, viewer_fullscreen,
- with_pdfrw)
+def convert(*images, **kwargs):
+
+ _default_kwargs = dict(
+ title=None,
+ author=None, creator=None, producer=None, creationdate=None,
+ moddate=None, subject=None, keywords=None, colorspace=None,
+ nodate=False, layout_fun=default_layout_fun, viewer_panes=None,
+ viewer_initial_page=None, viewer_magnification=None,
+ viewer_page_layout=None, viewer_fit_window=False,
+ viewer_center_window=False, viewer_fullscreen=False,
+ with_pdfrw=True, outputstream=None, first_frame_only=False,
+ allow_oversized=True)
+ for kwname, default in _default_kwargs.items():
+ if kwname not in kwargs:
+ kwargs[kwname] = default
+
+ pdf = pdfdoc(
+ "1.3",
+ kwargs['title'], kwargs['author'], kwargs['creator'],
+ kwargs['producer'], kwargs['creationdate'], kwargs['moddate'],
+ kwargs['subject'], kwargs['keywords'], kwargs['nodate'],
+ kwargs['viewer_panes'], kwargs['viewer_initial_page'],
+ kwargs['viewer_magnification'], kwargs['viewer_page_layout'],
+ kwargs['viewer_fit_window'], kwargs['viewer_center_window'],
+ kwargs['viewer_fullscreen'], kwargs['with_pdfrw'])
# backwards compatibility with older img2pdf versions where the first
# argument to the function had to be given as a list
@@ -999,6 +1147,9 @@ def convert(*images, title=None,
if isinstance(images[0], (list, tuple)):
images = images[0]
+ if not isinstance(images, (list, tuple)):
+ images = [images]
+
for img in images:
# img is allowed to be a path, a binary string representing image data
# or a file-like object (really anything that implements read())
@@ -1019,25 +1170,35 @@ def convert(*images, title=None,
# name so we now try treating it as raw image content
rawdata = img
- for color, ndpi, imgformat, imgdata, imgwidthpx, imgheightpx \
- in read_images(rawdata, colorspace, first_frame_only):
+ for color, ndpi, imgformat, imgdata, imgwidthpx, imgheightpx, palette \
+ in read_images(
+ rawdata, kwargs['colorspace'], kwargs['first_frame_only']):
pagewidth, pageheight, imgwidthpdf, imgheightpdf = \
- layout_fun(imgwidthpx, imgheightpx, ndpi)
+ kwargs['layout_fun'](imgwidthpx, imgheightpx, ndpi)
+
+ userunit = None
if pagewidth < 3.00 or pageheight < 3.00:
logging.warning("pdf width or height is below 3.00 - too "
"small for some viewers!")
elif pagewidth > 14400.0 or pageheight > 14400.0:
- raise PdfTooLargeError(
+ if kwargs['allow_oversized']:
+ userunit = find_scale(pagewidth, pageheight)
+ pagewidth /= userunit
+ pageheight /= userunit
+ imgwidthpdf /= userunit
+ imgheightpdf /= userunit
+ else:
+ raise PdfTooLargeError(
"pdf width or height must not exceed 200 inches.")
# the image is always centered on the page
imgxpdf = (pagewidth - imgwidthpdf)/2.0
imgypdf = (pageheight - imgheightpdf)/2.0
pdf.add_imagepage(color, imgwidthpx, imgheightpx, imgformat,
imgdata, imgwidthpdf, imgheightpdf, imgxpdf,
- imgypdf, pagewidth, pageheight)
+ imgypdf, pagewidth, pageheight, userunit, palette)
- if outputstream:
- pdf.tostream(outputstream)
+ if kwargs['outputstream']:
+ pdf.tostream(kwargs['outputstream'])
return
return pdf.tostring()
@@ -1318,15 +1479,13 @@ def main():
parser = argparse.ArgumentParser(
formatter_class=argparse.RawDescriptionHelpFormatter,
description='''\
-Losslessly convert raster images to PDF without re-encoding JPEG and JPEG2000
-images. This leads to a lossless conversion of JPEG and JPEG2000 images with
-the only added file size coming from the PDF container itself.
-
-Other raster graphics formats are losslessly stored in a zip/flate encoding of
-their RGB representation. This might increase file size and does not store
-transparency. There is nothing that can be done about that until the PDF format
-allows embedding other image formats like PNG. Thus, img2pdf is primarily
-useful to convert JPEG and JPEG2000 images to PDF.
+Losslessly convert raster images to PDF without re-encoding PNG, JPEG, and
+JPEG2000 images. This leads to a lossless conversion of PNG, JPEG and JPEG2000
+images with the only added file size coming from the PDF container itself.
+Other raster graphics formats are losslessly stored using the same encoding
+that PNG uses. Since PDF does not support images with transparency and since
+img2pdf aims to never be lossy, input images with an alpha channel are not
+supported.
The output is sent to standard output so that it can be redirected into a file
or to another program as part of a shell pipe. To directly write the output
@@ -1501,6 +1660,15 @@ RGB.''')
"input image be converted into a page in the resulting PDF."
)
+ outargs.add_argument(
+ "--pillow-limit-break", action="store_true",
+ help="img2pdf uses the Python Imaging Library Pillow to read input "
+ "images. Pillow limits the maximum input image size to %d pixels "
+ "to prevent decompression bomb denial of service attacks. If "
+ "your input image contains more pixels than that, use this "
+ "option to disable this safety measure during this run of img2pdf"
+ %Image.MAX_IMAGE_PIXELS)
+
sizeargs = parser.add_argument_group(
title='Image and page size and layout arguments',
description='''\
@@ -1674,6 +1842,9 @@ values set via the --border option.
if args.verbose:
logging.basicConfig(level=logging.DEBUG)
+ if args.pillow_limit_break:
+ Image.MAX_IMAGE_PIXELS = None
+
layout_fun = get_layout_fun(args.pagesize, args.imgsize, args.border,
args.fit, args.auto_orient)
diff --git a/src/tests/__init__.py b/src/tests/__init__.py
index 506fc48..b1c1797 100644
--- a/src/tests/__init__.py
+++ b/src/tests/__init__.py
@@ -1,14 +1,23 @@
import unittest
-import os
import img2pdf
+import os
+import struct
+import sys
import zlib
from PIL import Image
-from io import BytesIO
-import struct
+from io import StringIO, BytesIO
HERE = os.path.dirname(__file__)
+PY3 = sys.version_info[0] >= 3
+
+if PY3:
+ PdfReaderIO = StringIO
+else:
+ PdfReaderIO = BytesIO
+
+
# convert +set date:create +set date:modify -define png:exclude-chunk=time
# we define some variables so that the table below can be narrower
@@ -17,6 +26,7 @@ psp = (504, 972) # --pagesize portrait
isl = (756, 324) # --imgsize landscape
isp = (324, 756) # --imgsize portrait
border = (162, 270) # --border
+poster = (97200, 50400)
# there is no need to have test cases with the same images with inverted
# orientation (landscape/portrait) because --pagesize and --imgsize are
# already inverted
@@ -395,6 +405,8 @@ layout_test_cases = [
(972, 504), (864, 432)),
(psl, isl, border, f_enlarge, 1, (972, 504), (756, 252), # 179
(972, 504), (864, 432)),
+ (poster, None, None, f_fill, 0, (97200, 50400), (151200, 50400),
+ (97200, 50400), (100800, 50400)),
]
@@ -459,6 +471,10 @@ def test_suite():
files = os.listdir(os.path.join(HERE, "input"))
for with_pdfrw, test_name in [(a, b) for a in [True, False]
for b in files]:
+ # we do not test animation.gif with pdfrw because it doesn't support
+ # saving hexadecimal palette data
+ if test_name == 'animation.gif' and with_pdfrw:
+ continue
inputf = os.path.join(HERE, "input", test_name)
if not os.path.isfile(inputf):
continue
@@ -470,107 +486,142 @@ def test_suite():
orig_imgdata = inf.read()
output = img2pdf.convert(orig_imgdata, nodate=True,
with_pdfrw=with_pdfrw)
- from io import StringIO, BytesIO
from pdfrw import PdfReader, PdfName, PdfWriter
from pdfrw.py23_diffs import convert_load, convert_store
- x = PdfReader(StringIO(convert_load(output)))
+ x = PdfReader(PdfReaderIO(convert_load(output)))
self.assertEqual(sorted(x.keys()), [PdfName.Info, PdfName.Root,
PdfName.Size])
- self.assertEqual(x.Size, '7')
+ self.assertIn(x.Root.Pages.Count, ('1', '2'))
+ if len(x.Root.Pages.Kids) == '1':
+ self.assertEqual(x.Size, '7')
+ self.assertEqual(len(x.Root.Pages.Kids), 1)
+ elif len(x.Root.Pages.Kids) == '2':
+ self.assertEqual(x.Size, '10')
+ self.assertEqual(len(x.Root.Pages.Kids), 2)
self.assertEqual(x.Info, {})
self.assertEqual(sorted(x.Root.keys()), [PdfName.Pages,
PdfName.Type])
self.assertEqual(x.Root.Type, PdfName.Catalog)
self.assertEqual(sorted(x.Root.Pages.keys()),
[PdfName.Count, PdfName.Kids, PdfName.Type])
- self.assertEqual(x.Root.Pages.Count, '1')
self.assertEqual(x.Root.Pages.Type, PdfName.Pages)
- self.assertEqual(len(x.Root.Pages.Kids), 1)
- self.assertEqual(sorted(x.Root.Pages.Kids[0].keys()),
- [PdfName.Contents, PdfName.MediaBox,
- PdfName.Parent, PdfName.Resources, PdfName.Type])
- self.assertEqual(x.Root.Pages.Kids[0].MediaBox,
- ['0', '0', '115', '48'])
- self.assertEqual(x.Root.Pages.Kids[0].Parent, x.Root.Pages)
- self.assertEqual(x.Root.Pages.Kids[0].Type, PdfName.Page)
- self.assertEqual(x.Root.Pages.Kids[0].Resources.keys(),
- [PdfName.XObject])
- self.assertEqual(x.Root.Pages.Kids[0].Resources.XObject.keys(),
- [PdfName.Im0])
- self.assertEqual(x.Root.Pages.Kids[0].Contents.keys(),
- [PdfName.Length])
- self.assertEqual(x.Root.Pages.Kids[0].Contents.Length,
- str(len(x.Root.Pages.Kids[0].Contents.stream)))
- self.assertEqual(x.Root.Pages.Kids[0].Contents.stream,
- "q\n115.0000 0 0 48.0000 0.0000 0.0000 cm\n/Im0 "
- "Do\nQ")
+ orig_img = Image.open(f)
+ for pagenum in range(len(x.Root.Pages.Kids)):
+ # retrieve the original image frame that this page was
+ # generated from
+ orig_img.seek(pagenum)
+ cur_page = x.Root.Pages.Kids[pagenum]
- imgprops = x.Root.Pages.Kids[0].Resources.XObject.Im0
+ ndpi = orig_img.info.get("dpi", (96.0, 96.0))
+ # In python3, the returned dpi value for some tiff images will
+ # not be an integer but a float. To make the behaviour of
+ # img2pdf the same between python2 and python3, we convert that
+ # float into an integer by rounding.
+ # Search online for the 72.009 dpi problem for more info.
+ ndpi = (int(round(ndpi[0])), int(round(ndpi[1])))
+ imgwidthpx, imgheightpx = orig_img.size
+ pagewidth = 72.0*imgwidthpx/ndpi[0]
+ pageheight = 72.0*imgheightpx/ndpi[1]
- # test if the filter is valid:
- self.assertIn(
- imgprops.Filter, [[PdfName.DCTDecode], [PdfName.JPXDecode],
- [PdfName.FlateDecode],
- [PdfName.CCITTFaxDecode]])
- # test if the colorspace is valid
- self.assertIn(
- imgprops.ColorSpace, [PdfName.DeviceGray, PdfName.DeviceRGB,
- PdfName.DeviceCMYK])
- # test if the image has correct size
- orig_img = Image.open(f)
- self.assertEqual(imgprops.Width, str(orig_img.size[0]))
- self.assertEqual(imgprops.Height, str(orig_img.size[1]))
- # if the input file is a jpeg then it should've been copied
- # verbatim into the PDF
- if imgprops.Filter in [[PdfName.DCTDecode], [PdfName.JPXDecode]]:
- self.assertEqual(
- x.Root.Pages.Kids[0].Resources.XObject.Im0.stream,
- convert_load(orig_imgdata))
- elif imgprops.Filter == [PdfName.CCITTFaxDecode]:
- tiff_header = tiff_header_for_ccitt(
- int(imgprops.Width), int(imgprops.Height),
- int(imgprops.Length), 4)
- imgio = BytesIO()
- imgio.write(tiff_header)
- imgio.write(convert_store(
- x.Root.Pages.Kids[0].Resources.XObject.Im0.stream))
- imgio.seek(0)
- im = Image.open(imgio)
- self.assertEqual(im.tobytes(), orig_img.tobytes())
- try:
- im.close()
- except AttributeError:
- pass
+ def format_float(f):
+ if int(f) == f:
+ return str(int(f))
+ else:
+ return ("%.4f" % f).rstrip("0")
+
+ self.assertEqual(sorted(cur_page.keys()),
+ [PdfName.Contents, PdfName.MediaBox,
+ PdfName.Parent, PdfName.Resources,
+ PdfName.Type])
+ self.assertEqual(cur_page.MediaBox,
+ ['0', '0', format_float(pagewidth),
+ format_float(pageheight)])
+ self.assertEqual(cur_page.Parent, x.Root.Pages)
+ self.assertEqual(cur_page.Type, PdfName.Page)
+ self.assertEqual(cur_page.Resources.keys(),
+ [PdfName.XObject])
+ self.assertEqual(cur_page.Resources.XObject.keys(),
+ [PdfName.Im0])
+ self.assertEqual(cur_page.Contents.keys(),
+ [PdfName.Length])
+ self.assertEqual(cur_page.Contents.Length,
+ str(len(cur_page.Contents.stream)))
+ self.assertEqual(cur_page.Contents.stream,
+ "q\n%.4f 0 0 %.4f 0.0000 0.0000 cm\n"
+ "/Im0 Do\nQ" % (pagewidth, pageheight))
+
+ imgprops = cur_page.Resources.XObject.Im0
+
+ # test if the filter is valid:
+ self.assertIn(
+ imgprops.Filter, [PdfName.DCTDecode, PdfName.JPXDecode,
+ PdfName.FlateDecode,
+ [PdfName.CCITTFaxDecode]])
+
+ # test if the image has correct size
+ self.assertEqual(imgprops.Width, str(orig_img.size[0]))
+ self.assertEqual(imgprops.Height, str(orig_img.size[1]))
+ # if the input file is a jpeg then it should've been copied
+ # verbatim into the PDF
+ if imgprops.Filter in [PdfName.DCTDecode,
+ PdfName.JPXDecode]:
+ self.assertEqual(
+ cur_page.Resources.XObject.Im0.stream,
+ convert_load(orig_imgdata))
+ elif imgprops.Filter == [PdfName.CCITTFaxDecode]:
+ tiff_header = tiff_header_for_ccitt(
+ int(imgprops.Width), int(imgprops.Height),
+ int(imgprops.Length), 4)
+ imgio = BytesIO()
+ imgio.write(tiff_header)
+ imgio.write(convert_store(
+ cur_page.Resources.XObject.Im0.stream))
+ imgio.seek(0)
+ im = Image.open(imgio)
+ self.assertEqual(im.tobytes(), orig_img.tobytes())
+ try:
+ im.close()
+ except AttributeError:
+ pass
- elif imgprops.Filter == [PdfName.FlateDecode]:
- # otherwise, the data is flate encoded and has to be equal to
- # the pixel data of the input image
- imgdata = zlib.decompress(
- convert_store(
- x.Root.Pages.Kids[0].Resources.XObject.Im0.stream))
- colorspace = imgprops.ColorSpace
- if colorspace == PdfName.DeviceGray:
- colorspace = 'L'
- elif colorspace == PdfName.DeviceRGB:
- colorspace = 'RGB'
- elif colorspace == PdfName.DeviceCMYK:
- colorspace = 'CMYK'
- else:
- raise Exception("invalid colorspace")
- im = Image.frombytes(colorspace, (int(imgprops.Width),
- int(imgprops.Height)),
- imgdata)
- if orig_img.mode == '1':
- orig_img = orig_img.convert("L")
- elif orig_img.mode not in ("RGB", "L", "CMYK", "CMYK;I"):
- orig_img = orig_img.convert("RGB")
- self.assertEqual(im.tobytes(), orig_img.tobytes())
- # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have
- # the close() method
- try:
- im.close()
- except AttributeError:
- pass
+ elif imgprops.Filter == PdfName.FlateDecode:
+ # otherwise, the data is flate encoded and has to be equal
+ # to the pixel data of the input image
+ imgdata = zlib.decompress(
+ convert_store(cur_page.Resources.XObject.Im0.stream))
+ if imgprops.DecodeParms:
+ if orig_img.format == 'PNG':
+ pngidat, palette = img2pdf.parse_png(orig_imgdata)
+ else:
+ pngbuffer = BytesIO()
+ orig_img.save(pngbuffer, format="png")
+ pngidat, palette = img2pdf.parse_png(pngbuffer.getvalue())
+ self.assertEqual(zlib.decompress(pngidat), imgdata)
+ else:
+ colorspace = imgprops.ColorSpace
+ if colorspace == PdfName.DeviceGray:
+ colorspace = 'L'
+ elif colorspace == PdfName.DeviceRGB:
+ colorspace = 'RGB'
+ elif colorspace == PdfName.DeviceCMYK:
+ colorspace = 'CMYK'
+ else:
+ raise Exception("invalid colorspace")
+ im = Image.frombytes(colorspace, (int(imgprops.Width),
+ int(imgprops.Height)),
+ imgdata)
+ if orig_img.mode == '1':
+ self.assertEqual(im.tobytes(),
+ orig_img.convert("L").tobytes())
+ elif orig_img.mode not in ("RGB", "L", "CMYK", "CMYK;I"):
+ self.assertEqual(im.tobytes(),
+ orig_img.convert("RGB").tobytes())
+ # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not
+ # have the close() method
+ try:
+ im.close()
+ except AttributeError:
+ pass
# now use pdfrw to parse and then write out both pdfs and check the
# result for equality
y = PdfReader(out)
diff --git a/src/tests/input/CMYK.tif b/src/tests/input/CMYK.tif
new file mode 100644
index 0000000..8e3803e
--- /dev/null
+++ b/src/tests/input/CMYK.tif
Binary files differ
diff --git a/src/tests/input/animation.gif b/src/tests/input/animation.gif
new file mode 100644
index 0000000..af4b278
--- /dev/null
+++ b/src/tests/input/animation.gif
Binary files differ
diff --git a/src/tests/input/gray.png b/src/tests/input/gray.png
new file mode 100644
index 0000000..48247fd
--- /dev/null
+++ b/src/tests/input/gray.png
Binary files differ
diff --git a/src/tests/input/mono.tif b/src/tests/input/mono.tif
new file mode 100644
index 0000000..53e85bc
--- /dev/null
+++ b/src/tests/input/mono.tif
Binary files differ
diff --git a/src/tests/input/normal.png b/src/tests/input/normal.png
index 87b9a6e..394f965 100644
--- a/src/tests/input/normal.png
+++ b/src/tests/input/normal.png
Binary files differ
diff --git a/src/tests/output/CMYK.jpg.pdf b/src/tests/output/CMYK.jpg.pdf
index bfe67f3..9efbe16 100644
--- a/src/tests/output/CMYK.jpg.pdf
+++ b/src/tests/output/CMYK.jpg.pdf
Binary files differ
diff --git a/src/tests/output/CMYK.tif.pdf b/src/tests/output/CMYK.tif.pdf
index b00586b..242bac7 100644
--- a/src/tests/output/CMYK.tif.pdf
+++ b/src/tests/output/CMYK.tif.pdf
Binary files differ
diff --git a/src/tests/output/animation.gif.pdf b/src/tests/output/animation.gif.pdf
new file mode 100644
index 0000000..fdfd460
--- /dev/null
+++ b/src/tests/output/animation.gif.pdf
Binary files differ
diff --git a/src/tests/output/gray.png.pdf b/src/tests/output/gray.png.pdf
new file mode 100644
index 0000000..3f2d4c3
--- /dev/null
+++ b/src/tests/output/gray.png.pdf
Binary files differ
diff --git a/src/tests/output/mono.png.pdf b/src/tests/output/mono.png.pdf
index eda3ec7..c773715 100644
--- a/src/tests/output/mono.png.pdf
+++ b/src/tests/output/mono.png.pdf
Binary files differ
diff --git a/src/tests/output/mono.tif.pdf b/src/tests/output/mono.tif.pdf
new file mode 100644
index 0000000..d23e65e
--- /dev/null
+++ b/src/tests/output/mono.tif.pdf
Binary files differ
diff --git a/src/tests/output/normal.jpg.pdf b/src/tests/output/normal.jpg.pdf
index 87d2645..7acbe20 100644
--- a/src/tests/output/normal.jpg.pdf
+++ b/src/tests/output/normal.jpg.pdf
Binary files differ
diff --git a/src/tests/output/normal.png.pdf b/src/tests/output/normal.png.pdf
index 2628c5d..971475f 100644
--- a/src/tests/output/normal.png.pdf
+++ b/src/tests/output/normal.png.pdf
Binary files differ