summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJohannes Schauer Marin Rodrigues <josch@debian.org>2022-04-08 13:59:51 +0200
committerJohannes Schauer Marin Rodrigues <josch@debian.org>2022-04-08 13:59:51 +0200
commit7abe2f2f089f38a0ba403da8f1459f5c6bf2ffa6 (patch)
tree90fb49f6079574eb37b8d96c199ebc17d488b667
parent0b488a95d20ff071cdd5d20a4800e48bed8bfe58 (diff)
New upstream version 0.4.4
-rw-r--r--CHANGES.rst12
-rw-r--r--PKG-INFO612
-rw-r--r--README.md33
-rw-r--r--setup.py4
-rw-r--r--src/img2pdf.egg-info/PKG-INFO612
-rwxr-xr-xsrc/img2pdf.py163
-rwxr-xr-xsrc/img2pdf_test.py152
-rw-r--r--src/jp2.py2
-rw-r--r--src/tests/input/animation.gifbin1930 -> 1962 bytes
-rw-r--r--src/tests/output/animation.gif.pdfbin6070 -> 6101 bytes
10 files changed, 857 insertions, 733 deletions
diff --git a/CHANGES.rst b/CHANGES.rst
index 68a0187..9d6b3f7 100644
--- a/CHANGES.rst
+++ b/CHANGES.rst
@@ -2,6 +2,18 @@
CHANGES
=======
+0.4.4 (2022-04-07)
+------------------
+
+ - --viewer-page-layout support for twopageright and twopageleft
+ - Add B and JB paper sizes
+ - support for pikepdf (>= 5.0.0) and Pillow (>= 9.1.0)
+
+0.4.3 (2021-10-24)
+------------------
+
+ - fix --viewer-initial-page (broken in last release)
+
0.4.2 (2021-10-11)
------------------
diff --git a/PKG-INFO b/PKG-INFO
index a1b430d..4b1c86a 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,312 +1,12 @@
Metadata-Version: 2.1
Name: img2pdf
-Version: 0.4.2
+Version: 0.4.4
Summary: Convert images to PDF via direct JPEG inclusion.
Home-page: https://gitlab.mister-muffin.de/josch/img2pdf
-Author: Johannes 'josch' Schauer
+Author: Johannes Schauer Marin Rodrigues
Author-email: josch@mister-muffin.de
License: LGPL
-Download-URL: https://gitlab.mister-muffin.de/josch/img2pdf/repository/archive.tar.gz?ref=0.4.2
-Description: [![Travis Status](https://travis-ci.com/josch/img2pdf.svg?branch=main)](https://app.travis-ci.com/josch/img2pdf)
- [![Appveyor Status](https://ci.appveyor.com/api/projects/status/2kws3wkqvi526llj/branch/main?svg=true)](https://ci.appveyor.com/project/josch/img2pdf/branch/main)
-
- img2pdf
- =======
-
- Lossless conversion of raster images to PDF. You should use img2pdf if your
- priorities are (in this order):
-
- 1. **always lossless**: the image embedded in the PDF will always have the
- exact same color information for every pixel as the input
- 2. **small**: if possible, the difference in filesize between the input image
- and the output PDF will only be the overhead of the PDF container itself
- 3. **fast**: if possible, the input image is just pasted into the PDF document
- as-is without any CPU hungry re-encoding of the pixel data
-
- Conventional conversion software (like ImageMagick) would either:
-
- 1. not be lossless because lossy re-encoding to JPEG
- 2. not be small because using wasteful flate encoding of raw pixel data
- 3. not be fast because input data gets re-encoded
-
- Another advantage of not having to re-encode the input (in most common
- situations) is, that img2pdf is able to handle much larger input than other
- software, because the raw pixel data never has to be loaded into memory.
-
- The following table shows how img2pdf handles different input depending on the
- input file format and image color space.
-
- | Format | Colorspace | Result |
- | -------------------- | ------------------------------ | ------------- |
- | JPEG | any | direct |
- | JPEG2000 | any | direct |
- | PNG (non-interlaced) | any | direct |
- | TIFF (CCITT Group 4) | monochrome | direct |
- | any | any except CMYK and monochrome | PNG Paeth |
- | any | monochrome | CCITT Group 4 |
- | any | CMYK | flate |
-
- For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group 4
- encoded data, img2pdf directly embeds the image data into the PDF without
- re-encoding it. It thus treats the PDF format merely as a container format for
- the image data. In these cases, img2pdf only increases the filesize by the size
- of the PDF container (typically around 500 to 700 bytes). Since data is only
- copied and not re-encoded, img2pdf is also typically faster than other
- solutions for these input formats.
-
- For all other input types, img2pdf first has to transform the pixel data to
- make it compatible with PDF. In most cases, the PNG Paeth filter is applied to
- the pixel data. For monochrome input, CCITT Group 4 is used instead. Only for
- CMYK input no filter is applied before finally applying flate compression.
-
- Usage
- -----
-
- The images must be provided as files because img2pdf needs to seek in the file
- descriptor.
-
- If no output file is specified with the `-o`/`--output` option, output will be
- done to stdout. A typical invocation is:
-
- $ img2pdf img1.png img2.jpg -o out.pdf
-
- The detailed documentation can be accessed by running:
-
- $ img2pdf --help
-
- Bugs
- ----
-
- - If you find a JPEG, JPEG2000, PNG or CCITT Group 4 encoded TIFF file that,
- when embedded into the PDF cannot be read by the Adobe Acrobat Reader,
- please contact me.
-
- - I have not yet figured out how to determine the colorspace of JPEG2000
- files. Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000
- files with other colorspaces, you must explicitly specify it using the
- `--colorspace` option.
-
- - An error is produced if the input image is broken. This commonly happens if
- the input image has an invalid EXIF Orientation value of zero. Even though
- only nine different values from 1 to 9 are permitted, Anroid phones and
- Canon DSLR cameras produce JPEG images with the invalid value of zero.
- Either fix your input images with `exiftool` or similar software before
- passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`.
-
- - img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the
- input if necessary. To prevent decompression bomb denial of service attacks,
- Pillow limits the maximum number of pixels an input image is allowed to
- have. If you are sure that you know what you are doing, then you can disable
- this safeguard by passing the `--pillow-limit-break` option to img2pdf. This
- allows one to process even very large input images.
-
- Installation
- ------------
-
- On a Debian- and Ubuntu-based systems, img2pdf can be installed from the
- official repositories:
-
- $ apt install img2pdf
-
- If you want to install it using pip, you can run:
-
- $ pip3 install img2pdf
-
- If you prefer to install from source code use:
-
- $ cd img2pdf/
- $ pip3 install .
-
- To test the console script without installing the package on your system,
- use virtualenv:
-
- $ cd img2pdf/
- $ virtualenv ve
- $ ve/bin/pip3 install .
-
- You can then test the converter using:
-
- $ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
-
- For Microsoft Windows users, PyInstaller based .exe files are produced by
- appveyor. If you don't want to install Python before using img2pdf you can head
- to appveyor and click on "Artifacts" to download the latest version:
- https://ci.appveyor.com/project/josch/img2pdf
-
- GUI
- ---
-
- There exists an experimental GUI with all settings currently disabled. You can
- directly convert images to PDF but you cannot set any options via the GUI yet.
- If you are interested in adding more features to the PDF, please submit a merge
- request. The GUI is based on tkinter and works on Linux, Windows and MacOS.
-
- ![](screenshot.png)
-
- Library
- -------
-
- The package can also be used as a library:
-
- import img2pdf
-
- # opening from filename
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert('test.jpg'))
-
- # opening from file handle
- with open("name.pdf","wb") as f1, open("test.jpg") as f2:
- f1.write(img2pdf.convert(f2))
-
- # using in-memory image data
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert("\x89PNG...")
-
- # multiple inputs (variant 1)
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert("test1.jpg", "test2.png"))
-
- # multiple inputs (variant 2)
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert(["test1.jpg", "test2.png"]))
-
- # convert all files ending in .jpg inside a directory
- dirname = "/path/to/images"
- imgs = []
- for fname in os.listdir(dirname):
- if not fname.endswith(".jpg"):
- continue
- path = os.path.join(dirname, fname)
- if os.path.isdir(path):
- continue
- imgs.append(path)
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert(imgs))
-
- # convert all files ending in .jpg in a directory and its subdirectories
- dirname = "/path/to/images"
- imgs = []
- for r, _, f in os.walk(dirname):
- for fname in f:
- if not fname.endswith(".jpg"):
- continue
- imgs.append(os.path.join(r, fname))
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert(imgs))
-
-
- # convert all files matching a glob
- import glob
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert(glob.glob("/path/to/*.jpg")))
-
- # writing to file descriptor
- with open("name.pdf","wb") as f1, open("test.jpg") as f2:
- img2pdf.convert(f2, outputstream=f1)
-
- # specify paper size (A4)
- a4inpt = (img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297))
- layout_fun = img2pdf.get_layout_fun(a4inpt)
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
-
- # use a fixed dpi of 300 instead of reading it from the image
- dpix = dpiy = 300
- layout_fun = img2pdf.get_fixed_dpi_layout_fun((dpix, dpiy))
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
-
- # create a PDF/A-1b compliant document by passing an ICC profile
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert('test.jpg', pdfa="/usr/share/color/icc/sRGB.icc"))
-
- Comparison to ImageMagick
- -------------------------
-
- Create a large test image:
-
- $ convert logo: -resize 8000x original.jpg
-
- Convert it into PDF using ImageMagick and img2pdf:
-
- $ time img2pdf original.jpg -o img2pdf.pdf
- $ time convert original.jpg imagemagick.pdf
-
- Notice how ImageMagick took an order of magnitude longer to do the conversion
- than img2pdf. It also used twice the memory.
-
- Now extract the image data from both PDF documents and compare it to the
- original:
-
- $ pdfimages -all img2pdf.pdf tmp
- $ compare -metric AE original.jpg tmp-000.jpg null:
- 0
- $ pdfimages -all imagemagick.pdf tmp
- $ compare -metric AE original.jpg tmp-000.jpg null:
- 118716
-
- To get lossless output with ImageMagick we can use Zip compression but that
- unnecessarily increases the size of the output:
-
- $ convert original.jpg -compress Zip imagemagick.pdf
- $ pdfimages -all imagemagick.pdf tmp
- $ compare -metric AE original.jpg tmp-000.png null:
- 0
- $ stat --format="%s %n" original.jpg img2pdf.pdf imagemagick.pdf
- 1535837 original.jpg
- 1536683 img2pdf.pdf
- 9397809 imagemagick.pdf
-
- Comparison to pdfLaTeX
- ----------------------
-
- pdfLaTeX performs a lossless conversion from included images to PDF by default.
- If the input is a JPEG, then it simply embeds the JPEG into the PDF in the same
- way as img2pdf does it. But for other image formats it uses flate compression
- of the plain pixel data and thus needlessly increases the output file size:
-
- $ convert logo: -resize 8000x original.png
- $ cat << END > pdflatex.tex
- \documentclass{article}
- \usepackage{graphicx}
- \begin{document}
- \includegraphics{original.png}
- \end{document}
- END
- $ pdflatex pdflatex.tex
- $ stat --format="%s %n" original.png pdflatex.pdf
- 4500182 original.png
- 9318120 pdflatex.pdf
-
- Comparison to podofoimg2pdf
- ---------------------------
-
- Like pdfLaTeX, podofoimg2pdf is able to perform a lossless conversion from JPEG
- to PDF by plainly embedding the JPEG data into the pdf container. But just like
- pdfLaTeX it uses flate compression for all other file formats, thus sometimes
- resulting in larger files than necessary.
-
- $ convert logo: -resize 8000x original.png
- $ podofoimg2pdf out.pdf original.png
- stat --format="%s %n" original.png out.pdf
- 4500181 original.png
- 9335629 out.pdf
-
- It also only supports JPEG, PNG and TIF as input and lacks many of the
- convenience features of img2pdf like page sizes, borders, rotation and
- metadata.
-
- Comparison to Tesseract OCR
- ---------------------------
-
- Tesseract OCR comes closest to the functionality img2pdf provides. It is able
- to convert JPEG and PNG input to PDF without needlessly increasing the filesize
- and is at the same time lossless. So if your input is JPEG and PNG images, then
- you should safely be able to use Tesseract instead of img2pdf. For other input,
- Tesseract might not do a lossless conversion. For example it converts CMYK
- input to RGB and removes the alpha channel from images with transparency. For
- multipage TIFF or animated GIF, it will only convert the first frame.
-
+Download-URL: https://gitlab.mister-muffin.de/josch/img2pdf/repository/archive.tar.gz?ref=0.4.4
Keywords: jpeg pdf converter
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
@@ -323,3 +23,309 @@ Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Provides-Extra: gui
+License-File: LICENSE
+
+[![Travis Status](https://travis-ci.com/josch/img2pdf.svg?branch=main)](https://app.travis-ci.com/josch/img2pdf)
+[![Appveyor Status](https://ci.appveyor.com/api/projects/status/2kws3wkqvi526llj/branch/main?svg=true)](https://ci.appveyor.com/project/josch/img2pdf/branch/main)
+
+img2pdf
+=======
+
+Lossless conversion of raster images to PDF. You should use img2pdf if your
+priorities are (in this order):
+
+ 1. **always lossless**: the image embedded in the PDF will always have the
+ exact same color information for every pixel as the input
+ 2. **small**: if possible, the difference in filesize between the input image
+ and the output PDF will only be the overhead of the PDF container itself
+ 3. **fast**: if possible, the input image is just pasted into the PDF document
+ as-is without any CPU hungry re-encoding of the pixel data
+
+Conventional conversion software (like ImageMagick) would either:
+
+ 1. not be lossless because lossy re-encoding to JPEG
+ 2. not be small because using wasteful flate encoding of raw pixel data
+ 3. not be fast because input data gets re-encoded
+
+Another advantage of not having to re-encode the input (in most common
+situations) is, that img2pdf is able to handle much larger input than other
+software, because the raw pixel data never has to be loaded into memory.
+
+The following table shows how img2pdf handles different input depending on the
+input file format and image color space.
+
+| Format | Colorspace | Result |
+| ------------------------------------- | ------------------------------ | ------------- |
+| JPEG | any | direct |
+| JPEG2000 | any | direct |
+| PNG (non-interlaced, no transparency) | any | direct |
+| TIFF (CCITT Group 4) | monochrome | direct |
+| any | any except CMYK and monochrome | PNG Paeth |
+| any | monochrome | CCITT Group 4 |
+| any | CMYK | flate |
+
+For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group 4
+encoded data, img2pdf directly embeds the image data into the PDF without
+re-encoding it. It thus treats the PDF format merely as a container format for
+the image data. In these cases, img2pdf only increases the filesize by the size
+of the PDF container (typically around 500 to 700 bytes). Since data is only
+copied and not re-encoded, img2pdf is also typically faster than other
+solutions for these input formats.
+
+For all other input types, img2pdf first has to transform the pixel data to
+make it compatible with PDF. In most cases, the PNG Paeth filter is applied to
+the pixel data. For monochrome input, CCITT Group 4 is used instead. Only for
+CMYK input no filter is applied before finally applying flate compression.
+
+Usage
+-----
+
+The images must be provided as files because img2pdf needs to seek in the file
+descriptor.
+
+If no output file is specified with the `-o`/`--output` option, output will be
+done to stdout. A typical invocation is:
+
+ $ img2pdf img1.png img2.jpg -o out.pdf
+
+The detailed documentation can be accessed by running:
+
+ $ img2pdf --help
+
+Bugs
+----
+
+ - If you find a JPEG, JPEG2000, PNG or CCITT Group 4 encoded TIFF file that,
+ when embedded into the PDF cannot be read by the Adobe Acrobat Reader,
+ please contact me.
+
+ - An error is produced if the input image is broken. This commonly happens if
+ the input image has an invalid EXIF Orientation value of zero. Even though
+ only nine different values from 1 to 9 are permitted, Anroid phones and
+ Canon DSLR cameras produce JPEG images with the invalid value of zero.
+ Either fix your input images with `exiftool` or similar software before
+ passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`
+ (if you run img2pdf from the commandline) or by passing
+ `rotation=img2pdf.Rotation.ifvalid` as an argument to `convert()` when using
+ img2pdf as a library.
+
+ - img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the
+ input if necessary. To prevent decompression bomb denial of service attacks,
+ Pillow limits the maximum number of pixels an input image is allowed to
+ have. If you are sure that you know what you are doing, then you can disable
+ this safeguard by passing the `--pillow-limit-break` option to img2pdf. This
+ allows one to process even very large input images.
+
+Installation
+------------
+
+On a Debian- and Ubuntu-based systems, img2pdf can be installed from the
+official repositories:
+
+ $ apt install img2pdf
+
+If you want to install it using pip, you can run:
+
+ $ pip3 install img2pdf
+
+If you prefer to install from source code use:
+
+ $ cd img2pdf/
+ $ pip3 install .
+
+To test the console script without installing the package on your system,
+use virtualenv:
+
+ $ cd img2pdf/
+ $ virtualenv ve
+ $ ve/bin/pip3 install .
+
+You can then test the converter using:
+
+ $ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
+
+For Microsoft Windows users, PyInstaller based .exe files are produced by
+appveyor. If you don't want to install Python before using img2pdf you can head
+to appveyor and click on "Artifacts" to download the latest version:
+https://ci.appveyor.com/project/josch/img2pdf
+
+GUI
+---
+
+There exists an experimental GUI with all settings currently disabled. You can
+directly convert images to PDF but you cannot set any options via the GUI yet.
+If you are interested in adding more features to the PDF, please submit a merge
+request. The GUI is based on tkinter and works on Linux, Windows and MacOS.
+
+![](screenshot.png)
+
+Library
+-------
+
+The package can also be used as a library:
+
+ import img2pdf
+
+ # opening from filename
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg'))
+
+ # opening from file handle
+ with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+ f1.write(img2pdf.convert(f2))
+
+ # using in-memory image data
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert("\x89PNG...")
+
+ # multiple inputs (variant 1)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert("test1.jpg", "test2.png"))
+
+ # multiple inputs (variant 2)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(["test1.jpg", "test2.png"]))
+
+ # convert all files ending in .jpg inside a directory
+ dirname = "/path/to/images"
+ imgs = []
+ for fname in os.listdir(dirname):
+ if not fname.endswith(".jpg"):
+ continue
+ path = os.path.join(dirname, fname)
+ if os.path.isdir(path):
+ continue
+ imgs.append(path)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(imgs))
+
+ # convert all files ending in .jpg in a directory and its subdirectories
+ dirname = "/path/to/images"
+ imgs = []
+ for r, _, f in os.walk(dirname):
+ for fname in f:
+ if not fname.endswith(".jpg"):
+ continue
+ imgs.append(os.path.join(r, fname))
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(imgs))
+
+
+ # convert all files matching a glob
+ import glob
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(glob.glob("/path/to/*.jpg")))
+
+ # ignore invalid rotation values in the input images
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg'), rotation=img2pdf.Rotation.ifvalid)
+
+ # writing to file descriptor
+ with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+ img2pdf.convert(f2, outputstream=f1)
+
+ # specify paper size (A4)
+ a4inpt = (img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297))
+ layout_fun = img2pdf.get_layout_fun(a4inpt)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
+
+ # use a fixed dpi of 300 instead of reading it from the image
+ dpix = dpiy = 300
+ layout_fun = img2pdf.get_fixed_dpi_layout_fun((dpix, dpiy))
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
+
+ # create a PDF/A-1b compliant document by passing an ICC profile
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg', pdfa="/usr/share/color/icc/sRGB.icc"))
+
+Comparison to ImageMagick
+-------------------------
+
+Create a large test image:
+
+ $ convert logo: -resize 8000x original.jpg
+
+Convert it into PDF using ImageMagick and img2pdf:
+
+ $ time img2pdf original.jpg -o img2pdf.pdf
+ $ time convert original.jpg imagemagick.pdf
+
+Notice how ImageMagick took an order of magnitude longer to do the conversion
+than img2pdf. It also used twice the memory.
+
+Now extract the image data from both PDF documents and compare it to the
+original:
+
+ $ pdfimages -all img2pdf.pdf tmp
+ $ compare -metric AE original.jpg tmp-000.jpg null:
+ 0
+ $ pdfimages -all imagemagick.pdf tmp
+ $ compare -metric AE original.jpg tmp-000.jpg null:
+ 118716
+
+To get lossless output with ImageMagick we can use Zip compression but that
+unnecessarily increases the size of the output:
+
+ $ convert original.jpg -compress Zip imagemagick.pdf
+ $ pdfimages -all imagemagick.pdf tmp
+ $ compare -metric AE original.jpg tmp-000.png null:
+ 0
+ $ stat --format="%s %n" original.jpg img2pdf.pdf imagemagick.pdf
+ 1535837 original.jpg
+ 1536683 img2pdf.pdf
+ 9397809 imagemagick.pdf
+
+Comparison to pdfLaTeX
+----------------------
+
+pdfLaTeX performs a lossless conversion from included images to PDF by default.
+If the input is a JPEG, then it simply embeds the JPEG into the PDF in the same
+way as img2pdf does it. But for other image formats it uses flate compression
+of the plain pixel data and thus needlessly increases the output file size:
+
+ $ convert logo: -resize 8000x original.png
+ $ cat << END > pdflatex.tex
+ \documentclass{article}
+ \usepackage{graphicx}
+ \begin{document}
+ \includegraphics{original.png}
+ \end{document}
+ END
+ $ pdflatex pdflatex.tex
+ $ stat --format="%s %n" original.png pdflatex.pdf
+ 4500182 original.png
+ 9318120 pdflatex.pdf
+
+Comparison to podofoimg2pdf
+---------------------------
+
+Like pdfLaTeX, podofoimg2pdf is able to perform a lossless conversion from JPEG
+to PDF by plainly embedding the JPEG data into the pdf container. But just like
+pdfLaTeX it uses flate compression for all other file formats, thus sometimes
+resulting in larger files than necessary.
+
+ $ convert logo: -resize 8000x original.png
+ $ podofoimg2pdf out.pdf original.png
+ stat --format="%s %n" original.png out.pdf
+ 4500181 original.png
+ 9335629 out.pdf
+
+It also only supports JPEG, PNG and TIF as input and lacks many of the
+convenience features of img2pdf like page sizes, borders, rotation and
+metadata.
+
+Comparison to Tesseract OCR
+---------------------------
+
+Tesseract OCR comes closest to the functionality img2pdf provides. It is able
+to convert JPEG and PNG input to PDF without needlessly increasing the filesize
+and is at the same time lossless. So if your input is JPEG and PNG images, then
+you should safely be able to use Tesseract instead of img2pdf. For other input,
+Tesseract might not do a lossless conversion. For example it converts CMYK
+input to RGB and removes the alpha channel from images with transparency. For
+multipage TIFF or animated GIF, it will only convert the first frame.
+
+
+
diff --git a/README.md b/README.md
index 5e47383..1bca7f2 100644
--- a/README.md
+++ b/README.md
@@ -27,15 +27,15 @@ software, because the raw pixel data never has to be loaded into memory.
The following table shows how img2pdf handles different input depending on the
input file format and image color space.
-| Format | Colorspace | Result |
-| -------------------- | ------------------------------ | ------------- |
-| JPEG | any | direct |
-| JPEG2000 | any | direct |
-| PNG (non-interlaced) | any | direct |
-| TIFF (CCITT Group 4) | monochrome | direct |
-| any | any except CMYK and monochrome | PNG Paeth |
-| any | monochrome | CCITT Group 4 |
-| any | CMYK | flate |
+| Format | Colorspace | Result |
+| ------------------------------------- | ------------------------------ | ------------- |
+| JPEG | any | direct |
+| JPEG2000 | any | direct |
+| PNG (non-interlaced, no transparency) | any | direct |
+| TIFF (CCITT Group 4) | monochrome | direct |
+| any | any except CMYK and monochrome | PNG Paeth |
+| any | monochrome | CCITT Group 4 |
+| any | CMYK | flate |
For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group 4
encoded data, img2pdf directly embeds the image data into the PDF without
@@ -72,17 +72,15 @@ Bugs
when embedded into the PDF cannot be read by the Adobe Acrobat Reader,
please contact me.
- - I have not yet figured out how to determine the colorspace of JPEG2000
- files. Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000
- files with other colorspaces, you must explicitly specify it using the
- `--colorspace` option.
-
- An error is produced if the input image is broken. This commonly happens if
the input image has an invalid EXIF Orientation value of zero. Even though
only nine different values from 1 to 9 are permitted, Anroid phones and
Canon DSLR cameras produce JPEG images with the invalid value of zero.
Either fix your input images with `exiftool` or similar software before
- passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`.
+ passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`
+ (if you run img2pdf from the commandline) or by passing
+ `rotation=img2pdf.Rotation.ifvalid` as an argument to `convert()` when using
+ img2pdf as a library.
- img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the
input if necessary. To prevent decompression bomb denial of service attacks,
@@ -191,6 +189,10 @@ The package can also be used as a library:
with open("name.pdf","wb") as f:
f.write(img2pdf.convert(glob.glob("/path/to/*.jpg")))
+ # ignore invalid rotation values in the input images
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg'), rotation=img2pdf.Rotation.ifvalid)
+
# writing to file descriptor
with open("name.pdf","wb") as f1, open("test.jpg") as f2:
img2pdf.convert(f2, outputstream=f1)
@@ -297,3 +299,4 @@ you should safely be able to use Tesseract instead of img2pdf. For other input,
Tesseract might not do a lossless conversion. For example it converts CMYK
input to RGB and removes the alpha channel from images with transparency. For
multipage TIFF or animated GIF, it will only convert the first frame.
+
diff --git a/setup.py b/setup.py
index 070b1ea..57a34af 100644
--- a/setup.py
+++ b/setup.py
@@ -1,7 +1,7 @@
import sys
from setuptools import setup
-VERSION = "0.4.2"
+VERSION = "0.4.4"
INSTALL_REQUIRES = (
"Pillow",
@@ -11,7 +11,7 @@ INSTALL_REQUIRES = (
setup(
name="img2pdf",
version=VERSION,
- author="Johannes 'josch' Schauer",
+ author="Johannes Schauer Marin Rodrigues",
author_email="josch@mister-muffin.de",
description="Convert images to PDF via direct JPEG inclusion.",
long_description=open("README.md").read(),
diff --git a/src/img2pdf.egg-info/PKG-INFO b/src/img2pdf.egg-info/PKG-INFO
index a1b430d..4b1c86a 100644
--- a/src/img2pdf.egg-info/PKG-INFO
+++ b/src/img2pdf.egg-info/PKG-INFO
@@ -1,312 +1,12 @@
Metadata-Version: 2.1
Name: img2pdf
-Version: 0.4.2
+Version: 0.4.4
Summary: Convert images to PDF via direct JPEG inclusion.
Home-page: https://gitlab.mister-muffin.de/josch/img2pdf
-Author: Johannes 'josch' Schauer
+Author: Johannes Schauer Marin Rodrigues
Author-email: josch@mister-muffin.de
License: LGPL
-Download-URL: https://gitlab.mister-muffin.de/josch/img2pdf/repository/archive.tar.gz?ref=0.4.2
-Description: [![Travis Status](https://travis-ci.com/josch/img2pdf.svg?branch=main)](https://app.travis-ci.com/josch/img2pdf)
- [![Appveyor Status](https://ci.appveyor.com/api/projects/status/2kws3wkqvi526llj/branch/main?svg=true)](https://ci.appveyor.com/project/josch/img2pdf/branch/main)
-
- img2pdf
- =======
-
- Lossless conversion of raster images to PDF. You should use img2pdf if your
- priorities are (in this order):
-
- 1. **always lossless**: the image embedded in the PDF will always have the
- exact same color information for every pixel as the input
- 2. **small**: if possible, the difference in filesize between the input image
- and the output PDF will only be the overhead of the PDF container itself
- 3. **fast**: if possible, the input image is just pasted into the PDF document
- as-is without any CPU hungry re-encoding of the pixel data
-
- Conventional conversion software (like ImageMagick) would either:
-
- 1. not be lossless because lossy re-encoding to JPEG
- 2. not be small because using wasteful flate encoding of raw pixel data
- 3. not be fast because input data gets re-encoded
-
- Another advantage of not having to re-encode the input (in most common
- situations) is, that img2pdf is able to handle much larger input than other
- software, because the raw pixel data never has to be loaded into memory.
-
- The following table shows how img2pdf handles different input depending on the
- input file format and image color space.
-
- | Format | Colorspace | Result |
- | -------------------- | ------------------------------ | ------------- |
- | JPEG | any | direct |
- | JPEG2000 | any | direct |
- | PNG (non-interlaced) | any | direct |
- | TIFF (CCITT Group 4) | monochrome | direct |
- | any | any except CMYK and monochrome | PNG Paeth |
- | any | monochrome | CCITT Group 4 |
- | any | CMYK | flate |
-
- For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group 4
- encoded data, img2pdf directly embeds the image data into the PDF without
- re-encoding it. It thus treats the PDF format merely as a container format for
- the image data. In these cases, img2pdf only increases the filesize by the size
- of the PDF container (typically around 500 to 700 bytes). Since data is only
- copied and not re-encoded, img2pdf is also typically faster than other
- solutions for these input formats.
-
- For all other input types, img2pdf first has to transform the pixel data to
- make it compatible with PDF. In most cases, the PNG Paeth filter is applied to
- the pixel data. For monochrome input, CCITT Group 4 is used instead. Only for
- CMYK input no filter is applied before finally applying flate compression.
-
- Usage
- -----
-
- The images must be provided as files because img2pdf needs to seek in the file
- descriptor.
-
- If no output file is specified with the `-o`/`--output` option, output will be
- done to stdout. A typical invocation is:
-
- $ img2pdf img1.png img2.jpg -o out.pdf
-
- The detailed documentation can be accessed by running:
-
- $ img2pdf --help
-
- Bugs
- ----
-
- - If you find a JPEG, JPEG2000, PNG or CCITT Group 4 encoded TIFF file that,
- when embedded into the PDF cannot be read by the Adobe Acrobat Reader,
- please contact me.
-
- - I have not yet figured out how to determine the colorspace of JPEG2000
- files. Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000
- files with other colorspaces, you must explicitly specify it using the
- `--colorspace` option.
-
- - An error is produced if the input image is broken. This commonly happens if
- the input image has an invalid EXIF Orientation value of zero. Even though
- only nine different values from 1 to 9 are permitted, Anroid phones and
- Canon DSLR cameras produce JPEG images with the invalid value of zero.
- Either fix your input images with `exiftool` or similar software before
- passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`.
-
- - img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the
- input if necessary. To prevent decompression bomb denial of service attacks,
- Pillow limits the maximum number of pixels an input image is allowed to
- have. If you are sure that you know what you are doing, then you can disable
- this safeguard by passing the `--pillow-limit-break` option to img2pdf. This
- allows one to process even very large input images.
-
- Installation
- ------------
-
- On a Debian- and Ubuntu-based systems, img2pdf can be installed from the
- official repositories:
-
- $ apt install img2pdf
-
- If you want to install it using pip, you can run:
-
- $ pip3 install img2pdf
-
- If you prefer to install from source code use:
-
- $ cd img2pdf/
- $ pip3 install .
-
- To test the console script without installing the package on your system,
- use virtualenv:
-
- $ cd img2pdf/
- $ virtualenv ve
- $ ve/bin/pip3 install .
-
- You can then test the converter using:
-
- $ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
-
- For Microsoft Windows users, PyInstaller based .exe files are produced by
- appveyor. If you don't want to install Python before using img2pdf you can head
- to appveyor and click on "Artifacts" to download the latest version:
- https://ci.appveyor.com/project/josch/img2pdf
-
- GUI
- ---
-
- There exists an experimental GUI with all settings currently disabled. You can
- directly convert images to PDF but you cannot set any options via the GUI yet.
- If you are interested in adding more features to the PDF, please submit a merge
- request. The GUI is based on tkinter and works on Linux, Windows and MacOS.
-
- ![](screenshot.png)
-
- Library
- -------
-
- The package can also be used as a library:
-
- import img2pdf
-
- # opening from filename
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert('test.jpg'))
-
- # opening from file handle
- with open("name.pdf","wb") as f1, open("test.jpg") as f2:
- f1.write(img2pdf.convert(f2))
-
- # using in-memory image data
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert("\x89PNG...")
-
- # multiple inputs (variant 1)
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert("test1.jpg", "test2.png"))
-
- # multiple inputs (variant 2)
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert(["test1.jpg", "test2.png"]))
-
- # convert all files ending in .jpg inside a directory
- dirname = "/path/to/images"
- imgs = []
- for fname in os.listdir(dirname):
- if not fname.endswith(".jpg"):
- continue
- path = os.path.join(dirname, fname)
- if os.path.isdir(path):
- continue
- imgs.append(path)
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert(imgs))
-
- # convert all files ending in .jpg in a directory and its subdirectories
- dirname = "/path/to/images"
- imgs = []
- for r, _, f in os.walk(dirname):
- for fname in f:
- if not fname.endswith(".jpg"):
- continue
- imgs.append(os.path.join(r, fname))
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert(imgs))
-
-
- # convert all files matching a glob
- import glob
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert(glob.glob("/path/to/*.jpg")))
-
- # writing to file descriptor
- with open("name.pdf","wb") as f1, open("test.jpg") as f2:
- img2pdf.convert(f2, outputstream=f1)
-
- # specify paper size (A4)
- a4inpt = (img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297))
- layout_fun = img2pdf.get_layout_fun(a4inpt)
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
-
- # use a fixed dpi of 300 instead of reading it from the image
- dpix = dpiy = 300
- layout_fun = img2pdf.get_fixed_dpi_layout_fun((dpix, dpiy))
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
-
- # create a PDF/A-1b compliant document by passing an ICC profile
- with open("name.pdf","wb") as f:
- f.write(img2pdf.convert('test.jpg', pdfa="/usr/share/color/icc/sRGB.icc"))
-
- Comparison to ImageMagick
- -------------------------
-
- Create a large test image:
-
- $ convert logo: -resize 8000x original.jpg
-
- Convert it into PDF using ImageMagick and img2pdf:
-
- $ time img2pdf original.jpg -o img2pdf.pdf
- $ time convert original.jpg imagemagick.pdf
-
- Notice how ImageMagick took an order of magnitude longer to do the conversion
- than img2pdf. It also used twice the memory.
-
- Now extract the image data from both PDF documents and compare it to the
- original:
-
- $ pdfimages -all img2pdf.pdf tmp
- $ compare -metric AE original.jpg tmp-000.jpg null:
- 0
- $ pdfimages -all imagemagick.pdf tmp
- $ compare -metric AE original.jpg tmp-000.jpg null:
- 118716
-
- To get lossless output with ImageMagick we can use Zip compression but that
- unnecessarily increases the size of the output:
-
- $ convert original.jpg -compress Zip imagemagick.pdf
- $ pdfimages -all imagemagick.pdf tmp
- $ compare -metric AE original.jpg tmp-000.png null:
- 0
- $ stat --format="%s %n" original.jpg img2pdf.pdf imagemagick.pdf
- 1535837 original.jpg
- 1536683 img2pdf.pdf
- 9397809 imagemagick.pdf
-
- Comparison to pdfLaTeX
- ----------------------
-
- pdfLaTeX performs a lossless conversion from included images to PDF by default.
- If the input is a JPEG, then it simply embeds the JPEG into the PDF in the same
- way as img2pdf does it. But for other image formats it uses flate compression
- of the plain pixel data and thus needlessly increases the output file size:
-
- $ convert logo: -resize 8000x original.png
- $ cat << END > pdflatex.tex
- \documentclass{article}
- \usepackage{graphicx}
- \begin{document}
- \includegraphics{original.png}
- \end{document}
- END
- $ pdflatex pdflatex.tex
- $ stat --format="%s %n" original.png pdflatex.pdf
- 4500182 original.png
- 9318120 pdflatex.pdf
-
- Comparison to podofoimg2pdf
- ---------------------------
-
- Like pdfLaTeX, podofoimg2pdf is able to perform a lossless conversion from JPEG
- to PDF by plainly embedding the JPEG data into the pdf container. But just like
- pdfLaTeX it uses flate compression for all other file formats, thus sometimes
- resulting in larger files than necessary.
-
- $ convert logo: -resize 8000x original.png
- $ podofoimg2pdf out.pdf original.png
- stat --format="%s %n" original.png out.pdf
- 4500181 original.png
- 9335629 out.pdf
-
- It also only supports JPEG, PNG and TIF as input and lacks many of the
- convenience features of img2pdf like page sizes, borders, rotation and
- metadata.
-
- Comparison to Tesseract OCR
- ---------------------------
-
- Tesseract OCR comes closest to the functionality img2pdf provides. It is able
- to convert JPEG and PNG input to PDF without needlessly increasing the filesize
- and is at the same time lossless. So if your input is JPEG and PNG images, then
- you should safely be able to use Tesseract instead of img2pdf. For other input,
- Tesseract might not do a lossless conversion. For example it converts CMYK
- input to RGB and removes the alpha channel from images with transparency. For
- multipage TIFF or animated GIF, it will only convert the first frame.
-
+Download-URL: https://gitlab.mister-muffin.de/josch/img2pdf/repository/archive.tar.gz?ref=0.4.4
Keywords: jpeg pdf converter
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
@@ -323,3 +23,309 @@ Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Provides-Extra: gui
+License-File: LICENSE
+
+[![Travis Status](https://travis-ci.com/josch/img2pdf.svg?branch=main)](https://app.travis-ci.com/josch/img2pdf)
+[![Appveyor Status](https://ci.appveyor.com/api/projects/status/2kws3wkqvi526llj/branch/main?svg=true)](https://ci.appveyor.com/project/josch/img2pdf/branch/main)
+
+img2pdf
+=======
+
+Lossless conversion of raster images to PDF. You should use img2pdf if your
+priorities are (in this order):
+
+ 1. **always lossless**: the image embedded in the PDF will always have the
+ exact same color information for every pixel as the input
+ 2. **small**: if possible, the difference in filesize between the input image
+ and the output PDF will only be the overhead of the PDF container itself
+ 3. **fast**: if possible, the input image is just pasted into the PDF document
+ as-is without any CPU hungry re-encoding of the pixel data
+
+Conventional conversion software (like ImageMagick) would either:
+
+ 1. not be lossless because lossy re-encoding to JPEG
+ 2. not be small because using wasteful flate encoding of raw pixel data
+ 3. not be fast because input data gets re-encoded
+
+Another advantage of not having to re-encode the input (in most common
+situations) is, that img2pdf is able to handle much larger input than other
+software, because the raw pixel data never has to be loaded into memory.
+
+The following table shows how img2pdf handles different input depending on the
+input file format and image color space.
+
+| Format | Colorspace | Result |
+| ------------------------------------- | ------------------------------ | ------------- |
+| JPEG | any | direct |
+| JPEG2000 | any | direct |
+| PNG (non-interlaced, no transparency) | any | direct |
+| TIFF (CCITT Group 4) | monochrome | direct |
+| any | any except CMYK and monochrome | PNG Paeth |
+| any | monochrome | CCITT Group 4 |
+| any | CMYK | flate |
+
+For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group 4
+encoded data, img2pdf directly embeds the image data into the PDF without
+re-encoding it. It thus treats the PDF format merely as a container format for
+the image data. In these cases, img2pdf only increases the filesize by the size
+of the PDF container (typically around 500 to 700 bytes). Since data is only
+copied and not re-encoded, img2pdf is also typically faster than other
+solutions for these input formats.
+
+For all other input types, img2pdf first has to transform the pixel data to
+make it compatible with PDF. In most cases, the PNG Paeth filter is applied to
+the pixel data. For monochrome input, CCITT Group 4 is used instead. Only for
+CMYK input no filter is applied before finally applying flate compression.
+
+Usage
+-----
+
+The images must be provided as files because img2pdf needs to seek in the file
+descriptor.
+
+If no output file is specified with the `-o`/`--output` option, output will be
+done to stdout. A typical invocation is:
+
+ $ img2pdf img1.png img2.jpg -o out.pdf
+
+The detailed documentation can be accessed by running:
+
+ $ img2pdf --help
+
+Bugs
+----
+
+ - If you find a JPEG, JPEG2000, PNG or CCITT Group 4 encoded TIFF file that,
+ when embedded into the PDF cannot be read by the Adobe Acrobat Reader,
+ please contact me.
+
+ - An error is produced if the input image is broken. This commonly happens if
+ the input image has an invalid EXIF Orientation value of zero. Even though
+ only nine different values from 1 to 9 are permitted, Anroid phones and
+ Canon DSLR cameras produce JPEG images with the invalid value of zero.
+ Either fix your input images with `exiftool` or similar software before
+ passing the JPEG to `img2pdf` or run `img2pdf` with `--rotation=ifvalid`
+ (if you run img2pdf from the commandline) or by passing
+ `rotation=img2pdf.Rotation.ifvalid` as an argument to `convert()` when using
+ img2pdf as a library.
+
+ - img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the
+ input if necessary. To prevent decompression bomb denial of service attacks,
+ Pillow limits the maximum number of pixels an input image is allowed to
+ have. If you are sure that you know what you are doing, then you can disable
+ this safeguard by passing the `--pillow-limit-break` option to img2pdf. This
+ allows one to process even very large input images.
+
+Installation
+------------
+
+On a Debian- and Ubuntu-based systems, img2pdf can be installed from the
+official repositories:
+
+ $ apt install img2pdf
+
+If you want to install it using pip, you can run:
+
+ $ pip3 install img2pdf
+
+If you prefer to install from source code use:
+
+ $ cd img2pdf/
+ $ pip3 install .
+
+To test the console script without installing the package on your system,
+use virtualenv:
+
+ $ cd img2pdf/
+ $ virtualenv ve
+ $ ve/bin/pip3 install .
+
+You can then test the converter using:
+
+ $ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
+
+For Microsoft Windows users, PyInstaller based .exe files are produced by
+appveyor. If you don't want to install Python before using img2pdf you can head
+to appveyor and click on "Artifacts" to download the latest version:
+https://ci.appveyor.com/project/josch/img2pdf
+
+GUI
+---
+
+There exists an experimental GUI with all settings currently disabled. You can
+directly convert images to PDF but you cannot set any options via the GUI yet.
+If you are interested in adding more features to the PDF, please submit a merge
+request. The GUI is based on tkinter and works on Linux, Windows and MacOS.
+
+![](screenshot.png)
+
+Library
+-------
+
+The package can also be used as a library:
+
+ import img2pdf
+
+ # opening from filename
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg'))
+
+ # opening from file handle
+ with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+ f1.write(img2pdf.convert(f2))
+
+ # using in-memory image data
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert("\x89PNG...")
+
+ # multiple inputs (variant 1)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert("test1.jpg", "test2.png"))
+
+ # multiple inputs (variant 2)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(["test1.jpg", "test2.png"]))
+
+ # convert all files ending in .jpg inside a directory
+ dirname = "/path/to/images"
+ imgs = []
+ for fname in os.listdir(dirname):
+ if not fname.endswith(".jpg"):
+ continue
+ path = os.path.join(dirname, fname)
+ if os.path.isdir(path):
+ continue
+ imgs.append(path)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(imgs))
+
+ # convert all files ending in .jpg in a directory and its subdirectories
+ dirname = "/path/to/images"
+ imgs = []
+ for r, _, f in os.walk(dirname):
+ for fname in f:
+ if not fname.endswith(".jpg"):
+ continue
+ imgs.append(os.path.join(r, fname))
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(imgs))
+
+
+ # convert all files matching a glob
+ import glob
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert(glob.glob("/path/to/*.jpg")))
+
+ # ignore invalid rotation values in the input images
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg'), rotation=img2pdf.Rotation.ifvalid)
+
+ # writing to file descriptor
+ with open("name.pdf","wb") as f1, open("test.jpg") as f2:
+ img2pdf.convert(f2, outputstream=f1)
+
+ # specify paper size (A4)
+ a4inpt = (img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297))
+ layout_fun = img2pdf.get_layout_fun(a4inpt)
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
+
+ # use a fixed dpi of 300 instead of reading it from the image
+ dpix = dpiy = 300
+ layout_fun = img2pdf.get_fixed_dpi_layout_fun((dpix, dpiy))
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun))
+
+ # create a PDF/A-1b compliant document by passing an ICC profile
+ with open("name.pdf","wb") as f:
+ f.write(img2pdf.convert('test.jpg', pdfa="/usr/share/color/icc/sRGB.icc"))
+
+Comparison to ImageMagick
+-------------------------
+
+Create a large test image:
+
+ $ convert logo: -resize 8000x original.jpg
+
+Convert it into PDF using ImageMagick and img2pdf:
+
+ $ time img2pdf original.jpg -o img2pdf.pdf
+ $ time convert original.jpg imagemagick.pdf
+
+Notice how ImageMagick took an order of magnitude longer to do the conversion
+than img2pdf. It also used twice the memory.
+
+Now extract the image data from both PDF documents and compare it to the
+original:
+
+ $ pdfimages -all img2pdf.pdf tmp
+ $ compare -metric AE original.jpg tmp-000.jpg null:
+ 0
+ $ pdfimages -all imagemagick.pdf tmp
+ $ compare -metric AE original.jpg tmp-000.jpg null:
+ 118716
+
+To get lossless output with ImageMagick we can use Zip compression but that
+unnecessarily increases the size of the output:
+
+ $ convert original.jpg -compress Zip imagemagick.pdf
+ $ pdfimages -all imagemagick.pdf tmp
+ $ compare -metric AE original.jpg tmp-000.png null:
+ 0
+ $ stat --format="%s %n" original.jpg img2pdf.pdf imagemagick.pdf
+ 1535837 original.jpg
+ 1536683 img2pdf.pdf
+ 9397809 imagemagick.pdf
+
+Comparison to pdfLaTeX
+----------------------
+
+pdfLaTeX performs a lossless conversion from included images to PDF by default.
+If the input is a JPEG, then it simply embeds the JPEG into the PDF in the same
+way as img2pdf does it. But for other image formats it uses flate compression
+of the plain pixel data and thus needlessly increases the output file size:
+
+ $ convert logo: -resize 8000x original.png
+ $ cat << END > pdflatex.tex
+ \documentclass{article}
+ \usepackage{graphicx}
+ \begin{document}
+ \includegraphics{original.png}
+ \end{document}
+ END
+ $ pdflatex pdflatex.tex
+ $ stat --format="%s %n" original.png pdflatex.pdf
+ 4500182 original.png
+ 9318120 pdflatex.pdf
+
+Comparison to podofoimg2pdf
+---------------------------
+
+Like pdfLaTeX, podofoimg2pdf is able to perform a lossless conversion from JPEG
+to PDF by plainly embedding the JPEG data into the pdf container. But just like
+pdfLaTeX it uses flate compression for all other file formats, thus sometimes
+resulting in larger files than necessary.
+
+ $ convert logo: -resize 8000x original.png
+ $ podofoimg2pdf out.pdf original.png
+ stat --format="%s %n" original.png out.pdf
+ 4500181 original.png
+ 9335629 out.pdf
+
+It also only supports JPEG, PNG and TIF as input and lacks many of the
+convenience features of img2pdf like page sizes, borders, rotation and
+metadata.
+
+Comparison to Tesseract OCR
+---------------------------
+
+Tesseract OCR comes closest to the functionality img2pdf provides. It is able
+to convert JPEG and PNG input to PDF without needlessly increasing the filesize
+and is at the same time lossless. So if your input is JPEG and PNG images, then
+you should safely be able to use Tesseract instead of img2pdf. For other input,
+Tesseract might not do a lossless conversion. For example it converts CMYK
+input to RGB and removes the alpha channel from images with transparency. For
+multipage TIFF or animated GIF, it will only convert the first frame.
+
+
+
diff --git a/src/img2pdf.py b/src/img2pdf.py
index d6bb54f..39a311b 100755
--- a/src/img2pdf.py
+++ b/src/img2pdf.py
@@ -22,7 +22,17 @@ import sys
import os
import zlib
import argparse
-from PIL import Image, TiffImagePlugin
+from PIL import Image, TiffImagePlugin, GifImagePlugin
+
+if hasattr(GifImagePlugin, "LoadingStrategy"):
+ # Pillow 9.0.0 started emitting all frames but the first as RGB instead of
+ # P to make sure that more than 256 colors can be represented. But palette
+ # images compress far better than RGB images in PDF so we instruct Pillow
+ # to only emit RGB frames if the palette differs and return P otherwise.
+ # This works since Pillow 9.1.0.
+ GifImagePlugin.LOADING_STRATEGY = (
+ GifImagePlugin.LoadingStrategy.RGB_AFTER_DIFFERENT_PALETTE_ONLY
+ )
# TiffImagePlugin.DEBUG = True
from PIL.ExifTags import TAGS
@@ -50,7 +60,7 @@ try:
except ImportError:
have_pikepdf = False
-__version__ = "0.4.2"
+__version__ = "0.4.4"
default_dpi = 96.0
papersizes = {
"letter": "8.5inx11in",
@@ -61,6 +71,20 @@ papersizes = {
"a4": "210mmx297mm",
"a5": "148mmx210mm",
"a6": "105mmx148mm",
+ "b0": "1000mmx1414mm",
+ "b1": "707mmx1000mm",
+ "b2": "500mmx707mm",
+ "b3": "353mmx500mm",
+ "b4": "250mmx353mm",
+ "b5": "176mmx250mm",
+ "b6": "125mmx176mm",
+ "jb0": "1030mmx1456mm",
+ "jb1": "728mmx1030mm",
+ "jb2": "515mmx728mm",
+ "jb3": "364mmx515mm",
+ "jb4": "257mmx364mm",
+ "jb5": "182mmx257mm",
+ "jb6": "128mmx182mm",
"legal": "8.5inx14in",
"tabloid": "11inx17in",
}
@@ -73,6 +97,20 @@ papernames = {
"a4": "A4",
"a5": "A5",
"a6": "A6",
+ "b0": "B0",
+ "b1": "B1",
+ "b2": "B2",
+ "b3": "B3",
+ "b4": "B4",
+ "b5": "B5",
+ "b6": "B6",
+ "jb0": "JB0",
+ "jb1": "JB1",
+ "jb2": "JB2",
+ "jb3": "JB3",
+ "jb4": "JB4",
+ "jb5": "JB5",
+ "jb6": "JB6",
"legal": "Legal",
"tabloid": "Tabloid",
}
@@ -91,7 +129,10 @@ ImageFormat = Enum("ImageFormat", "JPEG JPEG2000 CCITTGroup4 PNG GIF TIFF MPO ot
PageMode = Enum("PageMode", "none outlines thumbs")
-PageLayout = Enum("PageLayout", "single onecolumn twocolumnright twocolumnleft")
+PageLayout = Enum(
+ "PageLayout",
+ "single onecolumn twocolumnright twocolumnleft twopageright twopageleft",
+)
Magnification = Enum("Magnification", "fit fith fitbh")
@@ -389,6 +430,28 @@ class ExifOrientationError(Exception):
pass
+# temporary change the attribute of an object using a context manager
+class temp_attr:
+ def __init__(self, obj, field, value):
+ self.obj = obj
+ self.field = field
+ self.value = value
+
+ def __enter__(self):
+ self.exists = False
+ if hasattr(self.obj, self.field):
+ self.exists = True
+ self.old_value = getattr(self.obj, self.field)
+ print(f"setting {self.obj}.{self.field} = {self.value}")
+ setattr(self.obj, self.field, self.value)
+
+ def __exit__(self, exctype, excinst, exctb):
+ if self.exists:
+ setattr(self.obj, self.field, self.old_value)
+ else:
+ delattr(self.obj, self.field)
+
+
# without pdfrw this function is a no-op
def my_convert_load(string):
return string
@@ -1106,6 +1169,18 @@ class pdfdoc(object):
[initial_page, PdfName.XYZ, NullObject, NullObject, 0]
)
+ # The /OpenAction array must contain the page as an indirect object.
+ # This changed some time after 4.2.0 and on or before 5.0.0 and current
+ # versions require to use .obj or otherwise we get:
+ # TypeError: Can't convert ObjectHelper (or subclass) to Object
+ # implicitly. Use .obj to get access the underlying object.
+ # See https://github.com/pikepdf/pikepdf/issues/313 for details.
+ if self.engine == Engine.pikepdf:
+ if isinstance(initial_page, pikepdf.Page):
+ initial_page = self.writer.make_indirect(initial_page.obj)
+ else:
+ initial_page = self.writer.make_indirect(initial_page)
+
if self.magnification == Magnification.fit:
catalog[PdfName.OpenAction] = PdfArray([initial_page, PdfName.Fit])
elif self.magnification == Magnification.fith:
@@ -1136,6 +1211,14 @@ class pdfdoc(object):
catalog[PdfName.PageLayout] = PdfName.TwoColumnRight
elif self.page_layout == PageLayout.twocolumnleft:
catalog[PdfName.PageLayout] = PdfName.TwoColumnLeft
+ elif self.page_layout == PageLayout.twopageright:
+ catalog[PdfName.PageLayout] = PdfName.TwoPageRight
+ if self.output_version < "1.5":
+ self.output_version = "1.5"
+ elif self.page_layout == PageLayout.twopageleft:
+ catalog[PdfName.PageLayout] = PdfName.TwoPageLeft
+ if self.output_version < "1.5":
+ self.output_version = "1.5"
elif self.page_layout is None:
pass
else:
@@ -1273,17 +1356,25 @@ def get_imgmetadata(
elif value in (2, 4, 5, 7):
if rotreq == Rotation.ifvalid:
logger.warning(
- "Unsupported flipped rotation mode (%d)", value
+ "Unsupported flipped rotation mode (%d): use "
+ "--rotation=ifvalid or "
+ "rotation=img2pdf.Rotation.ifvalid to ignore",
+ value,
)
else:
raise ExifOrientationError(
- "Unsupported flipped rotation mode (%d)" % value
+ "Unsupported flipped rotation mode (%d): use "
+ "--rotation=ifvalid or "
+ "rotation=img2pdf.Rotation.ifvalid to ignore" % value
)
else:
if rotreq == Rotation.ifvalid:
logger.warning("Invalid rotation (%d)", value)
else:
- raise ExifOrientationError("Invalid rotation (%d)" % value)
+ raise ExifOrientationError(
+ "Invalid rotation (%d): use --rotation=ifvalid "
+ "or rotation=img2pdf.Rotation.ifvalid to ignore" % value
+ )
elif rotreq in (Rotation.none, Rotation["0"]):
rotation = 0
elif rotreq == Rotation["90"]:
@@ -1390,27 +1481,29 @@ def transcode_monochrome(imgdata):
# into putting everything into a single strip. Thanks to Andrew Murray for
# the hack.
#
- # This can be dropped once this gets merged:
- # https://github.com/python-pillow/Pillow/pull/5744
- pillow__getitem__ = TiffImagePlugin.ImageFileDirectory_v2.__getitem__
-
- def __getitem__(self, tag):
- overrides = {
- TiffImagePlugin.ROWSPERSTRIP: imgdata.size[1],
- TiffImagePlugin.STRIPBYTECOUNTS: [
- (imgdata.size[0] + 7) // 8 * imgdata.size[1]
- ],
- TiffImagePlugin.STRIPOFFSETS: [0],
- }
- return overrides.get(tag, pillow__getitem__(self, tag))
+ # Since version 8.4.0 Pillow allows us to modify the strip size explicitly
+ tmp_strip_size = (imgdata.size[0] + 7) // 8 * imgdata.size[1]
+ if hasattr(TiffImagePlugin, "STRIP_SIZE"):
+ # we are using Pillow 8.4.0 or later
+ with temp_attr(TiffImagePlugin, "STRIP_SIZE", tmp_strip_size):
+ im.save(newimgio, format="TIFF", compression="group4")
+ else:
+ # only needed for Pillow 8.3.x but works for versions before that as
+ # well
+ pillow__getitem__ = TiffImagePlugin.ImageFileDirectory_v2.__getitem__
+
+ def __getitem__(self, tag):
+ overrides = {
+ TiffImagePlugin.ROWSPERSTRIP: imgdata.size[1],
+ TiffImagePlugin.STRIPBYTECOUNTS: [tmp_strip_size],
+ TiffImagePlugin.STRIPOFFSETS: [0],
+ }
+ return overrides.get(tag, pillow__getitem__(self, tag))
- # use try/finally to make sure that __getitem__ is reset even if save()
- # raises an exception
- try:
- TiffImagePlugin.ImageFileDirectory_v2.__getitem__ = __getitem__
- im.save(newimgio, format="TIFF", compression="group4")
- finally:
- TiffImagePlugin.ImageFileDirectory_v2.__getitem__ = pillow__getitem__
+ with temp_attr(
+ TiffImagePlugin.ImageFileDirectory_v2, "__getitem__", __getitem__
+ ):
+ im.save(newimgio, format="TIFF", compression="group4")
# Open new image in memory
newimgio.seek(0)
@@ -1803,8 +1896,8 @@ def read_images(rawdata, colorspace, first_frame_only=False, rot=None):
smaskidat, _, _ = to_png_data(a)
logger.warning(
- "Image contains an alpha channel which will be stored "
- "as a separate soft mask (/SMask) image in PDF."
+ "Image contains an alpha channel. Computing a separate "
+ "soft mask (/SMask) image to store transparency in PDF."
)
elif color in [Colorspace.P, Colorspace.PA] and iccp is not None:
# PDF does not support palette images with icc profile
@@ -2135,7 +2228,11 @@ def get_fixed_dpi_layout_fun(fixed_dpi):
def find_scale(pagewidth, pageheight):
"""Find the power of 10 (10, 100, 1000...) that will reduce the scale
- below the PDF specification limit of 14400 PDF units (=200 inches)"""
+ below the PDF specification limit of 14400 PDF units (=200 inches).
+ In principle we could also choose a scale that is not a power of 10.
+ We use powers of 10 because numbers in the PDF format are represented
+ in base-10 and using powers of 10 will thus just shift the comma and
+ keep the numbers easily readable by humans as well."""
from math import log10, ceil
major = max(pagewidth, pageheight)
@@ -3481,7 +3578,7 @@ Examples:
$ img2pdf --output out.pdf --colorspace L input.jp2
-Written by Johannes 'josch' Schauer <josch@mister-muffin.de>
+Written by Johannes Schauer Marin Rodrigues <josch@mister-muffin.de>
Report bugs at https://gitlab.mister-muffin.de/josch/img2pdf/issues
"""
@@ -3886,7 +3983,9 @@ and left/right, respectively. It is not possible to specify asymmetric borders.
'Valid values are "single" (display single pages), "onecolumn" '
'(one continuous column), "twocolumnright" (two continuous '
'columns with odd number pages on the right) and "twocolumnleft" '
- "(two continuous columns with odd numbered pages on the left)",
+ "(two continuous columns with odd numbered pages on the left), "
+ '"twopageright" (two pages with odd numbered page on the right) '
+ 'and "twopageleft" (two pages with odd numbered page on the left)',
)
viewerargs.add_argument(
"--viewer-fit-window",
@@ -3937,7 +4036,7 @@ and left/right, respectively. It is not possible to specify asymmetric borders.
# On windows, each positional argument can expand into multiple paths
# because we do globbing ourselves. Here we flatten the list of lists
# again.
- images = chain.from_iterable(args.images)
+ images = list(chain.from_iterable(args.images))
elif len(args.images) == 0 and len(args.from_file) > 0:
images = args.from_file
diff --git a/src/img2pdf_test.py b/src/img2pdf_test.py
index 8d4f2b5..80dd8e0 100755
--- a/src/img2pdf_test.py
+++ b/src/img2pdf_test.py
@@ -20,6 +20,8 @@ import warnings
import json
import pathlib
+img2pdfprog = os.getenv("img2pdfprog", default="src/img2pdf.py")
+
ICC_PROFILE = None
ICC_PROFILE_PATHS = (
# Debian
@@ -2188,15 +2190,31 @@ def gif_palette8_img(tmp_path_factory, tmp_palette8_png):
@pytest.fixture(scope="session")
def gif_animation_img(tmp_path_factory, tmp_normal_png, tmp_inverse_png):
in_img = tmp_path_factory.mktemp("gif_animation_img") / "in.gif"
+ pal_img = tmp_path_factory.mktemp("gif_animation_img") / "pal.gif"
+ tmp_img = tmp_path_factory.mktemp("gif_animation_img") / "tmp.gif"
subprocess.check_call(
CONVERT
+ [
str(tmp_normal_png),
str(tmp_inverse_png),
- "-strip",
- str(in_img),
+ str(tmp_img),
]
)
+ # create palette image with all unique colors
+ subprocess.check_call(
+ CONVERT
+ + [
+ str(tmp_img),
+ "-unique-colors",
+ str(pal_img),
+ ]
+ )
+ # make sure all frames have the same palette by using -remap
+ subprocess.check_call(
+ CONVERT + [str(tmp_img), "-strip", "-remap", str(pal_img), str(in_img)]
+ )
+ pal_img.unlink()
+ tmp_img.unlink()
identify = json.loads(
subprocess.check_output(CONVERT + [str(in_img) + "[0]", "json:"])
)
@@ -2226,6 +2244,7 @@ def gif_animation_img(tmp_path_factory, tmp_normal_png, tmp_inverse_png):
"y": 0,
}, str(identify)
assert identify[0]["image"].get("compression") == "LZW", str(identify)
+ colormap_frame0 = identify[0]["image"].get("colormap")
identify = json.loads(
subprocess.check_output(CONVERT + [str(in_img) + "[1]", "json:"])
)
@@ -2256,6 +2275,8 @@ def gif_animation_img(tmp_path_factory, tmp_normal_png, tmp_inverse_png):
}, str(identify)
assert identify[0]["image"].get("compression") == "LZW", str(identify)
assert identify[0]["image"].get("scene") == 1, str(identify)
+ colormap_frame1 = identify[0]["image"].get("colormap")
+ assert colormap_frame0 == colormap_frame1
yield in_img
in_img.unlink()
@@ -3914,7 +3935,7 @@ def jpg_pdf(tmp_path_factory, jpg_img, request):
out_pdf = tmp_path_factory.mktemp("jpg_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -3941,7 +3962,7 @@ def jpg_rot_pdf(tmp_path_factory, jpg_rot_img, request):
out_pdf = tmp_path_factory.mktemp("jpg_rot_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -3969,7 +3990,7 @@ def jpg_cmyk_pdf(tmp_path_factory, jpg_cmyk_img, request):
out_pdf = tmp_path_factory.mktemp("jpg_cmyk_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -3999,7 +4020,7 @@ def jpg_2000_pdf(tmp_path_factory, jpg_2000_img, request):
out_pdf = tmp_path_factory.mktemp("jpg_2000_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4026,7 +4047,7 @@ def png_rgb8_pdf(tmp_path_factory, png_rgb8_img, request):
out_pdf = tmp_path_factory.mktemp("png_rgb8_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4056,7 +4077,7 @@ def png_rgba8_pdf(tmp_path_factory, png_rgba8_img, request):
out_pdf = tmp_path_factory.mktemp("png_rgba8_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4096,7 +4117,7 @@ def gif_transparent_pdf(tmp_path_factory, gif_transparent_img, request):
out_pdf = tmp_path_factory.mktemp("gif_transparent_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4136,7 +4157,7 @@ def png_rgb16_pdf(tmp_path_factory, png_rgb16_img, request):
out_pdf = tmp_path_factory.mktemp("png_rgb16_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4166,7 +4187,7 @@ def png_interlaced_pdf(tmp_path_factory, png_interlaced_img, request):
out_pdf = tmp_path_factory.mktemp("png_interlaced_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4196,7 +4217,7 @@ def png_gray1_pdf(tmp_path_factory, tmp_gray1_png, request):
out_pdf = tmp_path_factory.mktemp("png_gray1_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4226,7 +4247,7 @@ def png_gray2_pdf(tmp_path_factory, tmp_gray2_png, request):
out_pdf = tmp_path_factory.mktemp("png_gray2_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4256,7 +4277,7 @@ def png_gray4_pdf(tmp_path_factory, tmp_gray4_png, request):
out_pdf = tmp_path_factory.mktemp("png_gray4_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4286,7 +4307,7 @@ def png_gray8_pdf(tmp_path_factory, tmp_gray8_png, request):
out_pdf = tmp_path_factory.mktemp("png_gray8_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4316,7 +4337,7 @@ def png_gray8a_pdf(tmp_path_factory, png_gray8a_img, request):
out_pdf = tmp_path_factory.mktemp("png_gray8a_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4356,7 +4377,7 @@ def png_gray16_pdf(tmp_path_factory, tmp_gray16_png, request):
out_pdf = tmp_path_factory.mktemp("png_gray16_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4386,7 +4407,7 @@ def png_palette1_pdf(tmp_path_factory, tmp_palette1_png, request):
out_pdf = tmp_path_factory.mktemp("png_palette1_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4417,7 +4438,7 @@ def png_palette2_pdf(tmp_path_factory, tmp_palette2_png, request):
out_pdf = tmp_path_factory.mktemp("png_palette2_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4448,7 +4469,7 @@ def png_palette4_pdf(tmp_path_factory, tmp_palette4_png, request):
out_pdf = tmp_path_factory.mktemp("png_palette4_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4479,7 +4500,7 @@ def png_palette8_pdf(tmp_path_factory, tmp_palette8_png, request):
out_pdf = tmp_path_factory.mktemp("png_palette8_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4510,7 +4531,7 @@ def png_icc_pdf(tmp_path_factory, tmp_icc_png, tmp_icc_profile, request):
out_pdf = tmp_path_factory.mktemp("png_icc_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4546,7 +4567,7 @@ def gif_palette1_pdf(tmp_path_factory, gif_palette1_img, request):
out_pdf = tmp_path_factory.mktemp("gif_palette1_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4577,7 +4598,7 @@ def gif_palette2_pdf(tmp_path_factory, gif_palette2_img, request):
out_pdf = tmp_path_factory.mktemp("gif_palette2_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4608,7 +4629,7 @@ def gif_palette4_pdf(tmp_path_factory, gif_palette4_img, request):
out_pdf = tmp_path_factory.mktemp("gif_palette4_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4639,7 +4660,7 @@ def gif_palette8_pdf(tmp_path_factory, gif_palette8_img, request):
out_pdf = tmp_path_factory.mktemp("gif_palette8_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4671,7 +4692,7 @@ def gif_animation_pdf(tmp_path_factory, gif_animation_img, request):
out_pdf = tmpdir / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4710,7 +4731,7 @@ def tiff_cmyk8_pdf(tmp_path_factory, tiff_cmyk8_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_cmyk8_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4737,7 +4758,7 @@ def tiff_rgb8_pdf(tmp_path_factory, tiff_rgb8_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_rgb8_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4767,7 +4788,7 @@ def tiff_gray1_pdf(tmp_path_factory, tiff_gray1_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_gray1_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4798,7 +4819,7 @@ def tiff_gray2_pdf(tmp_path_factory, tiff_gray2_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_gray2_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4828,7 +4849,7 @@ def tiff_gray4_pdf(tmp_path_factory, tiff_gray4_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_gray4_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4858,7 +4879,7 @@ def tiff_gray8_pdf(tmp_path_factory, tiff_gray8_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_gray8_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4889,7 +4910,7 @@ def tiff_multipage_pdf(tmp_path_factory, tiff_multipage_img, request):
out_pdf = tmpdir / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4927,7 +4948,7 @@ def tiff_palette1_pdf(tmp_path_factory, tiff_palette1_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_palette1_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4958,7 +4979,7 @@ def tiff_palette2_pdf(tmp_path_factory, tiff_palette2_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_palette2_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -4989,7 +5010,7 @@ def tiff_palette4_pdf(tmp_path_factory, tiff_palette4_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_palette4_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -5020,7 +5041,7 @@ def tiff_palette8_pdf(tmp_path_factory, tiff_palette8_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_palette8_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -5053,7 +5074,7 @@ def tiff_ccitt_lsb_m2l_white_pdf(
out_pdf = tmp_path_factory.mktemp("tiff_ccitt_lsb_m2l_white_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -5086,7 +5107,7 @@ def tiff_ccitt_msb_m2l_white_pdf(
out_pdf = tmp_path_factory.mktemp("tiff_ccitt_msb_m2l_white_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -5119,7 +5140,7 @@ def tiff_ccitt_msb_l2m_white_pdf(
out_pdf = tmp_path_factory.mktemp("tiff_ccitt_msb_l2m_white_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -5152,7 +5173,7 @@ def tiff_ccitt_lsb_m2l_black_pdf(
out_pdf = tmp_path_factory.mktemp("tiff_ccitt_lsb_m2l_black_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -5183,7 +5204,7 @@ def tiff_ccitt_nometa1_pdf(tmp_path_factory, tiff_ccitt_nometa1_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_ccitt_nometa1_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -5214,7 +5235,7 @@ def tiff_ccitt_nometa2_pdf(tmp_path_factory, tiff_ccitt_nometa2_img, request):
out_pdf = tmp_path_factory.mktemp("tiff_ccitt_nometa2_pdf") / "out.pdf"
subprocess.check_call(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + request.param,
@@ -5373,7 +5394,7 @@ def test_png_rgba16(tmp_path_factory, png_rgba16_img, engine):
0
!= subprocess.run(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + engine,
@@ -5405,7 +5426,7 @@ def test_png_gray16a(tmp_path_factory, png_gray16a_img, engine):
0
!= subprocess.run(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + engine,
@@ -5645,7 +5666,7 @@ def test_tiff_float(tmp_path_factory, tiff_float_img, engine):
0
!= subprocess.run(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + engine,
@@ -5683,7 +5704,7 @@ def test_tiff_cmyk16(tmp_path_factory, tiff_cmyk16_img, engine):
0
!= subprocess.run(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + engine,
@@ -5719,7 +5740,7 @@ def test_tiff_rgb12(tmp_path_factory, tiff_rgb12_img, engine):
0
!= subprocess.run(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + engine,
@@ -5743,7 +5764,7 @@ def test_tiff_rgb14(tmp_path_factory, tiff_rgb14_img, engine):
0
!= subprocess.run(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + engine,
@@ -5767,7 +5788,7 @@ def test_tiff_rgb16(tmp_path_factory, tiff_rgb16_img, engine):
0
!= subprocess.run(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + engine,
@@ -5790,7 +5811,7 @@ def test_tiff_rgba8(tmp_path_factory, tiff_rgba8_img, engine):
0
!= subprocess.run(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + engine,
@@ -5813,7 +5834,7 @@ def test_tiff_rgba16(tmp_path_factory, tiff_rgba16_img, engine):
0
!= subprocess.run(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + engine,
@@ -5884,7 +5905,7 @@ def test_tiff_gray16(tmp_path_factory, tiff_gray16_img, engine):
0
!= subprocess.run(
[
- "src/img2pdf.py",
+ img2pdfprog,
"--producer=",
"--nodate",
"--engine=" + engine,
@@ -6710,29 +6731,6 @@ def test_general(general_input, engine):
y = pikepdf.open(out)
pydictx = rec(x.Root)
pydicty = rec(y.Root)
- if f.endswith(os.path.sep + "animation.gif"):
- # starting with PIL 8.2.0 the palette is half the size when encoding
- # our test GIF image as PNG
- #
- # to still compare successfully, we truncate the expected palette
- import PIL
-
- if PIL.__version__ >= "8.2.0":
- assert len(pydictx["/Pages"]["/Kids"]) == 2
- for p in pydictx["/Pages"]["/Kids"]:
- assert p["/Resources"]["/XObject"]["/Im0"]["/ColorSpace"][2] == 127
- assert len(pydicty["/Pages"]["/Kids"]) == 2
- for p in pydicty["/Pages"]["/Kids"]:
- cs = p["/Resources"]["/XObject"]["/Im0"]["/ColorSpace"]
- cs[2] = decimal.Decimal("127")
- cs[3] = cs[3][:384]
- else:
- assert (
- pydictx["/Pages"]["/Kids"][0]["/Resources"]["/XObject"]["/Im0"][
- "/ColorSpace"
- ][2]
- == 255
- )
assert pydictx == pydicty
# the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the
# close() method
diff --git a/src/jp2.py b/src/jp2.py
index 1f99a5c..ae54746 100644
--- a/src/jp2.py
+++ b/src/jp2.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python
#
-# Copyright (C) 2013 Johannes 'josch' Schauer <j.schauer at email.de>
+# Copyright (C) 2013 Johannes Schauer Marin Rodrigues <j.schauer at email.de>
#
# this module is heavily based upon jpylyzer which is
# KB / National Library of the Netherlands, Open Planets Foundation
diff --git a/src/tests/input/animation.gif b/src/tests/input/animation.gif
index af4b278..d60a237 100644
--- a/src/tests/input/animation.gif
+++ b/src/tests/input/animation.gif
Binary files differ
diff --git a/src/tests/output/animation.gif.pdf b/src/tests/output/animation.gif.pdf
index fdfd460..2af1ba4 100644
--- a/src/tests/output/animation.gif.pdf
+++ b/src/tests/output/animation.gif.pdf
Binary files differ