diff options
author | Johannes 'josch' Schauer <josch@debian.org> | 2020-04-05 20:23:55 +0200 |
---|---|---|
committer | Johannes 'josch' Schauer <josch@debian.org> | 2020-04-05 20:23:55 +0200 |
commit | 140ed3b81e844b06f82bb5819fe335b514b2aed4 (patch) | |
tree | 81a5733683330dee8788300c1ea130bf2823ec5c /src/img2pdf.egg-info/PKG-INFO |
Import Upstream version 0.3.3
Diffstat (limited to 'src/img2pdf.egg-info/PKG-INFO')
-rw-r--r-- | src/img2pdf.egg-info/PKG-INFO | 245 |
1 files changed, 245 insertions, 0 deletions
diff --git a/src/img2pdf.egg-info/PKG-INFO b/src/img2pdf.egg-info/PKG-INFO new file mode 100644 index 0000000..7553591 --- /dev/null +++ b/src/img2pdf.egg-info/PKG-INFO @@ -0,0 +1,245 @@ +Metadata-Version: 2.1 +Name: img2pdf +Version: 0.3.3 +Summary: Convert images to PDF via direct JPEG inclusion. +Home-page: https://gitlab.mister-muffin.de/josch/img2pdf +Author: Johannes 'josch' Schauer +Author-email: josch@mister-muffin.de +License: LGPL +Download-URL: https://gitlab.mister-muffin.de/josch/img2pdf/repository/archive.tar.gz?ref=0.3.3 +Description: img2pdf + ======= + + Lossless conversion of raster images to PDF. You should use img2pdf if your + priorities are (in this order): + + 1. **always lossless**: the image embedded in the PDF will always have the + exact same color information for every pixel as the input + 2. **small**: if possible, the difference in filesize between the input image + and the output PDF will only be the overhead of the PDF container itself + 3. **fast**: if possible, the input image is just pasted into the PDF document + as-is without any CPU hungry re-encoding of the pixel data + + Conventional conversion software (like ImageMagick) would either: + + 1. not be lossless because lossy re-encoding to JPEG + 2. not be small because using wasteful flate encoding of raw pixel data + 3. not be fast because input data gets re-encoded + + Another advantage of not having to re-encode the input (in most common + situations) is, that img2pdf is able to handle much larger input than other + software, because the raw pixel data never has to be loaded into memory. + + The following table shows how img2pdf handles different input depending on the + input file format and image color space. + + | Format | Colorspace | Result | + | -------------------- | ------------------------------ | ------------- | + | JPEG | any | direct | + | JPEG2000 | any | direct | + | PNG (non-interlaced) | any | direct | + | TIFF (CCITT Group 4) | monochrome | direct | + | any | any except CMYK and monochrome | PNG Paeth | + | any | monochrome | CCITT Group 4 | + | any | CMYK | flate | + + For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group 4 + encoded data, img2pdf directly embeds the image data into the PDF without + re-encoding it. It thus treats the PDF format merely as a container format for + the image data. In these cases, img2pdf only increases the filesize by the size + of the PDF container (typically around 500 to 700 bytes). Since data is only + copied and not re-encoded, img2pdf is also typically faster than other + solutions for these input formats. + + For all other input types, img2pdf first has to transform the pixel data to + make it compatible with PDF. In most cases, the PNG Paeth filter is applied to + the pixel data. For monochrome input, CCITT Group 4 is used instead. Only for + CMYK input no filter is applied before finally applying flate compression. + + Usage + ----- + + The images must be provided as files because img2pdf needs to seek in the file + descriptor. + + If no output file is specified with the `-o`/`--output` option, output will be + done to stdout. A typical invocation is: + + $ img2pdf img1.png img2.jpg -o out.pdf + + The detailed documentation can be accessed by running: + + $ img2pdf --help + + Bugs + ---- + + - If you find a JPEG, JPEG2000, PNG or CCITT Group 4 encoded TIFF file that, + when embedded into the PDF cannot be read by the Adobe Acrobat Reader, + please contact me. + + - I have not yet figured out how to determine the colorspace of JPEG2000 + files. Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 + files with other colorspaces, you must explicitly specify it using the + `--colorspace` option. + + - Input images with alpha channels are not allowed. PDF doesn't support alpha + channels in images and thus, the alpha channel of the input would have to be + discarded. But img2pdf will always be lossless and thus, input images must + not carry transparency information. + + - img2pdf uses PIL (or Pillow) to obtain image meta data and to convert the + input if necessary. To prevent decompression bomb denial of service attacks, + Pillow limits the maximum number of pixels an input image is allowed to + have. If you are sure that you know what you are doing, then you can disable + this safeguard by passing the `--pillow-limit-break` option to img2pdf. This + allows one to process even very large input images. + + Installation + ------------ + + On a Debian- and Ubuntu-based systems, img2pdf can be installed from the + official repositories: + + $ apt install img2pdf + + If you want to install it using pip, you can run: + + $ pip3 install img2pdf + + If you prefer to install from source code use: + + $ cd img2pdf/ + $ pip3 install . + + To test the console script without installing the package on your system, + use virtualenv: + + $ cd img2pdf/ + $ virtualenv ve + $ ve/bin/pip3 install . + + You can then test the converter using: + + $ ve/bin/img2pdf -o test.pdf src/tests/test.jpg + + The package can also be used as a library: + + import img2pdf + + # opening from filename + with open("name.pdf","wb") as f: + f.write(img2pdf.convert('test.jpg')) + + # opening from file handle + with open("name.pdf","wb") as f1, open("test.jpg") as f2: + f1.write(img2pdf.convert(f2)) + + # using in-memory image data + with open("name.pdf","wb") as f: + f.write(img2pdf.convert("\x89PNG...") + + # multiple inputs (variant 1) + with open("name.pdf","wb") as f: + f.write(img2pdf.convert("test1.jpg", "test2.png")) + + # multiple inputs (variant 2) + with open("name.pdf","wb") as f: + f.write(img2pdf.convert(["test1.jpg", "test2.png"])) + + # writing to file descriptor + with open("name.pdf","wb") as f1, open("test.jpg") as f2: + img2pdf.convert(f2, outputstream=f1) + + # specify paper size (A4) + a4inpt = (img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297)) + layout_fun = img2pdf.get_layout_fun(a4inpt) + with open("name.pdf","wb") as f: + f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun)) + + Comparison to ImageMagick + ------------------------- + + Create a large test image: + + $ convert logo: -resize 8000x original.jpg + + Convert it into PDF using ImageMagick and img2pdf: + + $ time img2pdf original.jpg -o img2pdf.pdf + $ time convert original.jpg imagemagick.pdf + + Notice how ImageMagick took an order of magnitude longer to do the conversion + than img2pdf. It also used twice the memory. + + Now extract the image data from both PDF documents and compare it to the + original: + + $ pdfimages -all img2pdf.pdf tmp + $ compare -metric AE original.jpg tmp-000.jpg null: + 0 + $ pdfimages -all imagemagick.pdf tmp + $ compare -metric AE original.jpg tmp-000.jpg null: + 118716 + + To get lossless output with ImageMagick we can use Zip compression but that + unnecessarily increases the size of the output: + + $ convert original.jpg -compress Zip imagemagick.pdf + $ pdfimages -all imagemagick.pdf tmp + $ compare -metric AE original.jpg tmp-000.png null: + 0 + $ stat --format="%s %n" original.jpg img2pdf.pdf imagemagick.pdf + 1535837 original.jpg + 1536683 img2pdf.pdf + 9397809 imagemagick.pdf + + Comparison to pdfLaTeX + ---------------------- + + pdfLaTeX performs a lossless conversion from included images to PDF by default. + If the input is a JPEG, then it simply embeds the JPEG into the PDF in the same + way as img2pdf does it. But for other image formats it uses flate compression + of the plain pixel data and thus needlessly increases the output file size: + + $ convert logo: -resize 8000x original.png + $ cat << END > pdflatex.tex + \documentclass{article} + \usepackage{graphicx} + \begin{document} + \includegraphics{original.png} + \end{document} + END + $ pdflatex pdflatex.tex + $ stat --format="%s %n" original.png pdflatex.pdf + 4500182 original.png + 9318120 pdflatex.pdf + + Comparison to Tesseract OCR + --------------------------- + + Tesseract OCR comes closest to the functionality img2pdf provides. It is able + to convert JPEG and PNG input to PDF without needlessly increasing the filesize + and is at the same time lossless. So if your input is JPEG and PNG images, then + you should safely be able to use Tesseract instead of img2pdf. For other input, + Tesseract might not do a lossless conversion. For example it converts CMYK + input to RGB and removes the alpha channel from images with transparency. For + multipage TIFF or animated GIF, it will only convert the first frame. + +Keywords: jpeg pdf converter +Platform: UNKNOWN +Classifier: Development Status :: 5 - Production/Stable +Classifier: Intended Audience :: Developers +Classifier: Intended Audience :: Other Audience +Classifier: Environment :: Console +Classifier: Programming Language :: Python +Classifier: Programming Language :: Python :: 2 +Classifier: Programming Language :: Python :: 2.7 +Classifier: Programming Language :: Python :: 3 +Classifier: Programming Language :: Python :: 3.5 +Classifier: Programming Language :: Python :: Implementation :: CPython +Classifier: Programming Language :: Python :: Implementation :: PyPy +Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3) +Classifier: Natural Language :: English +Classifier: Operating System :: OS Independent +Provides-Extra: test |