summaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
authorJohannes Schauer <josch@debian.org>2016-08-16 16:35:49 +0200
committerJohannes Schauer <josch@debian.org>2016-08-16 16:35:49 +0200
commit65960d846bed2bbc1a83269b715bb78e83e36478 (patch)
tree49728d11d3f9269ad1856c1eaaf32b6a4a933028 /src
Import img2pdf_0.2.1.orig.tar.gz
[dgit import orig img2pdf_0.2.1.orig.tar.gz]
Diffstat (limited to 'src')
-rw-r--r--src/img2pdf.egg-info/PKG-INFO164
-rw-r--r--src/img2pdf.egg-info/SOURCES.txt23
-rw-r--r--src/img2pdf.egg-info/dependency_links.txt1
-rw-r--r--src/img2pdf.egg-info/entry_points.txt4
-rw-r--r--src/img2pdf.egg-info/pbr.json1
-rw-r--r--src/img2pdf.egg-info/requires.txt1
-rw-r--r--src/img2pdf.egg-info/top_level.txt2
-rw-r--r--src/img2pdf.egg-info/zip-safe1
-rwxr-xr-xsrc/img2pdf.py1650
-rw-r--r--src/jp2.py124
-rw-r--r--src/tests/__init__.py557
-rw-r--r--src/tests/input/CMYK.jpgbin0 -> 4788 bytes
-rw-r--r--src/tests/input/normal.jpgbin0 -> 2348 bytes
-rw-r--r--src/tests/input/normal.pngbin0 -> 1130 bytes
-rw-r--r--src/tests/output/CMYK.jpg.pdfbin0 -> 5560 bytes
-rw-r--r--src/tests/output/CMYK.tif.pdfbin0 -> 1724 bytes
-rw-r--r--src/tests/output/normal.jpg.pdfbin0 -> 3091 bytes
-rw-r--r--src/tests/output/normal.png.pdfbin0 -> 1573 bytes
18 files changed, 2528 insertions, 0 deletions
diff --git a/src/img2pdf.egg-info/PKG-INFO b/src/img2pdf.egg-info/PKG-INFO
new file mode 100644
index 0000000..b18e9d6
--- /dev/null
+++ b/src/img2pdf.egg-info/PKG-INFO
@@ -0,0 +1,164 @@
+Metadata-Version: 1.1
+Name: img2pdf
+Version: 0.2.1
+Summary: Convert images to PDF via direct JPEG inclusion.
+Home-page: https://gitlab.mister-muffin.de/josch/img2pdf
+Author: Johannes 'josch' Schauer
+Author-email: josch@mister-muffin.de
+License: LGPL
+Download-URL: https://gitlab.mister-muffin.de/josch/img2pdf/repository/archive.tar.gz?ref=0.2.1
+Description: img2pdf
+ =======
+
+ Losslessly convert raster images to PDF. The file size will not unnecessarily
+ increase. One major application would be a number of scans made in JPEG format
+ which should now become part of a single PDF document. Existing solutions
+ would either re-encode the input JPEG files (leading to quality loss) or store
+ them in the zip/flate format which results into the PDF becoming unnecessarily
+ large in terms of its file size.
+
+ Background
+ ----------
+
+ Quality loss can be avoided when converting JPEG and JPEG2000 images to PDF by
+ embedding them without re-encoding. I wrote this piece of python code.
+ because I was missing a tool to do this automatically. Img2pdf basically just
+ wraps JPEG images into the PDF container as they are.
+
+ If you know an existing tool which allows one to embed JPEG and JPEG2000 images
+ into a PDF container without recompression, please contact me so that I can put
+ this code into the garbage bin.
+
+ Functionality
+ -------------
+
+ This program will take a list of images and produce a PDF file with the images
+ embedded in it. JPEG and JPEG2000 images will be included without
+ recompression. Raster images in other formats will be included with zip/flate
+ encoding which usually leads to an increase in the resulting size because
+ formats like png compress better than PDF which just zip/flate compresses the
+ RGB data. As a result, this tool is able to losslessly wrap images into a PDF
+ container with a quality to filesize ratio that is typically better (in case of
+ JPEG and JPEG2000 images) or equal (in case of other formats) than that of
+ existing tools.
+
+ For example, imagemagick will re-encode the input JPEG image (thus changing
+ its content):
+
+ $ convert img.jpg img.pdf
+ $ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
+ $ compare -metric AE img.jpg img.extr-000.ppm null:
+ 1.6301e+06
+
+ If one wants to losslessly convert from any format to PDF with
+ imagemagick, one has to use zip compression:
+
+ $ convert input.jpg -compress Zip output.pdf
+ $ pdfimages img.pdf img.extr # not using -j to be extra sure there is no recompression
+ $ compare -metric AE img.jpg img.extr-000.ppm null:
+ 0
+
+ However, this approach will result in PDF files that are a few times larger
+ than the input JPEG or JPEG2000 file.
+
+ img2pdf is able to losslessly embed JPEG and JPEG2000 files into a PDF
+ container without additional overhead (aside from the PDF structure itself),
+ save other graphics formats using lossless zip compression, and produce
+ multi-page PDF files when more than one input image is given.
+
+ Also, since JPEG and JPEG2000 images are not reencoded, conversion with img2pdf
+ is several times faster than with other tools.
+
+ Usage
+ -----
+
+ The images must be provided as files because img2pdf needs to seek in the file
+ descriptor.
+
+ If no output file is specified with the `-o`/`--output` option, output will be
+ done to stdout.
+
+ The detailed documentation can be accessed by running:
+
+ img2pdf --help
+
+
+ Bugs
+ ----
+
+ If you find a JPEG or JPEG2000 file that, when embedded cannot be read
+ by the Adobe Acrobat Reader, please contact me.
+
+ For lossless conversion of formats other than JPEG or JPEG2000, zip/flate
+ encoding is used. This choice is based on tests I did with a number of images.
+ I converted them into PDF using the lossless variants of the compression
+ formats offered by imagemagick. In all my tests, zip/flate encoding performed
+ best. You can verify my findings using the test_comp.sh script with any input
+ image given as a commandline argument. If you find an input file that is
+ outperformed by another lossless compression method, contact me.
+
+ I have not yet figured out how to determine the colorspace of JPEG2000 files.
+ Therefore JPEG2000 files use DeviceRGB by default. For JPEG2000 files with
+ other colorspaces, you must explicitly specify it using the `--colorspace`
+ option.
+
+ It might be possible to store transparency using masks but it is not clear
+ what the utility of such a functionality would be.
+
+ Most vector graphic formats can be losslessly turned into PDF (minus some of
+ the features unsupported by PDF) but img2pdf will currently turn vector
+ graphics into their lossy raster representations. For converting raster
+ graphics to PDF, use another tool like inkscape and then join the resulting
+ pages with a tool like pdftk.
+
+ A configuration file could be used for default options.
+
+ Installation
+ ------------
+
+ On a Debian- and Ubuntu-based systems, dependencies may be installed
+ with the following command:
+
+ apt-get install python3 python3-pil python3-setuptools
+
+ You can then install the package using:
+
+ $ pip install img2pdf
+
+ If you prefer to install from source code use:
+
+ $ cd img2pdf/
+ $ pip install .
+
+ To test the console script without installing the package on your system,
+ use virtualenv:
+
+ $ cd img2pdf/
+ $ virtualenv ve
+ $ ve/bin/pip install .
+
+ You can then test the converter using:
+
+ $ ve/bin/img2pdf -o test.pdf src/tests/test.jpg
+
+ The package can also be used as a library:
+
+ import img2pdf
+ pdf_bytes = img2pdf.convert('test.jpg')
+
+ file = open("name.pdf","wb")
+ file.write(pdf_bytes)
+
+Keywords: jpeg pdf converter
+Platform: UNKNOWN
+Classifier: Development Status :: 5 - Production/Stable
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Other Audience
+Classifier: Environment :: Console
+Classifier: Programming Language :: Python
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.4
+Classifier: Programming Language :: Python :: Implementation :: CPython
+Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
+Classifier: Natural Language :: English
+Classifier: Operating System :: OS Independent
diff --git a/src/img2pdf.egg-info/SOURCES.txt b/src/img2pdf.egg-info/SOURCES.txt
new file mode 100644
index 0000000..192589d
--- /dev/null
+++ b/src/img2pdf.egg-info/SOURCES.txt
@@ -0,0 +1,23 @@
+MANIFEST.in
+README.md
+setup.cfg
+setup.py
+test_comp.sh
+src/img2pdf.py
+src/jp2.py
+src/img2pdf.egg-info/PKG-INFO
+src/img2pdf.egg-info/SOURCES.txt
+src/img2pdf.egg-info/dependency_links.txt
+src/img2pdf.egg-info/entry_points.txt
+src/img2pdf.egg-info/pbr.json
+src/img2pdf.egg-info/requires.txt
+src/img2pdf.egg-info/top_level.txt
+src/img2pdf.egg-info/zip-safe
+src/tests/__init__.py
+src/tests/input/CMYK.jpg
+src/tests/input/normal.jpg
+src/tests/input/normal.png
+src/tests/output/CMYK.jpg.pdf
+src/tests/output/CMYK.tif.pdf
+src/tests/output/normal.jpg.pdf
+src/tests/output/normal.png.pdf \ No newline at end of file
diff --git a/src/img2pdf.egg-info/dependency_links.txt b/src/img2pdf.egg-info/dependency_links.txt
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/src/img2pdf.egg-info/dependency_links.txt
@@ -0,0 +1 @@
+
diff --git a/src/img2pdf.egg-info/entry_points.txt b/src/img2pdf.egg-info/entry_points.txt
new file mode 100644
index 0000000..59301dc
--- /dev/null
+++ b/src/img2pdf.egg-info/entry_points.txt
@@ -0,0 +1,4 @@
+
+ [console_scripts]
+ img2pdf = img2pdf:main
+ \ No newline at end of file
diff --git a/src/img2pdf.egg-info/pbr.json b/src/img2pdf.egg-info/pbr.json
new file mode 100644
index 0000000..bc27bf9
--- /dev/null
+++ b/src/img2pdf.egg-info/pbr.json
@@ -0,0 +1 @@
+{"is_release": false, "git_version": "d78b2cb"} \ No newline at end of file
diff --git a/src/img2pdf.egg-info/requires.txt b/src/img2pdf.egg-info/requires.txt
new file mode 100644
index 0000000..7e2fba5
--- /dev/null
+++ b/src/img2pdf.egg-info/requires.txt
@@ -0,0 +1 @@
+Pillow
diff --git a/src/img2pdf.egg-info/top_level.txt b/src/img2pdf.egg-info/top_level.txt
new file mode 100644
index 0000000..0636fd7
--- /dev/null
+++ b/src/img2pdf.egg-info/top_level.txt
@@ -0,0 +1,2 @@
+img2pdf
+jp2
diff --git a/src/img2pdf.egg-info/zip-safe b/src/img2pdf.egg-info/zip-safe
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/src/img2pdf.egg-info/zip-safe
@@ -0,0 +1 @@
+
diff --git a/src/img2pdf.py b/src/img2pdf.py
new file mode 100755
index 0000000..2042d13
--- /dev/null
+++ b/src/img2pdf.py
@@ -0,0 +1,1650 @@
+#!/usr/bin/env python3
+
+# Copyright (C) 2012-2014 Johannes 'josch' Schauer <j.schauer at email.de>
+#
+# This program is free software: you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation, either
+# version 3 of the License, or (at your option) any later
+# version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public
+# License along with this program. If not, see
+# <http://www.gnu.org/licenses/>.
+
+import sys
+import os
+import zlib
+import argparse
+from PIL import Image
+from datetime import datetime
+from jp2 import parsejp2
+from enum import Enum
+from io import BytesIO
+import logging
+
+__version__ = "0.2.1"
+default_dpi = 96.0
+papersizes = {
+ "letter": "8.5inx11in",
+ "a0": "841mmx1189mm",
+ "a1": "594mmx841mm",
+ "a2": "420mmx594mm",
+ "a3": "297mmx420mm",
+ "a4": "210mmx297mm",
+ "a5": "148mmx210mm",
+ "a6": "105mmx148mm",
+}
+papernames = {
+ "letter": "Letter",
+ "a0": "A0",
+ "a1": "A1",
+ "a2": "A2",
+ "a3": "A3",
+ "a4": "A4",
+ "a5": "A5",
+ "a6": "A6",
+}
+
+
+FitMode = Enum('FitMode', 'into fill exact shrink enlarge')
+
+PageOrientation = Enum('PageOrientation', 'portrait landscape')
+
+Colorspace = Enum('Colorspace', 'RGB L 1 CMYK CMYK;I RGBA P other')
+
+ImageFormat = Enum('ImageFormat', 'JPEG JPEG2000 other')
+
+PageMode = Enum('PageMode', 'none outlines thumbs')
+
+PageLayout = Enum('PageLayout',
+ 'single onecolumn twocolumnright twocolumnleft')
+
+Magnification = Enum('Magnification', 'fit fith fitbh')
+
+ImgSize = Enum('ImgSize', 'abs perc dpi')
+
+Unit = Enum('Unit', 'pt cm mm inch')
+
+ImgUnit = Enum('ImgUnit', 'pt cm mm inch perc dpi')
+
+
+class NegativeDimensionError(Exception):
+ pass
+
+
+class UnsupportedColorspaceError(Exception):
+ pass
+
+
+class ImageOpenError(Exception):
+ pass
+
+
+class JpegColorspaceError(Exception):
+ pass
+
+
+class PdfTooLargeError(Exception):
+ pass
+
+
+# without pdfrw this function is a no-op
+def my_convert_load(string):
+ return string
+
+
+def parse(cont, indent=1):
+ if type(cont) is dict:
+ return b"<<\n"+b"\n".join(
+ [4 * indent * b" " + k + b" " + parse(v, indent+1)
+ for k, v in sorted(cont.items())])+b"\n"+4*(indent-1)*b" "+b">>"
+ elif type(cont) is int:
+ return str(cont).encode()
+ elif type(cont) is float:
+ if int(cont) == cont:
+ return parse(int(cont))
+ else:
+ return ("%0.4f" % cont).rstrip("0").encode()
+ elif isinstance(cont, MyPdfDict):
+ # if cont got an identifier, then addobj() has been called with it
+ # and a link to it will be added, otherwise add it inline
+ if hasattr(cont, "identifier"):
+ return ("%d 0 R" % cont.identifier).encode()
+ else:
+ return parse(cont.content, indent)
+ elif type(cont) is str or isinstance(cont, bytes):
+ if type(cont) is str and type(cont) is not bytes:
+ raise TypeError(
+ "parse must be passed a bytes object in py3. Got: %s" % cont)
+ return cont
+ elif isinstance(cont, list):
+ return b"[ "+b" ".join([parse(c, indent) for c in cont])+b" ]"
+ else:
+ raise TypeError("cannot handle type %s with content %s" % (type(cont),
+ cont))
+
+
+class MyPdfDict(object):
+ def __init__(self, *args, **kw):
+ self.content = dict()
+ if args:
+ if len(args) == 1:
+ args = args[0]
+ self.content.update(args)
+ self.stream = None
+ for key, value in kw.items():
+ if key == "stream":
+ self.stream = value
+ self.content[MyPdfName.Length] = len(value)
+ elif key == "indirect":
+ pass
+ else:
+ self.content[getattr(MyPdfName, key)] = value
+
+ def tostring(self):
+ if self.stream is not None:
+ return (
+ ("%d 0 obj\n" % self.identifier).encode() +
+ parse(self.content) +
+ b"\nstream\n" + self.stream + b"\nendstream\nendobj\n")
+ else:
+ return ("%d 0 obj\n" % self.identifier).encode() + \
+ parse(self.content) + b"\nendobj\n"
+
+ def __setitem__(self, key, value):
+ self.content[key] = value
+
+ def __getitem__(self, key):
+ return self.content[key]
+
+
+class MyPdfName():
+ def __getattr__(self, name):
+ return b'/' + name.encode('ascii')
+MyPdfName = MyPdfName()
+
+
+class MyPdfObject(bytes):
+ def __new__(cls, string):
+ return bytes.__new__(cls, string.encode('ascii'))
+
+
+class MyPdfArray(list):
+ pass
+
+
+class MyPdfWriter():
+ def __init__(self, version="1.3"):
+ self.objects = []
+ # create an incomplete pages object so that a /Parent entry can be
+ # added to each page
+ self.pages = MyPdfDict(Type=MyPdfName.Pages, Kids=[], Count=0)
+ self.catalog = MyPdfDict(Pages=self.pages, Type=MyPdfName.Catalog)
+ self.version = version # default pdf version 1.3
+ self.pagearray = []
+
+ def addobj(self, obj):
+ newid = len(self.objects)+1
+ obj.identifier = newid
+ self.objects.append(obj)
+
+ def tostream(self, info, stream):
+ xreftable = list()
+
+ # justification of the random binary garbage in the header from
+ # adobe:
+ #
+ # > Note: If a PDF file contains binary data, as most do (see Section
+ # > 3.1, “Lexical Conventions”), it is recommended that the header
+ # > line be immediately followed by a comment line containing at
+ # > least four binary characters—that is, characters whose codes are
+ # > 128 or greater. This ensures proper behavior of file transfer
+ # > applications that inspect data near the beginning of a file to
+ # > determine whether to treat the file’s contents as text or as
+ # > binary.
+ #
+ # the choice of binary characters is arbitrary but those four seem to
+ # be used elsewhere.
+ pdfheader = ('%%PDF-%s\n' % self.version).encode('ascii')
+ pdfheader += b'%\xe2\xe3\xcf\xd3\n'
+ stream.write(pdfheader)
+
+ # From section 3.4.3 of the PDF Reference (version 1.7):
+ #
+ # > Each entry is exactly 20 bytes long, including the end-of-line
+ # > marker.
+ # >
+ # > [...]
+ # >
+ # > The format of an in-use entry is
+ # > nnnnnnnnnn ggggg n eol
+ # > where
+ # > nnnnnnnnnn is a 10-digit byte offset
+ # > ggggg is a 5-digit generation number
+ # > n is a literal keyword identifying this as an in-use entry
+ # > eol is a 2-character end-of-line sequence
+ # >
+ # > [...]
+ # >
+ # > If the file’s end-of-line marker is a single character (either a
+ # > carriage return or a line feed), it is preceded by a single space;
+ #
+ # Since we chose to use a single character eol marker, we precede it by
+ # a space
+ pos = len(pdfheader)
+ xreftable.append(b"0000000000 65535 f \n")
+ for o in self.objects:
+ xreftable.append(("%010d 00000 n \n" % pos).encode())
+ content = o.tostring()
+ stream.write(content)
+ pos += len(content)
+
+ xrefoffset = pos
+ stream.write(b"xref\n")
+ stream.write(("0 %d\n" % len(xreftable)).encode())
+ for x in xreftable:
+ stream.write(x)
+ stream.write(b"trailer\n")
+ stream.write(parse({b"/Size": len(xreftable), b"/Info": info,
+ b"/Root": self.catalog})+b"\n")
+ stream.write(b"startxref\n")
+ stream.write(("%d\n" % xrefoffset).encode())
+ stream.write(b"%%EOF\n")
+ return
+
+ def addpage(self, page):
+ page[b"/Parent"] = self.pages
+ self.pagearray.append(page)
+ self.pages.content[b"/Kids"].append(page)
+ self.pages.content[b"/Count"] += 1
+ self.addobj(page)
+
+
+class MyPdfString():
+ @classmethod
+ def encode(cls, string):
+ try:
+ string = string.encode('ascii')
+ except UnicodeEncodeError:
+ string = b"\xfe\xff"+string.encode("utf-16-be")
+ string = string.replace(b'\\', b'\\\\')
+ string = string.replace(b'(', b'\\(')
+ string = string.replace(b')', b'\\)')
+ return b'(' + string + b')'
+
+
+class pdfdoc(object):
+ def __init__(self, version="1.3", title=None, author=None, creator=None,
+ producer=None, creationdate=None, moddate=None, subject=None,
+ keywords=None, nodate=False, panes=None, initial_page=None,
+ magnification=None, page_layout=None, fit_window=False,
+ center_window=False, fullscreen=False, with_pdfrw=True):
+ if with_pdfrw:
+ try:
+ from pdfrw import PdfWriter, PdfDict, PdfName, PdfString
+ self.with_pdfrw = True
+ except ImportError:
+ PdfWriter = MyPdfWriter
+ PdfDict = MyPdfDict
+ PdfName = MyPdfName
+ PdfString = MyPdfString
+ self.with_pdfrw = False
+ else:
+ PdfWriter = MyPdfWriter
+ PdfDict = MyPdfDict
+ PdfName = MyPdfName
+ PdfString = MyPdfString
+ self.with_pdfrw = False
+
+ now = datetime.now()
+ self.info = PdfDict(indirect=True)
+
+ def datetime_to_pdfdate(dt):
+ return dt.strftime("%Y%m%d%H%M%SZ")
+
+ if title is not None:
+ self.info[PdfName.Title] = PdfString.encode(title)
+ if author is not None:
+ self.info[PdfName.Author] = PdfString.encode(author)
+ if creator is not None:
+ self.info[PdfName.Creator] = PdfString.encode(creator)
+ if producer is not None:
+ self.info[PdfName.Producer] = PdfString.encode(producer)
+ if creationdate is not None:
+ self.info[PdfName.CreationDate] = \
+ PdfString.encode("D:"+datetime_to_pdfdate(creationdate))
+ elif not nodate:
+ self.info[PdfName.CreationDate] = \
+ PdfString.encode("D:"+datetime_to_pdfdate(now))
+ if moddate is not None:
+ self.info[PdfName.ModDate] = \
+ PdfString.encode("D:"+datetime_to_pdfdate(moddate))
+ elif not nodate:
+ self.info[PdfName.ModDate] = PdfString.encode(
+ "D:"+datetime_to_pdfdate(now))
+ if subject is not None:
+ self.info[PdfName.Subject] = PdfString.encode(subject)
+ if keywords is not None:
+ self.info[PdfName.Keywords] = PdfString.encode(",".join(keywords))
+
+ self.writer = PdfWriter()
+ self.writer.version = version
+ # this is done because pdfrw adds info, catalog and pages as the first
+ # three objects in this order
+ if not self.with_pdfrw:
+ self.writer.addobj(self.info)
+ self.writer.addobj(self.writer.catalog)
+ self.writer.addobj(self.writer.pages)
+
+ self.panes = panes
+ self.initial_page = initial_page
+ self.magnification = magnification
+ self.page_layout = page_layout
+ self.fit_window = fit_window
+ self.center_window = center_window
+ self.fullscreen = fullscreen
+
+ def add_imagepage(self, color, imgwidthpx, imgheightpx, imgformat, imgdata,
+ imgwidthpdf, imgheightpdf, imgxpdf, imgypdf, pagewidth,
+ pageheight):
+ if self.with_pdfrw:
+ from pdfrw import PdfDict, PdfName
+ from pdfrw.py23_diffs import convert_load
+ else:
+ PdfDict = MyPdfDict
+ PdfName = MyPdfName
+ convert_load = my_convert_load
+
+ if color == Colorspace.L:
+ colorspace = PdfName.DeviceGray
+ elif color == Colorspace.RGB:
+ colorspace = PdfName.DeviceRGB
+ elif color == Colorspace.CMYK or color == Colorspace['CMYK;I']:
+ colorspace = PdfName.DeviceCMYK
+ else:
+ raise UnsupportedColorspaceError("unsupported color space: %s"
+ % color.name)
+
+ # either embed the whole jpeg or deflate the bitmap representation
+ if imgformat is ImageFormat.JPEG:
+ ofilter = [PdfName.DCTDecode]
+ elif imgformat is ImageFormat.JPEG2000:
+ ofilter = [PdfName.JPXDecode]
+ self.writer.version = "1.5" # jpeg2000 needs pdf 1.5
+ else:
+ ofilter = [PdfName.FlateDecode]
+
+ image = PdfDict(stream=convert_load(imgdata))
+
+ image[PdfName.Type] = PdfName.XObject
+ image[PdfName.Subtype] = PdfName.Image
+ image[PdfName.Filter] = ofilter
+ image[PdfName.Width] = imgwidthpx
+ image[PdfName.Height] = imgheightpx
+ image[PdfName.ColorSpace] = colorspace
+ # hardcoded as PIL doesn't provide bits for non-jpeg formats
+ image[PdfName.BitsPerComponent] = 8
+
+ if color == Colorspace['CMYK;I']:
+ # Inverts all four channels
+ image[PdfName.Decode] = [1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0]
+
+ text = ("q\n%0.4f 0 0 %0.4f %0.4f %0.4f cm\n/Im0 Do\nQ" %
+ (imgwidthpdf, imgheightpdf, imgxpdf, imgypdf)).encode("ascii")
+
+ content = PdfDict(stream=convert_load(text))
+ resources = PdfDict(XObject=PdfDict(Im0=image))
+
+ page = PdfDict(indirect=True)
+ page[PdfName.Type] = PdfName.Page
+ page[PdfName.MediaBox] = [0, 0, pagewidth, pageheight]
+ page[PdfName.Resources] = resources
+ page[PdfName.Contents] = content
+
+ self.writer.addpage(page)
+
+ if not self.with_pdfrw:
+ self.writer.addobj(content)
+ self.writer.addobj(image)
+
+ def tostring(self):
+ stream = BytesIO()
+ self.tostream(stream)
+ return stream.getvalue()
+
+ def tostream(self, outputstream):
+ if self.with_pdfrw:
+ from pdfrw import PdfDict, PdfName, PdfArray, PdfObject
+ else:
+ PdfDict = MyPdfDict
+ PdfName = MyPdfName
+ PdfObject = MyPdfObject
+ PdfArray = MyPdfArray
+ NullObject = PdfObject('null')
+ TrueObject = PdfObject('true')
+
+ # We fill the catalog with more information like /ViewerPreferences,
+ # /PageMode, /PageLayout or /OpenAction because the latter refers to a
+ # page object which has to be present so that we can get its id.
+ #
+ # Furthermore, if using pdfrw, the trailer is cleared every time a page
+ # is added, so we can only start using it after all pages have been
+ # written.
+
+ if self.with_pdfrw:
+ catalog = self.writer.trailer.Root
+ else:
+ catalog = self.writer.catalog
+
+ if self.fullscreen or self.fit_window or self.center_window or \
+ self.panes is not None:
+ catalog[PdfName.ViewerPreferences] = PdfDict()
+
+ if self.fullscreen:
+ # this setting might be overwritten later by the page mode
+ catalog[PdfName.ViewerPreferences][PdfName.NonFullScreenPageMode] \
+ = PdfName.UseNone
+
+ if self.panes == PageMode.thumbs:
+ catalog[PdfName.ViewerPreferences][PdfName.NonFullScreenPageMode] \
+ = PdfName.UseThumbs
+ # this setting might be overwritten later if fullscreen
+ catalog[PdfName.PageMode] = PdfName.UseThumbs
+ elif self.panes == PageMode.outlines:
+ catalog[PdfName.ViewerPreferences][PdfName.NonFullScreenPageMode] \
+ = PdfName.UseOutlines
+ # this setting might be overwritten later if fullscreen
+ catalog[PdfName.PageMode] = PdfName.UseOutlines
+ elif self.panes in [PageMode.none, None]:
+ pass
+ else:
+ raise ValueError("unknown page mode: %s" % self.panes)
+
+ if self.fit_window:
+ catalog[PdfName.ViewerPreferences][PdfName.FitWindow] = TrueObject
+
+ if self.center_window:
+ catalog[PdfName.ViewerPreferences][PdfName.CenterWindow] = \
+ TrueObject
+
+ if self.fullscreen:
+ catalog[PdfName.PageMode] = PdfName.FullScreen
+
+ # see table 8.2 in section 8.2.1 in
+ # http://partners.adobe.com/public/developer/en/pdf/PDFReference16.pdf
+ # Fit - Fits the page to the window.
+ # FitH - Fits the width of the page to the window.
+ # FitV - Fits the height of the page to the window.
+ # FitR - Fits the rectangle specified by the four coordinates to the
+ # window.
+ # FitB - Fits the page bounding box to the window. This basically
+ # reduces the amount of whitespace (margins) that is displayed
+ # and thus focussing more on the text content.
+ # FitBH - Fits the width of the page bounding box to the window.
+ # FitBV - Fits the height of the page bounding box to the window.
+
+ # by default the initial page is the first one
+ initial_page = self.writer.pagearray[0]
+ # we set the open action here to make sure we open on the requested
+ # initial page but this value might be overwritten by a custom open
+ # action later while still taking the requested initial page into
+ # account
+ if self.initial_page is not None:
+ initial_page = self.writer.pagearray[self.initial_page - 1]
+ catalog[PdfName.OpenAction] = PdfArray([initial_page, PdfName.XYZ,
+ NullObject, NullObject, 0])
+
+ if self.magnification == Magnification.fit:
+ catalog[PdfName.OpenAction] = PdfArray([initial_page, PdfName.Fit])
+ elif self.magnification == Magnification.fith:
+ pagewidth = initial_page[PdfName.MediaBox][2]
+ catalog[PdfName.OpenAction] = PdfArray(
+ [initial_page, PdfName.FitH, pagewidth])
+ elif self.magnification == Magnification.fitbh:
+ # quick hack to determine the image width on the page
+ imgwidth = float(initial_page[PdfName.Contents].stream.split()[4])
+ catalog[PdfName.OpenAction] = PdfArray(
+ [initial_page, PdfName.FitBH, imgwidth])
+ elif isinstance(self.magnification, float):
+ catalog[PdfName.OpenAction] = PdfArray(
+ [initial_page, PdfName.XYZ, NullObject, NullObject,
+ self.magnification])
+ elif self.magnification is None:
+ pass
+ else:
+ raise ValueError("unknown magnification: %s" % self.magnification)
+
+ if self.page_layout == PageLayout.single:
+ catalog[PdfName.PageLayout] = PdfName.SinglePage
+ elif self.page_layout == PageLayout.onecolumn:
+ catalog[PdfName.PageLayout] = PdfName.OneColumn
+ elif self.page_layout == PageLayout.twocolumnright:
+ catalog[PdfName.PageLayout] = PdfName.TwoColumnRight
+ elif self.page_layout == PageLayout.twocolumnleft:
+ catalog[PdfName.PageLayout] = PdfName.TwoColumnLeft
+ elif self.page_layout is None:
+ pass
+ else:
+ raise ValueError("unknown page layout: %s" % self.page_layout)
+
+ # now write out the PDF
+ if self.with_pdfrw:
+ self.writer.trailer.Info = self.info
+ self.writer.write(outputstream)
+ else:
+ self.writer.tostream(self.info, outputstream)
+
+
+def get_imgmetadata(imgdata, imgformat, default_dpi, colorspace, rawdata=None):
+ if imgformat == ImageFormat.JPEG2000 \
+ and rawdata is not None and imgdata is None:
+ # this codepath gets called if the PIL installation is not able to
+ # handle JPEG2000 files
+ imgwidthpx, imgheightpx, ics, hdpi, vdpi = parsejp2(rawdata)
+
+ if hdpi is None:
+ hdpi = default_dpi
+ if vdpi is None:
+ vdpi = default_dpi
+ ndpi = (hdpi, vdpi)
+ else:
+ imgwidthpx, imgheightpx = imgdata.size
+
+ ndpi = imgdata.info.get("dpi", (default_dpi, default_dpi))
+ # In python3, the returned dpi value for some tiff images will
+ # not be an integer but a float. To make the behaviour of
+ # img2pdf the same between python2 and python3, we convert that
+ # float into an integer by rounding.
+ # Search online for the 72.009 dpi problem for more info.
+ ndpi = (int(round(ndpi[0])), int(round(ndpi[1])))
+ ics = imgdata.mode
+
+ logging.debug("input dpi = %d x %d", *ndpi)
+
+ if colorspace:
+ color = colorspace
+ logging.debug("input colorspace (forced) = %s", color)
+ else:
+ color = None
+ for c in Colorspace:
+ if c.name == ics:
+ color = c
+ if color is None:
+ color = Colorspace.other
+ if color == Colorspace.CMYK and imgformat == ImageFormat.JPEG:
+ # Adobe inverts CMYK JPEGs for some reason, and others
+ # have followed suit as well. Some software assumes the
+ # JPEG is inverted if the Adobe tag (APP14), while other
+ # software assumes all CMYK JPEGs are inverted. I don't
+ # have enough experience with these to know which is
+ # better for images currently in the wild, so I'm going
+ # with the first approach for now.
+ if "adobe" in imgdata.info:
+ color = Colorspace['CMYK;I']
+ logging.debug("input colorspace = %s", color.name)
+
+ logging.debug("width x height = %dpx x %dpx", imgwidthpx, imgheightpx)
+
+ return (color, ndpi, imgwidthpx, imgheightpx)
+
+
+def read_images(rawdata, colorspace, first_frame_only=False):
+ im = BytesIO(rawdata)
+ im.seek(0)
+ imgdata = None
+ try:
+ imgdata = Image.open(im)
+ except IOError as e:
+ # test if it is a jpeg2000 image
+ if rawdata[:12] != "\x00\x00\x00\x0C\x6A\x50\x20\x20\x0D\x0A\x87\x0A":
+ raise ImageOpenError("cannot read input image (not jpeg2000). "
+ "PIL: error reading image: %s" % e)
+ # image is jpeg2000
+ imgformat = ImageFormat.JPEG2000
+ else:
+ imgformat = None
+ for f in ImageFormat:
+ if f.name == imgdata.format:
+ imgformat = f
+ if imgformat is None:
+ imgformat = ImageFormat.other
+
+ logging.debug("imgformat = %s", imgformat.name)
+
+ # depending on the input format, determine whether to pass the raw
+ # image or the zlib compressed color information
+ if imgformat == ImageFormat.JPEG or imgformat == ImageFormat.JPEG2000:
+ color, ndpi, imgwidthpx, imgheightpx = get_imgmetadata(
+ imgdata, imgformat, default_dpi, colorspace, rawdata)
+ if color == Colorspace['1']:
+ raise JpegColorspaceError("jpeg can't be monochrome")
+ if color == Colorspace['P']:
+ raise JpegColorspaceError("jpeg can't have a color palette")
+ if color == Colorspace['RGBA']:
+ raise JpegColorspaceError("jpeg can't have an alpha channel")
+ im.close()
+ return [(color, ndpi, imgformat, rawdata, imgwidthpx, imgheightpx)]
+ else:
+ result = []
+ img_page_count = 0
+ # loop through all frames of the image (example: multipage TIFF)
+ while True:
+ try:
+ imgdata.seek(img_page_count)
+ except EOFError:
+ break
+
+ if first_frame_only and img_page_count > 0:
+ break
+
+ logging.debug("Converting frame: %d" % img_page_count)
+
+ color, ndpi, imgwidthpx, imgheightpx = get_imgmetadata(
+ imgdata, imgformat, default_dpi, colorspace)
+
+ # because we do not support /CCITTFaxDecode
+ if color == Colorspace['1']:
+ logging.debug("Converting colorspace 1 to L")
+ newimg = imgdata.convert('L')
+ color = Colorspace.L
+ elif color in [Colorspace.RGB, Colorspace.L, Colorspace.CMYK,
+ Colorspace["CMYK;I"]]:
+ logging.debug("Colorspace is OK: %s", color)
+ newimg = imgdata
+ elif color in [Colorspace.RGBA, Colorspace.P, Colorspace.other]:
+ logging.debug("Converting colorspace %s to RGB", color)
+ newimg = imgdata.convert('RGB')
+ color = Colorspace.RGB
+ else:
+ raise ValueError("unknown colorspace: %s" % color.name)
+ imggz = zlib.compress(newimg.tobytes())
+ result.append((color, ndpi, imgformat, imggz, imgwidthpx,
+ imgheightpx))
+ img_page_count += 1
+ # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the
+ # close() method
+ try:
+ imgdata.close()
+ except AttributeError:
+ pass
+ im.close()
+ return result
+
+
+# converts a length in pixels to a length in PDF units (1/72 of an inch)
+def px_to_pt(length, dpi):
+ return 72*length/dpi
+
+
+def cm_to_pt(length):
+ return (72*length)/2.54
+
+
+def mm_to_pt(length):
+ return (72*length)/25.4
+
+
+def in_to_pt(length):
+ return 72*length
+
+
+def get_layout_fun(pagesize, imgsize, border, fit, auto_orient):
+ def fitfun(fit, imgwidth, imgheight, fitwidth, fitheight):
+ if fitwidth is None and fitheight is None:
+ raise ValueError("fitwidth and fitheight cannot both be None")
+ # if fit is fill or enlarge then it is okay if one of the dimensions
+ # are negative but one of them must still be positive
+ # if fit is not fill or enlarge then both dimensions must be positive
+ if fit in [FitMode.fill, FitMode.enlarge] and \
+ fitwidth is not None and fitwidth < 0 and \
+ fitheight is not None and fitheight < 0:
+ raise ValueError("cannot fit into a rectangle where both "
+ "dimensions are negative")
+ elif fit not in [FitMode.fill, FitMode.enlarge] and \
+ ((fitwidth is not None and fitwidth < 0) or
+ (fitheight is not None and fitheight < 0)):
+ raise Exception("cannot fit into a rectangle where either "
+ "dimensions are negative")
+
+ def default():
+ if fitwidth is not None and fitheight is not None:
+ newimgwidth = fitwidth
+ newimgheight = (newimgwidth * imgheight)/imgwidth
+ if newimgheight > fitheight:
+ newimgheight = fitheight
+ newimgwidth = (newimgheight * imgwidth)/imgheight
+ elif fitwidth is None and fitheight is not None:
+ newimgheight = fitheight
+ newimgwidth = (newimgheight * imgwidth)/imgheight
+ elif fitheight is None and fitwidth is not None:
+ newimgwidth = fitwidth
+ newimgheight = (newimgwidth * imgheight)/imgwidth
+ else:
+ raise ValueError("fitwidth and fitheight cannot both be None")
+ return newimgwidth, newimgheight
+ if fit is None or fit == FitMode.into:
+ return default()
+ elif fit == FitMode.fill:
+ if fitwidth is not None and fitheight is not None:
+ newimgwidth = fitwidth
+ newimgheight = (newimgwidth * imgheight)/imgwidth
+ if newimgheight < fitheight:
+ newimgheight = fitheight
+ newimgwidth = (newimgheight * imgwidth)/imgheight
+ elif fitwidth is None and fitheight is not None:
+ newimgheight = fitheight
+ newimgwidth = (newimgheight * imgwidth)/imgheight
+ elif fitheight is None and fitwidth is not None:
+ newimgwidth = fitwidth
+ newimgheight = (newimgwidth * imgheight)/imgwidth
+ else:
+ raise ValueError("fitwidth and fitheight cannot both be None")
+ return newimgwidth, newimgheight
+ elif fit == FitMode.exact:
+ if fitwidth is not None and fitheight is not None:
+ return fitwidth, fitheight
+ elif fitwidth is None and fitheight is not None:
+ newimgheight = fitheight
+ newimgwidth = (newimgheight * imgwidth)/imgheight
+ elif fitheight is None and fitwidth is not None:
+ newimgwidth = fitwidth
+ newimgheight = (newimgwidth * imgheight)/imgwidth
+ else:
+ raise ValueError("fitwidth and fitheight cannot both be None")
+ return newimgwidth, newimgheight
+ elif fit == FitMode.shrink:
+ if fitwidth is not None and fitheight is not None:
+ if imgwidth <= fitwidth and imgheight <= fitheight:
+ return imgwidth, imgheight
+ elif fitwidth is None and fitheight is not None:
+ if imgheight <= fitheight:
+ return imgwidth, imgheight
+ elif fitheight is None and fitwidth is not None:
+ if imgwidth <= fitwidth:
+ return imgwidth, imgheight
+ else:
+ raise ValueError("fitwidth and fitheight cannot both be None")
+ return default()
+ elif fit == FitMode.enlarge:
+ if fitwidth is not None and fitheight is not None:
+ if imgwidth > fitwidth or imgheight > fitheight:
+ return imgwidth, imgheight
+ elif fitwidth is None and fitheight is not None:
+ if imgheight > fitheight:
+ return imgwidth, imgheight
+ elif fitheight is None and fitwidth is not None:
+ if imgwidth > fitwidth:
+ return imgwidth, imgheight
+ else:
+ raise ValueError("fitwidth and fitheight cannot both be None")
+ return default()
+ else:
+ raise NotImplementedError
+ # if no layout arguments are given, then the image size is equal to the
+ # page size and will be drawn with the default dpi
+ if pagesize is None and imgsize is None and border is None:
+ return default_layout_fun
+ if pagesize is None and imgsize is None and border is not None:
+ def layout_fun(imgwidthpx, imgheightpx, ndpi):
+ imgwidthpdf = px_to_pt(imgwidthpx, ndpi[0])
+ imgheightpdf = px_to_pt(imgheightpx, ndpi[1])
+ pagewidth = imgwidthpdf+2*border[1]
+ pageheight = imgheightpdf+2*border[0]
+ return pagewidth, pageheight, imgwidthpdf, imgheightpdf
+ return layout_fun
+ if border is None:
+ border = (0, 0)
+ # if the pagesize is given but the imagesize is not, then the imagesize
+ # will be calculated from the pagesize, taking into account the border
+ # and the fitting
+ if pagesize is not None and imgsize is None:
+ def layout_fun(imgwidthpx, imgheightpx, ndpi):
+ if pagesize[0] is not None and pagesize[1] is not None and \
+ auto_orient and \
+ ((imgwidthpx > imgheightpx and
+ pagesize[0] < pagesize[1]) or
+ (imgwidthpx < imgheightpx and pagesize[0] > pagesize[1])):
+ pagewidth, pageheight = pagesize[1], pagesize[0]
+ newborder = border[1], border[0]
+ else:
+ pagewidth, pageheight = pagesize[0], pagesize[1]
+ newborder = border
+ if pagewidth is not None:
+ fitwidth = pagewidth-2*newborder[1]
+ else:
+ fitwidth = None
+ if pageheight is not None:
+ fitheight = pageheight-2*newborder[0]
+ else:
+ fitheight = None
+ if fit in [FitMode.fill, FitMode.enlarge] and \
+ fitwidth is not None and fitwidth < 0 and \
+ fitheight is not None and fitheight < 0:
+ raise NegativeDimensionError(
+ "at least one border dimension musts be smaller than half "
+ "the respective page dimension")
+ elif fit not in [FitMode.fill, FitMode.enlarge] \
+ and ((fitwidth is not None and fitwidth < 0) or
+ (fitheight is not None and fitheight < 0)):
+ raise NegativeDimensionError(
+ "one border dimension is larger than half of the "
+ "respective page dimension")
+ imgwidthpdf, imgheightpdf = \
+ fitfun(fit, px_to_pt(imgwidthpx, ndpi[0]),
+ px_to_pt(imgheightpx, ndpi[1]),
+ fitwidth, fitheight)
+ if pagewidth is None:
+ pagewidth = imgwidthpdf+border[1]*2
+ if pageheight is None:
+ pageheight = imgheightpdf+border[0]*2
+ return pagewidth, pageheight, imgwidthpdf, imgheightpdf
+ return layout_fun
+
+ def scale_imgsize(s, px, dpi):
+ if s is None:
+ return None
+ mode, value = s
+ if mode == ImgSize.abs:
+ return value
+ if mode == ImgSize.perc:
+ return (px_to_pt(px, dpi)*value)/100
+ if mode == ImgSize.dpi:
+ return px_to_pt(px, value)
+ raise NotImplementedError
+ if pagesize is None and imgsize is not None:
+ def layout_fun(imgwidthpx, imgheightpx, ndpi):
+ imgwidthpdf, imgheightpdf = \
+ fitfun(fit, px_to_pt(imgwidthpx, ndpi[0]),
+ px_to_pt(imgheightpx, ndpi[1]),
+ scale_imgsize(imgsize[0], imgwidthpx, ndpi[0]),
+ scale_imgsize(imgsize[1], imgheightpx, ndpi[1]))
+ pagewidth = imgwidthpdf+2*border[1]
+ pageheight = imgheightpdf+2*border[0]
+ return pagewidth, pageheight, imgwidthpdf, imgheightpdf
+ return layout_fun
+ if pagesize is not None and imgsize is not None:
+ def layout_fun(imgwidthpx, imgheightpx, ndpi):
+ if pagesize[0] is not None and pagesize[1] is not None and \
+ auto_orient and \
+ ((imgwidthpx > imgheightpx and
+ pagesize[0] < pagesize[1]) or
+ (imgwidthpx < imgheightpx and pagesize[0] > pagesize[1])):
+ pagewidth, pageheight = pagesize[1], pagesize[0]
+ else:
+ pagewidth, pageheight = pagesize[0], pagesize[1]
+ imgwidthpdf, imgheightpdf = \
+ fitfun(fit, px_to_pt(imgwidthpx, ndpi[0]),
+ px_to_pt(imgheightpx, ndpi[1]),
+ scale_imgsize(imgsize[0], imgwidthpx, ndpi[0]),
+ scale_imgsize(imgsize[1], imgheightpx, ndpi[1]))
+ return pagewidth, pageheight, imgwidthpdf, imgheightpdf
+ return layout_fun
+ raise NotImplementedError
+
+
+def default_layout_fun(imgwidthpx, imgheightpx, ndpi):
+ imgwidthpdf = pagewidth = px_to_pt(imgwidthpx, ndpi[0])
+ imgheightpdf = pageheight = px_to_pt(imgheightpx, ndpi[1])
+ return pagewidth, pageheight, imgwidthpdf, imgheightpdf
+
+
+def get_fixed_dpi_layout_fun(fixed_dpi):
+ """Layout function that overrides whatever DPI is claimed in input images.
+
+ >>> layout_fun = get_fixed_dpi_layout_fun((300, 300))
+ >>> convert(image1, layout_fun=layout_fun, ... outputstream=...)
+ """
+ def fixed_dpi_layout_fun(imgwidthpx, imgheightpx, ndpi):
+ return default_layout_fun(imgwidthpx, imgheightpx, fixed_dpi)
+ return fixed_dpi_layout_fun
+
+
+# given one or more input image, depending on outputstream, either return a
+# string containing the whole PDF if outputstream is None or write the PDF
+# data to the given file-like object and return None
+#
+# Input images can be given as file like objects (they must implement read()),
+# as a binary string representing the image content or as filenames to the
+# images.
+def convert(*images, title=None,
+ author=None, creator=None, producer=None, creationdate=None,
+ moddate=None, subject=None, keywords=None, colorspace=None,
+ nodate=False, layout_fun=default_layout_fun, viewer_panes=None,
+ viewer_initial_page=None, viewer_magnification=None,
+ viewer_page_layout=None, viewer_fit_window=False,
+ viewer_center_window=False, viewer_fullscreen=False,
+ with_pdfrw=True, outputstream=None, first_frame_only=False):
+
+ pdf = pdfdoc("1.3", title, author, creator, producer, creationdate,
+ moddate, subject, keywords, nodate, viewer_panes,
+ viewer_initial_page, viewer_magnification, viewer_page_layout,
+ viewer_fit_window, viewer_center_window, viewer_fullscreen,
+ with_pdfrw)
+
+ for img in images:
+ # img is allowed to be a path, a binary string representing image data
+ # or a file-like object (really anything that implements read())
+ try:
+ rawdata = img.read()
+ except AttributeError:
+ # the thing doesn't have a read() function, so try if we can treat
+ # it as a file name
+ try:
+ with open(img, "rb") as f:
+ rawdata = f.read()
+ except:
+ # whatever the exception is (string could contain NUL
+ # characters or the path could just not exist) it's not a file
+ # name so we now try treating it as raw image content
+ rawdata = img
+
+ for color, ndpi, imgformat, imgdata, imgwidthpx, imgheightpx \
+ in read_images(rawdata, colorspace, first_frame_only):
+ pagewidth, pageheight, imgwidthpdf, imgheightpdf = \
+ layout_fun(imgwidthpx, imgheightpx, ndpi)
+ if pagewidth < 3.00 or pageheight < 3.00:
+ logging.warning("pdf width or height is below 3.00 - too "
+ "small for some viewers!")
+ elif pagewidth > 14400.0 or pageheight > 14400.0:
+ raise PdfTooLargeError(
+ "pdf width or height must not exceed 200 inches.")
+ # the image is always centered on the page
+ imgxpdf = (pagewidth - imgwidthpdf)/2.0
+ imgypdf = (pageheight - imgheightpdf)/2.0
+ pdf.add_imagepage(color, imgwidthpx, imgheightpx, imgformat,
+ imgdata, imgwidthpdf, imgheightpdf, imgxpdf,
+ imgypdf, pagewidth, pageheight)
+
+ if outputstream:
+ pdf.tostream(outputstream)
+ return
+
+ return pdf.tostring()
+
+
+def parse_num(num, name):
+ if num == '':
+ return None
+ unit = None
+ if num.endswith("pt"):
+ unit = Unit.pt
+ elif num.endswith("cm"):
+ unit = Unit.cm
+ elif num.endswith("mm"):
+ unit = Unit.mm
+ elif num.endswith("in"):
+ unit = Unit.inch
+ else:
+ try:
+ num = float(num)
+ except ValueError:
+ msg = "%s is not a floating point number and doesn't have a " \
+ "valid unit: %s" % (name, num)
+ raise argparse.ArgumentTypeError(msg)
+ if unit is None:
+ unit = Unit.pt
+ else:
+ num = num[:-2]
+ try:
+ num = float(num)
+ except ValueError:
+ msg = "%s is not a floating point number: %s" % (name, num)
+ raise argparse.ArgumentTypeError(msg)
+ if unit == Unit.cm:
+ num = cm_to_pt(num)
+ elif unit == Unit.mm:
+ num = mm_to_pt(num)
+ elif unit == Unit.inch:
+ num = in_to_pt(num)
+ return num
+
+
+def parse_imgsize_num(num, name):
+ if num == '':
+ return None
+ unit = None
+ if num.endswith("pt"):
+ unit = ImgUnit.pt
+ elif num.endswith("cm"):
+ unit = ImgUnit.cm
+ elif num.endswith("mm"):
+ unit = ImgUnit.mm
+ elif num.endswith("in"):
+ unit = ImgUnit.inch
+ elif num.endswith("dpi"):
+ unit = ImgUnit.dpi
+ elif num.endswith("%"):
+ unit = ImgUnit.perc
+ else:
+ try:
+ num = float(num)
+ except ValueError:
+ msg = "%s is not a floating point number and doesn't have a " \
+ "valid unit: %s" % (name, num)
+ raise argparse.ArgumentTypeError(msg)
+ if unit is None:
+ unit = ImgUnit.pt
+ else:
+ # strip off unit from string
+ if unit == ImgUnit.dpi:
+ num = num[:-3]
+ elif unit == ImgUnit.perc:
+ num = num[:-1]
+ else:
+ num = num[:-2]
+ try:
+ num = float(num)
+ except ValueError:
+ msg = "%s is not a floating point number: %s" % (name, num)
+ raise argparse.ArgumentTypeError(msg)
+ if unit == ImgUnit.cm:
+ num = (ImgSize.abs, cm_to_pt(num))
+ elif unit == ImgUnit.mm:
+ num = (ImgSize.abs, mm_to_pt(num))
+ elif unit == ImgUnit.inch:
+ num = (ImgSize.abs, in_to_pt(num))
+ elif unit == ImgUnit.pt:
+ num = (ImgSize.abs, num)
+ elif unit == ImgUnit.dpi:
+ num = (ImgSize.dpi, num)
+ elif unit == ImgUnit.perc:
+ num = (ImgSize.perc, num)
+ return num
+
+
+def parse_pagesize_rectarg(string):
+ transposed = string.endswith("^T")
+ if transposed:
+ string = string[:-2]
+ if papersizes.get(string.lower()):
+ string = papersizes[string.lower()]
+ if 'x' not in string:
+ # if there is no separating "x" in the string, then the string is
+ # interpreted as the width
+ w = parse_num(string, "width")
+ h = None
+ else:
+ w, h = string.split('x', 1)
+ w = parse_num(w, "width")
+ h = parse_num(h, "height")
+ if transposed:
+ w, h = h, w
+ if w is None and h is None:
+ raise argparse.ArgumentTypeError("at least one dimension must be "
+ "specified")
+ return w, h
+
+
+def parse_imgsize_rectarg(string):
+ transposed = string.endswith("^T")
+ if transposed:
+ string = string[:-2]
+ if papersizes.get(string.lower()):
+ string = papersizes[string.lower()]
+ if 'x' not in string:
+ # if there is no separating "x" in the string, then the string is
+ # interpreted as the width
+ w = parse_imgsize_num(string, "width")
+ h = None
+ else:
+ w, h = string.split('x', 1)
+ w = parse_imgsize_num(w, "width")
+ h = parse_imgsize_num(h, "height")
+ if transposed:
+ w, h = h, w
+ if w is None and h is None:
+ raise argparse.ArgumentTypeError("at least one dimension must be "
+ "specified")
+ return w, h
+
+
+def parse_colorspacearg(string):
+ for c in Colorspace:
+ if c.name == string:
+ return c
+ allowed = ", ".join([c.name for c in Colorspace])
+ raise argparse.ArgumentTypeError("Unsupported colorspace: %s. Must be one "
+ "of: %s." % (string, allowed))
+
+
+def parse_borderarg(string):
+ if ':' in string:
+ h, v = string.split(':', 1)
+ if h == '':
+ raise argparse.ArgumentTypeError("missing value before colon")
+ if v == '':
+ raise argparse.ArgumentTypeError("missing value after colon")
+ else:
+ if string == '':
+ raise argparse.ArgumentTypeError("border option cannot be empty")
+ h, v = string, string
+ h, v = parse_num(h, "left/right border"), parse_num(v, "top/bottom border")
+ if h is None and v is None:
+ raise argparse.ArgumentTypeError("missing value")
+ return h, v
+
+
+def input_images(path):
+ if path == '-':
+ # we slurp in all data from stdin because we need to seek in it later
+ result = sys.stdin.buffer.read()
+ if len(result) == 0:
+ raise argparse.ArgumentTypeError("\"%s\" is empty" % path)
+ else:
+ try:
+ if os.path.getsize(path) == 0:
+ raise argparse.ArgumentTypeError("\"%s\" is empty" % path)
+ # test-read a byte from it so that we can abort early in case
+ # we cannot read data from the file
+ with open(path, "rb") as im:
+ im.read(1)
+ except IsADirectoryError:
+ raise argparse.ArgumentTypeError(
+ "\"%s\" is a directory" % path)
+ except PermissionError:
+ raise argparse.ArgumentTypeError(
+ "\"%s\" permission denied" % path)
+ except FileNotFoundError:
+ raise argparse.ArgumentTypeError(
+ "\"%s\" does not exist" % path)
+ result = path
+ return result
+
+
+def parse_fitarg(string):
+ for m in FitMode:
+ if m.name == string.lower():
+ return m
+ raise argparse.ArgumentTypeError("unknown fit mode: %s" % string)
+
+
+def parse_panes(string):
+ for m in PageMode:
+ if m.name == string.lower():
+ return m
+ allowed = ", ".join([m.name for m in PageMode])
+ raise argparse.ArgumentTypeError("Unsupported page mode: %s. Must be one "
+ "of: %s." % (string, allowed))
+
+
+def parse_magnification(string):
+ for m in Magnification:
+ if m.name == string.lower():
+ return m
+ try:
+ return float(string)
+ except ValueError:
+ pass
+ allowed = ", ".join([m.name for m in Magnification])
+ raise argparse.ArgumentTypeError("Unsupported magnification: %s. Must be "
+ "a floating point number or one of: %s." %
+ (string, allowed))
+
+
+def parse_layout(string):
+ for l in PageLayout:
+ if l.name == string.lower():
+ return l
+ allowed = ", ".join([l.name for l in PageLayout])
+ raise argparse.ArgumentTypeError("Unsupported page layout: %s. Must be "
+ "one of: %s." % (string, allowed))
+
+
+def valid_date(string):
+ # first try parsing in ISO8601 format
+ try:
+ return datetime.strptime(string, "%Y-%m-%d")
+ except ValueError:
+ pass
+ try:
+ return datetime.strptime(string, "%Y-%m-%dT%H:%M")
+ except ValueError:
+ pass
+ try:
+ return datetime.strptime(string, "%Y-%m-%dT%H:%M:%S")
+ except ValueError:
+ pass
+ # then try dateutil
+ try:
+ from dateutil import parser
+ except ImportError:
+ pass
+ else:
+ try:
+ return parser.parse(string)
+ except TypeError:
+ pass
+ # as a last resort, try the local date utility
+ try:
+ import subprocess
+ except ImportError:
+ pass
+ else:
+ try:
+ utime = subprocess.check_output(["date", "--date", string, "+%s"])
+ except subprocess.CalledProcessError:
+ pass
+ else:
+ return datetime.utcfromtimestamp(int(utime))
+ raise argparse.ArgumentTypeError("cannot parse date: %s" % string)
+
+
+def main():
+ rendered_papersizes = ""
+ for k, v in sorted(papersizes.items()):
+ rendered_papersizes += " %-8s %s\n" % (papernames[k], v)
+
+ parser = argparse.ArgumentParser(
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ description='''\
+Losslessly convert raster images to PDF without re-encoding JPEG and JPEG2000
+images. This leads to a lossless conversion of JPEG and JPEG2000 images with
+the only added file size coming from the PDF container itself.
+
+Other raster graphics formats are losslessly stored in a zip/flate encoding of
+their RGB representation. This might increase file size and does not store
+transparency. There is nothing that can be done about that until the PDF format
+allows embedding other image formats like PNG. Thus, img2pdf is primarily
+useful to convert JPEG and JPEG2000 images to PDF.
+
+The output is sent to standard output so that it can be redirected into a file
+or to another program as part of a shell pipe. To directly write the output
+into a file, use the -o or --output option.
+''',
+ epilog='''\
+Colorspace
+
+ Currently, the colorspace must be forced for JPEG 2000 images that are not in
+ the RGB colorspace. Available colorspace options are based on Python Imaging
+ Library (PIL) short handles.
+
+ RGB RGB color
+ L Grayscale
+ 1 Black and white (internally converted to grayscale)
+ CMYK CMYK color
+ CMYK;I CMYK color with inversion (for CMYK JPEG files from Adobe)
+
+Paper sizes
+
+ You can specify the short hand paper size names shown in the first column in
+ the table below as arguments to the --pagesize and --imgsize options. The
+ width and height they are mapping to is shown in the second column. Giving
+ the value in the second column has the same effect as giving the short hand
+ in the first column. Appending ^T (a caret/circumflex followed by the letter
+ T) turns the paper size from portrait into landscape. The postfix thus
+ symbolizes the transpose. The values are case insensitive.
+
+%s
+
+Fit options
+
+ The img2pdf options for the --fit argument are shown in the first column in
+ the table below. The function of these options can be mapped to the geometry
+ operators of imagemagick. For users who are familiar with imagemagick, the
+ corresponding operator is shown in the second column. The third column shows
+ whether or not the aspect ratio is preserved for that option (same as in
+ imagemagick). Just like imagemagick, img2pdf tries hard to preserve the
+ aspect ratio, so if the --fit argument is not given, then the default is
+ "into" which corresponds to the absence of any operator in imagemagick.
+ The value of the --fit option is case insensitive.
+
+ into | | Y | The default. Width and height values specify maximum
+ | | | values.
+ ---------+---+---+----------------------------------------------------------
+ fill | ^ | Y | Width and height values specify the minimum values.
+ ---------+---+---+----------------------------------------------------------
+ exact | ! | N | Width and height emphatically given.
+ ---------+---+---+----------------------------------------------------------
+ shrink | > | Y | Shrinks an image with dimensions larger than the given
+ | | | ones (and otherwise behaves like "into").
+ ---------+---+---+----------------------------------------------------------
+ enlarge | < | Y | Enlarges an image with dimensions smaller than the given
+ | | | ones (and otherwise behaves like "into").
+
+Examples
+
+ Lines starting with a dollar sign denote commands you can enter into your
+ terminal. The dollar sign signifies your command prompt. It is not part of
+ the command you type.
+
+ Convert two scans in JPEG format to a PDF document.
+
+ $ img2pdf --output out.pdf page1.jpg page2.jpg
+
+ Convert a directory of JPEG images into a PDF with printable A4 pages in
+ landscape mode. On each page, the photo takes the maximum amount of space
+ while preserving its aspect ratio and a print border of 2 cm on the top and
+ bottom and 2.5 cm on the left and right hand side.
+
+ $ img2pdf --output out.pdf --pagesize A4^T --border 2cm:2.5cm *.jpg
+
+ On each A4 page, fit images into a 10 cm times 15 cm rectangle but keep the
+ original image size if the image is smaller than that.
+
+ $ img2pdf --output out.pdf -S A4 --imgsize 10cmx15cm --fit shrink *.jpg
+
+ Prepare a directory of photos to be printed borderless on photo paper with a
+ 3:2 aspect ratio and rotate each page so that its orientation is the same as
+ the input image.
+
+ $ img2pdf --output out.pdf --pagesize 15cmx10cm --auto-orient *.jpg
+
+ Encode a grayscale JPEG2000 image. The colorspace has to be forced as img2pdf
+ cannot read it from the JPEG2000 file automatically.
+
+ $ img2pdf --output out.pdf --colorspace L input.jp2
+
+Argument parsing
+
+ Argument long options can be abbreviated to a prefix if the abbreviation is
+ anambiguous. That is, the prefix must match a unique option.
+
+ Beware of your shell interpreting argument values as special characters (like
+ the semicolon in the CMYK;I colorspace option). If in doubt, put the argument
+ values in single quotes.
+
+ If you want an argument value to start with one or more minus characters, you
+ must use the long option name and join them with an equal sign like so:
+
+ $ img2pdf --author=--test--
+
+ If your input file name starts with one or more minus characters, either
+ separate the input files from the other arguments by two minus signs:
+
+ $ img2pdf -- --my-file-starts-with-two-minuses.jpg
+
+ Or be more explicit about its relative path by prepending a ./:
+
+ $ img2pdf ./--my-file-starts-with-two-minuses.jpg
+
+ The order of non-positional arguments (all arguments other than the input
+ images) does not matter.
+''' % rendered_papersizes)
+
+ parser.add_argument(
+ 'images', metavar='infile', type=input_images, nargs='*',
+ help='Specifies the input file(s) in any format that can be read by '
+ 'the Python Imaging Library (PIL). If no input images are given, then '
+ 'a single image is read from standard input. The special filename "-" '
+ 'can be used once to read an image from standard input. To read a '
+ 'file in the current directory with the filename "-", pass it to '
+ 'img2pdf by explicitly stating its relative path like "./-".')
+ parser.add_argument(
+ '-v', '--verbose', action="store_true",
+ help='Makes the program operate in verbose mode, printing messages on '
+ 'standard error.')
+ parser.add_argument(
+ '-V', '--version', action='version', version='%(prog)s '+__version__,
+ help="Prints version information and exits.")
+
+ outargs = parser.add_argument_group(
+ title='General output arguments',
+ description='')
+
+ outargs.add_argument(
+ '-o', '--output', metavar='out', type=argparse.FileType('wb'),
+ default=sys.stdout.buffer,
+ help='Makes the program output to a file instead of standard output.')
+ outargs.add_argument(
+ '-C', '--colorspace', metavar='colorspace', type=parse_colorspacearg,
+ help='''
+Forces the PIL colorspace. See the epilogue for a list of possible values.
+Usually the PDF colorspace would be derived from the color space of the input
+image. This option overwrites the automatically detected colorspace from the
+input image and thus forces a certain colorspace in the output PDF /ColorSpace
+property. This is useful for JPEG 2000 images with a different colorspace than
+RGB.''')
+
+ outargs.add_argument(
+ '-D', '--nodate', action="store_true",
+ help='Suppresses timestamps in the output and thus makes the output '
+ 'deterministic between individual runs. You can also manually '
+ 'set a date using the --moddate and --creationdate options.')
+
+ outargs.add_argument(
+ "--without-pdfrw", action="store_true",
+ help="By default, img2pdf uses the pdfrw library to create the output "
+ "PDF if pdfrw is available. If you want to use the internal PDF "
+ "generator of img2pdf even if pdfrw is present, then pass this "
+ "option. This can be useful if you want to have unicode metadata "
+ "values which pdfrw does not yet support (See "
+ "https://github.com/pmaupin/pdfrw/issues/39) or if you want the "
+ "PDF code to be more human readable.")
+
+ outargs.add_argument(
+ "--first-frame-only", action="store_true",
+ help="By default, img2pdf will convert multi-frame images like "
+ "multi-page TIFF or animated GIF images to one page per frame. "
+ "This option will only let the first frame of every multi-frame "
+ "input image be converted into a page in the resulting PDF."
+ )
+
+ sizeargs = parser.add_argument_group(
+ title='Image and page size and layout arguments',
+ description='''\
+
+Every input image will be placed on its own page. The image size is controlled
+by the dpi value of the input image or, if unset or missing, the default dpi of
+%.2f. By default, each page will have the same size as the image it shows.
+Thus, there will be no visible border between the image and the page border by
+default. If image size and page size are made different from each other by the
+options in this section, the image will always be centered in both dimensions.
+
+The image size and page size can be explicitly set using the --imgsize and
+--pagesize options, respectively. If either dimension of the image size is
+specified but the same dimension of the page size is not, then the latter will
+be derived from the former using an optional minimal distance between the image
+and the page border (given by the --border option) and/or a certain fitting
+strategy (given by the --fit option). The converse happens if a dimension of
+the page size is set but the same dimension of the image size is not.
+
+Any length value in below options is represented by the meta variable L which
+is a floating point value with an optional unit appended (without a space
+between them). The default unit is pt (1/72 inch, the PDF unit) and other
+allowed units are cm (centimeter), mm (millimeter), and in (inch).
+
+Any size argument of the format LxL in the options below specifies the width
+and height of a rectangle where the first L represents the width and the second
+L represents the height with an optional unit following each value as described
+above. Either width or height may be omitted. If the height is omitted, the
+separating x can be omitted as well. Omitting the width requires to prefix the
+height with the separating x. The missing dimension will be chosen so to not
+change the image aspect ratio. Instead of giving the width and height
+explicitly, you may also specify some (case-insensitive) common page sizes such
+as letter and A4. See the epilogue at the bottom for a complete list of the
+valid sizes.
+
+The --fit option scales to fit the image into a rectangle that is either
+derived from the --imgsize option or otherwise from the --pagesize option.
+If the --border option is given in addition to the --imgsize option while the
+--pagesize option is not given, then the page size will be calculated from the
+image size, respecting the border setting. If the --border option is given in
+addition to the --pagesize option while the --imgsize option is not given, then
+the image size will be calculated from the page size, respecting the border
+setting. If the --border option is given while both the --pagesize and
+--imgsize options are passed, then the --border option will be ignored.
+
+''' % default_dpi)
+
+ sizeargs.add_argument(
+ '-S', '--pagesize', metavar='LxL', type=parse_pagesize_rectarg,
+ help='''
+Sets the size of the PDF pages. The short-option is the upper case S because
+it is an mnemonic for being bigger than the image size.''')
+
+ sizeargs.add_argument(
+ '-s', '--imgsize', metavar='LxL', type=parse_imgsize_rectarg,
+ help='''
+Sets the size of the images on the PDF pages. In addition, the unit dpi is
+allowed which will set the image size as a value of dots per inch. Instead of
+a unit, width and height values may also have a percentage sign appended,
+indicating a resize of the image by that percentage. The short-option is the
+lower case s because it is an mnemonic for being smaller than the page size.
+''')
+ sizeargs.add_argument(
+ '-b', '--border', metavar='L[:L]', type=parse_borderarg,
+ help='''
+Specifies the minimal distance between the image border and the PDF page
+border. This value Is overwritten by explicit values set by --pagesize or
+--imgsize. The value will be used when calculating page dimensions from the
+image dimensions or the other way round. One, or two length values can be given
+as an argument, separated by a colon. One value specifies the minimal border on
+all four sides. Two values specify the minimal border on the top/bottom and
+left/right, respectively. It is not possible to specify asymmetric borders
+because images will always be centered on the page.
+''')
+ sizeargs.add_argument(
+ '-f', '--fit', metavar='FIT', type=parse_fitarg,
+ default=FitMode.into, help='''
+
+If --imgsize is given, fits the image using these dimensions. Otherwise, fit
+the image into the dimensions given by --pagesize. FIT is one of into, fill,
+exact, shrink and enlarge. The default value is "into". See the epilogue at the
+bottom for a description of the FIT options.
+
+''')
+ sizeargs.add_argument(
+ '-a', '--auto-orient', action="store_true",
+ help='''
+If both dimensions of the page are given via --pagesize, conditionally swaps
+these dimensions such that the page orientation is the same as the orientation
+of the input image. If the orientation of a page gets flipped, then so do the
+values set via the --border option.
+''')
+
+ metaargs = parser.add_argument_group(title='Arguments setting metadata',
+ description='')
+ metaargs.add_argument(
+ '--title', metavar='title', type=str,
+ help='Sets the title metadata value')
+ metaargs.add_argument(
+ '--author', metavar='author', type=str,
+ help='Sets the author metadata value')
+ metaargs.add_argument(
+ '--creator', metavar='creator', type=str,
+ help='Sets the creator metadata value')
+ metaargs.add_argument(
+ '--producer', metavar='producer', type=str,
+ default="img2pdf " + __version__,
+ help='Sets the producer metadata value (default is: img2pdf)')
+ metaargs.add_argument(
+ '--creationdate', metavar='creationdate', type=valid_date,
+ help='Sets the UTC creation date metadata value in YYYY-MM-DD or '
+ 'YYYY-MM-DDTHH:MM or YYYY-MM-DDTHH:MM:SS format or any format '
+ 'understood by python dateutil module or any format understood '
+ 'by `date --date`')
+ metaargs.add_argument(
+ '--moddate', metavar='moddate', type=valid_date,
+ help='Sets the UTC modification date metadata value in YYYY-MM-DD '
+ 'or YYYY-MM-DDTHH:MM or YYYY-MM-DDTHH:MM:SS format or any format '
+ 'understood by python dateutil module or any format understood '
+ 'by `date --date`')
+ metaargs.add_argument(
+ '--subject', metavar='subject', type=str,
+ help='Sets the subject metadata value')
+ metaargs.add_argument(
+ '--keywords', metavar='kw', type=str, nargs='+',
+ help='Sets the keywords metadata value (can be given multiple times)')
+
+ viewerargs = parser.add_argument_group(
+ title='PDF viewer arguments',
+ description='PDF files can specify how they are meant to be '
+ 'presented to the user by a PDF viewer')
+
+ viewerargs.add_argument(
+ '--viewer-panes', metavar="PANES", type=parse_panes,
+ help='Instruct the PDF viewer which side panes to show. Valid values '
+ 'are "outlines" and "thumbs". It is not possible to specify both '
+ 'at the same time.')
+ viewerargs.add_argument(
+ '--viewer-initial-page', metavar="NUM", type=int,
+ help='Instead of showing the first page, instruct the PDF viewer to '
+ 'show the given page instead. Page numbers start with 1.')
+ viewerargs.add_argument(
+ '--viewer-magnification', metavar="MAG", type=parse_magnification,
+ help='Instruct the PDF viewer to open the PDF with a certain zoom '
+ 'level. Valid values are either a floating point number giving '
+ 'the exact zoom level, "fit" (zoom to fit whole page), "fith" '
+ '(zoom to fit page width) and "fitbh" (zoom to fit visible page '
+ 'width).')
+ viewerargs.add_argument(
+ '--viewer-page-layout', metavar="LAYOUT", type=parse_layout,
+ help='Instruct the PDF viewer how to arrange the pages on the screen. '
+ 'Valid values are "single" (display single pages), "onecolumn" '
+ '(one continuous column), "twocolumnright" (two continuous '
+ 'columns with odd number pages on the right) and "twocolumnleft" '
+ '(two continuous columns with odd numbered pages on the left)')
+ viewerargs.add_argument(
+ '--viewer-fit-window', action="store_true",
+ help='Instruct the PDF viewer to resize the window to fit the page '
+ 'size')
+ viewerargs.add_argument(
+ '--viewer-center-window', action="store_true",
+ help='Instruct the PDF viewer to center the PDF viewer window')
+ viewerargs.add_argument(
+ '--viewer-fullscreen', action="store_true",
+ help='Instruct the PDF viewer to open the PDF in fullscreen mode')
+
+ args = parser.parse_args()
+
+ if args.verbose:
+ logging.basicConfig(level=logging.DEBUG)
+
+ layout_fun = get_layout_fun(args.pagesize, args.imgsize, args.border,
+ args.fit, args.auto_orient)
+
+ # if no positional arguments were supplied, read a single image from
+ # standard input
+ if len(args.images) == 0:
+ logging.info("reading image from standard input")
+ try:
+ args.images = [sys.stdin.buffer.read()]
+ except KeyboardInterrupt:
+ exit(0)
+
+ # with the number of pages being equal to the number of images, the
+ # value passed to --viewer-initial-page must be between 1 and that number
+ if args.viewer_initial_page is not None:
+ if args.viewer_initial_page < 1:
+ parser.print_usage(file=sys.stderr)
+ logging.error("%s: error: argument --viewer-initial-page: must be "
+ "greater than zero" % parser.prog)
+ exit(2)
+ if args.viewer_initial_page > len(args.images):
+ parser.print_usage(file=sys.stderr)
+ logging.error("%s: error: argument --viewer-initial-page: must be "
+ "less than or equal to the total number of pages" %
+ parser.prog)
+ exit(2)
+
+ try:
+ convert(
+ *args.images, title=args.title, author=args.author,
+ creator=args.creator, producer=args.producer,
+ creationdate=args.creationdate, moddate=args.moddate,
+ subject=args.subject, keywords=args.keywords,
+ colorspace=args.colorspace, nodate=args.nodate,
+ layout_fun=layout_fun, viewer_panes=args.viewer_panes,
+ viewer_initial_page=args.viewer_initial_page,
+ viewer_magnification=args.viewer_magnification,
+ viewer_page_layout=args.viewer_page_layout,
+ viewer_fit_window=args.viewer_fit_window,
+ viewer_center_window=args.viewer_center_window,
+ viewer_fullscreen=args.viewer_fullscreen, with_pdfrw=not
+ args.without_pdfrw, outputstream=args.output,
+ first_frame_only=args.first_frame_only)
+ except Exception as e:
+ logging.error("error: " + str(e))
+ if logging.getLogger().isEnabledFor(logging.DEBUG):
+ import traceback
+ traceback.print_exc(file=sys.stderr)
+ exit(1)
+
+if __name__ == '__main__':
+ main()
diff --git a/src/jp2.py b/src/jp2.py
new file mode 100644
index 0000000..7f61312
--- /dev/null
+++ b/src/jp2.py
@@ -0,0 +1,124 @@
+#!/usr/bin/env python
+#
+# Copyright (C) 2013 Johannes 'josch' Schauer <j.schauer at email.de>
+#
+# this module is heavily based upon jpylyzer which is
+# KB / National Library of the Netherlands, Open Planets Foundation
+# and released under the same license conditions
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+import struct
+
+
+def getBox(data, byteStart, noBytes):
+ boxLengthValue = struct.unpack(">I", data[byteStart:byteStart+4])[0]
+ boxType = data[byteStart+4:byteStart+8]
+ contentsStartOffset = 8
+ if boxLengthValue == 1:
+ boxLengthValue = struct.unpack(">Q", data[byteStart+8:byteStart+16])[0]
+ contentsStartOffset = 16
+ if boxLengthValue == 0:
+ boxLengthValue = noBytes-byteStart
+ byteEnd = byteStart + boxLengthValue
+ boxContents = data[byteStart+contentsStartOffset:byteEnd]
+ return (boxLengthValue, boxType, byteEnd, boxContents)
+
+
+def parse_ihdr(data):
+ height = struct.unpack(">I", data[0:4])[0]
+ width = struct.unpack(">I", data[4:8])[0]
+ return width, height
+
+
+def parse_colr(data):
+ meth = struct.unpack(">B", data[0:1])[0]
+ if meth != 1:
+ raise Exception("only enumerated color method supported")
+ enumCS = struct.unpack(">I", data[3:])[0]
+ if enumCS == 16:
+ return "RGB"
+ elif enumCS == 17:
+ return "L"
+ else:
+ raise Exception("only sRGB and greyscale color space is supported, "
+ "got %d" % enumCS)
+
+
+def parse_resc(data):
+ hnum, hden, vnum, vden, hexp, vexp = struct.unpack(">HHHHBB", data)
+ hdpi = ((hnum/hden) * (10**hexp) * 100)/2.54
+ vdpi = ((vnum/vden) * (10**vexp) * 100)/2.54
+ return hdpi, vdpi
+
+
+def parse_res(data):
+ hdpi, vdpi = None, None
+ noBytes = len(data)
+ byteStart = 0
+ boxLengthValue = 1 # dummy value for while loop condition
+ while byteStart < noBytes and boxLengthValue != 0:
+ boxLengthValue, boxType, byteEnd, boxContents = \
+ getBox(data, byteStart, noBytes)
+ if boxType == b'resc':
+ hdpi, vdpi = parse_resc(boxContents)
+ break
+ return hdpi, vdpi
+
+
+def parse_jp2h(data):
+ width, height, colorspace, hdpi, vdpi = None, None, None, None, None
+ noBytes = len(data)
+ byteStart = 0
+ boxLengthValue = 1 # dummy value for while loop condition
+ while byteStart < noBytes and boxLengthValue != 0:
+ boxLengthValue, boxType, byteEnd, boxContents = \
+ getBox(data, byteStart, noBytes)
+ if boxType == b'ihdr':
+ width, height = parse_ihdr(boxContents)
+ elif boxType == b'colr':
+ colorspace = parse_colr(boxContents)
+ elif boxType == b'res ':
+ hdpi, vdpi = parse_res(boxContents)
+ byteStart = byteEnd
+ return (width, height, colorspace, hdpi, vdpi)
+
+
+def parsejp2(data):
+ noBytes = len(data)
+ byteStart = 0
+ boxLengthValue = 1 # dummy value for while loop condition
+ width, height, colorspace, hdpi, vdpi = None, None, None, None, None
+ while byteStart < noBytes and boxLengthValue != 0:
+ boxLengthValue, boxType, byteEnd, boxContents = \
+ getBox(data, byteStart, noBytes)
+ if boxType == b'jp2h':
+ width, height, colorspace, hdpi, vdpi = parse_jp2h(boxContents)
+ break
+ byteStart = byteEnd
+ if not width:
+ raise Exception("no width in jp2 header")
+ if not height:
+ raise Exception("no height in jp2 header")
+ if not colorspace:
+ raise Exception("no colorspace in jp2 header")
+ # retrieving the dpi is optional so we do not error out if not present
+ return (width, height, colorspace, hdpi, vdpi)
+
+if __name__ == "__main__":
+ import sys
+ width, height, colorspace = parsejp2(open(sys.argv[1]).read())
+ sys.stdout.write("width = %d" % width)
+ sys.stdout.write("height = %d" % height)
+ sys.stdout.write("colorspace = %s" % colorspace)
diff --git a/src/tests/__init__.py b/src/tests/__init__.py
new file mode 100644
index 0000000..b668054
--- /dev/null
+++ b/src/tests/__init__.py
@@ -0,0 +1,557 @@
+import unittest
+
+import os
+import img2pdf
+import zlib
+from PIL import Image
+
+HERE = os.path.dirname(__file__)
+
+# convert +set date:create +set date:modify -define png:exclude-chunk=time
+
+# we define some variables so that the table below can be narrower
+psl = (972, 504) # --pagesize landscape
+psp = (504, 972) # --pagesize portrait
+isl = (756, 324) # --imgsize landscape
+isp = (324, 756) # --imgsize portrait
+border = (162, 270) # --border
+# there is no need to have test cases with the same images with inverted
+# orientation (landscape/portrait) because --pagesize and --imgsize are
+# already inverted
+im1 = (864, 288) # imgpx #1 => 648x216
+im2 = (1152, 576) # imgpx #2 => 864x432
+# shortcuts for fit modes
+f_into = img2pdf.FitMode.into
+f_fill = img2pdf.FitMode.fill
+f_exact = img2pdf.FitMode.exact
+f_shrink = img2pdf.FitMode.shrink
+f_enlarge = img2pdf.FitMode.enlarge
+layout_test_cases = [
+ # psp=972x504, psl=504x972, isl=756x324, isp=324x756, border=162:270
+ # --pagesize --border -a pagepdf imgpdf
+ # --imgsize --fit
+ (None, None, None, f_into, 0, (648, 216), (648, 216), # 000
+ (864, 432), (864, 432)),
+ (None, None, None, f_into, 1, (648, 216), (648, 216), # 001
+ (864, 432), (864, 432)),
+ (None, None, None, f_fill, 0, (648, 216), (648, 216), # 002
+ (864, 432), (864, 432)),
+ (None, None, None, f_fill, 1, (648, 216), (648, 216), # 003
+ (864, 432), (864, 432)),
+ (None, None, None, f_exact, 0, (648, 216), (648, 216), # 004
+ (864, 432), (864, 432)),
+ (None, None, None, f_exact, 1, (648, 216), (648, 216), # 005
+ (864, 432), (864, 432)),
+ (None, None, None, f_shrink, 0, (648, 216), (648, 216), # 006
+ (864, 432), (864, 432)),
+ (None, None, None, f_shrink, 1, (648, 216), (648, 216), # 007
+ (864, 432), (864, 432)),
+ (None, None, None, f_enlarge, 0, (648, 216), (648, 216), # 008
+ (864, 432), (864, 432)),
+ (None, None, None, f_enlarge, 1, (648, 216), (648, 216), # 009
+ (864, 432), (864, 432)),
+ (None, None, border, f_into, 0, (1188, 540), (648, 216), # 010
+ (1404, 756), (864, 432)),
+ (None, None, border, f_into, 1, (1188, 540), (648, 216), # 011
+ (1404, 756), (864, 432)),
+ (None, None, border, f_fill, 0, (1188, 540), (648, 216), # 012
+ (1404, 756), (864, 432)),
+ (None, None, border, f_fill, 1, (1188, 540), (648, 216), # 013
+ (1404, 756), (864, 432)),
+ (None, None, border, f_exact, 0, (1188, 540), (648, 216), # 014
+ (1404, 756), (864, 432)),
+ (None, None, border, f_exact, 1, (1188, 540), (648, 216), # 015
+ (1404, 756), (864, 432)),
+ (None, None, border, f_shrink, 0, (1188, 540), (648, 216), # 016
+ (1404, 756), (864, 432)),
+ (None, None, border, f_shrink, 1, (1188, 540), (648, 216), # 017
+ (1404, 756), (864, 432)),
+ (None, None, border, f_enlarge, 0, (1188, 540), (648, 216), # 018
+ (1404, 756), (864, 432)),
+ (None, None, border, f_enlarge, 1, (1188, 540), (648, 216), # 019
+ (1404, 756), (864, 432)),
+ (None, isp, None, f_into, 0, (324, 108), (324, 108), # 020
+ (324, 162), (324, 162)),
+ (None, isp, None, f_into, 1, (324, 108), (324, 108), # 021
+ (324, 162), (324, 162)),
+ (None, isp, None, f_fill, 0, (2268, 756), (2268, 756), # 022
+ (1512, 756), (1512, 756)),
+ (None, isp, None, f_fill, 1, (2268, 756), (2268, 756), # 023
+ (1512, 756), (1512, 756)),
+ (None, isp, None, f_exact, 0, (324, 756), (324, 756), # 024
+ (324, 756), (324, 756)),
+ (None, isp, None, f_exact, 1, (324, 756), (324, 756), # 025
+ (324, 756), (324, 756)),
+ (None, isp, None, f_shrink, 0, (324, 108), (324, 108), # 026
+ (324, 162), (324, 162)),
+ (None, isp, None, f_shrink, 1, (324, 108), (324, 108), # 027
+ (324, 162), (324, 162)),
+ (None, isp, None, f_enlarge, 0, (648, 216), (648, 216), # 028
+ (864, 432), (864, 432)),
+ (None, isp, None, f_enlarge, 1, (648, 216), (648, 216), # 029
+ (864, 432), (864, 432)),
+ (None, isp, border, f_into, 0, (864, 432), (324, 108), # 030
+ (864, 486), (324, 162)),
+ (None, isp, border, f_into, 1, (864, 432), (324, 108), # 031
+ (864, 486), (324, 162)),
+ (None, isp, border, f_fill, 0, (2808, 1080), (2268, 756), # 032
+ (2052, 1080), (1512, 756)),
+ (None, isp, border, f_fill, 1, (2808, 1080), (2268, 756), # 033
+ (2052, 1080), (1512, 756)),
+ (None, isp, border, f_exact, 0, (864, 1080), (324, 756), # 034
+ (864, 1080), (324, 756)),
+ (None, isp, border, f_exact, 1, (864, 1080), (324, 756), # 035
+ (864, 1080), (324, 756)),
+ (None, isp, border, f_shrink, 0, (864, 432), (324, 108), # 036
+ (864, 486), (324, 162)),
+ (None, isp, border, f_shrink, 1, (864, 432), (324, 108), # 037
+ (864, 486), (324, 162)),
+ (None, isp, border, f_enlarge, 0, (1188, 540), (648, 216), # 038
+ (1404, 756), (864, 432)),
+ (None, isp, border, f_enlarge, 1, (1188, 540), (648, 216), # 039
+ (1404, 756), (864, 432)),
+ (None, isl, None, f_into, 0, (756, 252), (756, 252), # 040
+ (648, 324), (648, 324)),
+ (None, isl, None, f_into, 1, (756, 252), (756, 252), # 041
+ (648, 324), (648, 324)),
+ (None, isl, None, f_fill, 0, (972, 324), (972, 324), # 042
+ (756, 378), (756, 378)),
+ (None, isl, None, f_fill, 1, (972, 324), (972, 324), # 043
+ (756, 378), (756, 378)),
+ (None, isl, None, f_exact, 0, (756, 324), (756, 324), # 044
+ (756, 324), (756, 324)),
+ (None, isl, None, f_exact, 1, (756, 324), (756, 324), # 045
+ (756, 324), (756, 324)),
+ (None, isl, None, f_shrink, 0, (648, 216), (648, 216), # 046
+ (648, 324), (648, 324)),
+ (None, isl, None, f_shrink, 1, (648, 216), (648, 216), # 047
+ (648, 324), (648, 324)),
+ (None, isl, None, f_enlarge, 0, (756, 252), (756, 252), # 048
+ (864, 432), (864, 432)),
+ (None, isl, None, f_enlarge, 1, (756, 252), (756, 252), # 049
+ (864, 432), (864, 432)),
+ # psp=972x504, psp=504x972, isl=756x324, isp=324x756, border=162:270
+ # --pagesize --border -a pagepdf imgpdf
+ # --imgsize --fit imgpx
+ (None, isl, border, f_into, 0, (1296, 576), (756, 252), # 050
+ (1188, 648), (648, 324)),
+ (None, isl, border, f_into, 1, (1296, 576), (756, 252), # 051
+ (1188, 648), (648, 324)),
+ (None, isl, border, f_fill, 0, (1512, 648), (972, 324), # 052
+ (1296, 702), (756, 378)),
+ (None, isl, border, f_fill, 1, (1512, 648), (972, 324), # 053
+ (1296, 702), (756, 378)),
+ (None, isl, border, f_exact, 0, (1296, 648), (756, 324), # 054
+ (1296, 648), (756, 324)),
+ (None, isl, border, f_exact, 1, (1296, 648), (756, 324), # 055
+ (1296, 648), (756, 324)),
+ (None, isl, border, f_shrink, 0, (1188, 540), (648, 216), # 056
+ (1188, 648), (648, 324)),
+ (None, isl, border, f_shrink, 1, (1188, 540), (648, 216), # 057
+ (1188, 648), (648, 324)),
+ (None, isl, border, f_enlarge, 0, (1296, 576), (756, 252), # 058
+ (1404, 756), (864, 432)),
+ (None, isl, border, f_enlarge, 1, (1296, 576), (756, 252), # 059
+ (1404, 756), (864, 432)),
+ (psp, None, None, f_into, 0, (504, 972), (504, 168), # 060
+ (504, 972), (504, 252)),
+ (psp, None, None, f_into, 1, (972, 504), (972, 324), # 061
+ (972, 504), (972, 486)),
+ (psp, None, None, f_fill, 0, (504, 972), (2916, 972), # 062
+ (504, 972), (1944, 972)),
+ (psp, None, None, f_fill, 1, (972, 504), (1512, 504), # 063
+ (972, 504), (1008, 504)),
+ (psp, None, None, f_exact, 0, (504, 972), (504, 972), # 064
+ (504, 972), (504, 972)),
+ (psp, None, None, f_exact, 1, (972, 504), (972, 504), # 065
+ (972, 504), (972, 504)),
+ (psp, None, None, f_shrink, 0, (504, 972), (504, 168), # 066
+ (504, 972), (504, 252)),
+ (psp, None, None, f_shrink, 1, (972, 504), (648, 216), # 067
+ (972, 504), (864, 432)),
+ (psp, None, None, f_enlarge, 0, (504, 972), (648, 216), # 068
+ (504, 972), (864, 432)),
+ (psp, None, None, f_enlarge, 1, (972, 504), (972, 324), # 069
+ (972, 504), (972, 486)),
+ (psp, None, border, f_into, 0, None, None, None, None), # 070
+ (psp, None, border, f_into, 1, None, None, None, None), # 071
+ (psp, None, border, f_fill, 0, (504, 972), (1944, 648), # 072
+ (504, 972), (1296, 648)),
+ (psp, None, border, f_fill, 1, (972, 504), (648, 216), # 073
+ (972, 504), (648, 324)),
+ (psp, None, border, f_exact, 0, None, None, None, None), # 074
+ (psp, None, border, f_exact, 1, None, None, None, None), # 075
+ (psp, None, border, f_shrink, 0, None, None, None, None), # 076
+ (psp, None, border, f_shrink, 1, None, None, None, None), # 077
+ (psp, None, border, f_enlarge, 0, (504, 972), (648, 216), # 078
+ (504, 972), (864, 432)),
+ (psp, None, border, f_enlarge, 1, (972, 504), (648, 216), # 079
+ (972, 504), (864, 432)),
+ (psp, isp, None, f_into, 0, (504, 972), (324, 108), # 080
+ (504, 972), (324, 162)),
+ (psp, isp, None, f_into, 1, (972, 504), (324, 108), # 081
+ (972, 504), (324, 162)),
+ (psp, isp, None, f_fill, 0, (504, 972), (2268, 756), # 082
+ (504, 972), (1512, 756)),
+ (psp, isp, None, f_fill, 1, (972, 504), (2268, 756), # 083
+ (972, 504), (1512, 756)),
+ (psp, isp, None, f_exact, 0, (504, 972), (324, 756), # 084
+ (504, 972), (324, 756)),
+ (psp, isp, None, f_exact, 1, (972, 504), (324, 756), # 085
+ (972, 504), (324, 756)),
+ (psp, isp, None, f_shrink, 0, (504, 972), (324, 108), # 086
+ (504, 972), (324, 162)),
+ (psp, isp, None, f_shrink, 1, (972, 504), (324, 108), # 087
+ (972, 504), (324, 162)),
+ (psp, isp, None, f_enlarge, 0, (504, 972), (648, 216), # 088
+ (504, 972), (864, 432)),
+ (psp, isp, None, f_enlarge, 1, (972, 504), (648, 216), # 089
+ (972, 504), (864, 432)),
+ (psp, isp, border, f_into, 0, (504, 972), (324, 108), # 090
+ (504, 972), (324, 162)),
+ (psp, isp, border, f_into, 1, (972, 504), (324, 108), # 091
+ (972, 504), (324, 162)),
+ (psp, isp, border, f_fill, 0, (504, 972), (2268, 756), # 092
+ (504, 972), (1512, 756)),
+ (psp, isp, border, f_fill, 1, (972, 504), (2268, 756), # 093
+ (972, 504), (1512, 756)),
+ (psp, isp, border, f_exact, 0, (504, 972), (324, 756), # 094
+ (504, 972), (324, 756)),
+ (psp, isp, border, f_exact, 1, (972, 504), (324, 756), # 095
+ (972, 504), (324, 756)),
+ (psp, isp, border, f_shrink, 0, (504, 972), (324, 108), # 096
+ (504, 972), (324, 162)),
+ (psp, isp, border, f_shrink, 1, (972, 504), (324, 108), # 097
+ (972, 504), (324, 162)),
+ (psp, isp, border, f_enlarge, 0, (504, 972), (648, 216), # 098
+ (504, 972), (864, 432)),
+ (psp, isp, border, f_enlarge, 1, (972, 504), (648, 216), # 099
+ (972, 504), (864, 432)),
+ # psp=972x504, psp=504x972, isl=756x324, isp=324x756, border=162:270
+ # --pagesize --border -a pagepdf imgpdf
+ # --imgsize --fit imgpx
+ (psp, isl, None, f_into, 0, (504, 972), (756, 252), # 100
+ (504, 972), (648, 324)),
+ (psp, isl, None, f_into, 1, (972, 504), (756, 252), # 101
+ (972, 504), (648, 324)),
+ (psp, isl, None, f_fill, 0, (504, 972), (972, 324), # 102
+ (504, 972), (756, 378)),
+ (psp, isl, None, f_fill, 1, (972, 504), (972, 324), # 103
+ (972, 504), (756, 378)),
+ (psp, isl, None, f_exact, 0, (504, 972), (756, 324), # 104
+ (504, 972), (756, 324)),
+ (psp, isl, None, f_exact, 1, (972, 504), (756, 324), # 105
+ (972, 504), (756, 324)),
+ (psp, isl, None, f_shrink, 0, (504, 972), (648, 216), # 106
+ (504, 972), (648, 324)),
+ (psp, isl, None, f_shrink, 1, (972, 504), (648, 216), # 107
+ (972, 504), (648, 324)),
+ (psp, isl, None, f_enlarge, 0, (504, 972), (756, 252), # 108
+ (504, 972), (864, 432)),
+ (psp, isl, None, f_enlarge, 1, (972, 504), (756, 252), # 109
+ (972, 504), (864, 432)),
+ (psp, isl, border, f_into, 0, (504, 972), (756, 252), # 110
+ (504, 972), (648, 324)),
+ (psp, isl, border, f_into, 1, (972, 504), (756, 252), # 111
+ (972, 504), (648, 324)),
+ (psp, isl, border, f_fill, 0, (504, 972), (972, 324), # 112
+ (504, 972), (756, 378)),
+ (psp, isl, border, f_fill, 1, (972, 504), (972, 324), # 113
+ (972, 504), (756, 378)),
+ (psp, isl, border, f_exact, 0, (504, 972), (756, 324), # 114
+ (504, 972), (756, 324)),
+ (psp, isl, border, f_exact, 1, (972, 504), (756, 324), # 115
+ (972, 504), (756, 324)),
+ (psp, isl, border, f_shrink, 0, (504, 972), (648, 216), # 116
+ (504, 972), (648, 324)),
+ (psp, isl, border, f_shrink, 1, (972, 504), (648, 216), # 117
+ (972, 504), (648, 324)),
+ (psp, isl, border, f_enlarge, 0, (504, 972), (756, 252), # 118
+ (504, 972), (864, 432)),
+ (psp, isl, border, f_enlarge, 1, (972, 504), (756, 252), # 119
+ (972, 504), (864, 432)),
+ (psl, None, None, f_into, 0, (972, 504), (972, 324), # 120
+ (972, 504), (972, 486)),
+ (psl, None, None, f_into, 1, (972, 504), (972, 324), # 121
+ (972, 504), (972, 486)),
+ (psl, None, None, f_fill, 0, (972, 504), (1512, 504), # 122
+ (972, 504), (1008, 504)),
+ (psl, None, None, f_fill, 1, (972, 504), (1512, 504), # 123
+ (972, 504), (1008, 504)),
+ (psl, None, None, f_exact, 0, (972, 504), (972, 504), # 124
+ (972, 504), (972, 504)),
+ (psl, None, None, f_exact, 1, (972, 504), (972, 504), # 125
+ (972, 504), (972, 504)),
+ (psl, None, None, f_shrink, 0, (972, 504), (648, 216), # 126
+ (972, 504), (864, 432)),
+ (psl, None, None, f_shrink, 1, (972, 504), (648, 216), # 127
+ (972, 504), (864, 432)),
+ (psl, None, None, f_enlarge, 0, (972, 504), (972, 324), # 128
+ (972, 504), (972, 486)),
+ (psl, None, None, f_enlarge, 1, (972, 504), (972, 324), # 129
+ (972, 504), (972, 486)),
+ (psl, None, border, f_into, 0, (972, 504), (432, 144), # 130
+ (972, 504), (360, 180)),
+ (psl, None, border, f_into, 1, (972, 504), (432, 144), # 131
+ (972, 504), (360, 180)),
+ (psl, None, border, f_fill, 0, (972, 504), (540, 180), # 132
+ (972, 504), (432, 216)),
+ (psl, None, border, f_fill, 1, (972, 504), (540, 180), # 133
+ (972, 504), (432, 216)),
+ (psl, None, border, f_exact, 0, (972, 504), (432, 180), # 134
+ (972, 504), (432, 180)),
+ (psl, None, border, f_exact, 1, (972, 504), (432, 180), # 135
+ (972, 504), (432, 180)),
+ (psl, None, border, f_shrink, 0, (972, 504), (432, 144), # 136
+ (972, 504), (360, 180)),
+ (psl, None, border, f_shrink, 1, (972, 504), (432, 144), # 137
+ (972, 504), (360, 180)),
+ (psl, None, border, f_enlarge, 0, (972, 504), (648, 216), # 138
+ (972, 504), (864, 432)),
+ (psl, None, border, f_enlarge, 1, (972, 504), (648, 216), # 139
+ (972, 504), (864, 432)),
+ (psl, isp, None, f_into, 0, (972, 504), (324, 108), # 140
+ (972, 504), (324, 162)),
+ (psl, isp, None, f_into, 1, (972, 504), (324, 108), # 141
+ (972, 504), (324, 162)),
+ (psl, isp, None, f_fill, 0, (972, 504), (2268, 756), # 142
+ (972, 504), (1512, 756)),
+ (psl, isp, None, f_fill, 1, (972, 504), (2268, 756), # 143
+ (972, 504), (1512, 756)),
+ (psl, isp, None, f_exact, 0, (972, 504), (324, 756), # 144
+ (972, 504), (324, 756)),
+ (psl, isp, None, f_exact, 1, (972, 504), (324, 756), # 145
+ (972, 504), (324, 756)),
+ (psl, isp, None, f_shrink, 0, (972, 504), (324, 108), # 146
+ (972, 504), (324, 162)),
+ (psl, isp, None, f_shrink, 1, (972, 504), (324, 108), # 147
+ (972, 504), (324, 162)),
+ (psl, isp, None, f_enlarge, 0, (972, 504), (648, 216), # 148
+ (972, 504), (864, 432)),
+ (psl, isp, None, f_enlarge, 1, (972, 504), (648, 216), # 149
+ (972, 504), (864, 432)),
+ # psp=972x504, psl=504x972, isl=756x324, isp=324x756, border=162:270
+ # --pagesize --border -a pagepdf imgpdf
+ # --imgsize --fit imgpx
+ (psl, isp, border, f_into, 0, (972, 504), (324, 108), # 150
+ (972, 504), (324, 162)),
+ (psl, isp, border, f_into, 1, (972, 504), (324, 108), # 151
+ (972, 504), (324, 162)),
+ (psl, isp, border, f_fill, 0, (972, 504), (2268, 756), # 152
+ (972, 504), (1512, 756)),
+ (psl, isp, border, f_fill, 1, (972, 504), (2268, 756), # 153
+ (972, 504), (1512, 756)),
+ (psl, isp, border, f_exact, 0, (972, 504), (324, 756), # 154
+ (972, 504), (324, 756)),
+ (psl, isp, border, f_exact, 1, (972, 504), (324, 756), # 155
+ (972, 504), (324, 756)),
+ (psl, isp, border, f_shrink, 0, (972, 504), (324, 108), # 156
+ (972, 504), (324, 162)),
+ (psl, isp, border, f_shrink, 1, (972, 504), (324, 108), # 157
+ (972, 504), (324, 162)),
+ (psl, isp, border, f_enlarge, 0, (972, 504), (648, 216), # 158
+ (972, 504), (864, 432)),
+ (psl, isp, border, f_enlarge, 1, (972, 504), (648, 216), # 159
+ (972, 504), (864, 432)),
+ (psl, isl, None, f_into, 0, (972, 504), (756, 252), # 160
+ (972, 504), (648, 324)),
+ (psl, isl, None, f_into, 1, (972, 504), (756, 252), # 161
+ (972, 504), (648, 324)),
+ (psl, isl, None, f_fill, 0, (972, 504), (972, 324), # 162
+ (972, 504), (756, 378)),
+ (psl, isl, None, f_fill, 1, (972, 504), (972, 324), # 163
+ (972, 504), (756, 378)),
+ (psl, isl, None, f_exact, 0, (972, 504), (756, 324), # 164
+ (972, 504), (756, 324)),
+ (psl, isl, None, f_exact, 1, (972, 504), (756, 324), # 165
+ (972, 504), (756, 324)),
+ (psl, isl, None, f_shrink, 0, (972, 504), (648, 216), # 166
+ (972, 504), (648, 324)),
+ (psl, isl, None, f_shrink, 1, (972, 504), (648, 216), # 167
+ (972, 504), (648, 324)),
+ (psl, isl, None, f_enlarge, 0, (972, 504), (756, 252), # 168
+ (972, 504), (864, 432)),
+ (psl, isl, None, f_enlarge, 1, (972, 504), (756, 252), # 169
+ (972, 504), (864, 432)),
+ (psl, isl, border, f_into, 0, (972, 504), (756, 252), # 170
+ (972, 504), (648, 324)),
+ (psl, isl, border, f_into, 1, (972, 504), (756, 252), # 171
+ (972, 504), (648, 324)),
+ (psl, isl, border, f_fill, 0, (972, 504), (972, 324), # 172
+ (972, 504), (756, 378)),
+ (psl, isl, border, f_fill, 1, (972, 504), (972, 324), # 173
+ (972, 504), (756, 378)),
+ (psl, isl, border, f_exact, 0, (972, 504), (756, 324), # 174
+ (972, 504), (756, 324)),
+ (psl, isl, border, f_exact, 1, (972, 504), (756, 324), # 175
+ (972, 504), (756, 324)),
+ (psl, isl, border, f_shrink, 0, (972, 504), (648, 216), # 176
+ (972, 504), (648, 324)),
+ (psl, isl, border, f_shrink, 1, (972, 504), (648, 216), # 177
+ (972, 504), (648, 324)),
+ (psl, isl, border, f_enlarge, 0, (972, 504), (756, 252), # 178
+ (972, 504), (864, 432)),
+ (psl, isl, border, f_enlarge, 1, (972, 504), (756, 252), # 179
+ (972, 504), (864, 432)),
+]
+
+
+def test_suite():
+ class TestImg2Pdf(unittest.TestCase):
+ pass
+
+ for i, (psopt, isopt, border, fit, ao, pspdf1, ispdf1,
+ pspdf2, ispdf2) in enumerate(layout_test_cases):
+ if isopt is not None:
+ isopt = ((img2pdf.ImgSize.abs, isopt[0]),
+ (img2pdf.ImgSize.abs, isopt[1]))
+
+ def layout_handler(
+ self, psopt, isopt, border, fit, ao, pspdf, ispdf, im):
+ layout_fun = img2pdf.get_layout_fun(psopt, isopt, border, fit, ao)
+ try:
+ pwpdf, phpdf, iwpdf, ihpdf = \
+ layout_fun(im[0], im[1], (img2pdf.default_dpi,
+ img2pdf.default_dpi))
+ self.assertEqual((pwpdf, phpdf), pspdf)
+ self.assertEqual((iwpdf, ihpdf), ispdf)
+ except img2pdf.NegativeDimensionError:
+ self.assertEqual(None, pspdf)
+ self.assertEqual(None, ispdf)
+
+ def layout_handler_im1(self, psopt=psopt, isopt=isopt, border=border,
+ fit=fit, ao=ao, pspdf=pspdf1, ispdf=ispdf1):
+ layout_handler(self, psopt, isopt, border, fit, ao, pspdf, ispdf,
+ im1)
+ setattr(TestImg2Pdf, "test_layout_%03d_im1" % i, layout_handler_im1)
+
+ def layout_handler_im2(self, psopt=psopt, isopt=isopt, border=border,
+ fit=fit, ao=ao, pspdf=pspdf2, ispdf=ispdf2):
+ layout_handler(self, psopt, isopt, border, fit, ao, pspdf, ispdf,
+ im2)
+ setattr(TestImg2Pdf, "test_layout_%03d_im2" % i, layout_handler_im2)
+
+ files = os.listdir(os.path.join(HERE, "input"))
+ for with_pdfrw, test_name in [(a, b) for a in [True, False]
+ for b in files]:
+ inputf = os.path.join(HERE, "input", test_name)
+ if not os.path.isfile(inputf):
+ continue
+ outputf = os.path.join(HERE, "output", test_name+".pdf")
+ assert os.path.isfile(outputf)
+
+ def handle(self, f=inputf, out=outputf, with_pdfrw=with_pdfrw):
+ with open(f, "rb") as inf:
+ orig_imgdata = inf.read()
+ output = img2pdf.convert(orig_imgdata, nodate=True,
+ with_pdfrw=with_pdfrw)
+ from io import StringIO, BytesIO
+ from pdfrw import PdfReader, PdfName, PdfWriter
+ from pdfrw.py23_diffs import convert_load, convert_store
+ x = PdfReader(StringIO(convert_load(output)))
+ self.assertEqual(sorted(x.keys()), [PdfName.Info, PdfName.Root,
+ PdfName.Size])
+ self.assertEqual(x.Size, '7')
+ self.assertEqual(x.Info, {})
+ self.assertEqual(sorted(x.Root.keys()), [PdfName.Pages,
+ PdfName.Type])
+ self.assertEqual(x.Root.Type, PdfName.Catalog)
+ self.assertEqual(sorted(x.Root.Pages.keys()),
+ [PdfName.Count, PdfName.Kids, PdfName.Type])
+ self.assertEqual(x.Root.Pages.Count, '1')
+ self.assertEqual(x.Root.Pages.Type, PdfName.Pages)
+ self.assertEqual(len(x.Root.Pages.Kids), 1)
+ self.assertEqual(sorted(x.Root.Pages.Kids[0].keys()),
+ [PdfName.Contents, PdfName.MediaBox,
+ PdfName.Parent, PdfName.Resources, PdfName.Type])
+ self.assertEqual(x.Root.Pages.Kids[0].MediaBox,
+ ['0', '0', '115', '48'])
+ self.assertEqual(x.Root.Pages.Kids[0].Parent, x.Root.Pages)
+ self.assertEqual(x.Root.Pages.Kids[0].Type, PdfName.Page)
+ self.assertEqual(x.Root.Pages.Kids[0].Resources.keys(),
+ [PdfName.XObject])
+ self.assertEqual(x.Root.Pages.Kids[0].Resources.XObject.keys(),
+ [PdfName.Im0])
+ self.assertEqual(x.Root.Pages.Kids[0].Contents.keys(),
+ [PdfName.Length])
+ self.assertEqual(x.Root.Pages.Kids[0].Contents.Length,
+ str(len(x.Root.Pages.Kids[0].Contents.stream)))
+ self.assertEqual(x.Root.Pages.Kids[0].Contents.stream,
+ "q\n115.0000 0 0 48.0000 0.0000 0.0000 cm\n/Im0 "
+ "Do\nQ")
+
+ imgprops = x.Root.Pages.Kids[0].Resources.XObject.Im0
+
+ # test if the filter is valid:
+ self.assertIn(
+ imgprops.Filter, [[PdfName.DCTDecode], [PdfName.JPXDecode],
+ [PdfName.FlateDecode]])
+ # test if the colorspace is valid
+ self.assertIn(
+ imgprops.ColorSpace, [PdfName.DeviceGray, PdfName.DeviceRGB,
+ PdfName.DeviceCMYK])
+ # test if the image has correct size
+ orig_img = Image.open(f)
+ self.assertEqual(imgprops.Width, str(orig_img.size[0]))
+ self.assertEqual(imgprops.Height, str(orig_img.size[1]))
+ # if the input file is a jpeg then it should've been copied
+ # verbatim into the PDF
+ if imgprops.Filter in [[PdfName.DCTDecode], [PdfName.JPXDecode]]:
+ self.assertEqual(
+ x.Root.Pages.Kids[0].Resources.XObject.Im0.stream,
+ convert_load(orig_imgdata))
+ elif imgprops.Filter == [PdfName.FlateDecode]:
+ # otherwise, the data is flate encoded and has to be equal to
+ # the pixel data of the input image
+ imgdata = zlib.decompress(
+ convert_store(
+ x.Root.Pages.Kids[0].Resources.XObject.Im0.stream))
+ colorspace = imgprops.ColorSpace
+ if colorspace == PdfName.DeviceGray:
+ colorspace = 'L'
+ elif colorspace == PdfName.DeviceRGB:
+ colorspace = 'RGB'
+ elif colorspace == PdfName.DeviceCMYK:
+ colorspace = 'CMYK'
+ else:
+ raise Exception("invalid colorspace")
+ im = Image.frombytes(colorspace, (int(imgprops.Width),
+ int(imgprops.Height)),
+ imgdata)
+ if orig_img.mode == '1':
+ orig_img = orig_img.convert("L")
+ elif orig_img.mode not in ("RGB", "L", "CMYK", "CMYK;I"):
+ orig_img = orig_img.convert("RGB")
+ self.assertEqual(im.tobytes(), orig_img.tobytes())
+ # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have
+ # the close() method
+ try:
+ im.close()
+ except AttributeError:
+ pass
+ # now use pdfrw to parse and then write out both pdfs and check the
+ # result for equality
+ y = PdfReader(out)
+ outx = BytesIO()
+ outy = BytesIO()
+ xwriter = PdfWriter()
+ ywriter = PdfWriter()
+ xwriter.trailer = x
+ ywriter.trailer = y
+ xwriter.write(outx)
+ ywriter.write(outy)
+ self.assertEqual(outx.getvalue(), outy.getvalue())
+ # the python-pil version 2.3.0-1ubuntu3 in Ubuntu does not have the
+ # close() method
+ try:
+ orig_img.close()
+ except AttributeError:
+ pass
+ if with_pdfrw:
+ setattr(TestImg2Pdf, "test_%s_with_pdfrw" % test_name, handle)
+ else:
+ setattr(TestImg2Pdf, "test_%s_without_pdfrw" % test_name, handle)
+
+ return unittest.TestSuite((
+ unittest.makeSuite(TestImg2Pdf),
+ ))
diff --git a/src/tests/input/CMYK.jpg b/src/tests/input/CMYK.jpg
new file mode 100644
index 0000000..44213a8
--- /dev/null
+++ b/src/tests/input/CMYK.jpg
Binary files differ
diff --git a/src/tests/input/normal.jpg b/src/tests/input/normal.jpg
new file mode 100644
index 0000000..2c036e9
--- /dev/null
+++ b/src/tests/input/normal.jpg
Binary files differ
diff --git a/src/tests/input/normal.png b/src/tests/input/normal.png
new file mode 100644
index 0000000..87b9a6e
--- /dev/null
+++ b/src/tests/input/normal.png
Binary files differ
diff --git a/src/tests/output/CMYK.jpg.pdf b/src/tests/output/CMYK.jpg.pdf
new file mode 100644
index 0000000..bfe67f3
--- /dev/null
+++ b/src/tests/output/CMYK.jpg.pdf
Binary files differ
diff --git a/src/tests/output/CMYK.tif.pdf b/src/tests/output/CMYK.tif.pdf
new file mode 100644
index 0000000..b00586b
--- /dev/null
+++ b/src/tests/output/CMYK.tif.pdf
Binary files differ
diff --git a/src/tests/output/normal.jpg.pdf b/src/tests/output/normal.jpg.pdf
new file mode 100644
index 0000000..87d2645
--- /dev/null
+++ b/src/tests/output/normal.jpg.pdf
Binary files differ
diff --git a/src/tests/output/normal.png.pdf b/src/tests/output/normal.png.pdf
new file mode 100644
index 0000000..2628c5d
--- /dev/null
+++ b/src/tests/output/normal.png.pdf
Binary files differ