summaryrefslogtreecommitdiff
path: root/doc/source/description/sift.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/source/description/sift.rst')
-rw-r--r--doc/source/description/sift.rst314
1 files changed, 168 insertions, 146 deletions
diff --git a/doc/source/description/sift.rst b/doc/source/description/sift.rst
index d4a9fad..71ba9c5 100644
--- a/doc/source/description/sift.rst
+++ b/doc/source/description/sift.rst
@@ -1,25 +1,27 @@
General introduction to sift.
=============================
-silx.image.sift, a parallel version of SIFT algorithm
------------------------------------------------------
+silx.opencl.sift, a parallel version of SIFT algorithm
+------------------------------------------------------
SIFT (Scale-Invariant Feature Transform) is an algorithm developed by David Lowe in 1999.
It is a worldwide reference for image alignment and object recognition.
The robustness of this method enables to detect features at different scales,
angles and illumination of a scene.
The implementation available in silx uses OpenCL, meaning that it can run on
-Graphics Processing Units and Central Processing Units as well.
-Interest points are detected in the image, then data structures called
-*descriptors* are built to be characteristic of the scene, so that two different
-images of the same scene have similar descriptors. They are robust to transformations
-like translation, rotation, rescaling and illumination change, which make SIFT
-interesting for image stitching.
+Graphics Processing Units (GPU) as well as on Central Processing Units (CPU).
+Interesting points (keypoints) are detected in the image, then data structures called
+*descriptors*, characteristic of the scene, are extracted so that two different
+images of the same scene exhibit similar descriptors.
+The descriptors are insensitive to transformations like translation, rotation,
+rescaling and illumination changes, making SIFT interesting for image stitching.
+
In the fist stage, descriptors are computed from the input images.
-Then, they are compared to determine the geometric transformation to apply in
-order to align the images.
-silx.image.sift can run on most graphic cards and CPU, making it usable on many setups.
-OpenCL processes are handled from Python with PyOpenCL, a module to access OpenCL parallel computation API.
+Then, they are compared against each other to determine the geometric transformation
+to apply in order to align the images.
+*silx.opencl.sift* can run on most GPU and CPU, offerng a large flexibility.
+OpenCL processes are driven from Python via PyOpenCL, a module to access OpenCL
+the parallel computation API.
Introduction
@@ -27,48 +29,40 @@ Introduction
The European Synchrotron Radiation Facility (ESRF) beamline ID21 developed a
full-field method for X-ray absorption near-edge spectroscopy (XANES).
-Since the flat field images are not acquired simultaneously with the sample
-transmission images, a realignment procedure has to be performed.
-Serial SIFT implementation used to take about 8 seconds per frame, and one stack
-can have up to 500 frames.
-It is a bottleneck in the global process, therefore a parallel version had to be implemented.
-silx.image.sift differs from existing parallel implementations of SIFT in the way
-that the whole process is done on the device, enabling crucial speed-ups.
+Since the flat field images and the sample transmission images are not acquired
+simultaneously, an image realignment procedure has to be performed.
+The sequential SIFT implementation used to take about 8 seconds per frame, and
+one typical stack can have up to 500 frames.
+This is a bottleneck in the global process, hence a parallel version had to be
+implemented.
+*silx.opencl.sift* differs from existing parallel implementations of SIFT in the way
+that the whole process is executed on the device, enabling crucial speed-ups.
-Launching silx.image.sift
--------------------------
+Launching silx.opencl.sift
+--------------------------
-silx.image.sift is written in Python, and handles the OpenCL kernels with PyOpenCL.
+silx.opencl.sift is written in Python, and handles the OpenCL kernels through PyOpenCL.
This enables a simple and efficient access to GPU resources.
-The project is installed as a Python library and can be imported in a script.
+The package is installed as a Python library and can be imported in a script.
-Before image alignment, points of interest (keypoints) have to be detected in each image.
+Prior to image alignment, keypoints have to be detected in each image and their
+descriptors calculated.
The whole process can be launched by several lines of code.
How to use it
.............
-silx.image.sift is installed as part of silx and requires PyOpenCL to be useable.
-It generates a library that can be imported, then used to compute a list of descriptors from an image.
-The image can be in RGB values, but all the process is done on grayscale values.
-One can specify the devicetype, either CPU or GPU.
-
-.. Although being integrated in ESRF EDNA framework for online image alignment,
- and thus mostly used by developers, silx.image.sift provides example scripts.
-
- .. code-block:: python
-
- #TODO: update
- python test/demo.py --type=GPU my_image.jpg
-
-This computes and shows the keypoints on the input image.
-One can also launch silx.image.sift interactively with iPython :
+*silx.opencl.sift* is installed as a part of *silx* and requires PyOpenCL.
+One can also launch silx.opencl.sift interactively with IPython:
+This computes and shows the last keypoint of the input image.
+Color input images are converted to grayscale automatically during the processing.
+One can specify the devicetype, either "CPU" or "GPU".
.. code-block:: python
- from silx.image import sift
+ from silx.opencl import sift
import numpy
import scipy.misc
image_rgb = scipy.misc.imread("my_image.jpg")
@@ -77,44 +71,48 @@ One can also launch silx.image.sift interactively with iPython :
print(kp[-1])
-silx.image.sift files
-.....................
+silx.opencl.sift files
+......................
-The Python sources are in the ``silx.image.sift`` module:
+The Python sources are in the ``silx.opencl.sift`` module:
.. code-block:: python
from silx.image import sift
print(sift.__file__)
-The file ``plan.py`` corresponds to the keypoint extraction and handles the whole process:
-from kernel compilation to descriptors generation as numpy array.
-The OpenCL kernels are distributed as *resources* in the "openCL" folder; they are compiled on the fly.
-Several kernels have multiple implementations, depending the architecture to run on.
+The file ``plan.py`` contains the keypoint extraction code and drives the whole
+process: from kernel compilation to descriptors generation as numpy array.
+The OpenCL kernels are distributed as *resources* in the "openCL" folder; they
+are compiled on the fly.
+Several kernels have multiple implementations, depending on the computer architecture
+to run on.
-The file ``match.py`` does the matching between two lists of keypoints returned by ``plan.py``.
+The file ``match.py`` carries out the matching between two lists of keypoints
+returned by ``plan.py``.
-The file ``alignment.py`` does the image alignment : it computes the keypoints
+The file ``alignment.py`` performes the image alignment : it computes the keypoints
from two images (``plan.py``), then uses the matching result (``match.py``)
to find out the transformation aligning the second image on the first.
-Each of those module contain a class which holds GPU contexts, memory and kernel.
+Each of these modules contains a class which holds GPU contexts, memory and kernel.
They are expensive to instantiate and should be re-used as much as possible.
Overall process
***************
-The different steps of SIFT are handled by ``plan.py``.
-When launched, it automatically choose the best device to run on, unless a device
+The different steps of SIFT are controled by ``plan.py``.
+When launched, it automatically chooses the best device to run on, unless a device
is explicitly provided in the options.
All the OpenCL kernels that can be compiled are built on the fly.
-Buffers are pre-allocated on the device, and all the steps are executed on rge device (GPU).
+Buffers are pre-allocated on the device, and all the steps are executed on the device (GPU).
At each *octave* (scale level), keypoints are returned to the CPU and the buffers are re-used.
-Once the keypoints are computed, the keypoints of two different images can be compared.
+Once computed, the keypoints of two different images can be compared.
This matching is done by ``match.py``.
-It simply takes the descriptors of the two lists of keypoints, and compare them with a L1 distance.
-It returns a vector of *matchings*, i.e couples of similar keypoints.
+It simply takes the descriptors of the two lists of keypoints, and compares them
+using a L1 distance (absolute value).
+It returns a vector of *matching-keypoints*, i.e couples of similar keypoints.
For image alignment, ``alignment.py`` takes the matching vector between two images
and determines the transformation to be done in order to align the second image on the first.
@@ -137,25 +135,25 @@ The keypoints are detected in several steps according to Lowe's paper_ :
keypoints :math:`(x,y,s, \theta)`
* Descriptor computation: a histogram of orientations is built around every keypoint,
then concatenated in a 128-values vector.
- This vector is called *SIFT descriptor*, it is robust to rotation, illumination, translation and scaling.
+ This vector is called *SIFT descriptor*, it is insensitive to rotation, illumination, translation and scaling.
The scale variation is simulated by blurring the image.
-A very blurred image represents a scene seen from a distance, for small details are not visible.
+A very blurred image represents a scene seen from a distance, in which small
+details are no more visible.
Unlike existing parallel versions of SIFT, the entire process is done on the
device to avoid time-consuming transfers between CPU and GPU.
This leads to several tricky parts like the use of atomic instructions, or
-writing different versions of the same kernel to adapt to every platform.
-
+using different versions of the same kernel taylored for different platforms.
Keypoints detection
...................
The image is increasingly blurred to imitate the scale variations.
-This is done by convolving with a gaussian kernel.
-Then, consecutive blurs are subtracted to get *differences of gaussians (DoG)*.
+This is done by convolving the image with a Gaussian kernel.
+Then, consecutive blurs are subtracted to get *differences of Gaussians (DoG)*.
In these DoG, every pixel is tested. Let :math:`(x,y)` be the pixel position in
the current (blurred) image, and :math:`s` its *scale* (that is, the blur factor).
The point :math:`(x,y,s)` is a local maximum in the scale-space if
@@ -170,10 +168,10 @@ The point :math:`(x,y,s)` is a local maximum in the scale-space if
:alt: detection in scale-space
-For these steps, we highly benefit from the parallelism : every pixel is handled
-by a GPU thread.
-Besides, convolution is implemented in the direct space (without Fourier Transform)
-and is quite fast (50 times faster than the convolutions done by the C++ reference
+Those steps highly benefit from the parallelism of the OpenCL: every pixel is processed
+by a different thread.
+Besides, the convolution is implemented in the direct space (without Fourier Transform)
+and is quite fast (50 times faster than the convolutions in the C++ reference
implementation).
@@ -182,27 +180,28 @@ Keypoints refinement
At this stage, many keypoints are not reliable. Low-contrast keypoints are discarded,
and keypoints located on an edge are rejected as well.
-For keypoints located on an edge, principal curvature across the edge is much larger
-than the principal curvature along it. Finding these principal curvatures amounts
+For keypoints located on an edge, the principal curvature across the edge is much larger
+than the principal curvature along it.
+Finding these principal curvatures amounts
to solving for the eigenvalues of the second-order Hessian matrix of the current DoG.
-The ratio of the eigenvalues :math:`r` is compared to a threshold :math:`\dfrac{(r+1)^2}{r} < R`
-with R defined by taking r=10.
-To improve keypoints accuracy, the coordinates are interpolated with a second-order Taylor development.
+To improve keypoints accuracy, the coordinates are interpolated with a second-order
+Taylor series.
.. math::
D \left( \vec{x} + \vec{\delta_x} \right) \simeq D + \dfrac{\partial D}{\partial \vec{x}} \cdot \vec{\delta_x} + \dfrac{1}{2} \left( \vec{\delta_x} \right)^T \cdot \left( H \right) \cdot \vec{\delta_x} \qquad \text{with } H = \dfrac{\partial^2 D}{\partial \vec{x}^2}
-Keypoints that were too far from a *true* (interpolated) extremum are rejected.
+Keypoints too far from a *true* (interpolated) extremum are also rejected.
Orientation assignment
......................
-An orientation has to be assigned to each keypoint so that SIFT descriptors will
-be invariant to rotation. For each blurred version of the image, the gradient
+An orientation has to be assigned to each keypoint, so that SIFT descriptors will
+be invariant to rotation.
+For each blurred version of the image, the gradient
magnitude and orientation are computed.
From the neighborhood of a keypoint, a histogram of orientations is built
(36 bins, 1 bin per 10 degrees).
@@ -218,28 +217,38 @@ keypoint with a different orientation.
The parallel implementation of this step is complex, and the performances strongly
depend on the graphic card the program is running on.
-That is why there are different files for this kernel, adapted for different platforms.
-The file to compile is automatically determined in ``plan.py``.
+That is why different opencl kernels have been written with the same signature,
+but adapted to different platforms.
+The kernel to be used are automatically determined in ``plan.py``.
Descriptor computation
......................
A histogram of orientations is built around every keypoint.
-The neighborhood is divided into 4 regions of 4 sub-regions of 4x4 pixels.
+The neighborhood is divided into 4 regions, each of 4 sub-regions of 4x4 pixels.
In every sub-region, a 8-bin histogram is computed; then, all the histograms are
-concatenated in a 128-values descriptor.
-The histogram is weighted by the gradient magnitudes and the current scale factor,
-so that the descriptor is robust to rotation, illumination, translation and scaling.
-Here again, there are several files adapted to different platforms.
+concatenated in a 128-value descriptor (4x4x8 = 128).
+The concatenated histogram is weighted by the gradient magnitudes and the current
+scale factor, so that the descriptor is invariant to rotation, illumination,
+translation and scaling.
+Here again, there are several kernels adapted to different platforms.
Image matching and alignment
----------------------------
-Matching is also explained in this tutorial, once the keypoints are
+Matching is also explained in this tutorial: once the keypoints are extracted
+from two images, their descriptors (128-value vector) are compared two by two,
+using the L1-norm (sum of absolute value difference).
+For a given keypoint K1 from the image 1, a keypoint K2 from image 2 matches K1
+if the L1-distance between K1-K2 is much shorter than any other pair K1-Kn for
+any other keypoint of image 2.
+Once keypoints are matched, building the afine transformation with the
+least-squares displacement is done using a singular value decomposition of the
+over-complete system of equation.
.. figure:: img/sift_match1.png
:align: center
@@ -251,12 +260,15 @@ Matching is also explained in this tutorial, once the keypoints are
:alt: Another example of image matching for pattern recognition
+
Performances
------------
-The aim of silx.image.sift is to fasten the SIFT keypoint extraction by running it on GPU.
-On big images with many keypoints, it enables a speed-up between 30 and 50 times.
-The following benchmark was done on an Intel Xeon E5-2667 (2.90GHz, 2x6 cores)
+The aim of silx.opencl.sift is to speed-up the SIFT keypoint extraction by
+running it on GPU.
+On big images with many keypoints, this module enables a speed-up between 30 and
+50 times.
+The following benchmark has been carried out on an Intel Xeon E5-2667 (2.90GHz, 2x6 cores)
CPU, and a NVidia Tesla K20m GPU.
@@ -264,7 +276,8 @@ CPU, and a NVidia Tesla K20m GPU.
:align: center
:alt: Benchmark GPU vs CPU
-silx.image.sift can also be run on CPU, even running up to 10 times faster than the C++ implementation.
+*silx.opencl.sift* can also be run on CPU, even running up to 10 times faster
+than the reference C++ implementation.
.. figure:: img/sift_bench_cpu0.png
:align: center
@@ -275,125 +288,134 @@ silx.image.sift can also be run on CPU, even running up to 10 times faster than
SIFT parameters
---------------
-Command line parameters
-.......................
+SiftPlan constructor parameters
+...............................
-When launched from the command line, silx.image.sift can handle several options
+When instanciated, silx.opencl.sift.SiftPlan can take several optionnal parameters
like the device to run on and the *number of pixels per keypoint*.
-By default ``PIX_PER_KP`` is 10, meaning that we gess one keypoint will be found
-for every 10 pixels.
-This is for buffers allocation on the device, as the number of keypoints that
+By default ``PIX_PER_KP`` is 10, meaning that on guesses one keypoint will be found
+every 10 pixels.
+This initial step is setout for buffer allocation on the device, as the number
+of keypoints that
will be found is unknown, and strongly depends of the type of image.
-10 pixels per keypoint is a high estimation, even for images with many features
-like landscapes.
-For example, this 5.8 MPixels image_ gives about 2500 keypoints, which makes
-2270 pixels per keypoints.
+10 pixels per keypoint is a conservative estimation, even for images with many
+features like landscapes.
+For example, a 5.8 MPixels image (of ESRF) yields about 2500 keypoints, hence
+2270 pixels per keypoint.
-.. _image: http://www.lightsources.org/imagebank/image/esr032
-
-If you have big images with few features and the image does not fit on the GPU,
-you can increase ``PIX_PER_KP`` in the command line options in order to
+If one has large images with few features and the image does not fit on the GPU,
+you can increase ``PIX_PER_KP`` in the constructor options in order to
decrease the amount of memory required.
Advanced SIFT parameters
........................
-The file ``param.py`` contains SIFT default parameters, recommended by
-David Lowe in his paper_ or by the authors of the C++ version in ASIFT_.
-You should not modify these values unless you know what you are doing.
-Some parameters require to understand several aspects of the algorithm,
+The file ``param.py`` in the source folder contains SIFT default parameters,
+recommended by David Lowe in his paper_ or by the authors of the C++ version in ASIFT_.
+The user should not modify these values unless one is an advanced SIFT-user.
+Some parameters require the understanding of several aspects of the algorithm,
explained in Lowe's original paper.
.. _ASIFT: http://www.ipol.im/pub/art/2011/my-asift
-``DoubleImSize`` (0 by default) is for the pre-blur factor of the image.
+``DoubleImSize`` (0 by default) stands for the pre-blur factor of the image.
At the beginning, the original image is blurred (*prior-smoothing*) to eliminate noise.
-The standard deviation of the gaussian filter is either ``1.52`` if DoubleImSize is 0, or ``1.25`` if DoubleImSize is 1.
-Setting this parameter to 1 decrease the prior-smoothing factor, the algorithm will certainly find more keypoints but less accurate.
+The standard deviation of the Gaussian filter is either ``1.52`` (if DoubleImSize is 0),
+or ``1.25`` (if DoubleImSize is 1).
+Setting this parameter to 1 decreases the prior-smoothing factor, hence the algorithm
+will certainly find more keypoints but less accurate, as they result from the noise of
+the first octave.
``InitSigma`` (1.6 by default) is the prior-smoothing factor.
-The original image is blurred by a gaussian filter which standard deviation is
+The original image is blurred by a Gaussian filter which standard deviation is
:math:`\sqrt{\text{InitSigma}^2 - c^2}`.
-with ``c == 0.5`` if ``DoubleImSize == 0`` or ``c == 1`` otherwise.
-If the prior-smoothing factor is decreased, the algorithm will certainly find more
-keypoint, but they will be less accurate.
+with ``c = 0.5`` if ``DoubleImSize == 0`` or ``c = 1`` otherwise.
+Once again, if the prior-smoothing factor is decreased, the algorithm will find
+more keypoint in the first octave, located in the noise of the image.
-``BorderDist`` (5 by default) is the minimal distance to borders:
+``BorderDist`` (5 by default) is the minimum distance from a keypoint to the image
+borders:
+Border to create artefacts in the bluring procedure and in the gradiant.
pixels that are less than ``BorderDist`` pixels from the border will be ignored
for the processing.
-If features are likely to be near the borders, decreasing this parameter will
-enable to detect them.
+If the featuring keypoints are near the borders, decreasing this parameter will
+enable onr to detect them but their descriptor are probably less reliable.
``Scales`` (3 by default) is the number of Difference of Gaussians (DoG) that will
-actually be used for keypoints detection.
-In the gaussian pyramid, Scales+3 blurs are made, from which Scales+2 DoGs are computed.
+actually be used for keypoints detection within an octave.
+In the Gaussian hierarchical pyramid, Scales+3 subsequently blured images are
+used to compute Scales+2 DoGs.
The DoGs in the middle are used to detect keypoints in the scale-space.
If ``Scales`` is 3, there will be 6 blurs and 5 DoGs in an octave, and 3 DoGs
will be used for local extrema detection.
-Increasing Scales will make more blurred images in an octave, so SIFT can detect
-a few more strong keypoints.
-However, it will slow down the execution for few additional keypoints.
+Increasing Scales will produce more blurred images in an octave, thus SIFT can detect
+a few more reliable keypoints,
+however, it will slow down the execution for a few additional keypoints.
``PeakThresh`` (255 * 0.04/3.0 by default) is the grayscale threshold for keypoints
refinement.
-To discard low-contrast keypoints, every pixel which grayscale value is below
-this threshold can not become a keypoint.
+To discard low-contrast keypoints, every pixel whose grayscale value is below
+this threshold cannot become a keypoint.
Decreasing this threshold will lead to a larger number of keypoints, which can
be useful for detecting features in low-contrast areas.
``EdgeThresh`` (0.06 by default) and ``EdgeThresh1`` (0.08 by default) are the
-limit ratio of principal curvatures while testing if keypoints are located on an edge.
+limit ratios of principal curvatures while testing if keypoints are located on an edge.
Those points are not reliable for they are sensivite to noise.
For such points, the principal curvature across the edge is much larger than the
principal curvature along it.
Finding these principal curvatures amounts to solving for the eigenvalues of the
second-order Hessian matrix of the current DoG.
-The ratio of the eigenvalues :math:`r` is compared to a threshold :math:`\dfrac{(r+1)^2}{r} < R`
-with R defined by taking r=10, which gives
-:math:`\frac{(r+1)^2}{r} = 12.1`, and 1/12.1 = 0.08.
+The ratio of the curvatures is compared to a threshold.
In the first octave, the value 0.06 is taken instead of 0.08.
-Decreasing these values lead to a larger number of keypoints, but sensivite to
-noise because they are located on edges.
-
-``OriSigma`` (1.5 by default) is related to the radius of gaussian weighting in
-orientation assignment.
-In this stage, for a given keypoint, we look in a region of radius
-:math:`3 \times s \times \text{OriSigma}` with :math:`s` the scale of the current keypoint.
-Increasing it will not lead to increase the number of keypoints found;
-it will take a larger area into account while computing the orientation assignment.
-Thus, the descriptor will be characteristic of a larger neighbourhood.
+Decreasing these values leads to a larger number of keypoints, but more sensivite to
+noise because they are located on edges, hence sliding.
+
+``OriSigma`` (1.5 by default) is related to the radius of Gaussian weighting while
+assessing the orientation for a keypoint.
+At this stage, for a given keypoint, we look into a region of radius
+:math:`3 \times s \times \text{OriSigma}`, :math:`s` being the scale of the
+current keypoint.
+Increasing ``OriSigma`` will not lead to increasing the number of keypoints found;
+it will instead take a larger area into account while determining the orientation
+assignment.
+The descriptor will therefore be characteristic of a larger neighbourhood.
``MatchRatio`` (0.73 by default) is the threshold used for image alignment.
Descriptors are compared with a :math:`L^1`-distance.
-For a given descriptor, if the ratio between the closest-neighbor the
-second-closest-neighbor is below this threshold, then a matching is added to the list.
-Increasing this value leads to a larger number of matchings, certainly less accurate.
+For a given descriptor, if the ratio of distances between the closest-neighbour and the
+second-closest-neighbour is below this threshold, then the closest-neighbour
+matches the descriptor and the matching pair is added to the list.
+Increasing this value leads to a larger number of matching occurences, certainly
+less accurate.
Region of Interest for image alignment
......................................
-When processing the image matching, a region of interest (ROI) can be specified
-on the image.
-It is a binary image which can have any shape.
-For instance, if a sample is centered on the image, the user can select the
-center of the image before processing.
+When performing the image matching, a region of interest (ROI) can be specified
+in the image.
+The ROI is a binary image which can have any shape.
+For instance, if a sample is centred on the image, the user can select the
+centre of the image before processing it.
.. figure:: img/sift_frame_ROI.png
:align: center
:alt: Sample with region of interest
-It both accelerates the processing and avoids to do match keypoints that are not
-on the sample.
+This both accelerates the processing and avoids trying to match keypoints that
+are not on the sample.
References
..........
-- David G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60, no 2, 2004, p. 91–110 - "http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf"
+- David G. Lowe, Distinctive image features from scale-invariant keypoints,
+ International Journal of Computer Vision, vol. 60, no 2, 2004, p. 91–110 -
+ "http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf"