summaryrefslogtreecommitdiff log msg author committer range
path: root/doc/source/description/sift.rst
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399  .highlight .hll { background-color: #ffffcc } .highlight .c { color: #888888 } /* Comment */ .highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */ .highlight .k { color: #008800; font-weight: bold } /* Keyword */ .highlight .ch { color: #888888 } /* Comment.Hashbang */ .highlight .cm { color: #888888 } /* Comment.Multiline */ .highlight .cp { color: #cc0000; font-weight: bold } /* Comment.Preproc */ .highlight .cpf { color: #888888 } /* Comment.PreprocFile */ .highlight .c1 { color: #888888 } /* Comment.Single */ .highlight .cs { color: #cc0000; font-weight: bold; background-color: #fff0f0 } /* Comment.Special */ .highlight .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */ .highlight .ge { font-style: italic } /* Generic.Emph */ .highlight .gr { color: #aa0000 } /* Generic.Error */ .highlight .gh { color: #333333 } /* Generic.Heading */ .highlight .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */ .highlight .go { color: #888888 } /* Generic.Output */ .highlight .gp { color: #555555 } /* Generic.Prompt */ .highlight .gs { font-weight: bold } /* Generic.Strong */ .highlight .gu { color: #666666 } /* Generic.Subheading */ .highlight .gt { color: #aa0000 } /* Generic.Traceback */ .highlight .kc { color: #008800; font-weight: bold } /* Keyword.Constant */ .highlight .kd { color: #008800; font-weight: bold } /* Keyword.Declaration */ .highlight .kn { color: #008800; font-weight: bold } /* Keyword.Namespace */ .highlight .kp { color: #008800 } /* Keyword.Pseudo */ .highlight .kr { color: #008800; font-weight: bold } /* Keyword.Reserved */ .highlight .kt { color: #888888; font-weight: bold } /* Keyword.Type */ .highlight .m { color: #0000DD; font-weight: bold } /* Literal.Number */ .highlight .s { color: #dd2200; background-color: #fff0f0 } /* Literal.String */ .highlight .na { color: #336699 } /* Name.Attribute */ .highlight .nb { color: #003388 } /* Name.Builtin */ .highlight .nc { color: #bb0066; font-weight: bold } /* Name.Class */ .highlight .no { color: #003366; font-weight: bold } /* Name.Constant */ .highlight .nd { color: #555555 } /* Name.Decorator */ .highlight .ne { color: #bb0066; font-weight: bold } /* Name.Exception */ .highlight .nf { color: #0066bb; font-weight: bold } /* Name.Function */ .highlight .nl { color: #336699; font-style: italic } /* Name.Label */ .highlight .nn { color: #bb0066; font-weight: bold } /* Name.Namespace */ .highlight .py { color: #336699; font-weight: bold } /* Name.Property */ .highlight .nt { color: #bb0066; font-weight: bold } /* Name.Tag */ .highlight .nv { color: #336699 } /* Name.Variable */ .highlight .ow { color: #008800 } /* Operator.Word */ .highlight .w { color: #bbbbbb } /* Text.Whitespace */ .highlight .mb { color: #0000DD; font-weight: bold } /* Literal.Number.Bin */ .highlight .mf { color: #0000DD; font-weight: bold } /* Literal.Number.Float */ .highlight .mh { color: #0000DD; font-weight: bold } /* Literal.Number.Hex */ .highlight .mi { color: #0000DD; font-weight: bold } /* Literal.Number.Integer */ .highlight .mo { color: #0000DD; font-weight: bold } /* Literal.Number.Oct */ .highlight .sa { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Affix */ .highlight .sb { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Backtick */ .highlight .sc { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Char */ .highlight .dl { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Delimiter */ .highlight .sd { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Doc */ .highlight .s2 { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Double */ .highlight .se { color: #0044dd; background-color: #fff0f0 } /* Literal.String.Escape */ .highlight .sh { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Heredoc */ .highlight .si { color: #3333bb; background-color: #fff0f0 } /* Literal.String.Interpol */ .highlight .sx { color: #22bb22; background-color: #f0fff0 } /* Literal.String.Other */ .highlight .sr { color: #008800; background-color: #fff0ff } /* Literal.String.Regex */ .highlight .s1 { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Single */ .highlight .ss { color: #aa6600; background-color: #fff0f0 } /* Literal.String.Symbol */ .highlight .bp { color: #003388 } /* Name.Builtin.Pseudo */ .highlight .fm { color: #0066bb; font-weight: bold } /* Name.Function.Magic */ .highlight .vc { color: #336699 } /* Name.Variable.Class */ .highlight .vg { color: #dd7700 } /* Name.Variable.Global */ .highlight .vi { color: #3333bb } /* Name.Variable.Instance */ .highlight .vm { color: #336699 } /* Name.Variable.Magic */ .highlight .il { color: #0000DD; font-weight: bold } /* Literal.Number.Integer.Long */General introduction to sift. ============================= silx.image.sift, a parallel version of SIFT algorithm ----------------------------------------------------- SIFT (Scale-Invariant Feature Transform) is an algorithm developed by David Lowe in 1999. It is a worldwide reference for image alignment and object recognition. The robustness of this method enables to detect features at different scales, angles and illumination of a scene. The implementation available in silx uses OpenCL, meaning that it can run on Graphics Processing Units and Central Processing Units as well. Interest points are detected in the image, then data structures called *descriptors* are built to be characteristic of the scene, so that two different images of the same scene have similar descriptors. They are robust to transformations like translation, rotation, rescaling and illumination change, which make SIFT interesting for image stitching. In the fist stage, descriptors are computed from the input images. Then, they are compared to determine the geometric transformation to apply in order to align the images. silx.image.sift can run on most graphic cards and CPU, making it usable on many setups. OpenCL processes are handled from Python with PyOpenCL, a module to access OpenCL parallel computation API. Introduction ------------ The European Synchrotron Radiation Facility (ESRF) beamline ID21 developed a full-field method for X-ray absorption near-edge spectroscopy (XANES). Since the flat field images are not acquired simultaneously with the sample transmission images, a realignment procedure has to be performed. Serial SIFT implementation used to take about 8 seconds per frame, and one stack can have up to 500 frames. It is a bottleneck in the global process, therefore a parallel version had to be implemented. silx.image.sift differs from existing parallel implementations of SIFT in the way that the whole process is done on the device, enabling crucial speed-ups. Launching silx.image.sift ------------------------- silx.image.sift is written in Python, and handles the OpenCL kernels with PyOpenCL. This enables a simple and efficient access to GPU resources. The project is installed as a Python library and can be imported in a script. Before image alignment, points of interest (keypoints) have to be detected in each image. The whole process can be launched by several lines of code. How to use it ............. silx.image.sift is installed as part of silx and requires PyOpenCL to be useable. It generates a library that can be imported, then used to compute a list of descriptors from an image. The image can be in RGB values, but all the process is done on grayscale values. One can specify the devicetype, either CPU or GPU. .. Although being integrated in ESRF EDNA framework for online image alignment, and thus mostly used by developers, silx.image.sift provides example scripts. .. code-block:: python #TODO: update python test/demo.py --type=GPU my_image.jpg This computes and shows the keypoints on the input image. One can also launch silx.image.sift interactively with iPython : .. code-block:: python from silx.image import sift import numpy import scipy.misc image_rgb = scipy.misc.imread("my_image.jpg") sift_ocl = sift.SiftPlan(template=image_rgb, devicetype="GPU") kp = sift_ocl.keypoints(image_rgb) print(kp[-1]) silx.image.sift files ..................... The Python sources are in the silx.image.sift module: .. code-block:: python from silx.image import sift print(sift.__file__) The file plan.py corresponds to the keypoint extraction and handles the whole process: from kernel compilation to descriptors generation as numpy array. The OpenCL kernels are distributed as *resources* in the "openCL" folder; they are compiled on the fly. Several kernels have multiple implementations, depending the architecture to run on. The file match.py does the matching between two lists of keypoints returned by plan.py. The file alignment.py does the image alignment : it computes the keypoints from two images (plan.py), then uses the matching result (match.py) to find out the transformation aligning the second image on the first. Each of those module contain a class which holds GPU contexts, memory and kernel. They are expensive to instantiate and should be re-used as much as possible. Overall process *************** The different steps of SIFT are handled by plan.py. When launched, it automatically choose the best device to run on, unless a device is explicitly provided in the options. All the OpenCL kernels that can be compiled are built on the fly. Buffers are pre-allocated on the device, and all the steps are executed on rge device (GPU). At each *octave* (scale level), keypoints are returned to the CPU and the buffers are re-used. Once the keypoints are computed, the keypoints of two different images can be compared. This matching is done by match.py. It simply takes the descriptors of the two lists of keypoints, and compare them with a L1 distance. It returns a vector of *matchings*, i.e couples of similar keypoints. For image alignment, alignment.py takes the matching vector between two images and determines the transformation to be done in order to align the second image on the first. SIFT keypoints computation -------------------------- The keypoints are detected in several steps according to Lowe's paper_ : .. _paper: http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf * Keypoints detection: local extrema are detected in the *scale-space* :math:(x, y, s). Every pixel is compared to its neighborhood in the image itself, and in the previous/next scale factor images. * Keypoints refinement: keypoints located on corners are discarded. Additionally, a second-order interpolation is done to improve the keypoints accuracy, modifying the coordinates :math:(x, y, s). * Orientation assignment: a characteristic orientation is assigned to the keypoints :math:(x,y,s, \theta) * Descriptor computation: a histogram of orientations is built around every keypoint, then concatenated in a 128-values vector. This vector is called *SIFT descriptor*, it is robust to rotation, illumination, translation and scaling. The scale variation is simulated by blurring the image. A very blurred image represents a scene seen from a distance, for small details are not visible. Unlike existing parallel versions of SIFT, the entire process is done on the device to avoid time-consuming transfers between CPU and GPU. This leads to several tricky parts like the use of atomic instructions, or writing different versions of the same kernel to adapt to every platform. Keypoints detection ................... The image is increasingly blurred to imitate the scale variations. This is done by convolving with a gaussian kernel. Then, consecutive blurs are subtracted to get *differences of gaussians (DoG)*. In these DoG, every pixel is tested. Let :math:(x,y) be the pixel position in the current (blurred) image, and :math:s its *scale* (that is, the blur factor). The point :math:(x,y,s) is a local maximum in the scale-space if * :math:D(x-1, y, s) < D(x,y,s) and :math:D(x,y,s) > D(x+1, y, s) (local maximum in :math:x) * :math:D(x, y-1, s) < D(x,y,s) and :math:D(x,y,s) > D(x, y+1, s) (local maximum in :math:y) * :math:D(x, y, s -1) < D(x,y,s) and :math:D(x,y,s) > D(x, y, s+1) (local maximum in :math:s) .. figure:: img/sift_dog1.png :align: center :alt: detection in scale-space For these steps, we highly benefit from the parallelism : every pixel is handled by a GPU thread. Besides, convolution is implemented in the direct space (without Fourier Transform) and is quite fast (50 times faster than the convolutions done by the C++ reference implementation). Keypoints refinement .................... At this stage, many keypoints are not reliable. Low-contrast keypoints are discarded, and keypoints located on an edge are rejected as well. For keypoints located on an edge, principal curvature across the edge is much larger than the principal curvature along it. Finding these principal curvatures amounts to solving for the eigenvalues of the second-order Hessian matrix of the current DoG. The ratio of the eigenvalues :math:r is compared to a threshold :math:\dfrac{(r+1)^2}{r} < R with R defined by taking r=10. To improve keypoints accuracy, the coordinates are interpolated with a second-order Taylor development. .. math:: D \left( \vec{x} + \vec{\delta_x} \right) \simeq D + \dfrac{\partial D}{\partial \vec{x}} \cdot \vec{\delta_x} + \dfrac{1}{2} \left( \vec{\delta_x} \right)^T \cdot \left( H \right) \cdot \vec{\delta_x} \qquad \text{with } H = \dfrac{\partial^2 D}{\partial \vec{x}^2} Keypoints that were too far from a *true* (interpolated) extremum are rejected. Orientation assignment ...................... An orientation has to be assigned to each keypoint so that SIFT descriptors will be invariant to rotation. For each blurred version of the image, the gradient magnitude and orientation are computed. From the neighborhood of a keypoint, a histogram of orientations is built (36 bins, 1 bin per 10 degrees). .. figure:: img/sift_orientation.png :align: center :alt: orientation assignment The maximum value of this histogram is the dominant orientation ; it is defined as the characteristic orientation of the keypoint. Additionally, every peak greater than 80% of the maximum generates a new keypoint with a different orientation. The parallel implementation of this step is complex, and the performances strongly depend on the graphic card the program is running on. That is why there are different files for this kernel, adapted for different platforms. The file to compile is automatically determined in plan.py. Descriptor computation ...................... A histogram of orientations is built around every keypoint. The neighborhood is divided into 4 regions of 4 sub-regions of 4x4 pixels. In every sub-region, a 8-bin histogram is computed; then, all the histograms are concatenated in a 128-values descriptor. The histogram is weighted by the gradient magnitudes and the current scale factor, so that the descriptor is robust to rotation, illumination, translation and scaling. Here again, there are several files adapted to different platforms. Image matching and alignment ---------------------------- Matching is also explained in this tutorial, once the keypoints are .. figure:: img/sift_match1.png :align: center :alt: Example of image matching for pattern recognition .. figure:: img/sift_match2.png :align: center :alt: Another example of image matching for pattern recognition Performances ------------ The aim of silx.image.sift is to fasten the SIFT keypoint extraction by running it on GPU. On big images with many keypoints, it enables a speed-up between 30 and 50 times. The following benchmark was done on an Intel Xeon E5-2667 (2.90GHz, 2x6 cores) CPU, and a NVidia Tesla K20m GPU. .. figure:: img/sift_bench_gpu0.png :align: center :alt: Benchmark GPU vs CPU silx.image.sift can also be run on CPU, even running up to 10 times faster than the C++ implementation. .. figure:: img/sift_bench_cpu0.png :align: center :alt: Benchmark on CPU : OpenCL implementation vs C++ implementation SIFT parameters --------------- Command line parameters ....................... When launched from the command line, silx.image.sift can handle several options like the device to run on and the *number of pixels per keypoint*. By default PIX_PER_KP is 10, meaning that we gess one keypoint will be found for every 10 pixels. This is for buffers allocation on the device, as the number of keypoints that will be found is unknown, and strongly depends of the type of image. 10 pixels per keypoint is a high estimation, even for images with many features like landscapes. For example, this 5.8 MPixels image_ gives about 2500 keypoints, which makes 2270 pixels per keypoints. .. _image: http://www.lightsources.org/imagebank/image/esr032 If you have big images with few features and the image does not fit on the GPU, you can increase PIX_PER_KP in the command line options in order to decrease the amount of memory required. Advanced SIFT parameters ........................ The file param.py contains SIFT default parameters, recommended by David Lowe in his paper_ or by the authors of the C++ version in ASIFT_. You should not modify these values unless you know what you are doing. Some parameters require to understand several aspects of the algorithm, explained in Lowe's original paper. .. _ASIFT: http://www.ipol.im/pub/art/2011/my-asift DoubleImSize (0 by default) is for the pre-blur factor of the image. At the beginning, the original image is blurred (*prior-smoothing*) to eliminate noise. The standard deviation of the gaussian filter is either 1.52 if DoubleImSize is 0, or 1.25 if DoubleImSize is 1. Setting this parameter to 1 decrease the prior-smoothing factor, the algorithm will certainly find more keypoints but less accurate. InitSigma (1.6 by default) is the prior-smoothing factor. The original image is blurred by a gaussian filter which standard deviation is :math:\sqrt{\text{InitSigma}^2 - c^2}. with c == 0.5 if DoubleImSize == 0 or c == 1 otherwise. If the prior-smoothing factor is decreased, the algorithm will certainly find more keypoint, but they will be less accurate. BorderDist (5 by default) is the minimal distance to borders: pixels that are less than BorderDist pixels from the border will be ignored for the processing. If features are likely to be near the borders, decreasing this parameter will enable to detect them. Scales (3 by default) is the number of Difference of Gaussians (DoG) that will actually be used for keypoints detection. In the gaussian pyramid, Scales+3 blurs are made, from which Scales+2 DoGs are computed. The DoGs in the middle are used to detect keypoints in the scale-space. If Scales is 3, there will be 6 blurs and 5 DoGs in an octave, and 3 DoGs will be used for local extrema detection. Increasing Scales will make more blurred images in an octave, so SIFT can detect a few more strong keypoints. However, it will slow down the execution for few additional keypoints. PeakThresh (255 * 0.04/3.0 by default) is the grayscale threshold for keypoints refinement. To discard low-contrast keypoints, every pixel which grayscale value is below this threshold can not become a keypoint. Decreasing this threshold will lead to a larger number of keypoints, which can be useful for detecting features in low-contrast areas. EdgeThresh (0.06 by default) and EdgeThresh1 (0.08 by default) are the limit ratio of principal curvatures while testing if keypoints are located on an edge. Those points are not reliable for they are sensivite to noise. For such points, the principal curvature across the edge is much larger than the principal curvature along it. Finding these principal curvatures amounts to solving for the eigenvalues of the second-order Hessian matrix of the current DoG. The ratio of the eigenvalues :math:r is compared to a threshold :math:\dfrac{(r+1)^2}{r} < R with R defined by taking r=10, which gives :math:\frac{(r+1)^2}{r} = 12.1, and 1/12.1 = 0.08. In the first octave, the value 0.06 is taken instead of 0.08. Decreasing these values lead to a larger number of keypoints, but sensivite to noise because they are located on edges. OriSigma (1.5 by default) is related to the radius of gaussian weighting in orientation assignment. In this stage, for a given keypoint, we look in a region of radius :math:3 \times s \times \text{OriSigma} with :math:s the scale of the current keypoint. Increasing it will not lead to increase the number of keypoints found; it will take a larger area into account while computing the orientation assignment. Thus, the descriptor will be characteristic of a larger neighbourhood. MatchRatio (0.73 by default) is the threshold used for image alignment. Descriptors are compared with a :math:L^1-distance. For a given descriptor, if the ratio between the closest-neighbor the second-closest-neighbor is below this threshold, then a matching is added to the list. Increasing this value leads to a larger number of matchings, certainly less accurate. Region of Interest for image alignment ...................................... When processing the image matching, a region of interest (ROI) can be specified on the image. It is a binary image which can have any shape. For instance, if a sample is centered on the image, the user can select the center of the image before processing. .. figure:: img/sift_frame_ROI.png :align: center :alt: Sample with region of interest It both accelerates the processing and avoids to do match keypoints that are not on the sample. References .......... - David G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60, no 2, 2004, p. 91–110 - "http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf"