By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 537 |
Page: 1|
3 min read
Published: Jan 29, 2019
Words: 537|Page: 1|3 min read
Published: Jan 29, 2019
Scale Invariant Feature Transform (SIFT) is an image descriptor for image-based matching and recognition that was developed by David Lowe. Like other descriptors, this descriptor is used for a large number of purposes in computer vision related topics that are related to point matching for object recognition. The SIFT descriptor is invariant to geometrical transformations like translation, rotation and scaling in the image domain, besides it is robust to moderate perspective transformations and variations in illumination degrees. It has been experimentally proven to be useful and effective in practice for object recognition and image matching under real-world conditions.
SIFT has comprised a method for detecting interest points from a grey-level image, where statistics of local gradient directions of image intensities were accumulated in order to give a summarizing description of the local image structure in a local neighborhood around each point of interest, in which the descriptor should be used to match the corresponding interest points between different images. Later, SIFT descriptor has been extended from grey-level to color images.
SIFT algorithm uses Difference of Gaussians (DoG), which is an approximation of Laplacian of Gaussian (LoG), which is a little costly. Difference of Gaussian is obtained as a difference of Gaussian blurring of an image with two different σ, which acts as a scaling parameter.
Once DoG is found, images are searched for local extrema over scale and space. For example, one pixel in an image is compared with its 8 neighbors as well as 9 pixels in the next scale and 9 pixels in the previous scales as well. In case it was a local extrema, it is a potential key point. This process is done over different octaves of the image in Gaussian Pyramid as shown in 2.12. An image pyramid is a series of images, each image being a result of down sampling (scaling down by a certain factor) from the previous element.
After that we go to the next step, which is the key point localization. Once the potential key points locations are found, they have to be refined in order to get more accurate results of the location of extrema, where there is a threshold value, and if the intensity at this extrema is less than this threshold value, then it is rejected. Now the orientation has to be taken into account, and for that, an orientation has to be assigned to each key point to achieve invariance to image rotation. A neighborhood will be taken around the key point location depending on the scale, and the gradient magnitude and direction is calculated in this particular region. To find the dominant orientation, peaks are detected in this orientation histogram. In case there is more than one dominant orientation around the interest point, then multiple peaks are accepted if the height of secondary peaks is above 80% of the height of the highest peak, and in this case each peak is used for computing a new image descriptor for the corresponding orientation estimate.
Now that the key point descriptor is created, the neighborhood around the key point is taken. It is divided into sub-blocks, and for each sub-block, an orientation histogram is created. Then the key points between two images are matched by identifying their nearest neighbors.
Browse our vast selection of original essay samples, each expertly formatted and styled