Image Locals Features Descriptors in Augmented Reality

          As computer vision experts, we have to handle almost daily with the information “hidden” on images in order to generate information “visible” and useful for our algorithms. In this entry I want to talk about the Image Matching process. The image matching is the technique used in Computer Vision to find enough patches or strong features in a couple – or more- images in order to be able to state that one of these images is contained on the other one, or that both images are the same image. For this purpose several approaches have been proposed in the literature, but we are going to focus on local features approaches.

     Local Feature representation of images are widely used for matching and recognition in the field of computer vision and, lately also used in  Augmented Reality applications by adding any augmented information on the real world. Robust feature descriptors such as SIFT, SURF, FAST, Harris-Affine or GLOH (to name some examples) have become a core component in those kind of applications. The main idea about it is first detect features and then compute a set of descriptors for these features. One important thing to keep in mind is that all these methods will be ported to mobile devices later, and they could drive to very heavy processes, not reaching real-time rates. Thus, several techniques are lately developed so that the features detection and descriptors extraction methods selected can be implemented in mobiles devices with a real-time performance. But this is another step I do not want to focus on in this entry.

Local Features

           But what is a local feature? A local feature is an image pattern which differs from its immediate neighborhood. This difference can be associated with a change of an image property, which the commonly considered are texture, intensity and color.

          In order to detect those keypoints many are the methods proposed since long time ago, but we focus only on efficient implementations of them, instead of in simple corners or edges detection, like Harris corner´s detector alone. The reason is simple, simple corners/edge detectors will give a lot of points which will not be invariant to the light, rotation or the scale, so they will not be useful, as they are, for our purpose. In this section we describe several feature detectors that have been implemented thinking on computational efficiency as one of the main objectives, although some of them do not accomplish the real-time requeriments . These approaches are: SIFT, SURF, FAST and STAR.

SIFT ( Scale-Invariant Feature Transform)

SURF ( Speeded-Up Robust Feature)

FAST ( Features from Accelerated Segment Test)

STAR(based on CenSurE. Center Surround Extremas)

Must be highly distinctive, i.e. low probability of a mismatch and also it should be easy to extract. A good local feature should be tolerant to things like:

    • - Rotation
    • - Changes in illumination
    • - Uniform scaling
    • - Minor changes in viewing direction


As we have seen these features are desirable to be both repeatable as well as invariant to various viewing conditions like lighting, viewpoint and object orientation. Depending on the task to be carried out various invariances become important than others.

Feature Descriptors

    Once the features keypoints have been detected on the scene, they must be described in a vector which contains information about this patch detected, which will differ of each others depending on the approach selected. Many different descriptors have been proposed in the literature in the recent years. However, it is unclear which descriptors are more appropriate and how their performance depends on the interest point detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the point detector.

SIFT ( Scale-Invariant Feature Transform)

SURF ( Speeded-Up Robust Feature)

FERNS (Tracking by Classification)

BRIEF (Binary Robust Independent Elementary features)

      In order to migrate all the work carried out in desktop to mobile devices, we are only interested in those approaches which provide robust feature descriptors in real-time object recognition and tracking applications. While SIFT or SURF are known to be strong, but computationally expensive feature descriptors, Ferns classification is fast, but requires large amounts of memory. This renders their original designs unsuitable for mobile phones, so they have to be improved if we wish to make them work in such devices. This improvement will depend on the final requirements of the developer or the hardware available in that moment.

      This is a very coarse approach about this deep and fascinating field, and this entry is only the top of the iceberg of Image matching, but after reading this entry I really hope you have got a better understanding about the features descriptors´world.


Do not forget to check out our AR Browser and Image Matching SDKs.

Tags: , ,
Posted on: 1 Comment

One Response

  1. Muhammad Muzzamil Luqman says:

    Thank you very much for such an interesting explaination. You really made it very simple to understand, short and consie for the newbies to feature descriptors world ;)

Leave a Reply