Posts Tagged ‘Computer Vision’

How to solve the Image Distortion Problem

Posted on: No Comments

      Theoretically, it is possible to define lens which will not introduce distortions. In practice, however, no lens is perfect. This is mainly due to manufacturing factors; it is much easier to make a “spherical” lens than to make a more mathematically ideal “parabolic” lens. It is also difficult to mechanically align the lens and imager exactly. Here we describe the two main lens distortions and how to model them. Radial distortions arise as a result of the shape of lens, whereas tangential distortions arise from the assembly process of the camera as a whole.

      We start with the radial distortion. The lenses of real cameras often noticeably distort the location of pixels near the edges of the imager. This bulging phenomenon is the source of the “barrel” or “fish-eye” effect. Figure1 gives some intuition as to why radial distortion occurs. With some lenses, rays farther from the center of the lens are bent more than those closer in. A typical inexpensive lens is, in effect, stronger than it ought to be as you get farther from the center. Barrel distortion is particularly noticeable in cheap web cameras but less apparent in high-end cameras, where a lot of effort is put into fancy lens systems that minimize radial distortion.

Figure 1.

        For radial distortions, the distortion is 0 at the (optical) center of the imager and increases as we move toward the periphery. In practice, this distortion is small and can be characterized by the first few terms of a Taylor series expansion around r = 0. For cheaper web cameras, we generally use the first two such terms; the first of which is conventionally called k1 and the second k2. For highly distorted cameras such as fish-eye lenses we can use a third radial distortion term k3.

In general, the radial location of a point on the image will be rescaled according to the following equations:

Here, (x, y) is the original location (on the imager) of the distorted point and (xcorrected, ycorrected) is the new location as a result of the correction.

The second largest common distortion is known as tangential distortion. This distortion is due to manufacturing defects resulting from the lens not being exactly parallel to the imaging plane.

Tangential distortion is minimally characterized by two additional parameters, p1 and p2, such that:

Thus, in total five (or six in some cases) distortion coefficients are going to be required.

Distortion example

      To ilustrate these theoretical points, I am going to show a couple of images taken by the same camera. The first of these images will show the picture with the distortion effect, whereas the second one will show the result of applying “undistortion” functions.

Figure 2. The image on the left shows a distorted image, while the right image shows an image in which the distortion has been corrected through the methods explained.


Image Locals Features Descriptors in Augmented Reality

Posted on: 1 Comment

          As computer vision experts, we have to handle almost daily with the information “hidden” on images in order to generate information “visible” and useful for our algorithms. In this entry I want to talk about the Image Matching process. The image matching is the technique used in Computer Vision to find enough patches or strong features in a couple – or more- images in order to be able to state that one of these images is contained on the other one, or that both images are the same image. For this purpose several approaches have been proposed in the literature, but we are going to focus on local features approaches.

     Local Feature representation of images are widely used for matching and recognition in the field of computer vision and, lately also used in  Augmented Reality applications by adding any augmented information on the real world. Robust feature descriptors such as SIFT, SURF, FAST, Harris-Affine or GLOH (to name some examples) have become a core component in those kind of applications. The main idea about it is first detect features and then compute a set of descriptors for these features. One important thing to keep in mind is that all these methods will be ported to mobile devices later, and they could drive to very heavy processes, not reaching real-time rates. Thus, several techniques are lately developed so that the features detection and descriptors extraction methods selected can be implemented in mobiles devices with a real-time performance. But this is another step I do not want to focus on in this entry.

Local Features

           But what is a local feature? A local feature is an image pattern which differs from its immediate neighborhood. This difference can be associated with a change of an image property, which the commonly considered are texture, intensity and color.


Computer Vision for Augmented Reality

Posted on: 1 Comment

Augmented Reality (AR) is a exponentially growing area in the so-called Virtual Environment (VE) field. While VE, or Virtual Reality (VR), provides a complete immersive experience into a fully synthetic scenario, where the user cannot see and interact with the real surroundings, AR allows to see the real world, placing virtual objects or superimposing virtual information. In other words, AR does not substitute reality, but integrates and supplements it. It takes the real objects as a foundation to add contextual information helping the user to deep his understanding about the subject. The potential application domains of this technology are vast, including medical, education and training, manufacturing and repair, annotation and visualization, path planning, gaming and entertainment, military. The related market is exploding: according recent studies, the installed base of AR-capable mobile device has grown from 8 million in 2009 to more than 100 million in 2010, producing a global revenue that is estimated to reach $1.5 billion by 2015.

Figure 1. Different AR scenarios

But, besides the final AR tangible (or visible) results, and market predictions, one can ask: what is the enabling technology that effectively allows augmenting our reality? In other words, beyond the virtual information rendered, how can the device, or the application, be aware of the world and select the appropriate content to present to the user?

From a generic point of view, this task is far from being trivial: in fact, while for a human being the surrounding understanding is somehow unconsciously and easily reached in almost all the scenarios in fractions of second, for a computer-based machine the things are way more complicated. What is done in the practice is to constrain the scenario and sense the world status through multi-modal sensors (i.e., images, videos, sounds, inertial sensors): the discrete information flow is then fused, merged, and processed by so-called Artificial Intelligence (AI) algorithms that try to give a plausible explanation to the provided data. 

Very close to AI, often intersecting and relying on it, and strongly related to AR application development is the Computer Vision. In fact, since the main cue for the actual AR systems is the artificial vision, this field has gained increasing importance in the AR context. As AI aims at the surroundings understanding relying on generic low-level sensor data, Computer Vision is concerned with duplicating, or emulating the capabilities of the Human Vision System (HVS) by reconstructing, and interpreting, a 3D scene only from its 2D projections, or images. Although it may seem simple, any task related to Computer Vision can become arbitrarily complex: this is due to the intrinsic nature of the problem, so-called ill-posed inverse. In other words, a 3D scene understanding has to be reached from its 2D projections, in fact losing one spatial dimension. Furthermore, given the AR interactivity constraints, the tasks have to be performed in real-time, or near real-time.

Figure 2. Typical Computer VIsion processing pipeline