Computer Vision for Augmented Reality

Augmented Reality (AR) is a exponentially growing area in the so-called Virtual Environment (VE) field. While VE, or Virtual Reality (VR), provides a complete immersive experience into a fully synthetic scenario, where the user cannot see and interact with the real surroundings, AR allows to see the real world, placing virtual objects or superimposing virtual information. In other words, AR does not substitute reality, but integrates and supplements it. It takes the real objects as a foundation to add contextual information helping the user to deep his understanding about the subject. The potential application domains of this technology are vast, including medical, education and training, manufacturing and repair, annotation and visualization, path planning, gaming and entertainment, military. The related market is exploding: according recent studies, the installed base of AR-capable mobile device has grown from 8 million in 2009 to more than 100 million in 2010, producing a global revenue that is estimated to reach $1.5 billion by 2015.

Figure 1. Different AR scenarios

But, besides the final AR tangible (or visible) results, and market predictions, one can ask: what is the enabling technology that effectively allows augmenting our reality? In other words, beyond the virtual information rendered, how can the device, or the application, be aware of the world and select the appropriate content to present to the user?

From a generic point of view, this task is far from being trivial: in fact, while for a human being the surrounding understanding is somehow unconsciously and easily reached in almost all the scenarios in fractions of second, for a computer-based machine the things are way more complicated. What is done in the practice is to constrain the scenario and sense the world status through multi-modal sensors (i.e., images, videos, sounds, inertial sensors): the discrete information flow is then fused, merged, and processed by so-called Artificial Intelligence (AI) algorithms that try to give a plausible explanation to the provided data. 

Very close to AI, often intersecting and relying on it, and strongly related to AR application development is the Computer Vision. In fact, since the main cue for the actual AR systems is the artificial vision, this field has gained increasing importance in the AR context. As AI aims at the surroundings understanding relying on generic low-level sensor data, Computer Vision is concerned with duplicating, or emulating the capabilities of the Human Vision System (HVS) by reconstructing, and interpreting, a 3D scene only from its 2D projections, or images. Although it may seem simple, any task related to Computer Vision can become arbitrarily complex: this is due to the intrinsic nature of the problem, so-called ill-posed inverse. In other words, a 3D scene understanding has to be reached from its 2D projections, in fact losing one spatial dimension. Furthermore, given the AR interactivity constraints, the tasks have to be performed in real-time, or near real-time.

Figure 2. Typical Computer VIsion processing pipeline

From an algorithmic point of view, the development of Computer Vision solutions can be split in three layers. At lowest level there is the image acquisition and the processing for basic feature extraction, like corners, edges, contours, motion estimation. On top of this there is an intermediate-level vision processing layer: here, the object/feature recognition and tracking can be carried out, including 3D scene modelling, and reconstruction. Finally, at the top level of the processing pyramid there is the so-called high-level vision. Here, the interpretation of the evolving information provided by the intermediate processing layers can be carried out. In a broad sense, this interpretation includes the surroundings understanding and involve the conceptual description of the scene. The acquisition of this high level information can be then used as a feedback for the intermediate and low-level tasks.

Although a lot of significant problems has been successfully faced, many challenges still remains that prevent the implementation of mature AR applications. Research is approaching the limit of what can be done relying on so-called low-level approaches (“blind” processing of low level visual features). However, the application of high-level surroundings understanding techniques seems to be the required successive step to allow the computer intelligence to go further. In light of this, one of the most promising approach to bridge the gap between the current state of the art techniques and a natural interaction with the surroundings and a pleasant interaction with it is represented by the massive application of machine learning strategies. In fact, as demonstrated in limited contexts (like object recognition and feature matching) impressive results can be obtained, and are likely to be obtained in always more scenarios, allowing always more advanced AR applications to be developed.

Do not forget to check out our AR Browser and Image Matching SDKs.

Tags: , , , ,
Posted on: 1 Comment

One Response

  1. Yasen Zhang says:

    I am a researcher and being prepared to diving into AR. Just for deep study in this website

Leave a Reply