Augmented Reality (AR) is a exponentially growing area in the so-called Virtual Environment (VE) field. While VE, or Virtual Reality (VR), provides a complete immersive experience into a fully synthetic scenario, where the user cannot see and interact with the real surroundings, AR allows to see the real world, placing virtual objects or superimposing virtual information. In other words, AR does not substitute reality, but integrates and supplements it. It takes the real objects as a foundation to add contextual information helping the user to deep his understanding about the subject. The potential application domains of this technology are vast, including medical, education and training, manufacturing and repair, annotation and visualization, path planning, gaming and entertainment, military. The related market is exploding: according recent studies, the installed base of AR-capable mobile device has grown from 8 million in 2009 to more than 100 million in 2010, producing a global revenue that is estimated to reach $1.5 billion by 2015.
Figure 1. Different AR scenarios
But, besides the final AR tangible (or visible) results, and market predictions, one can ask: what is the enabling technology that effectively allows augmenting our reality? In other words, beyond the virtual information rendered, how can the device, or the application, be aware of the world and select the appropriate content to present to the user?
From a generic point of view, this task is far from being trivial: in fact, while for a human being the surrounding understanding is somehow unconsciously and easily reached in almost all the scenarios in fractions of second, for a computer-based machine the things are way more complicated. What is done in the practice is to constrain the scenario and sense the world status through multi-modal sensors (i.e., images, videos, sounds, inertial sensors): the discrete information flow is then fused, merged, and processed by so-called Artificial Intelligence (AI) algorithms that try to give a plausible explanation to the provided data.
Very close to AI, often intersecting and relying on it, and strongly related to AR application development is the Computer Vision. In fact, since the main cue for the actual AR systems is the artificial vision, this field has gained increasing importance in the AR context. As AI aims at the surroundings understanding relying on generic low-level sensor data, Computer Vision is concerned with duplicating, or emulating the capabilities of the Human Vision System (HVS) by reconstructing, and interpreting, a 3D scene only from its 2D projections, or images. Although it may seem simple, any task related to Computer Vision can become arbitrarily complex: this is due to the intrinsic nature of the problem, so-called ill-posed inverse. In other words, a 3D scene understanding has to be reached from its 2D projections, in fact losing one spatial dimension. Furthermore, given the AR interactivity constraints, the tasks have to be performed in real-time, or near real-time.
Figure 2. Typical Computer VIsion processing pipeline