Image Tracking in Augmented Reality. The translation between the 2D and the 3D world.

      Image tracking is well-known in the Computer Vision and the Augmented Reality world mainly due to the user´s interaction it provides when a model 3D is rendered over the image tracked. From ARLab we believe this product is a gap between how the augmented reality was seen in last years and how is seen nowadays. It really makes the user feel the new experience that Augmented Reality is able to offer by mixing real and virtual world in one single space and moment.

      But the steps between recognizing an image and placing over it a 3D model are not easy, and a lot of possible issues have to been taken into consideration. As any normal mobile application will run in a daily environment, some aspects like bright changes, tilt angles or  sudden rotations must be filtered and managed in order to achieve a good and robust performance in the tracking. This is only the beginning, and once the image tracking is performed in the 2D world it is time to dig into the 3-Dimensional space.

Regarding the 2D tracking of the object – or the image – the algorithm has to be able to track the image across the scene while the user is making movements with the mobile device or even while the target is moving. As Augmented Reality is known as a technology which allows to the user the interaction with the augmented world in real time, this performance must be a low consuming time process in order to reach, after the whole process, a frame rate of at least 15 fps in the worst case. In this step a crucial and a determinant step for further stages with the 3D Tracking is to establish a relationship between what we see in the 2D world – this is the device camera- and the 3D world – the world where we want to augment this information-. If this relationship becomes wrong in any moment, the layered 3D world will be wrongly placed, and the user´s experience may differ from the expected one.

      Knowing these variables which relate the two worlds, like the location of the target on the screen, the target location on the 3D World or the intrinsic parameters matrix of the camera used, we are able to establish a relation between them through mathematic theory, using geometrical transformations. The geometric transformations can be mainly defined by the following, as they become more complex:

  • - Translation.
  • - Euclidean. (Traslation + Rotation)
  • - Similarity. (Traslation + Rotation + Scale)
  • - Affine Transformations.
  • - Projective Transformations.

Do not forget to check out our AR Browser and Image Matching SDKs.

Tags: ,
Posted on: No Comments

Leave a Reply