Code Optimization…arranging the code to improve the performance.

      One of the most important things we have to take care when we implement our algorithms is that they must perform in real-time on mobile devices. We have to keep in mind that this means that all the implemented code will run in devices with limited processors.

      At the first stage, our computer vision developers implement the algorithm needed in order to reach the features each of the products requires. Once this is done is time to integrate all this code in mobile devices, the code is not still optimised so it will be slow, not reaching the performance in speed that the QA team will accept as good. At this moment is when the Real-Time team enters to the scene, taking all the code previously implemented and optimising it, changing a lot of instructions and structures that the mobile processors will run faster.

      When we are working on mobile devices we also need to take care of multithreading, that is to split processes which can run in a parallel way. We can work with NEON or even with assembler ASM which can speed up some parts of our code. Furthermore, we usually recommend to work in C instead of in C++, due to the former is a language where we can optimize much better the code.

Example of optimization.


for( ; x < colsn; x++ ) 

 deriv_type t0 = (deriv_type)(trow0[x+cn] - trow0[x-cn]); 
 deriv_type t1 = (deriv_type)((trow1[x+cn] + trow1[x-cn])*3 + trow1[x]*10); 
drow[x*2] = t0; drow[x*2+1] = t1;

NEON (intrinsics version)

const int16x8_t vk3 = { 3, 3, 3, 3, 3, 3, 3, 3 }; 
const int16x8_t vk10 = { 10, 10, 10, 10, 10, 10, 10, 10 }; 
for( ; x < colsn; x+=8 ) 

int16x8x2_t loaded; 
int16x8_t t0a = vld1q_s16(&trow0[x + cn]); 
int16x8_t t0b = vld1q_s16(&trow0[x - cn]); 
loaded.val[0] = vsubq_s16(t0a, t0b); // t0 = (trow0[x + cn] - trow0[x - cn]) 
loaded.val[1] = vld1q_s16(&trow1[x + cn]); 
int16x8_t t1b = vld1q_s16(&trow1[x - cn]); 
int16x8_t t1c = vld1q_s16(&trow1[x]); 
loaded.val[1] = vaddq_s16(loaded.val[1], t1b); 
loaded.val[1] = vmulq_s16(loaded.val[1], vk3); 
loaded.val[1] = vmlaq_s16(loaded.val[1], t1c, vk10); 

Do not forget to check out our AR Browser and Image Matching SDKs.

Tags: , ,
Posted on: No Comments

Leave a Reply