UME::SIMD Tutorial 3: Getting back to scalar world (LOAD/STORE)

We already discussed two steps of computations using UME::SIMD: initializing vectors and performing calculations. But what can we do with vectors once we finished our calculations? Unfortunatelly, most of the standard functions and existing libraries don’t use the UME::SIMD types. What is necessary is to somehow get back to scalar world.

One method of doing so, presented in previous tutorial was called horizontal reduction. Since horizontal reductions return a scalar type, it is possible to use returned scalars as inputs to any already defined function.

 
#include <umesimd/UMESimd.h>

float foo(float x) {
... // Do something with x
}

...

int main() {
    UME::SIMD::SIMDVec<float, 4> a, b, c;
    ...
    a=b*c;
    float t0=a.hmul(); // Horizontal multiplication
    foo(t0);
    return 0;
}

Whil extremaly useful in certain situations, horizontal reductions are not always a proper solution. Very often we want to perform only vertical operations, and store the results into some memory array. Here is how to do it:

#include <umesimd/UMESimd.h>

float foo(float *x) {
... // Do something with an array of x
}

...

int main() {
    UME::SIMD::SIMDVec<float, 4> a, b, c;
    float temp[4];
    ...
    a=b*c;

    // STORE the values from vector 
    // in consecutive fields of 'temp'
    a.store(temp);

    // pass 'temp' to function
    foo(temp);
    return 0;
}

In the most often scenario, the workflow with SIMD types will be as follows:

  1. LOAD values from memory locations to UME::SIMD vectors
  2. perform calculations using UME::SIMD vectors
  3. STORE values from UME::SIMD vectors to memory locations.

Remember that storing values from a vector doesn’t destroy the values within the vector, so you can still use the vector for some other calculations.

PERFORMANCE HINT: once values are loaded to a vector, perform as many calculations as possible before storing the results back. LOAD/STORE operations are equivalent to moving data between memory and registers, and can introduce significant latency into your computational kernels.

In C++ the load and store operations are hidden from the programmer under the array indexing operator[]. Most of modern compilers make some additional deductions hiding the load/store (MOV in x86) operations when it is possible to. Since the compiler doesn’t have any knowledge about UME::SIMD types, the burden of deciding when these operations should happen rests on you, the User.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s