All posts by Przemysław Karpiński

UME::SIMD Tutorial 3: Getting back to scalar world (LOAD/STORE)

We already discussed two steps of computations using UME::SIMD: initializing vectors and performing calculations. But what can we do with vectors once we finished our calculations? Unfortunatelly, most of the standard functions and existing libraries don’t use the UME::SIMD types. What is necessary is to somehow get back to scalar world.

One method of doing so, presented in previous tutorial was called horizontal reduction. Since horizontal reductions return a scalar type, it is possible to use returned scalars as inputs to any already defined function.

#include <umesimd/UMESimd.h>

float foo(float x) {
... // Do something with x


int main() {
    UME::SIMD::SIMDVec<float, 4> a, b, c;
    float t0=a.hmul(); // Horizontal multiplication
    return 0;

Whil extremaly useful in certain situations, horizontal reductions are not always a proper solution. Very often we want to perform only vertical operations, and store the results into some memory array. Here is how to do it:

#include <umesimd/UMESimd.h>

float foo(float *x) {
... // Do something with an array of x


int main() {
    UME::SIMD::SIMDVec<float, 4> a, b, c;
    float temp[4];

    // STORE the values from vector 
    // in consecutive fields of 'temp';

    // pass 'temp' to function
    return 0;

In the most often scenario, the workflow with SIMD types will be as follows:

  1. LOAD values from memory locations to UME::SIMD vectors
  2. perform calculations using UME::SIMD vectors
  3. STORE values from UME::SIMD vectors to memory locations.

Remember that storing values from a vector doesn’t destroy the values within the vector, so you can still use the vector for some other calculations.

PERFORMANCE HINT: once values are loaded to a vector, perform as many calculations as possible before storing the results back. LOAD/STORE operations are equivalent to moving data between memory and registers, and can introduce significant latency into your computational kernels.

In C++ the load and store operations are hidden from the programmer under the array indexing operator[]. Most of modern compilers make some additional deductions hiding the load/store (MOV in x86) operations when it is possible to. Since the compiler doesn’t have any knowledge about UME::SIMD types, the burden of deciding when these operations should happen rests on you, the User.

UME::SIMD Tutorial 2: Calculations

Horizontal vs. Vertical vectorization

Once the SIMD vectors are initialized with data, they will be operated upon. When operating on vectors, we need to distinguish two directions of operations: horizontal and vertical. The easiest way to understand the difference is by looking at a following diagram:


In the diagram we can distinguish two directions relating to either data orientation or instruction orientation. Vertical vectorization will be a process of operating in an elementwise manner on elements of each vector. For the first ADD operation, the result will be, as if we executed following list of operations:


Mind that vertical operations can also operate on a single vector, the same way as SQRT operation:


A horizontal operation, or a reduction , is one involving applying an operator between elements of a vector. From the example above this would be HADD operation, which would have the same meaning as:


Another way of remembering the distinction between horizontal and vertical operations is as follows: vertical operation always returns a vector, while horizontal always returns a scalar. By convention names of hoirzontal operations are prefixed with H- letter, such as in HADD.

Invoking operations in UME::SIMD

When operating on vectors simply use overloaded operators, as if you were working with scalars:

UME::SIMD::SIMDVec<float, 4> a, b, c, d, e, f;
c = a + b;
e = c * d;

For additional operations, the ones that don’t have an equivalent C++ operator, you can either use a special namespace UME::SIMD::FUNCTIONS or a Member Function Interface (MFI) invocation convention:

f = UME::SIMD::FUNCTIONS::sqrt(e); // or
f = e.sqrt(); // MFI call

MFI invocation convention is especially useful when using UME::SIMD as a code generation target, as it allows invoking every operation supported by the library in the same way. For regular development I advise using operators and functions as this will result in better code readability.

Invocation of a horizontal operation cannot be done using operator syntax, as the concept of reduction operation is not present in current C++ language. You have to therefor use either FUNCTIONS namespace or MFI:

float x = UME::SIMD::FUNCTIONS::hadd(f); // or
float x = e.hadd(); // MFI call

Which syntax convention to use is up to you.

UME::SIMD Tutorial 1: Vector declaration and basic initialization

The programming model used in UME::SIMD is very simple. Instead of using scalar variables, use vector variables. A simple vector declaration can look like:

UME::SIMD::SIMDVec<float, 8> x_vec;

In the above declaration two template parameters have to be passed: number of elements packed in the vector (8) and the fundamental type used to represent each element (float). The fundamental types supported are:

  • unsigned integer: uint8_t (8b), uint16_t (16b), uint32_t (32b) and uint64_t (64b);
  • signed integer: int8_t (8b), int16_t (16b), int32_t (32b) and int64_t (64b);
  • floating point: float(32b) and double (64b).

For the vector length two rules apply:

  • vector length is power of 2, starting with ‘1’,
  • maximum size of a vector is not higher than 1024b.

But what is the reason for these rules? Both limitations are comming from hardware constraints. On hardware level it only makes sense to have registers of length being power of 2, as having the arbitrary vector sizes would require additional die surface to be used. At the same time, the hardware limit is being put on number of bits, rather than on number of elements: vector of 32 64-bit elements would occupy 2048 bit registers, while  a vector of 32 8-bit elements would only use 256 bit registers. Unfortunatelly this means that in the current model, we can operate on up to 128-element uint8_t vectors, but only on 16-element uint64_t vectors.

Once a vector is declared it can be used in a similar way as any fundamental type… with some exceptions. First problem is: how to put actual data in this vector type?

Initialization from scalars

All vector elements can be initialized with the same value already contained in  a scalar variable or constant. To initialize the vector with scalar (or a constant) it is possible to simply write something like:

UME::SIMD::SIMDVec<float, 8> x_vec1(3.14f);
UME::SIMD::SIMDVec<float, 8> x_vec2=3.14f;

float y=2.71f;

UME::SIMD::SIMDVec<float, 8> y_vec1(y);
UME::SIMD::SIMDVec<float, 8> y_vec2=y;

If the initialization has to take place after the vector declaration, it is enough to use the assignment operator (‘=’):

UME::SIMD::SIMDVec<float, 8> x_vec1, x_vec2;
float x=2.71f;

The broadcast initialization is not always feasible as we might want to have different initial values in every vector cell. You can initialize a vector using multiple scalar variables in a following way:

float x1=1.0f, x2=2.0f, x3=3.0f, x4=4.0f;
UME::SIMD::SIMDVec<float, 4> x_vec(x1, x2, x3, x4);

Initialization from memory

In many situations, we don’t want to simply propagate a single value to a vector register. If the initialization data is stored in an array of memory, it is possible to load the variables into the vector register:

float x[4]={1.0f, 2.0f, 3.0f, 4.0f}
// Load-initialize with 'x':
UME::SIMD::SIMDVec<float, 4> x_vec1(x);
// Load from 'x' after vector declaration
UME::SIMD::SIMDVec<float, 4> x_vec2;

Loading is important as it allows use of memory pointers instead of scalar variables. This is an important initialization mode especially for long vectors (imagine initializing SIMDVec using scalars…).

In next tutorial we will look into some basic computations that can be performed using vector primitives.

UME::SIMD Tutorial 0: Installation

The library is provided in a header-only form, which makes its’ installation trivial. We will present installation procedures for both Linux and Windows operating systems. Mind that as there is no perfectly portable build system, we only limit ourselves to most common tools used.

This installation procedure shows only steps required for UME::SIMD installation as a standalone library, but we make some small adjustments so that switching to full UME framework would be possible in the future.

Linux installation procedure

The library in its’ primitive form doesn’t require any build system. The only configuration that has to be made is by passing proper compilation flags to the compiler. The specific flags used will depend on both: hardware, compiler and operating system. We will discuss specific configurations in future tutorials in this series.

This installation procedure should work for most of the Linux based systems and requires a GIT client (

    1. Navigate to the directory where you want to store downloaded the current version of the library and create a directory for the framework:
$ mkdir ume
$ cd ume 
    1. Clone the repository:
$ git clone 


    1. You might want to checkout a tagged version to keep track of library updates:


$ cd umesimd
$ git checkout tags/v0.8.1
$ cd ..
    1. export the library directory
$ pwd
    1. Include the library code into your C++ project:

Now you should be able to include the library into your program using following directive:

 #include <umesimd/UMESimd.h>

That’s it!

Windows installation procedure

Windows installation procedure is similar, but requires additional steps in configuring the specific IDE. We will discuss this configuration for MS Visual Studio 2015 only, but similar rules apply for others.

    1. Navigate to the directory where you want to store downloaded the current version of the library and create a directory for the framework:
c:\> mkdir ume
c:\> cd ume 
    1. Clone the repository:
c:\ume> git clone 


    1. You might want to checkout a tagged version to keep track of library updates:


c:\ume> cd umesimd
c:\ume\umesimd> git checkout tags/v0.8.1
c:\ume> cd ..
    1. Create a new project (or open an existing one):

[Ctrl+Shift+N] or File->New->Project…


    1. Add new source file to the project:

[Ctrl+Shift+A] or Project->Add New Item…


    1. Fill the main.cpp with:
#include <umesimd/UMESimd.h>

int main()
    return 0;
    1. Open project properties and configure path to the ‘ume’ directory created before

[Alt+Enter] or Project->Properties


There you go!

Vectorization #1: the difficulty of notation, part 2

In the previous post we showed a simple example of a vectorization process. One thing we did there was to assume certain behaviour of \cdot operation. Before getting to a more complicated vectorization example, we have to discuss our previous omission. It will be critical to understanding more complex example.

We will also discuss problem of operation symbols ambiguity and why we actually have to use multiple operations with different symbols.

Operations definition (aka. operator overloading)

Before we try applying second round of vectorization to our previous equations, we have to discuss one more thing: definition of operations. (Remember my comment about \circ vs. \cdot symbols mentioned in previous post?) Here’s the explanation.

If we have an equation of following form:

a = b_1 \cdot c_1 + b_2 \cdot c_2 + ... + b_N \cdot c_N

and we want to represent it in a shorter, vector form, we could write:

\textbf{b} = (b_1, b_2, ..., b_N)
\textbf{c} = (c_1, c_2, ... , c_N)

a = \textbf{b} * \textbf{c}

We know this simplification as vector dot product. In mathematics it is usually denoted with ‘\cdot ‘ symbol.

On the other hand, we might face a set of equations, such as:

a_1 = b_1 \cdot c_1
a_2 = b_2 \cdot c_2
a_N = b_N \cdot c_N

In which case our best possible choice could be to re-write it as:

\textbf{a} = (a_1, a_2, ..., a_N)
\textbf{b} = (b_1, b_2, ..., b_N)
\textbf{c} = (c_1, c_2, ..., c_N)

\textbf{a} = \textbf{b} * \textbf{c}

This element-wise multiplication is known as Hadamard product, and in commonly accepted mathematical notation, should be written as:

\textbf{a} = \textbf{b} \circ \textbf{c}

On one hand, we want to have simpler notations, which boils down to smaller number of different symbols, and variable types. On the other hand, we would like the notation to be as readable as possible, so we increase the number of types and symbols. In programming languages we are free to define as many ‘functions’ as possible to describe behavior of operations we will be performing. But having too many functions, requires programmers to memorize many of them to navigate the code efficiently.

What is then a practical limitation that prohibits us from re-using operation symbols? If we used the same symbol for both operations, the only way we could tell one operation from another, would be to look at the type of result. In the first case this is a scalar, in the second: a vector. This would be possible, but when dealing with more complex equations can really make our notation more complicated.

First lets define ‘+’ operation in a following way:

\textbf{a} + \textbf{b} = (a_1 + b_1, a_2 + b_2, ..., a_N + b_N)
a + \textbf{b} = a + b_1 + b_2 + ... + b_N
\textbf{a} + b = a_1 + a_2 + ... + a_N + b

Example of an expression to be evaluated:

\textbf{a} = \textbf{b} * \textbf{c} + \textbf{d} * \textbf{e}

We have now to work it back, starting from the type of a result:

  1. \textbf{a} is a vector, so according to regular precedence, the result of ‘+’ operation has to be a vector.
  2. The result of \textbf{b} * \textbf{c} + \textbf{d} *\textbf{e} is a vector only if both \textbf{b} * \textbf{c} and \textbf{d} * \textbf{e} are vectors.
  3. Both ‘* ‘ operations result in a vector only when the requested operation is a Hadamard product.

So in the example above the meaning of * would be of mathematical \circ .

What about following expression in which the only difference is that a is a scalar:

a = \textbf{b} * \textbf{c} + \textbf{d} * \textbf{e}

Again working it backwards:

  1. a is a scalar in multiple situations:
    1. \textbf{b} * \textbf{c} result is scalar and \textbf{d} * \textbf{e} is a vector
    2. \textbf{b} * \textbf{c} result is vector and \textbf{d} * \textbf{e} is a scalar
    3. Both results of: \textbf{b} * \textbf{c} and \textbf{d} * \textbf{e} are scalars
  2. Since both \textbf{b} * \textbf{c} and  \textbf{d} * \textbf{e} cannot be a vector at the same time, the result of \textbf{b} * \textbf{c} is a vector only when result of \textbf{d} * \textbf{e} is a scalar.
  3. Since both \textbf{b} * \textbf{c} and  \textbf{d} * \textbf{e} cannot be a vector at the same time, the result of \textbf{d} * \textbf{e} is a vector only when result of \textbf{b} * \textbf{c} is a scalar.

At this moment our deduction reveals a phenomenon known in programming as symbol ambiguity. This phenomenon is a nightmare of both programming language and compiler designers. Between points 2 and 3 there exists an ambiguity saying: we cannot deduce anything about the type of result of one operand, without knowing the result of the other operand. Without any additional rule saying anything about how such conflict should be resolved, we cannot simply deduce anything about ‘\cdot ‘ operations! We could modify our rules for + saying for example: ‘if a + operator is supposed to result in a scalar, assume both of it’s operands to result in scalar’. If such a rule held, then we could immediately say that the meaning of \cdot would be that of \circ . The definition of:

\textbf{a} + \textbf{b} = (a_1 + b_1, a_2 + b_2, ..., a_N + b_N)

would immediately become:

\textbf{a} + \textbf{b}  -> vector: \textbf{a} + \textbf{b} = (a_1 + b_1, a_2 + b_2, ..., a_N + b_N)
\textbf{a} + \textbf{b}  -> scalar: \textbf{a} + \textbf{b} = a_1 + a_2 + ... + a_N + b_1 + b_2 + a_N

Multiplying such rules is not very welcome, as it can lead to an ambiguity avalanche and, as a result, explosion in the description of our notation. When an ambiguity happens, the simplest solution is then to use a separate symbol, having its’ own definition.

The continuation in: Vectorization #1: the difficulty of notation, part 3

Vectorization #1: the difficulty of notation, part 1

In this series of posts, I will try looking at different aspects of vectorization. We will start today with a scalar/vector/matrix notations, and try explaining what are the benefits and complications related to using such notations. I won’t be talking much about the performance or programming in the strict sense, but I will rather try explaining what is the problem in finding a fair notation. The mathematics presented here are simplified so that non-mathematicians could also follow the thought process without (too) much effort. Although the knowledge about real vector/matrix algebra might be of some use.

Vectorizing scalar equations

The vector term originates from mathematics. Say you have a group of equations, like:

a_1 = b_1 + c_1 \cdot d_1
a_2 = b_2 + c_2 \cdot d_2
a_N = b_N + c_N \cdot d_N

All of the variables: a_1, b_1, c_1, d_1, a_2, b_2, c_2, d_2 ..., a_N, b_N, c_N, d_N are used in the equation of the same exact form:

\alpha = \beta + \gamma \cdot \delta

In order to simplify the first notation, you can first group the variables at the same position in equations in a following way, forming vectors:

\textbf{a} = (a_1, a_2, ..., a_N)
\textbf{b} = (b_1, b_2, ..., b_N)
\textbf{c} = (c_1, c_2, ..., c_N)
\textbf{d} = (d_1, d_2, ..., d_N)

Now you can replace the original set of equations with a vector form (a cautious mathematician can now point that ‘\cdot ‘ is not a correct vector operation symbol here, and that we should use \circ instead. This is a part of the thought experiment. Please bear with me in this insanity for a while.):

\textbf{a} = \textbf{b} + \textbf{c} \cdot \textbf{d}

An entity representing a singular value, such as a_1 is called a scalar. A list (or more precisely ‘an ordered-set’) of elements, such as \textbf{a} is called a vector.

By vectorizing our equations, we replaced an unknown number of equations (N )  of known number of variables (a_1, b_1, c_1, d_1), with a known number of equations (1), and unknown number of packed variables(a_1, ..., a_N, ... , b_N, ..., c_N, ..., d_N ).

The obvious gain we will get from having vector notation is the simplification of mathematical proofs. Instead of repeating each time that a certain property applies to equations for a_1, a_2, ..., a_N we can simply state that the property applies to all elements of \textbf{a}. We still have more complicated definition of the types we manipulate, but we usually have to deal with that only in the beginning and possibly at the end of a proof.

Now imagine following set of equations:

a_{11} = 3 \cdot b_{11} + 4 \cdot c_{11} + 5 \cdot d_{11}
a_{12} = 3 \cdot b_{12} + 4 \cdot c_{12} + 5 \cdot d_{12}
a_{1M} = 3 \cdot b_{1M} + 4 \cdot b_{1M} + 5 \cdot d_{1M}
a_{21} = 1 \cdot b_{21} + 5 \cdot c_{21} + 4 \cdot d_{21}
a_{22} = 1 \cdot b_{22} + 5 \cdot c_{22} + 4 \cdot d_{22}
a_{2M} = 1 \cdot b_{2M} + 5 \cdot c_{2M} + 4 \cdot d_{2M}
a_{N1} = 7 \cdot b_{N1} + 3 \cdot c_{N1} + 9 \cdot d_{N2}
a_{N2} = 7 \cdot b_{N2} + 3 \cdot c_{N2} + 9 \cdot d_{N2}
a_{NM} = 7 \cdot b_{NM} + 3 \cdot c_{NM} + 9 \cdot d_{NM}

If we apply our vector notation then we have:

\textbf{a}_1 = (a_{11}, a_{12}, ..., a_{1M})
\textbf{a}_2 = (a_{21}, a_{22}, ..., a_{2M})

\textbf{a}_N = (a_{N1}, a_{N2}, ..., a_{NM})

\textbf{b}_1 = (b_{11}, b_{12}, ..., b_{1M})
\textbf{b}_2 = (b_{21}, b_{22}, ..., b_{2M})

\textbf{b}_N = (b_{N1}, b_{N2}, ..., b_{NM})

\textbf{c}_1 = (c_{11}, c_{12}, ..., c_{1M})
\textbf{c}_2 = (c_{21}, c_{22}, ..., c_{2M})

\textbf{c}_N = (c_{N1}, c_{N2}, ..., c_{NM})

\textbf{d}_1 = (d_{11}, d_{12}, ..., d_{1M})
\textbf{d}_2 = (d_{21}, d_{22}, ..., d_{2M})

\textbf{d}_N = (d_{N1}, d_{N2}, ..., d_{NM})

And the re-written equations:

\textbf{a}_1 = 3 \cdot \textbf{b}_1 + 4 \cdot \textbf{c}_1 + 5 \cdot \textbf{d}_1
\textbf{a}_2 = 1 \cdot \textbf{b}_2 + 5 \cdot \textbf{c}_2 + 4 \cdot \textbf{d}_2

\textbf{a}_N = 7 \cdot \textbf{b}_N + 3 \cdot \textbf{c}_N + 9 \cdot \textbf{d}_N

So we replaced a long set of equations, with a shorter set of equations. But again if we start performing proofs on such equations, we would end up repeating everything as in the first example.

We can also think about another way of representing the same set of equations. Instead of grouping all a ‘s and all b ‘s together, we could use following grouping:

\textbf{p}_1 = (3, 4, 5)
\textbf{p}_2 = (1, 5, 4)
\textbf{p}_N = (7, 3, 9)

\textbf{q}_{11} = (b_{11}, c_{11}, d_{11})
\textbf{q}_{12} = (b_{12}, c_{12}, d_{12})
\textbf{q}_{NM} = (b_{NM}, c_{NM}, d_{NM})

Our second vector form becomes then:

a_{11} = \textbf{p}_1 \cdot \textbf{q}_{11}
a_{12} = \textbf{p}_1 \cdot \textbf{q}_{12}
a_{1M} = \textbf{p}_1 \cdot \textbf{q}_{1M}
a_{21} = \textbf{p}_2 \cdot \textbf{q}_{21}
a_{22} = \textbf{p}_2 \cdot \textbf{q}_{22}
a_{2M} = \textbf{p}_2 \cdot \textbf{q}_{2M}
a_{N1} = \textbf{p}_N \cdot \textbf{q}_{N1}
a_{N2} = \textbf{p}_N \cdot \textbf{q}_{N2}
a_{NM} = \textbf{p}_N \cdot \textbf{q}_{NM}

While in this case we didn’t decrease the number of equations, we made each of them much simpler. As for the previous transformation, we could vectorize again and get yet simpler notation for the set of our equations. We will do that in a moment. But first we need to clarify something about ‘\cdot ‘ symbol we’ve been using carelessly so far.

The continuation in: Vectorization #1: the difficulty of notation, part 2

What is performance?

Since this is a first post in this blog, I decided to start with some more ‘lighter’ (or philosophical) topic. I will try to abstract this from code development, as this kind of discussion can be easily held for any type of project or activity.

In my work I often hear phrases similar to: ‘With these changes we have gained X% of performance’. ‘Good job!’ would be an answer most people would dream off. Instead, experienced audience comes to sentences like this with a dose of healthy distance. The statement is usually followed by a series of supportive questions:

  • ‘What is the reference you used to evaluate this gain?’,
  • ‘What is the maximum performance we could get?’,
  • ‘Have you tried comparing this results with XYZ?’,
  • ‘What are you measuring?’,
  • ‘So what?’ etc.

If the discussion continues, then the number of questions can grow further ad infinitum. But are these questions valid and do they have valid answers? Can we answer them right away and is it even practical to answer them? Is this information useful to the people who are listening? Let’s try analysing the real reason why these questions are being asked.

So what?

Let’s first start with the last question. In order to answer this question you have to ask some more questions:

  • External impact: Will this performance gain impact my customers/users’ experience in any way?
  • Internal impact: Can this modification be reflected in other places in the project/company/community to gain performance there?
  • Local impact: Will this modification impact our future project evolution (e.g. code development)?

As you probably noticed, it is all about impact. If you have to spend six months trying to increase performance, and nobody will notice this change, then you effectively wasted that time. On the contrary, if you can positively impact your customers, then it might be worth doing. Knowing what are the expectations of external customers will help you to better asses the significance of a specific optimization.

The optimizations that have internal impact can be useful only if they can be practically transferred to other projects in the close vicinity of what you are doing (e.g. inside your company/organization or within the community).

Doing optimizations that have local impact is worth the effort only if they accelerate the work that you and your colleagues are doing.

Donald Knuth coined a very popular quotation phrase:

Premature optimization is the root of all evil.

While this phrase seems to be overused lately to justify incompetence and/or laziness, one thing is sure:

Do not start optimizing unless you know what kind of impact it brings.

What are you measuring?

A branch of science called metrology has been developed around this question. And this science is a fundamental prerequisite for any optimization. There are two aspects of this science that you should consider when starting your optimization process.

At the core of optimization process is something called ‘an objective function’. Few examples of different optimization problems and objective functions are:

  • A stock market investor wants to maximize amount of money gained.
  • An Olympic runner wants to minimize the time it takes him to run 100 meters distance.
  • An ecology activist wants to minimize environmental impact of industry.

First of all you need to know what is your objective function and what is the unit you will be measuring. In the first example the amount of dollars is a fair unit of profit. The same amount expressed in euro would be acceptable as well, as it is straightforward to convert this unit into dollars and back. In the second example we would probably use seconds as a unit of choice, but we might as well use miles per hour.

The choice of a metric for the last example would be a very complex task. First of all, a good metrics would have to be easily quantified. If we cannot quantify, we usually cannot compare. How then would we quantify the environmental impact? By the number of cancer deaths caused by nuclear waste? By the amount of CO2 in the atmosphere? By the number of species extinct per year? Or maybe we do a weighted sum of all of these quantities?

This brings us to the second aspect of the proper measurements: the methodology of measurement and results comparability. Comparison between profit of two stock brokers residing in New York (‘dollars’ as unit) is trivial. Similar comparison between one residing in New York, and another one in London (unit is ‘pounds’) can be done if we know current exchange rates. Because the exchange rates change every second, we might find out that due to some financial crisis, suddenly our leading American stock broker is not worth a penny in UK. For the second example, it would be reasonable to do comparison between two runners participating in the same contest. Trying to compare handicap runner with a healthy one is considered to be an unfair comparison. Trying to compare a runner with a F1 car would be considered a nonsense.

In the last example we cannot even translate the concept of CO2 into number of species extinct in last century. While we could possibly find some correlation between the two, the number of data samples for the extinctions wouldn’t be meaningful in statistical sense. And of course we cannot (for both practical and moral reasons) repeat the extinction events.

Be careful about metrics and measurements.

Have you tried comparing this results with XYZ?

Another cliche saying is: Do not compare apples with pears. . Of course you can compare apples and pears if you choose proper feature that is common to both of them.

To have a valid comparison we need to have a proper experimental setting. To design a proper comparison experiment, we need to have a deep understanding of what we are trying to compare and for what reason. Say that you already have a setting that compares apples of different sort, one that measures: the radius of an apple, saturation of red and/or green skin color or the amount of sugar that can be extracted from the apple. Your best customer comes one day and says: ‘Well, we’ve been selling best apples for past ten years, and we are really happy from the data you’ve been giving us. We would like to broaden our market and start selling pears. Would you be so kind to give us the same data about different variety of those?’.
Now this gets you thinking… As an expert in the apple optimization, you can say: ‘Well it doesn’t make sense to measure radius of a pear, as it is not a spherical shape. We would have to develop a mathematical model of the pear curvature and then compare that. The pears are also rarely red, but we could adjust the saturation check for yellow and/or green. Extracting the sugar would probably stay the same?’. The excited customer says then: ‘Great! Can you also produce some type of comparison with the apples we’ve been selling so far?’.

Now that last sentence is really confusing…

In our little example the only feature we can compare without much doubt is the amount of sugar content. We could also compare green (or maybe even yellow) skin saturation for both types of fruit, but once we get a truly yellow pear, and a truly red apple, this comparison stops having any sense. What can you do in this case? Well you have to start developing a new experiment. You have to pick different qualities to compare, such as: acidic contents, pulp hardness, juiciness etc. If you however keep treating the radius as a metric, then the question that somebody will eventually ask you would be: ‘But how are you measuring the radius of a pear?’.

This kind of question is commonly asked in scientific community, and is very cleverly evaded by sales officers in the industry. You have to be really careful about what you are comparing and what experimental setting you are using.

What is the maximum performance we could get?

This is a really common question, a really interesting one, a really tricky to answer with one sentence, and in most cases an irrelevant one.

In rare cases the optimal solution is already known, and we could use it as a baseline. More often, we don’t know the best possible solution, but we know the state-of-the-art. Lastly, we might have no clue whatsoever about possible top performance.

If we are in the first category, it is pretty straightforward to asses the impact of further optimizations. If we are in the second group, then we can at least say that we can get better than the best-known solution. When we are in the last group – the whole science unravels.

A popular method of evaluation of potential performance is to create a mathematical model of an ideal solution, and perform a simulation. This is a tedious and expensive process. Think of the Olympic runners: how complex the model of a human body should be to give us results accurate up to 100’th of a second? And we could always develop even more precise model for this kind of evaluation.

For majority of practical engineering it is probably unnecessary to know this evaluation. Why? Well, the debate would go back to Knuth saying that we should focus on the optimizations having biggest impact, and try improving that one first. In other cases (e.g. Ultra High Frequency Trading) you are only interested in having a solution that works a fraction of the second faster than anybody else. So getting 0.001% faster than your competitors is enough to solve all your problems.

There is only one reason that I think makes knowing what is ‘the maximum performance’ valuable: letting go and moving on. If you know the maximum that you can get, and you know you are very close to that value, you can move to another optimization problem. There is a rule in economics called ‘a law of diminishing returns’ which states:

The return on investment decreases as you increase investments in only one factor, and keep other factors constant.

In other words: if you cannot further optimize in local context, you have to start optimizing in a broader sense. If you think about problem of minimizing fuel consumption in air planes, the optimizations might be already so advanced, that the only viable solution is to do full re-design of the machines.

Estimating maximum performance can be difficult and expensive. Using the same resources for optimization can be more useful.

What is the reference you used to evaluate this gain?

A reference value, is some value that we use to compare with. For a given experiment we can say that we have improved if the new result is better than the reference result. If the result is worse than reference, then we would observe performance degradation . Now depending on what reference we select, we can observe improvement of hundreds of percent, decrease of hundreds of percent or no difference at all! Now, this is a playground for marketing department!

I am of the opinion that it is the most important to select one reference point at the beginning of the optimization process, and stick with it for the rest of the project. A common mistake is to show improvement over the previous iteration of the optimization process. This leads to performance gains jumping around over the time, misleading result presentations and makes overall post-optimization analysis more difficult.

An example: say you start preparing for 100m run. Your first result in the optimization process (or ‘training’) is 100 seconds. It is not even near the Olympic record, but it doesn’t matter at this moment. The result you get will allow you to track your future progress. Next you decide, you will be training for a six months period, and do the timing results every month. So you get following values (mind there are 7 results, but only 6 months):

100, 40, 30, 25, 33, 22, 20

You can now calculate the improvements against the previous iterations:

-, 250%, 33%, 20%, -24%, 50%, 10%

As you can see it is difficult to say what happened, and if you don’t explain your audience that the negative value (e.g. caused by an injury) is actually a very good result, people might get confused. Also the 50% improvement after that incident would also look misleading.

If you use the initial reference for every ‘performance gain’ then you will get following:

-, 250%, 333%, 400%, 303%, 454%, 500%

Now that is much easier to read. Turns out that the 10% of the last improvement was actual 500% improvement in regard to reference. You also get the information, that the ‘10%’ from previous representation, is actually 5x improvement over your initial result. I don’t know about you, but for me this representation looks more satisfying, and easier to work with. You could even apply some inverse polynomial regression to figure out, that in next 6 months you could get to 13s timings! I agree – this is not a very realistic example, but still proves the point.

People are often using the first notation instead of the second one. And the reason is simple: no systematic performance evaluation over the project evolution. It is much easier to just take current code repository snapshot, do the measurements, make some optimization changes, and run measurements again. It is compelling, because otherwise the optimization process would have to be carried back over the past (and ancient) versions of the project. And people usually don’t want to waste time on doing so.

Carry on measurements from the beginning of your optimization process. Always use the same reference.

What is ‘Performance’?

I would try defining performance as a quality represented by a quantitative value, measured in some experimental setting, and used to compare different solutions of the same problem.

Now this sounds more scientific than I intended, but I guess it is necessary to establish some common language so that I could use the word ‘performance‘ in future posts without being exposed to linguistic struggles.

Performance makes only sense when it is used for comparison. There is no reason to talk about performance when your project doesn’t solve actual problem. And there should be no misconception that decreasing performance is a bad thing. You can add a feature to your project and suddenly you have different evaluation setting available. Even if you degrade with the measured values now, you are also solving a different problem! In that sense you should start the optimization process from the beginning, discarding the old reference.

About this blog

Anyways, this entry is already too long for a blog post, so it is time to close it now.

The topics I will be exploring in this blog, will usually relate to the questions that were discussed above. Because this is intended to be a software engineering blog, I will be trying to provide you with some practical tips on how to perform this kind of evaluation in context of the software.
However, since the performance is a general topic, I will also try presenting different techniques, such as statistical tools, which can also be applied in other domains.

It’s time to gain performance (and some feedback).