UME::SIMD Tutorials 6: Static alignment

As explained briefly in previous tutorial, data alignment can be an important factor in performance sensitive codes. In this tutorial we will go through direct techniques that can be used for static alignment control. Static alignment is the alignment of variables, including class members, which can be deduced at compile time.

Variable alignment

The static alignment can be controlled in C++ using alignas specifier. Let’s look at an example containing basic scalar variables only:

#include <iostream>
#include <iomanip>

void print_address(uint64_t addr) {
  std::cout << std::showbase << std::internal << std::setfill('0') <<std::hex << std::setw(4) << addr << std::dec;
}

int alignment(uint64_t addr) {
 uint64_t result = 1;
 for(uint64_t i = 2; i < 4096; i*=2) {
  if(addr % i != 0) break;
  result = i;
 }
 return result;
}

int main()
{
  uint8_t a;
  uint32_t b;
  alignas(16) uint32_t d;
  alignas(4) uint8_t c;

  std::cout << "[a]"
    << "\naddress: ";
    print_address((uint64_t)&a);
    std::cout << "\nvariable alignment: " << alignment((uint64_t)&a)
    << "\nrequired type alignment: " << alignof(decltype(a)) << "\n";
  std::cout << "\n[b]"
    << "\naddress: ";
    print_address((uint64_t)&b);
    std::cout << "\nvariable alignment: " << alignment((uint64_t)&b)
    << "\nrequired type alignment: " << alignof(decltype(b)) << "\n";
  std::cout << "\n[c]"
    << "\naddress: ";
    print_address((uint64_t)&c);
    std::cout << "\nvariable alignment: " << alignment((uint64_t)&c)
    << "\nrequired type alignment: " << alignof(decltype(c)) << "\n";

  std::cout << "\n[d]"
    << "\naddress: ";
    print_address((uint64_t)&d);
    std::cout << "\nvariable alignment: " << alignment((uint64_t)&d)
    << "\nrequired type alignment: " << alignof(decltype(d)) << "\n";
}

Running above code will give an output looking like this:

[a]
address: 0x7ffdf0452c04
variable alignment: 4
required type alignment: 1
[b]
address: 0x7ffdf0452c00
variable alignment: 1024
required type alignment: 4
[c]
address: 0x7ffdf0452c10
variable alignment: 16
required type alignment: 4
[d]
address: 0x7ffdf0452c08
variable alignment: 8
required type alignment: 1

I recommend re-running the example couple of times to observe how the specific alignments change.

The first variable a is required to be aligned to 1B boundary, as it’s natural alignment is 1B (sizeof(uint8_t)). The actual alignment is 4B, but this is OK, as it respects 1B requirement (as well as 2B). In other runs it can ba aligned to an arbitrary allowed boundary.

The second variable b also respects the natural alignment, which is in this case 4B (sizeof(uint32_t)). In this case the alignment of the variable should never be smaller than 4B.

For variables c and d we artificially force the alignment to be 16B and 4B respectively. In the first case the actual alignment ends up to be as expected. For d we end up with a stronger alignment but, as previously, it is also acceptable.

Array alignment

We can also use the alignas specifier for statically declared arrays.

#include <iostream>
#include <iomanip>

void print_address(uint64_t addr) {
  std::cout << std::showbase << std::internal << std::setfill('0') <<std::hex << std::setw(4) << addr << std::dec;
}

int alignment(uint64_t addr) {
 uint64_t result = 1;
 for(uint64_t i = 2; i < 4096; i*=2) {
  if(addr % i != 0) break;
  result = i;
 }
 return result;
}

int main()
{
  uint8_t a[10];
  uint32_t b[20];
  alignas(16) uint32_t c[7];
  alignas(4) uint8_t d[13];

  std::cout << "[a]"
    << "\naddress: ";
    print_address((uint64_t)&a);
    std::cout << "\narray alignment: " << alignment((uint64_t)&a)
    << "\narray size: " << sizeof(a)
    << "\nrequired type alignment: " << alignof(decltype(a));
  std::cout << "\n[b]"
    << "\naddress: ";
    print_address((uint64_t)&b);
    std::cout << "\narray alignment: " << alignment((uint64_t)&b)
    << "\narray size: " << sizeof(b)
    << "\nrequired type alignment: " << alignof(decltype(b)) << "\n";
  std::cout << "\n[c]"
    << "\naddress: ";
    print_address((uint64_t)&c);
    std::cout << "\narray alignment: " << alignment((uint64_t)&c)
    << "\narray size: " << sizeof(c)
    << "\nrequired type alignment: " << alignof(decltype(c)) << "\n";
  std::cout << "\n[d]"
    << "\naddress: ";
    print_address((uint64_t)&d);
    std::cout << "\narray alignment: " << alignment((uint64_t)&d)
    << "\narray size: " << sizeof(d)
    << "\nrequired type alignment: " << alignof(decltype(d)) << "\n";
}

An exemplary output:

[a]
address: 0x7ffcfddbfc79
array alignment: 1
array size: 10
required type alignment: 1
[b]
address: 0x7ffcfddbfc00
array alignment: 1024
array size: 80
required type alignment: 4
[c]
address: 0x7ffcfddbfc50
array alignment: 16
array size: 28
required type alignment: 4
[d]
address: 0x7ffcfddbfc6c
array alignment: 4
array size: 13
required type alignment: 1

As defined in C++, the minimal alignment requirement for an array, is the natural alignment of its elements. So for the array of 8b integers we obtain the required alignment of the arrays to be 1B for a and 4B for b. Forcing the alignment of an array results only in forcing the alignment of the first element. In this particular case, the alignment is forced on variables c and d.

PITFALL: Mind that the rest of the elements are guaranteed to be aligned only to their natural alignment, since C++ requires the contiguous packing of plain array elements. In other words: alignment of c[0] will be at least 16, but the alignment of c[1] might be only 4.

Member alignment

We will only discuss structures, but the general conclusions apply also for classes.

Again, the code you can use to tinker a little:

#include <iostream>
#include <iomanip>

void print_address(uint64_t addr) {
  std::cout << std::showbase << std::internal << std::setfill('0') <<std::hex << std::setw(4) << addr << std::dec;
}

void print_raw(uint8_t *ptr, int length) {
  for(int i = 0; i < length; i++) {
    std::cout << std::showbase << std::internal << std::setfill('0') <<std::hex << std::setw(4) << (uint32_t)ptr[i] << " " ;
  }
  std::cout << std::dec << std::endl;
}

int alignment(uint64_t addr) {
 uint64_t result = 2;
 for(uint64_t i = 2; i < 4096; i*=2) {
  if(addr % i != 0) break;
  result = i;
 }
 return result;
}

struct S_unaligned{
 uint8_t a;
 uint8_t b;

 // We initialize the data structure with some data
 // so that the output is more readable
 S_unaligned() : a(3), b(7) {}
};

struct S_aligned_a{
 alignas(16) uint8_t a;
 uint8_t b;

 S_aligned_a(): a(3), b(7) {}
};

struct S_aligned_b{
 uint8_t a;
 alignas(16) uint8_t b;

 S_aligned_b(): a(3), b(7) {}
};

int main()
{
  S_unaligned x0;

  std::cout << "[S_unaligned] size: " << sizeof(x0)
    << "\naddress: " << &x0
    << "\nrequired type alignment: " << alignof(S_unaligned)
    << "\nobject alignment: " << alignment((uint64_t)&x0)
    << "\ndata:";
  print_raw((uint8_t*) &x0, sizeof(x0));

  alignas(128) S_unaligned x0_a;
  std::cout << "\n[aligned S_unaligned] size: " << sizeof(x0_a)
    << "\naddress: " << &x0_a
    << "\nrequired type alignment: " << alignof(S_unaligned)
    << "\nobject alignment: " << alignment((uint64_t)&x0_a)
    << "\ndata:";
  print_raw((uint8_t*) &x0_a, sizeof(x0_a));

  S_aligned_a x1;
  std::cout << "\n[S_aligned_a] size: " << sizeof(x1)
    << "\naddress: " << &x1
    << "\nrequire dtype alignment: " << alignof(S_aligned_a)
    << "\nobject alignment: " << alignment((uint64_t)&x1)
    << "\ndata:";
  print_raw((uint8_t*) &x1, sizeof(x1));

  alignas(128) S_aligned_a x1_a;
  std::cout << "\n[aligned S_aligned_a] size: " << sizeof(x1_a)
    << "\naddress: " << &x1_a
    << "\ntype alignment: " << alignof(S_aligned_a)
    << "\nobject alignment: " << alignment((uint64_t)&x1_a)
    << "\ndata:";
  print_raw((uint8_t*) &x1_a, sizeof(x1_a));  

  S_aligned_b x2;
  std::cout << "\n[S_aligned_b] size: " << sizeof(x2)
    << "\naddress: " << &x2
    << "\nrequire type alignment: " << alignof(S_aligned_b)
    << "\nobject alignment: " << alignment((uint64_t)&x2)
    << "\ndata:";
  print_raw((uint8_t*) &x2, sizeof(x2));

  alignas(128) S_aligned_b x2_a;
  std::cout << "\n[aligned S_aligned_b] size: " << sizeof(x2_a)
    << "\naddress: " << &x2_a
    << "\nrequire type alignment: " << alignof(S_aligned_b)
    << "\nobject alignment: " << alignment((uint64_t)&x2_a)
    << "\n data:";
  print_raw((uint8_t*) &x2_a, sizeof(x2_a));
}

We will now discuss few selected structures.

Structure #1: No members aligned

For the S_unaligned x0 the output is;

[S_unaligned] size: 2
address: 0x7fff2c36ca78
required type alignment: 1
object alignment: 8
data:0x03 0x07

Few things to point out here. First of all, the size of the S_unaligned structure is 2 bytes. This means, that also the natural alignment of this structure is 2B. At the same time, the structure has two fields, each of them 1B long (natural alignment equal to 1B). According to C++, the structure itself has to be aligned to the natural alignment of its first member. And indeed it is, as indicated by:

type alignment: 1

While the structure itself has to be aligned to 1B boundary, in this particular case it is aligned to 8B boundary:

object alignment: 8

Again, this is OK as higher alignment requirement also respects lower alignments, i.e. 8B aligned structure is also 4B, 2B and 1B aligned.

When we align an object of this structure, we obtain something like:

[aligned S_unaligned] size: 2
address: 0x7ffe25de5080
required type alignment: 1
object alignment: 128
data:0x03 0x07

Nothing changed, except for the minimal value the object alignment can take.

Structure #2: only the first member aligned

Now let’s modify the structure to have the a member aligned to some boundary. In this particular example, we are aligning the first field to 16B boundary.

  struct S_aligned_a{
   alignas(16) uint8_t a;
   uint8_t b;

   S_aligned_a(): a(3), b(7) {}
  };

The output for a specific execution is:

[S_aligned_a] size: 16
address: 0x7ffc5b06a0f0
require dtype alignment: 16
object alignment: 16
data:0x03 0x07 0xea 0xe9 0x21 0x7f 0000 0000 0x90 0x77 0xea 0xe9 0x21 0x7f 0000 0000

Few interesting things happened here. First of all, the size of the structure immediately got larger. The size is now 16B, which means 8 times longer than the initial one! But we also can see from data: 0x03 0x07 ... that a and b are still in the same place within the structure. What is happening? Let’s imagine an example with pointer arithmetic:

S_aligned_a data[10];
S_aligned_a* ptr = &data[10];

for (int i = 0; i <10; i++) {
 std::cout <a << std::endl;
 ptr++;
}

Due to how pointer arithmetics work, when incrementing the pointer, such as in the loop using ptr, the value stored in the pointer (the address at which the pointer is pointing) has to be increased with the size of the base type of the pointer. In this specific case if the size was 2B, as for the original structure, then the pointer would be increased by 2B each time. This means that we would end up accessing structures of S_aligned_a type at following addresses:


0x7ffc5b06a0f0,
0x7ffc5b06a0f2,
0x7ffc5b06a0f4,
0x7ffc5b06a0f8,
0x7ffc5b06a0fa,
0x7ffc5b06a0fc,
0x7ffc5b06a0fe,
0x7ffc5b06a100,
...

Out of all of above adresses, only 0x7ffc5b06a0f0 and 0x7ffc5b06a100 are aligned to 16B boundary! This means that a would not be aligned for data[1], data[2],…, data[6], data[8] etc.
Now the obvious solution is to add padding at the end of each structure object so that the next structure, starting directly after the previous one, was automatically aligned to required boundary. This is exactly how it is done. Unfortunately in some specific cases, as the one presented, this might grow the required size enormously.

As for the previous case, aligning the whole structure doesn’t change much:

[aligned S_aligned_a] size: 16
address: 0x7ffe25de5000
type alignment: 16
object alignment: 2048
data:0x03 0x07 0x60 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Structure #3: only the second member aligned

Now let’s try defining another structure, this time with only b aligned:

struct S_aligned_b{
 uint8_t a;
 alignas(16) uint8_t b;

 S_aligned_b(): a(3), b(7) {}
};

A particular test execution gives us:


[S_aligned_b] size: 32
address: 0x7ffc5b06a160
require type alignment: 16
object alignment: 32
data:0x03 0000 0000 0000 0x10 0000 0000 0000 0x01 0000 0000 0000 0000 0000 0000 0000 0x07 0x19 0x60 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

What?! Yet another size increase!? Again, C++ is not allowed to reorder the members (due to serialization problems it might cause), so the only way it can behave is by padding with some unused fields between a and b. Let’s think for a bit, why is this necessary.

The a member, is aligned to 1B boundary, that is to its’ natural alignment. Let’s assume that an object of the S_aligned_b structure is aligned to 16B boundary. a would then be aligned to 16B, but the next 15B would only be guaranteed to be aligned to 1, 2, 4 and 8 byte boundaries. This means that the first unocupied address after a, aligned to 16B boundary is starting at the offset 16 from the beginning of the structure. Hence the data fields between two structures have to be padded with unused space, so that the alignment requirement for b are met.

We could use the same rationalization for why the required type alignment is 16: it has to be to guarantee that b is properly aligned.

Yet one more time aligned object does not have a drastic impact:

[aligned S_aligned_b] size: 32
address: 0x7ffe25de4f80
require type alignment: 16
object alignment: 128
data:0x03 0x8e 0x9f 0xa2 0x01 0x7f 0000 0000 0x90 0x17 0x9f 0xa2 0x01 0x7f 0000 0000 0x07 0xc0 0x66 0xa2 0x01 0x7f 0000 0000 0x80 0x7c 0x9f 0xa2 0x01 0x7f 0000 0000

Structure packing

As shown before, the size of the structure can grow surprisingly quickly. Can we fight it, and how? We can reorder the members of a class/struct so that they have tighter packing. The code to show this:

#include <iostream>
#include <iomanip>

void print_address(uint64_t addr) {
  std::cout << std::showbase << std::internal << std::setfill('0') <<std::hex << std::setw(4) << addr << std::dec;
}

void print_raw(uint8_t *ptr, int length) {
  for(int i = 0; i < length; i++) {
    std::cout << std::showbase << std::internal << std::setfill('0') <<std::hex << std::setw(4) << (uint32_t)ptr[i] << " " ;
  }
  std::cout << std::dec << std::endl;
}

int alignment(uint64_t addr) {
 uint64_t result = 2;
 for(uint64_t i = 2; i < 4096; i*=2) {
  if(addr % i != 0) break;
  result = i;
 }
 return result;
}

struct S{
 uint8_t a;
 uint16_t b;
 uint32_t c;
 uint64_t d;
 alignas(256) uint64_t e;

 S(): a(3), b(7), c(9), d(1), e(6) {}
};

struct S_packed{
 alignas(256) uint64_t e;
 uint64_t d;
 uint32_t c;
 uint16_t b;
 uint8_t a;

 S_packed(): a(3), b(7), c(9), d(1), e(6) {}
};

struct S_packed_noalign{
 uint64_t e;
 uint64_t d;
 uint32_t c;
 uint16_t b;
 uint8_t a;

 S_packed_noalign(): a(3), b(7), c(9), d(1), e(6) {}
};


int main()
{
  S x0;

  std::cout << "[S] size: " << sizeof(x0) 
    << "\naddress: " << &x0
    << "\nrequired type alignment: " << alignof(S)
    << "\nobject alignment: " << alignment((uint64_t)&x0)
    << "\ndata:";
  print_raw((uint8_t*) &x0, sizeof(x0));
  
  S_packed x1;
  
  std::cout << "[S_packed] size: " << sizeof(x1) 
    << "\naddress: " << &x1
    << "\nrequired type alignment: " << alignof(S_packed)
    << "\nobject alignment: " << alignment((uint64_t)&x1)
    << "\ndata:";
  print_raw((uint8_t*) &x1, sizeof(x1));
  
  S_packed_noalign x2;
  
  std::cout << "[S_packed_noalign] size: " << sizeof(x2) 
    << "\naddress: " << &x2
    << "\nrequired type alignment: " << alignof(S_packed_noalign)
    << "\nobject alignment: " << alignment((uint64_t)&x2)
    << "\ndata:";
  print_raw((uint8_t*) &x2, sizeof(x2));

}

The output (i marked the actual data that matters for us):

[S] size: 512
address: 0x7fff1beb2b00
required type alignment: 256
object alignment: 256
data:0x03 0x34 0x07 0000 0x09 0000 0000 0000 0x01 0000 0000 0000 0000 0000 0000 0000 0x46 0x1d 0x41 0x11 0x6a 0x7f 0000 0000 0x58 0xb7 0xbc 0x10 0x6a 0x7f 0000 0000 0x98 0x66 0x40 0x11 0x6a 0x7f 0000 0000 0000 0000 0000 0000 0x01 0000 0000 0000 0xec 0x07 0000 0000 0x01 0000 0000 0000 0x25 0x0c 0x78 0x11 0x6a 0x7f 0000 0000 0x48 0xec 0xbb 0x10 0x6a 0x7f 0000 0000 0xa8 0x45 0x99 0x11 0x6a 0x7f 0000 0000 0xc0 0x2c 0xeb 0x1b 0xff 0x7f 0000 0000 0x16 0x04 0x78 0x11 0x6a 0x7f 0000 0000 0x50 0x42 0x99 0x11 0x6a 0x7f 0000 0000 0x10 0x2c 0xeb 0x1b 0xff 0x7f 0000 0000 0x5e 0x96 0x93 0x1c 0000 0000 0000 0000 0x10 0x2c 0xeb 0x1b 0xff 0x7f 0000 0000 0x07 0000 0000 0000 0000 0000 0000 0000 0xb8 0x4f 0x96 0x11 0x6a 0x7f 0000 0000 0xda 0x16 0x87 0x30 0000 0000 0000 0000 0x25 0x0c 0x78 0x11 0x6a 0x7f 0000 0000 0x07 0000 0000 0000 0000 0000 0000 0000 0x5b 0x1c 0xc2 0000 0000 0000 0000 0000 0x1a 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0x07 0000 0000 0000 0000 0000 0000 0000 0x90 0x2d 0xeb 0x1b 0xff 0x7f 0000 0000 0xa8 0x06 0x3f 0x11 0x6a 0x7f 0000 0000 0xb0 0xc3 0x40 0x11 0x6a 0x7f 0000 0000 0x70 0x2c 0xeb 0x1b 0xff 0x7f 0000 0000 0x16 0x04 0x78 0x11 0x6a 0x7f 0000 0000 0xa8 0x06 0x3f 0x11 0x6a 0x7f 0000 0000 0xa0 0x2c 0xeb 0x1b 0xff 0x7f 0000 0000 0x06 0000 0000 0000 0000 0000 0000 0000 0xa0 0x2c 0xeb 0x1b 0xff 0x7f 0000 0000 0x07 0000 0000 0000 0000 0000 0000 0000 0xb8 0x4f 0x96 0x11 0x6a 0x7f 0000 0000 0xed 0xe9 0x43 0x2b 0000 0000 0000 0000 0x62 0x0d 0x78 0x11 0x6a 0x7f 0000 0000 0x62 0x85 0x43 0x11 0x6a 0x7f 0000 0000 0xa7 0x0f 0xad 0000 0000 0000 0000 0000 0x2d 0000 0000 0000 0x6a 0x7f 0000 0000 0xb8 0xd6 0xbb 0x10 0x6a 0x7f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0x20 0x2e 0xeb 0x1b 0xff 0x7f 0000 0000 0x48 0xec 0xbb 0x10 0x6a 0x7f 0000 0000 0xf8 0x35 0xbc 0x10 0x6a 0x7f 0000 0000 0xe0 0x2d 0xeb 0x1b 0xff 0x7f 0000 0000 0x08 0x2e 0xeb 0x1b 0xff 0x7f 0000 0000 0x50 0x42 0x99 0x11 0x6a 0x7f 0000 0000 0x20 0x31 0x96 0x11 0x6a 0x7f 0000 0000 0xda 0x16 0x87 0x30 0000 0000 0000 0000 0xea 0x0f 0x78 0x11 0x6a 0x7f 0000 0000 0x88 0x91 0x99 0x11 0x6a 0x7f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0x60 0x30 0x96 0x11 0x6a 0x7f 0000 0000 0xf8 0x44 0x96 0x11 0x6a 0x7f 0000 0000 0xe7 0x08 0x40 0000 0000 0000 0000 0000 0x58 0xb7 0xbc 0x10 0x6a 0x7f 0000 0000 0x10 0x04 0x40 0000 0000 0000 0000 0000 0000 0000 0000 0000 0x01 0000 0000 0000 0x12 0x03 0000 0000 0x01 0000 0000 0000 0x50 0x42 0x99 0x11 0x6a 0x7f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0xe0 0x94 0x99 0x11 0x6a 0x7f 0000 0000
[S_packed] size: 256
address: 0x7fff1beb2d00
required type alignment: 256
object alignment: 256
data:0x06 0000 0000 0000 0000 0000 0000 0000 0x01 0000 0000 0000 0000 0000 0000 0000 0x09 0000 0000 0000 0x07 0000 0x03 0000 0x60 0x30 0x96 0x11 0x6a 0x7f 0000 0000 0xed 0xe9 0x43 0x2b 0000 0000 0000 0000 0xea 0x0f 0x78 0x11 0x6a 0x7f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0x60 0x30 0x96 0x11 0x6a 0x7f 0000 0000 0x01 0000 0000 0000 0xff 0x7f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0x01 0000 0000 0000 0xff 0x7f 0000 0000 0x88 0x91 0x99 0x11 0x6a 0x7f 0000 0000 0x3a 0x02 0000 0000 0000 0000 0000 0000 0x64 0x4c 0x41 0x11 0x6a 0x7f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0xd5 0xb9 0x78 0x11 0x6a 0x7f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0xe0 0x94 0x99 0x11 0x6a 0x7f 0000 0000 0x20 0x2e 0xeb 0x1b 0xff 0x7f 0000 0000 0x50 0x42 0x99 0x11 0x6a 0x7f 0000 0000 0x38 0x2e 0xeb 0x1b 0xff 0x7f 0000 0000 0x85 0x4d 0x02 0x04 0x01 0000 0000 0000 0x50 0xea 0x40 0x11 0x6a 0x7f 0000 0000 0xe7 0x08 0x40 0000 0000 0000 0000 0000 0x48 0x96 0x38 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0x94 0x5e 0x77 0x11 0x6a 0x7f 0000 0000 0x90 0xe7 0x76 0x11 0x6a 0x7f 0000 0000 0000 0x90 0x3e 0x11 0x6a 0x7f 0000 0000 0x80 0x4c 0x77 0x11 0x6a 0x7f 0000 0000 0x18 0xf2 0x76 0x11 0x6a 0x7f 0000 0000 0x50 0x95 0x76 0x11 0x6a 0x7f 0000 0000
[S_packed_noalign] size: 24
address: 0x7fff1beb2e00
required type alignment: 8
object alignment: 512
data:0x06 0000 0000 0000 0000 0000 0000 0000 0x01 0000 0000 0000 0000 0000 0000 0000 0x09 0000 0000 0000 0x07 0000 0x03 0000

The general rule for reordering is to calculate max(required_alignment(member), sizeof(member)) and declare the members with higher values first. For this particular case this packing technique halved the size of the structure when one of the field requires a very high alignment.
Also if we can relax the alignment requirement, as in the structure S_packed_noalign, then the amount of space required will be even smaller. We might however run into problems with efficient register LOAD/STORE operations if the alignment criteria are not met.

PITFALL: If the structures are nested, modifications made in components might change their size/alignment requirements. Therefore it might also require rearrangement of derived structures. A common, nasty scenario is when the alignment problems appear after a remote commit, not directly related to affected structure. In such scenarios, seemingly innocent changes can drasticaly decrease the performance of an application.

Summary

We discussed topic of static alignment. Few performance hints to be remembered:

  1. The alignment requirement can be controlled using alignas keyword.
  2. The natural alignment can be obtained using alignof keyword.
  3. Alignment of array is set only for the first element. Therefor consecutive elements might not respect the necessary alignment constraint.
  4. When aligning members of structures the size of the structure might increase substantially due to padding introduced by the compiler.
  5. Structures can be packed densely using manual reordering of members, so that the resulting size is decreased.

I don’t think that the topic of static alignment is exhausted here, but the post already got eXtra Large. So let me know if something requires more thorough explanation.

In the next post I will discuss how can alignment be controlled in a dynamic way.

One thought on “UME::SIMD Tutorials 6: Static alignment”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s