In the previous post we discussed how to control the alignment of statically declared variables, and struct/class members. In this post we will look at ways of performing dynamic allocations with alignment control.
The C++ way
Unfortunately so far there is no convenient, standard way for aligned allocations. Current C++ requires the users to perform overallocation and then to use only the data at aligned offset. This can be done using std::align function.
Example code:
#include <iostream> #include <iomanip> #include <memory> void print_address(uint64_t addr) { std::cout << std::showbase << std::internal << std::setfill('0') <<std::hex << std::setw(4) << addr << std::dec; } int alignment(uint64_t addr) { uint64_t result = 1; for(uint64_t i = 2; i <= 4096; i*=2) { if(addr % i != 0) break; result = i; } return result; } int main() { const int ALIGNMENT = 64; // align to 64B (512b) boundary int ARRAY_LENGTH = 5; // size in number of array fields void *original_ptr; // this has to be 'void*' so that we could use it with std::align void *aligned_ptr; std::size_t SIZE = ARRAY_LENGTH*sizeof(float); std::size_t OVERALLOCATED_SIZE // overallocated size in bytes = (SIZE + ALIGNMENT - sizeof(float)); original_ptr = (float*) malloc(OVERALLOCATED_SIZE); std::cout << "[original_ptr]" << "\naddress: "; print_address((uint64_t)original_ptr); std::cout << "\nvariable alignment: " << alignment((uint64_t)original_ptr) << "\nrequired type alignment: " << alignof(decltype(*(float*)original_ptr)) << "\nrequired ptr alignment: " << alignof(decltype(original_ptr)) << "\n"; // We would like to retain the 'original_ptr' // to be able to free the buffer in the future aligned_ptr = original_ptr; aligned_ptr = std::align( ALIGNMENT, sizeof(float), aligned_ptr, OVERALLOCATED_SIZE); if(aligned_ptr == nullptr) { std::cout << "std::align failed: no sufficient space in the buffer.\n"; } std::cout << "[aligned_ptr]" << "\naddress: "; print_address((uint64_t)aligned_ptr); std::cout << "\nvariable alignment: " << alignment((uint64_t)aligned_ptr) << "\nrequired type alignment: " << alignof(decltype(*(float*)aligned_ptr)) << "\nrequired ptr alignment: " << alignof(decltype(aligned_ptr)) << "\n"; free(original_ptr); }
Exemplary output:
[original_ptr]
address: 0x1109c20
variable alignment: 32
required type alignment: 4
required ptr alignment: 8
[aligned_ptr]
address: 0x1109c40
variable alignment: 64
required type alignment: 4
required ptr alignment: 8
What the code does is:
- it calculates the overallocation size ( OVERALLOCATED_SIZE) taking into account additional padding,
- allocates an oversized buffer using a standard allocation method,
- creates a new pointer to point at properly aligned location within the buffer,
- uses the allocated data to do something (possibly useful),
- frees the data using the original pointer.
This method is pretty troublesome. First of all we need to manually calculate the size of the new buffer. It is not difficult, however it might be a source of error, and is troublesome in case that aligned allocations have to be frequent in the code. The allocated size can be calculated from the following formula:
std::size_t OVERALLOCATED_SIZE = (SIZE + ALIGNMENT - sizeof(float));
A small graphics to show why this formula would work:
In a worst case scenario we could end up with malloc
returning an address that is pointing just after a 64B aligned address. This means, that the first element would have to be shifted 60B to the right ( ALIGNMENT - sizeof(float)
). The last element would also have to be shifted, so the span between original location of the first element, and the new location of the last data element would be: 60 + 5*sizeof(float)
that is 80 bytes. This is exactly how much space we need to allocate.
Secondly, we have to make sure that we retain the knowledge about original location of the memory buffer. This requires us to actually control two pointers: one for the original buffer, and one for the aligned buffer. Even with that it is easy to fall into a PITFALL of passing the original pointer to the std::align
function, as it might be modified by the function! We can avoid this by copying the original pointer into the new pointer and passing it instead:
aligned_ptr = original_ptr; aligned_ptr = std::align( ALIGNMENT, sizeof(float), aligned_ptr, OVERALLOCATED_SIZE);
Dynamic allocation using UME
UME offers you a set of portable wrappers for aligned allocation. You can acces these using UME::DynamicMemory::AlignedMalloc()
and UME::DynamicMemory::AlignedFree()
methods. The behaviour is almost identical to standard malloc
call, with the difference that also additional ALIGNMENT parameter has to be defined. Here’s a code example:
#include <iostream> #include <iomanip> #include <umesimd/UMESimd.h> void print_address(uint64_t addr) { std::cout << std::showbase << std::internal << std::setfill('0') <<std::hex << std::setw(4) << addr << std::dec; } int alignment(uint64_t addr) { uint64_t result = 1; for(uint64_t i = 2; i <= 4096; i*=2) { if(addr % i != 0) break; result = i; } return result; } int main() { const int ALIGNMENT = 128; // align to 128B boundary int ARRAY_LENGTH = 5; // size in number of array fields std::size_t SIZE = ARRAY_LENGTH*sizeof(float); float *ptr = (float*) UME::DynamicMemory::AlignedMalloc(SIZE, ALIGNMENT); std::cout << "[ptr]" << "\naddress: "; print_address((uint64_t)ptr); std::cout << "\nvariable alignment: " << alignment((uint64_t)ptr) << "\nrequired type alignment: " << alignof(decltype(*ptr)) << "\nrequired ptr alignment: " << alignof(decltype(ptr)) << "\n"; UME::DynamicMemory::AlignedFree(ptr); }
Nothing more, nothing less. No additional calculations required, no additional pointers to track, simple syntax.