Custom C++20 Memory Allocators for STL Containers

Custom C++ Memory Allocators and using them with STL containers

If you’re making a game engine, or really any memory intensive application, it’s a really good idea to use a custom memory allocator. In this article, I’ll go over why you want them, how you can build them in C++, and how you can use them in the standard STL Containers like std::unordered_map and std::vector etc.

Why write this? Because while there’s a lot of info about custom allocators, I found it hard to find solid C++ code. Finding info on making these custom allocators compatible with STL Containers was another problem I had. So I’m putting all I’ve learned in one place.

Why do we use custom memory allocators?

Performance. The slowest part of computer hardware in the modern day is not the processor, it’s the memory bus. Or put another way, it’s how fast the memory can be first retrieved, and secondly, travel down the wire to the processor. Nowadays your gaming machines are using DDR3 or DDR5 RAM, really fast stuff, but we can always be faster. That’s why you may have heard of things on your CPU like the L1 Cache and the shared L3 cache.

These “caches” are all just efforts to have the next bit of memory that the processor will likely need next, up close to the processor’s circuitry. You can optimise your code to make the most use of these caches, for example the order that you loop through a multi-dimensional array.

But that’s NOT what a custom memory allocator will do. Caches etc are a topic for a different article. What a custom allocator will do is make less calls to acquire new memory.

For example, let’s talk about STL Containers. std::vector is wonderful, isn’t it? We use it all the time. As a contiguous block of memory, we can use it for all sorts of things. Let’s say I have the following:

#include <vector>
#include <cstdint>

int main()
{
    std::vector<std::uint16_t> myData({10u, 24u, 4859u});

    return 0;
}Code language: C++ (cpp)

How much memory has my myData vector used? Well, we’ve got 3x 16-bit integers, so maybe 48-bits (6 bytes)? That’s almost correct. The vector itself just holds a pointer to where the memory actually is, somewhere on the heap. That pointer uses up some tiny amount of memory (so small we don’t care about it, and that pointer is on the stack where it belongs), but it points to a memory location where at least 48 bits are reserved for this vector.

I say “at least” because it’s implementation specific. Clang’s version of std::vector might do something different to MingW’s.

So, somewhere in memory, we’ve got about 48-bits used by this vector. Here’s a diagram showing the heap memory available to our program and what happens when we create the above vector.

Heap allocation after creating the vector

Now, what happens when I add another element?

...
myData.push_back(837u);
...Code language: C++ (cpp)
New element added

That shouldn’t be a surprise, there’s now 2 more bytes allocated to the vector. It’s all shifted because the vector has said, “oh hold on, I need to find 8 bytes now, not 6”. So it has to request from the operating system 8 bytes of contiguous memory. This is a costly operation because the OS needs to now find 8 bytes somewhere, and give it to the vector (add on top, the vector then needs to copy the values to the new memory allocation and destruct the old values).

Previously, after the 6-byte allocation, there was a contiguous block of 10-bytes remaining, we could have created a new vector with 5 16-bit integers in it. But now, after the request for a new block of memory, we can’t fit a 10-byte vector in memory, instead we can only fit a 6-byte and a 2-byte vector.

Fragmentation – When there’s lots of things being added and removed, the free bits of memory get chopped and changed up. Eventually, when some container makes a request for n-bytes, there may not be n-bytes in a row free, so you end up with an out-of-memory exception.

Ignoring this fragmentation for now, each of these allocations cost time. What our custom allocators allow us to do is request a big chunk of contiguous memory up front, and then every time a container requests some memory of n-bytes, we dish it out some of the memory we’ve already allocated. We do this with STL Containers by making sure our custom allocators are compatible with std::allocator. Now, the operating system doesn’t need to spend previous cycles trying to find free bits of memory while our program is running, it only needed to do it once when our program started.

The second reason to use custom allocators is control and monitoring. With a custom allocation architecture, you know exactly how much memory your game is using, where it’s being used, and when you’re getting close to running out. It’s also fun and cool to do it yourself and all the different ways to handle memory (that’s the control part). We can see how fragmented our memory is, and we can even take steps to defragment it if needed.

How much of a performance increase does a custom memory allocator provide?

I’ve got a test suite with the custom allocator patterns I’ll show you that compares the normal out-of-the-box STL container allocators with the custom ones. The containers don’t change, I’m still using std::string and std::unordered_set, but the allocator changes. The performance differences are quite extreme. This performance increases are with -O3 flags set too BTW, so the library is free to be as fast as possible. Each test is done 100 times and speed averaged over that.

Container ScenarioSTL Allocator TimeCustom Allocator Time% Decrease
std::vector of 2000 elements, each individually emplaced. Element uses heap memory internally (object with pointers).

std::vector<Element> a;
for(int i = 0; i < 2000; ++i)
{
a.emplace_back(Element(i));
}


Operating on the data involves invoking a method that does something with the instances internal pointers.

The vector is then allowed to fall out of scope and destruct.
Creation time: 411 microseconds

Operate time: negligible

Deletion time: 70 microseconds
Creation time: 76 microseconds

Operate time: negligible

Deletion time: 16 microseconds
Creation time: 81%

Operate time: negligible

Deletion time: 77%
Same as above, but an std::listCreation time: 1076 microseconds

Operate time: 61 microseconds

Deletion time: 276 microseconds
Creation time: 33 microseconds

Operate time: 9 microseconds

Deletion time: 13 microseconds
Creation time: 96%

Operate time: 85%

Deletion time: 95%
Same as above, but an std::unordered_mapCreation time: 433 microseconds

Operate time: 15 microseconds

Deletion time: 134 microseconds
Creation time: 191 microseconds

Operate time: 16 microseconds

Deletion time: 45 microseconds
Creation time: 55%

Operate time: 0%

Deletion time: 66%
Results using STL containers with a custom allocator

These are simple tests, and I haven’t explained exactly what kind of allocator I was using in each container, but there are obviously big speed benefits to this.

Allocation Types

Next, let’s cover two different kinds of allocator you might want to create. Each of these will have a diagram illustrating a few different actions that could be taken.

There are more allocation patterns than just these ones, but I’ve found that these cover most use cases I’ve come up with. Just these two are enough for me to get the above speed gains.

Fixed Linear Allocator

This allocator is the simplest. It has a chunk of memory, and when an allocation request comes in, it provides the next free chunk. When deallocation requests come in, it does nothing. The allocations are linear, they go in one direction. But! You can always rewind back to a previous allocation. You can’t deallocate/free memory in the middle.

In the diagram below, the follow is taking place:

  1. A 3-byte allocation is requested. In response, our Linear Allocator:
    • Moves its internal free pointer to the third byte. This is denoted by the ‘free’ section
    • Returns a pointer to the start of the 3-byte section reserved for the requester
  2. A 2-byte allocation is requested. Same process as above occurs. There are now 2 pointers out in our program, one with 3 bytes of storage, the other with 2 bytes.
  3. A 4-byte allocation is requested.
  4. We ‘rewind’ the 4 byte allocation. In a way, you can think of this as the 4-byte allocation being returned to us to be used again by someone else.
  5. A 14-byte allocation is requested. But unfortunately, we don’t have that many free bytes in our Linear Allocator, so we’re out of memory. Exactly what you do at this point is a question for your implementation.
You can only add and remove allocations from the end of a Linear Allocator

At first, it might seem like a bad idea, but when speed is absolutely essential, this is a good choice. It’s also useful for when you first request your entire games memory from the operating system. Each sub-system in your game will live for the entire length of your program execution, so the sub-system memory can be allocated linearly, because we’re never going to, for example, free up the audio subsystem memory in the middle of the allocator and use it for something else.

Being able to rewind to the start of the “buffer” is also useful when you have a lot of temporary “things”. For example a particle system: when you’re done with all the sparks you can just rewind to the start of the linear allocator and start again.

There is a slight extension to a linear allocator like this called Dynamic Linear Allocator. This one works much the same, memory is provided in a straight line, except it has the backing of another allocator. Basically allowing you to request data from it, and when it runs out, it will request another chunk from the backing allocator.

Free-List Allocator

This allocator is a lot like the allocator that your operating system uses. If you remember your computer science theory, lol just kidding. A free-list is basically a way to allow your memory to be given in chunks of the requested size, and when a chunk in the “middle” gets freed, it can be reused later, and when two adjacent chunks are freed, they will be merged into one larger chunk.

That’s a little hard to wrap your head around. So let’s start at the beginning. First, we have the memory of our allocator. It has a what we call a free list pointer that points to the start of our memory. The free list pointer keeps track of our memory, it contains two things:

  1. A pointer to the next free-block of memory
  2. The size of this free block of memory

So to start with, we just have a free pointer that points to the start of our memory, and says it’s n-bytes.

Next, let’s make a request for 3-bytes of memory. To do this, we:

  1. Keep track that a 3-byte allocation was made at the current address
  2. Move the free-pointer ahead 3-bytes

So, our free-pointer points to a block 3-bytes into out memory, and says it’s n-3 bytes in size.

No problem. Now let’s move ahead and say there’s then been a few more allocations of various sizes:

Unlike our Linear Allocator, we can free up some memory. Let’s free up the 4-byte allocation from position 3-6. What happens is the deallocation/free requests comes in with pointer address ‘3’. We then lookup (remember we were keeping track) what the allocation size was at position ‘3’. With this information, we can now say that those 4-bytes are free again.

To denote ‘free’, we can move our free-pointer back to position 3 and say that the next 4 bytes from there are available. But now… how do we keep track that position ‘a’ is free for the next 6 bytes as well? We could have a bunch of free pointers, each one telling us how much space is available there… and in fact that’s exactly what we’ll do, sort of. If you remember before, the free pointer tells us the size of the memory available at that position, as well as the position of the next free pointer.

In other words, each free block tells you the address of the next free block:

Alright, makes some sense? So every time an allocation request comes in we do the following:

  1. Check the free list pointer. Does it have enough space at this location to meet the request?
    1. Yes, then use that requested space. Is there space remaining in the block?
      1. Yes, create a new free list pointer and point it to the next
      2. No, use the ‘next’ free list pointer as the new start of the free list pointers
    2. No, move onto the next free list pointer

Isn’t this just a linked list? Yes, that’s exactly what this is.

To be sure it’s clear, let’s make a request for another 2 bytes.

And then, let’s make a request for 3 bytes:

Do you see what happened there? On the request for 3 bytes, the first of the free-pointers didn’t have enough space, so we moved on to the next one and moved it. Of course, we had to update the previous free pointer to point to the newly moved second pointer.

There’s one final trick. Let’s free up the 2-byte allocation in position 7.

This is going to be a problem isn’t it? If we then request a 4-byte allocation, we’re going to ‘run out of memory’. Do you see why? The first pointer is going to say, “sorry, I only have two bytes, try the next guy.”. We’ll move to the next guy and he’ll say the same, “sorry, only two bytes here, try the next guy.”. This final pointer will only have 3-bytes so we were unable to meet the request for 4 bytes.

But there’s 4 free bytes right there. Yes! There is. So the final trick is that when allocations are freed, we need to check if they can be merged, either to a free block before or after it (or both). So that 2-byte allocation being freed, should result in:

And that is a Free List Allocator. There’s a few more tricks we can use for the implementation details… so let’s do that!

C++ Custom Allocator Implementation

The first thing we’ll need is an interface for our allocators to inherit from. This will make things a little easier down the track.

m_start is a pointer to the start of the memory available to the concrete implementation of Allocator. What this means is that to use it, you first need to std::malloc some data and provide to your concrete Allocator instance (you’ll see later that you can also use one Allocator to provide the memory another Alloactor will use.

m_size is how much data is at m_start for this Allocator.

m_numAllocations is how many allocations this Allocator has made. It’s for reporting and monitoring purposes only.

m_usedBytes will be for keeping track of how much of m_size is still available to allocate. Mostly used for reporting and monitoring as well as making sure we don’t try to Allocate memory we don’t currently have.

You’ll notice that we’ve explicitly deleted the copy constructor and copy assignment. The reason is that you can’t have two copies making changes to the underlying m_start and m_usedBytes and giving away allocations (because then the same memory will be given out twice). But we do allow for move semantics.

This is an abstract class because the Allocate and Free methods are left for the concrete classes to define. The remaining getters are self explanatory.

#include <cstddef>
#include <cstdint>

class Allocator
{
public:
	Allocator(const std::size_t sizeBytes, void* const start) noexcept;

	Allocator(const Allocator&) = delete;
	Allocator& operator=(Allocator&) = delete;
	Allocator(Allocator&&) noexcept;
	Allocator& operator=(Allocator&&) noexcept;

	virtual ~Allocator() noexcept;

	virtual void* Allocate(const std::size_t& size,
		const std::uintptr_t& alignment = sizeof(std::intptr_t)) = 0;

	virtual void Free(void* const ptr) = 0;

	const std::size_t& GetSize() const noexcept;
	const std::size_t& GetUsed() const noexcept;
	const std::size_t& GetNumAllocation() const noexcept;

	const void* GetStart() const noexcept;

protected:

	std::size_t m_size;
	std::size_t m_usedBytes;
	std::size_t m_numAllocations;

	void* m_start;
};Code language: C++ (cpp)
#include "allocator.hpp"
#include <cassert>

Allocator::Allocator(const std::size_t sizeBytes, void* const start) noexcept
:
    m_size(sizeBytes),
    m_usedBytes(0),
    m_numAllocations(0),
    m_start(start)
{
    assert(sizeBytes > 0);
}

Allocator::Allocator(Allocator&& other) noexcept
:
    m_size(other.m_size),
    m_usedBytes(other.m_usedBytes),
    m_numAllocations(other.m_numAllocations),
    m_start(other.m_start)
{
    other.m_start = nullptr;
    other.m_size = 0;
    other.m_numAllocations = 0;
    other.m_usedBytes = 0;
}

Allocator& Allocator::operator=(Allocator&& rhs) noexcept
{
    m_size = rhs.m_size;
    m_usedBytes = rhs.m_usedBytes;
    m_numAllocations = rhs.m_numAllocations;
    m_start = rhs.m_start;

    rhs.m_start = nullptr;
    rhs.m_size = 0;
    rhs.m_numAllocations = 0;
    rhs.m_usedBytes = 0;

    return *this;
}

Allocator::~Allocator() noexcept
{
    assert(m_numAllocations == 0 && m_usedBytes == 0);
}

const std::size_t& Allocator::GetSize() const noexcept
{
    return m_size;
}

const std::size_t& Allocator::GetUsed() const noexcept
{
    return m_usedBytes;
}

const std::size_t& Allocator::GetNumAllocation() const noexcept
{
    return m_numAllocations;
}

const void* Allocator::GetStart() const noexcept
{
    return m_start;
}Code language: C++ (cpp)

C++ Linear Allocator

Here’s our implementation of the Linear Allocator.

m_current is keeping track of where the free section of our memory is. Every time there’s an allocation, we move the m_current pointer along, ready for the next allocation.

Allocate is going to move the m_current pointer along size bytes, and return the previous m_current value. Free is going to do nothing! Remember, Linear Allocators can only go up and down from the end, so we can’t free arbitrary blocks of memory. Instead, we provide a Rewind to go back to a specific section of the memory (a specific section would be a previous return value from Allocate), and Clear to just go back the m_start.

#include "allocator.hpp"

class LinearAllocator : public Allocator
{
public:
	LinearAllocator(const std::size_t sizeBytes,
					void* const start) noexcept;

	LinearAllocator(const LinearAllocator&) = delete;
	LinearAllocator& operator=(const LinearAllocator&)
		= delete;
	LinearAllocator(LinearAllocator&&) noexcept;
	LinearAllocator& operator=(LinearAllocator&&) noexcept;

	virtual void* Allocate(const std::size_t& size,
						   const std::uintptr_t& alignment
						       = sizeof(std::intptr_t)) override;

	virtual void Free(void* const ptr) noexcept override final;

	void* GetCurrent() const noexcept;

	virtual void Rewind(void* const mark) noexcept = 0;
	virtual void Clear() noexcept = 0;

protected:
	void* m_current;
};Code language: C++ (cpp)
#include "linearAllocator.hpp"

#include <utility>

LinearAllocator::LinearAllocator(const std::size_t sizeBytes,
                                 void* const start) noexcept
:
    Allocator(sizeBytes, start),
    m_current(const_cast<void*>(start))
{

}

LinearAllocator::LinearAllocator(LinearAllocator&& other) noexcept
:
    Allocator(std::move(other)),
    m_current(other.m_current)
{
    other.m_current = nullptr;
}

LinearAllocator::~LinearAllocator() noexcept
{
    Clear();
}

LinearAllocator& LinearAllocator::operator=(LinearAllocator&& rhs) noexcept
{
    Allocator::operator=(std::move(rhs));
    m_current = rhs.m_current;
    rhs.m_current = nullptr;
    return *this;
}


void* LinearAllocator::Allocate(const std::size_t& size,
                                     const std::uintptr_t& alignment)
{
    assert(size > 0 && alignment > 0);

   std::size_t adjustment
        = align_forward_adjustment(m_current, alignment);

    if(m_usedBytes + adjustment + size > m_size)
        throw std::bad_alloc();

    void* alignedAddr = ptr_add(m_current, adjustment);

    m_current = ptr_add(alignedAddr, size);

    m_usedBytes = reinterpret_cast<std::uintptr_t>(m_current)
        - reinterpret_cast<std::uintptr_t>(m_start);

    ++m_numAllocations;


    return alignedAddr;
}

void LinearAllocator::Free([[maybe_unused]] void* const ptr) noexcept
{
    // you can't free from a linear allocator
}


void LinearAllocator::Rewind(void* const mark) noexcept
{
    assert(m_current >= mark && m_start <= mark);

    m_current = mark;

    m_usedBytes = reinterpret_cast<std::uintptr_t>(m_current)
        - reinterpret_cast<std::uintptr_t>(m_start);

}

void LinearAllocator::Clear() noexcept
{
    m_numAllocations = 0;
    m_usedBytes = 0;
    m_current = const_cast<void*>(m_start);
}

void* LinearAllocator::GetCurrent() const noexcept
{
    return m_current;
}
Code language: C++ (cpp)

I hope that’s straight forward, it’s a direct implementation of the ideas above.

The biggest thing to take note of is the alignment requirements. If the allocation requested has a specific alignment need, we just move far enough ahead to meet that requirement. Here’s a handy little implementation of align_forward_adjustment and ptr_add:

inline std::size_t align_forward_adjustment
(const void* const ptr, const std::size_t & alignment) noexcept
{
	const auto iptr = reinterpret_cast<std::uintptr_t>(ptr);
	const auto aligned = (iptr - 1u + alignment) & -alignment;
	return aligned - iptr;
}

inline void* ptr_add(const void* const p, const std::uintptr_t& amount) noexcept
{
	return reinterpret_cast<void*>
		(reinterpret_cast<std::uintptr_t>(p) + amount);
}Code language: C++ (cpp)

C++ Free-List Allocator

The Free-List Allocator is a bit of a different beast:

You’ll see the AllocationHeader struct at work a little later, but the FreeBlock struct is just what we already talked about. We always have one, the m_freeBlocks pointer, and each tells us how big it is, and where the next one is.

I won’t bother with the whole class, it’s much the same as the LinearAllocator, but instead, let’s take a look at the Allocate function first.

void* FreeListAllocator::Allocate(const std::size_t& size,
                                  const std::uintptr_t& alignment)
{
    // find the best fitting block that is at least size + header and then
    // aligned. "best fitting" is the smallest possible block unless we find
    // one that is exactly the right size

    assert(size > 0 && alignment > 0);

    FreeBlock* prevFreeBlock = nullptr;
    FreeBlock* freeBlock = m_freeBlocks;

    FreeBlock* bestFitPrev = nullptr;
    FreeBlock* bestFit = nullptr;

    std::uintptr_t bestFitAdjustment = 0;
    std::size_t bestFitTotalSize = 0;

    while(freeBlock != nullptr)
    {
        std::uintptr_t adjustment
            = align_forward_adjustment_with_header<AllocationHeader>
                (freeBlock, alignment);

        // total size needed to store the data + the header and be aligned
        std::size_t totalSize = size + adjustment;

        // is this block a better fit than the previously big enough block
        // "better fit" means it's smaller, so we favor splitting small blocks
        if(freeBlock->size > totalSize &&
           (bestFit == nullptr || freeBlock->size < bestFit->size))
        {
            bestFitPrev = prevFreeBlock;
            bestFit = freeBlock;
            bestFitAdjustment = adjustment;
            bestFitTotalSize = totalSize;

            // if this is a perfect match we can exit early
            if(freeBlock->size == totalSize)
            {
                break;
            }
        }

        prevFreeBlock = freeBlock;
        freeBlock = freeBlock->next;
    }

    if(bestFit == nullptr)
    {
        // we failed to find a suitable free block
        throw std::bad_alloc();
    }

    if(bestFit->size - bestFitTotalSize <= sizeof(AllocationHeader))
    {
        // this bestFit can't be split into two because the remainder wouldn't
        // be enough to hold another allocation (need at least the size of the
        // header + 1 byte)

        // adjust the total size of this allocation to be the entirety of the
        // free block
        bestFitTotalSize = bestFit->size;

        // remove the free block from the list
        if(bestFitPrev != nullptr)
            bestFitPrev->next = bestFit->next;
        else
            m_freeBlocks = bestFit->next;
    }
    else
    {
        assert(bestFitTotalSize > sizeof(FreeBlock)); // not really needed

        // split the best fit block into how much we need for this allocation
        // and a new block of the remainder

        // new block starts at bestFit + the size we need
        FreeBlock* newBlock
            = reinterpret_cast<FreeBlock*>(ptr_add(bestFit, bestFitTotalSize));

        // it has the remaining size
        newBlock->size = bestFit->size - bestFitTotalSize;
        // it's next is the block we're splitting's next
        newBlock->next = bestFit->next;

        // insert the new block into the free list
        if(bestFitPrev != nullptr)
            bestFitPrev->next = newBlock;
        else
            m_freeBlocks = newBlock;
    }

    // get the allocations aligned address
    std::uintptr_t alignedAddr
        = reinterpret_cast<std::uintptr_t>(bestFit) + bestFitAdjustment;

    // shift it back into the space we allowed for the header and populate
    // the header
    AllocationHeader* header = reinterpret_cast<AllocationHeader*>
                                (alignedAddr - sizeof(AllocationHeader));
    header->size = bestFitTotalSize;
    header->adjustment = bestFitAdjustment;

    m_usedBytes += bestFitTotalSize;
    ++m_numAllocations;


    return reinterpret_cast<void*>(alignedAddr);
}Code language: PHP (php)

Hopefully all the comments left inline should explain things as you read through it. The real trick is that every allocation is actually a little bit bigger than what’s requested. This little bit more is used to store an AllocationHeader, which as a POD struct has a guaranteed size. So we provide back to the requester the start of their memory, but just prior to it, we keep track size and alignment requirements of that allocation.

We do this so that when we Free the allocation later, we have these details to correctly free up the right amount of memory in the right place. This wasn’t needed for the LinearAllocator because you can’t actually Free any of the data. But additional types of allocators that you might create yourself would need some way of keeping track of all these details.

The other trick is how we take the memory we need, and then adjust the linked list to take that chunk out of the free list. You’ll see there’s times when the left over space isn’t big enough for another allocation, so we include it in the request. The caller won’t know they have a little more memory than they think, and it won’t hurt them, but it allows us to not have to worry about piddly little bits of too-small memory.

Let’s take a look at how we Free the memory later. Remember, we could be freeing any previously allocated block, in any order.

void FreeListAllocator::Free(void* const ptr) noexcept
{
    assert(ptr != nullptr);

    // retrieve the header from the space we allocated prior to ptr
    AllocationHeader* header = reinterpret_cast<AllocationHeader*>
        (ptr_sub(ptr, sizeof(AllocationHeader)));

    // retrieve the real start of the allocation by moving backwards by the
    // amount needed for alignment
    std::uintptr_t blockStart
        = reinterpret_cast<std::uintptr_t>(ptr) - header->adjustment;
    std::size_t blockSize = header->size;
    std::uintptr_t blockEnd = blockStart + blockSize;

    FreeBlock* prevFreeBlock = nullptr;
    FreeBlock* freeBlock = m_freeBlocks;

    // find the first free block that starts after or is at the boundary of this
    // block
    while(freeBlock != nullptr)
    {
        if(reinterpret_cast<std::uintptr_t>(freeBlock) >= blockEnd)
        {
            break;
        }

        prevFreeBlock = freeBlock;
        freeBlock = freeBlock->next;
    }

    if(prevFreeBlock == nullptr)
    {
        // there was no free block that starts after the one we're freeing, so
        // we can add it to the start of the free list

        prevFreeBlock = reinterpret_cast<FreeBlock*>(blockStart);
        prevFreeBlock->size = blockSize;
        prevFreeBlock->next = m_freeBlocks;

        m_freeBlocks = prevFreeBlock;
    }
    else if(reinterpret_cast<std::uintptr_t>(prevFreeBlock)
            + prevFreeBlock->size == blockStart)
    {
        // the block that starts after the one we're freeing, has a block before
        // it on the free list that ends right on the boundary starting our
        // block.

        // this means that block that bounds ours can be merged into this one to
        // coalesce into a larger block

        prevFreeBlock->size += blockSize;
    }
    else
    {
        // the block that starts after the one we're freeing, does not have a
        // block before it that we can coalesce with.

        // so let's create a new block at the position we're freeing and insert
        // it into the free list

        FreeBlock* temp = reinterpret_cast<FreeBlock*>(blockStart);
        temp->size = blockSize;
        temp->next = prevFreeBlock->next;

        prevFreeBlock->next = temp;
        prevFreeBlock = temp;
    }

    if(reinterpret_cast<std::uintptr_t>(prevFreeBlock) + prevFreeBlock->size
       == reinterpret_cast<std::uintptr_t>(prevFreeBlock->next))
    {
        // the new or merged block ends right on the next block on the free list
        // so we can merge them

        prevFreeBlock->size += prevFreeBlock->next->size;
        prevFreeBlock->next = prevFreeBlock->next->next;
    }

    --m_numAllocations;
    m_usedBytes -= blockSize;
}Code language: C++ (cpp)

The comments should explain everything again. The interesting part is in how we can go just before our freeing ptr and find our AllocationHeader to adjust the actual memory to free. The second interesting part is how we merge blocks when they touch each other, thereby allowing for large allocations in those spaces. Remember the description in the start of the article, if we didn’t merge, we’d only ever had a bunch of tiny blocks even though we have the contiguous memory available for larger allocations.


I’ll leave the custom allocators at that. Honestly, this is enough for you to make a game. But what I found, is that you really do want to use the STL containers. I know I like to build a lot of things myself when I could just use a library, but I personally draw the line at things in the STL. I don’t want to write my own unordered_map.

But you can’t use the allocators above in the STL Containers. Not yet anyway, you need an adaptor.

STL Allocator Adaptor

Let’s take a quick look at the definition of an std::vector.

template<
    class T,
    class Allocator = std::allocator<T>
> class vector;Code language: C++ (cpp)

You’ve been quite happily making vectors like std::vector<int> myIntegers; and probably never really worried about that second template argument. So what is it? Here’s the definition:

The std::allocator class template is the default Allocator used by all standard library containers if no user-specified allocator is provided. The default allocator is stateless, that is, all instances of the given allocator are interchangeable, compare equal and can deallocate memory allocated by any other instance of the same allocator type.

https://en.cppreference.com/w/cpp/memory/allocator

Okay, that’s a little something. Some things to take note of are that it’s stateless, and any instance can deallocate memory of another instances. That’s not true for our custom allocators. Once instance of FreeListAllocator cannot free memory allocated by another instance of FreeListAllocator. They’re also full of state, like pointers and free-lists etc.

So back to std::vector. What does it say about the Allocator template parameter? As of C++20

An allocator that is used to acquire/release memory and to construct/destroy the elements in that memory. The type must meet the requirements of Allocator. The program is ill-formed if Allocator::value_type is not the same as T.

https://en.cppreference.com/w/cpp/container/vector

Okay, so what are the requirements of Allocator? You can find them here https://en.cppreference.com/w/cpp/named_req/Allocator, but in all honesty there’s a let to parse there. The important things to note are:

  1. All STL Containers that use an Allocator do some via an std::allocator_traits, and this provides the default implementations of a lot of those requirements. In short, so long as the Allocator we pass to the STL Containers is compatible with std::allocator_traits, we can use it.

That’s really it! There’s a lot of stuff about stateless allocators etc, but it can be summed up as “you can use them, but to do so requires you pass the state of the allocator to the container”. Which we’ll get to at the end of this article.

What about the std::pmr stuff? These are a way to provide the STL Containers and your allocators with a way to allow you to assign a vector using one type of allocator to a vector using a different type. What’s important is that they share a base class. For our purposes this won’t work, but there are times when it might.

However, the goal of our allocators here is speed, Polymorphism is just going to introduce needless overhead that we don’t want anyway. But, it could work out with a cleaner use if you have very specific needs.

Alright, so how do we use our custom allocators with STL Containers again? That’s right, with an adaptor. I won’t worry about a UML diagram, it’s very simple. Here it is:

template<typename T, CustomAllocator Alloc>
class STLAdaptor
{
public:

    typedef T value_type; 


    STLAdaptor() = delete;

    STLAdaptor(Alloc& allocator) noexcept
    :
        m_allocator(allocator)
    {

    }

    template<typename U>
    STLAdaptor(const STLAdaptor<U, Alloc>& other) noexcept
    :
        m_allocator(other.m_allocator)
    {}

    [[nodiscard]] constexpr T* allocate(std::size_t n)
    {
        return reinterpret_cast<T*>
            (m_allocator.Allocate(n * sizeof(T), alignof(T)));
    }

    constexpr void deallocate(T* p, [[maybe_unused]] std::size_t n)
    noexcept
    {
        m_allocator.Free(p);
    }

    std::size_t MaxAllocationSize() const noexcept
    {
        return m_allocator.GetSize();
    }

    bool operator==(const STLAdaptor<T,Alloc>& rhs) const noexcept
    {
        if constexpr(std::is_base_of_v<FixedAllocator,Alloc>)
        {
            return m_allocator.GetStart() == rhs.m_allocator.GetStart();
        }
        else
        {
            DynamicAllocator::BlockDesc* a =
                reinterpret_cast<DynamicAllocator*>
                (&m_allocator)->m_currentBlock;

            while(a->prevBlock != nullptr)
            {
                a = a->prevBlock;
            }

            DynamicAllocator::BlockDesc* b =
                reinterpret_cast<DynamicAllocator*>
                (&rhs.m_allocator)->m_currentBlock;

            while(b->prevBlock != nullptr)
            {
                b = b->prevBlock;
            }

            return a == b;
        }
    }

    bool operator!=(const STLAdaptor<T,Alloc>& rhs) const noexcept
    {
        return !(*this == rhs);
    }


    Alloc& m_allocator; 
};Code language: C++ (cpp)

Alright, so what’s happening here? We’ve basically created a wrapper class around our Allocators, that matches the function calls and minimum requirements that std::allocator_traits expects to see. So, when create a vector:

std::vector<int> myVector;Code language: C++ (cpp)

It’s actually (remember the Allocator template param has a default) is:

std::vector<int, std::allocator<int>> myVector;Code language: C++ (cpp)

And inside the code for std::vector it accesses an std::allocator_traits<std::allocator<int>>. This default setup basically routes all the allocate and deallocate calls through to new and delete. In the end, our STLAdaptor will let us instead do:

const std::size_t myMemorySize = 20000; // measured in bytes
void* memory = std::malloc(myMemorySize);
FreeListAllocator myFreeListAllocator(myMemorySize, memory);
std::vector<int, STLAdaptor<FreeListAllocator>> myVector(myFreeListAllocator);Code language: C++ (cpp)

Notice that I had to wrap our Allocator in the adaptor? Notice how I had to pass an instance of our allocator to the vector? That’s all that’s needed. But it’s also a little unwieldy, so I have a few typedefs like the following:

template<typename T, typename Alloc>
using vector = std::vector<T,memory::STLAdaptor<T, Alloc>>;Code language: C++ (cpp)

and

template<typename Key, typename T, typename Alloc,
typename Hash = std::hash<Key>, typename KeyEqual = std::equal_to<Key>>
using unordered_map = 
    std::unordered_map<Key, T, Hash, KeyEqual,
    memory::STLAdaptor<std::pair<const Key, T>, Alloc>>;

namespace std
{
    template<typename Alloc>
    struct hash<string<Alloc>>
    :
        public __hash_base<size_t, string<Alloc>>
    {
        size_t operator()(const string<Alloc>& __s) const noexcept
        {
            return std::_Hash_impl::hash(__s.data(), __s.length());
        }
    };
}Code language: C++ (cpp)

The latter is for the normal hash function in an unordered_map or unordered_set to work correctly.

The Bad News

You thought this would be all roses and sunshine didn’t you? Sorry to say, there are some minor drawbacks.

  1. You have to pass an instance of your Allocator to the container. That’s not really a big deal, but it’s different to what you’re used to. However, you are forced to really think about what sort of allocator you need at any time.
  2. You cannot mix allocators. That means you cannot mix containers that have used different allocators. So, I can’t do std::vector<int> normalVector = vectorWithCustomAllocator. But, if you’re using custom allocators, you’d use them everywhere.

That’s it for bad news.

The Good News

You now have complete control and knowledge of the memory footprint of your game. You can also chain allocators, for example:

int main()
{
    const std::size_t totalMem = 200000;
    void* memStart = std::malloc(totalMem);

    LinearAllocator linearAlloc(totalMem, memStart);

    FreeListAllocator soundAlloc(totalMem/2, linearAlloc.Allocate(totalMem/2));
    LinearAllocator renderAlloc(totalMem/2, linearAlloc.Allocate(totalMem/2));

    return 0;
}Code language: C++ (cpp)

The STL Adaptor also works for all containers, including strings. Basically, you should never need to invoke the expensive new and delete ever again. If you’ve read my article on making your own Entity Component System, you can also change that to use these same custom allocators (that’s what I did).

You’ll also see those same speed increases as above. It might not be much, but you might eke out a few more frames per second, even on a simple PC game.


That’s it! Thanks for reading. If you see any errors, or anything in unclear, feel free to leave a comment or discuss in the forums.

4 thoughts on “Custom C++20 Memory Allocators for STL Containers”

  1. I’ve been wrestling with this for awhile now. I got VERY close to getting it to work, I came up with something close to your STLAdapter. Unfortunately, my c++ seems to be lacking. Can you explain what’s happening here?
    `typedef STLAdaptor other;`
    This “magical” Enable keeps popping up in the third location for a template that has 2 parameters? What is this?

  2. In your STLAdaptor class in method Allocate you use reinterpret_cast(void*). Is it legitimate or UB? Because I think that only after object construction(using placement new and one of constructors of type T) in memory which was returned by m_allocator.Allocate(n * sizeof(T), alignof(T)) we can treat that pointer as pointer to type T. There is only one exception when we can reinterpret_cast(void*) without constructing T at first – when T is char or unsigned char.

    1. It’s only undefined behaviour if you try to access or dereference the pointer. This is considered uninitialized memory. The STL container will handle the initialization and construction of the elements within this storage. Consider there are no construction parameters passed to the allocator from the STL containers requesting memory, and they can also request enough memory for n instances of type T.

      This is exactly the same way that the default std::allocator works. It has to be, as any custom allocators you create have to satisfy the requirements of allocator_traits.

Leave a Comment