Automatic Serialization in C++ for Game Engines

Automatic Serialization in C++ for Game Engines

You’ve made your game, probably made an engine, and now you’re ready to put on the finishing touches. Players are definitely going to want to save their game. So, that means storing all the current game state to a file…. Wow, that’s a lot of work, right? Now every enemy, character, item, fallen tree, exploded building, location etc all needs to be put into a file. And then you need to read it back again. What a nightmare.

In this article, I’m going to show you how I setup my very own serialisation library. Sure, there’s existing libraries you can use, but like always, I don’t even look at them, I’m more interested in doing it myself.

Caveats

Of course, there’s a few things we’re going to have to align on, right off the bat, for this to work. The following list is a few concessions we need to make in order to allow for easy serialisation. I’ll justify each as well.

References and Pointers can’t be serialised

Oh, that’s a big one. Let’s get it right out the way. You can serialise references and pointers, I’ve done it before, but it’s a nightmare. For example, let’s say we have some objects like so (don’t worry about the poor design choices for now):

class Player
{
public:
    int m_health;
};

class Enemy
{
public:
    Player* m_targetPlayer;
    int m_health;
};

int main()
{
    Player[2] players;
    Enemy[5] enemies;

    enemies[0].m_targetPlayer = &players[0];
    
    ...
    return 0;
}Code language: C++ (cpp)

Now, if you hot-save in the middle of the game, how do we store the fact that Enemy 1 is currently targeting Player 1? Our game data is using a pointer to the targeted player. We can’t just write that pointer to the file and reload it later, because the memory locations when we reload the game are all different, so that won’t work. We can’t write a copy of Player 1 to the file, because we don’t want to reload and create a second instance of player 1.

That’s the difficulty. But you can work around it the hard way.

When serialising an object, if you come across a pointer you could:

  1. Add it to a list of “pointers yet to be serialised”
  2. Make a reference in the serialisation of the current object that it needs the new address for this pointed to object

After you’ve finished serialising, you should have all objects serialised, and anything that was referencing an pointer/reference should have serialised the original address. When you serialise objects, include the current address of it. In the end, you might have a few “dangling” objects that only ever existed on the heap, in which case you can serialize them last.

When deserializing, anything that says “oh my pointer referenced this original memory address”, you can find the already deserialized object with that original memory address; or delay for when you do have it but keeping long pointers to yet-to-be-given-the-correct-address pointers. When something is deserialized, you can go to all those long-pointers and point them to this new memory address.

Essentially it’s a two-pass method. Pass one, read objects you can and leave reminders for pointers. Pass two, patch the pointers.

Alternative

All in all, it’s messy, hard to debug, and error prone to utilize pointers and serialize. In respect to engine speed and extensibility, you really shouldn’t be throwing pointers around anyway. It can be done but it’s horrible. And if you’re using an Entity Component System, you very likely have no pointers to worry about.

An extra note, if you have polymorphic pointers, you need to also keep track of the true class when serializing (override virtual functions). Deserialization will require a map in your code and a special factory function to create the correct sub-class. More and more pain.

Aside from ECS, you can also do away with pointers by using some other method of indirection. Identifiers that map to a unique instance of an object is often good, with a globally accessible map of identifiers to actual pointers. That can be deserialized separately to everything else, and if all your objects use identifiers instead of actual pointers, everything will go together again. Of course, speed issues exist here when looking up the pointer, but you should be reducing pointer use anyway.

In a future article, I will cover the cases of serialisation of pointers.

Objects need to be default constructible and copyable

This is a bit of a burden, but the way I have things laid out here, anything that’s serialised needs to be default constructible (or otherwise you can construct a “dummy” one), and copyable.

For me and my purposes, this isn’t a big problem. The primary reason is my reliance on an Entity Component System, which makes zeroing out my structs super easy and non-problematic.

But maybe things aren’t as simple for you. You can work around these limitations by adjusting the code that does the serialisation (you’ll see it later). It shouldn’t be too burdensome to do something with a “Loader” or “Loading Function” that investigates the file for the next “class” to be constructed and handle things from there; potentially even passing the stream as a constructor argument for the class itself.

You have to add some special code to your classes

Caveat 2 isn’t as bad. If you want a class to be “serializable”, you’re going to have to add some special code to it. I’ve seen this done a few different ways, for example, inheriting from a “Serializable” class. The way we’re going to do it is hopefully a little less annoying.

The reason you must always add some special code is that C++, as of C++20 anyway, offers no real reflection capability. What reflection allows you to do is give the program code dynamic knowledge of what is in the code. Like how in JavaScript everything is a string, so you can parse a function and edit it at runtime so the function does something else.

We haven’t got that. Without it, any serialisation code we write must be told, by you, what the object looks like (what members it has to serialise). I’ve seen a few different ways that people add this:

  1. Code/Scripts that reads the source code headers of your project, and generates serialisation automatically. This is really cool, but personally I find it sort too… recursive… strange… idk. It sits wrong with me.
  2. Inheriting from a “Serializable” class that requires you to specialise a “Serialise” and “Deserialize” method. Not great, you’re basically just writing all the code yourself. It’s a naïve approach, but works very well for small projects.

We’re going to use MACROS. Actually, let’s get started.

Why use Preprocessor Macros for C++ Serialization?

The aim of this article is to use the pre-processor to write our serialisation code for us. Yes, that’s right, we’ll write a program to write our program.

First of all; here’s the naïve approach mentioned above (sans inheritance from a base Serializable class, we don’t really need to do that):

class ObjectA
{
public:
    std::string m_str;
    int m_int;
    float m_float;

    void Serialise(std::ostream& stream) const
    {
        stream << m_str << " " << m_int << " " << m_float << " ";
    }

    void Deserialise(std::istream& stream)
    {
        stream >> m_str >> m_int >> m_float;
    }
};Code language: C++ (cpp)

We create an std::ofstream and pass it to the Serialise method and we’ll have an output. That same output back through an std::ifstream and passed to Deserialise will recreate the object.

Like I said, for simple use-cases, this works well. We might even change it up a bit to be fancier:

class ObjectA
{
public:
    std::string m_str;
    int m_int;
    float m_float;

    friend std::ofstream& operator<<(std::ostream& stream, const ObjectA& obj)
    {
        stream << obj.m_str << " " << obj.m_int << " " << obj.m_float << " ";
        return stream;
    }

    friend std::istream& operator>>(std::istream& stream, ObjectA& obj)
    {
        stream >> obj.m_str >> obj.m_int >> obj.m_float;
        return stream;
    }
};Code language: C++ (cpp)

And in this way we can invoke it directly on a stream, outputFile << objectAInstance; and it works well.

The big issue is that we have to do this for every kind of object. When you have a tonne of different things that need to be serialised… well that’s a big ask. Our preprocessor macros are going to write the above code for us, that’s why we use it.

Be wary of differences in compilers with preprocessor macros. They’re supposed to be somewhat standard but they aren’t always. In particular, I know that various version of MSVC++ compiler can be troublesome.

Building Blocks

First, let’s create a couple of objects we want to be able to serialise. Thing1 and Thing2.

class Thing1
{
public:

    Thing1()
    :
        m_int(0),
        m_float(0),
        m_string("something"),
        m_vecs(4)
    {

    }

private:
    int m_int;
    float m_float;
    std::string m_string;
    Something2 m_something;
    std::vector<Something2> m_vecs;
};Code language: C++ (cpp)
class Thing2
{
public:
    Thing2()
    :
        m_int(0),
        m_int2(0),
        m_vec({3.2f,.4f,1224.3f})
    {

    }

    int m_int;
    int m_int2;

    std::vector<float> m_vec;
};Code language: C++ (cpp)

These are a little silly, but they’ll illustrate the point well. We’re going to want our serialization to serialize m_int, m_float, m_string, and m_thing2 from Thing1; and we want it to serialize m_int and m_int2 from Thing2.

When our macros insert code, we need to provide it with the code to ultimately insert. If we want to be able to serialise different types, we need to provide functions to do so. Let’s start with:

template<typename StreamType, typename T>
requires std::derived_from<StreamType,std::ostream>
void serialise(StreamType& s, const T& t)
{
    s << t << " ";
}Code language: C++ (cpp)

Easy and simple enough I think. Serialisation requires some kind of output stream, and simply puts it out.

Note that I’m adding a space. That’s because I’m going to serialise to text. You might want to serialise to binary. Heck, you might want to serialise to JSON or YAML or XML. So think about that when you adopt this code for your own game.

I’ve chosen simple text output because it’s easy and I can manipulate values before reloading. However, JSON or some other structured text format would be easier to make sense of. Binary would probably save space.

To reverse the above serialisation we would:

template<typename StreamType, typename T>
requires std::derived_from<StreamType,std::istream>
void deserialise(StreamType& s, T& t)
{
    s >> t;
}Code language: C++ (cpp)

No issues, very easy.

Oh but wait. We need to be able to serialize std::vector and std::string as well, plus reload them. We’ll need to add some specialised versions of these functions:

template<typename StreamType, typename T, typename Alloc>
requires
    std::derived_from<StreamType,std::ostream> &&
    std::same_as<std::basic_string<char, std::char_traits<char>, Alloc>,T>
void serialise(StreamType& s, const T& t)
{
    s << t.size() << " " << t;
}

template<typename StreamType, typename T, typename Alloc>
requires std::derived_from<StreamType,std::istream> &&
std::same_as<std::basic_string<char, std::char_traits<char>, Alloc>,T>
void deserialise(StreamType& s, T& t)
{
    t = "";

    typename T::size_type len;
    s >> len;

    for(std::size_t i = 0; i < len; ++i)
    {
        typename T::value_type c;
        s >> c;
        t += c;
    }
}



template<typename StreamType, typename T, typename Alloc>
requires
    std::derived_from<StreamType,std::ostream>
void serialise(StreamType& s, const std::vector<T, Alloc>& t)
{
    s << t.size() << " ";
    for(const T& tt : t)
    {
        s << tt << " ";
    }
}

template<typename StreamType, typename T, typename Alloc>
requires
    std::derived_from<StreamType,std::istream>
void deserialise(StreamType& s, std::vector<T, Alloc>& t)
{
    using VecType = std::vector<T, Alloc>;
    t.clear();

    typename VecType::size_type len;
    s >> len;
    for(std::size_t i = 0; i < len; ++i)
    {
        typename VecType::value_type c;
        s >> c;
        t.push_back(c);
    }
}Code language: C++ (cpp)

To serialize a string, you need to first output how many characters it has. That way, when you deserialize the same string, you know how many characters to read back in. An std::vector is much the same, serialize the number of elements so you know how many to read back in.

You might be wondering why I’ve got those Alloc arguments in there. Well, if you’re using custom allocators with the STL containers, you’ll need them like the above to be compatible. And if you’re not using custom allocators, like how our Thing1 and Thing2 classes aren’t, then the calls still work.

I won’t write serialization functions for all the STL Containers, but that’s how it works. You can easily write your own functions for the other containers you’re using.

Looping Preprocessor Macro

It’ll be clearer why we need it later, but we need some preprocessor code that will allow us to loop through variadic preprocessor arguments and output some constant arguments.

I.e., I could write MACRO_COMMAND(1, 2, 3, 4) or MACRO_COMMAND(1, 2, 3, 4, 5, 6) or MACRO_COMMAND(1). I need to be able to send a variable number. The following achieves this. I’ll explain it now, but it probably won’t click in your head for a little bit.

#define CONCATENATE(arg1, arg2)   CONCATENATE1(arg1, arg2)
#define CONCATENATE1(arg1, arg2)  CONCATENATE2(arg1, arg2)
#define CONCATENATE2(arg1, arg2)  arg1##arg2

#define FOR_EACH_1(what, o, i, x)         \
    what(o, i, x)

#define FOR_EACH_2(what, o, i, x, ...)    \
    what(o, i, x);                        \
    FOR_EACH_1(what, o, i, __VA_ARGS__)

#define FOR_EACH_3(what, o, i, x, ...)    \
    what(o, i, x);                        \
    FOR_EACH_2(what, o, i, __VA_ARGS__)

#define FOR_EACH_4(what, o, i, x, ...)    \
    what(o, i, x);                        \
    FOR_EACH_3(what, o, i,  __VA_ARGS__)

#define FOR_EACH_5(what, o, i, x, ...)    \
    what(o, i, x);                        \
    FOR_EACH_4(what, o, i,  __VA_ARGS__)

#define FOR_EACH_6(what, x, ...)          \
  what(o, i, x);                          \
  FOR_EACH_5(what, o, i, __VA_ARGS__)

#define FOR_EACH_7(what, o, i, x, ...)    \
    what(o, i, x);                        \
    FOR_EACH_6(what, o, i,  __VA_ARGS__)

#define FOR_EACH_8(what, o, i, x, ...)    \
    what(o, i, x);                        \
    FOR_EACH_7(what, o, i,  __VA_ARGS__)

#define FOR_EACH_NARG(...) FOR_EACH_NARG_(__VA_ARGS__, FOR_EACH_RSEQ_N())
#define FOR_EACH_NARG_(...) FOR_EACH_ARG_N(__VA_ARGS__)
#define FOR_EACH_ARG_N(_1, _2, _3, _4, _5, _6, _7, _8, N, ...) N
#define FOR_EACH_RSEQ_N() 8, 7, 6, 5, 4, 3, 2, 1, 0

#define FOR_EACH_(N, what, ...) CONCATENATE(FOR_EACH_, N)(what, __VA_ARGS__)
#define FOR_EACH(what, o, i, ...) \
FOR_EACH_(FOR_EACH_NARG(__VA_ARGS__), what, o, i, __VA_ARGS__)Code language: C++ (cpp)

What this let’s me do is make a call like:

FOR_EACH(MACRO_COMMAND, repeated_arg_o, repeated_arg_i, for_each_arg1, for_each_arg2, for_each_arg3)Code language: C++ (cpp)

and the preprocessor would replace it with:

MACRO_COMMAND(repeated_arg_o, repeated_arg_i, for_each_arg1)
MACRO_COMMAND(repeated_arg_o, repeated_arg_i, for_each_arg2)
MACRO_COMMAND(repeated_arg_o, repeated_arg_i, for_each_arg3)Code language: C++ (cpp)

Of course, then it would replace MACRO_COMMAND with the actual thing, but you can see how it will loop through a variable number of arguments, keeping the first two consistent.

Operator >> and operator << Macro Replacements

Now you can see where this is going. We need to add a friend std::ostream& operator<<() and friend std::istream& operator>>() method to all classes we wish to serialise. That way we can write:

Thing1 thing1;
std::ofstream file;
...

file << thing1;
Code language: C++ (cpp)

and it will all work. The magic in this is we want to take our class definitions from above, and add a single line of code:

class Thing2
{
public:
    Thing2()
    :
        m_int(0),
        m_int2(0),
        m_vec({1.3f,1.4f,2.5f})
    {

    }

    int m_int;
    int m_int2;
    std::vector<float> m_vec;

    SERIALISE(Something2, m_int, m_int2, m_vec)
};

class Thing1
{
public:

    Thing1()
    :
        m_int(0),
        m_float(0),
        m_string("something"),
        m_vecs(4)
    {

    }

    SERIALISE(Something, m_int, m_float, m_string, m_something, m_vecs)

private:
    int m_int;
    float m_float;
    std::string m_string;
    Something2 m_something;
    std::vector<Something2> m_vecs;
};
Code language: C++ (cpp)

So we need some MACROs that will take those two lines, and replace it with serialisation friend functions.

#define MEMBER_SERIALISE(outputStream, instance, memberName) \
        serialise(outputStream, instance.memberName);

#define MEMBER_DESERIALISE(inputStream, instance, memberName) \
        deserialise(inputStream, instance.memberName);


#define SERIALISE(className, ...)                                      \
friend std::ostream& operator<<(std::ostream& outputStream,            \
                                const className& instance)             \
{                                                                      \
    FOR_EACH(MEMBER_SERIALISE, outputStream, instance, __VA_ARGS__)    \
    return outputStream;                                               \
}                                                                      \
                                                                       \
friend std::istream& operator>>(std::istream& inputStream,             \
                                className& instance)                   \
{                                                                      \
    FOR_EACH(MEMBER_DESERIALISE, inputStream, instance, __VA_ARGS__)   \
    return inputStream;                                                \
}Code language: C++ (cpp)

And that’s it! When you call SERIALISE you provide the class name, followed by a list of member variables that should be included in the serialisation. Because C++ has no reflection capability, you have to do this manual “this is what I want serialised”, but it’s not a big ask (just a single line of code).

If you take the above to classes and execute g++ -E myfile.cpp it will output the code after all the preprocessor commands have taken place. The relevant code, formatted nicer for humans (new lines characters and tabs), gives us the following:

class Something2
{
public:
    Something2()
    :
        m_int(0),
        m_int2(0),
        m_vec({1.3f,1.4f,2.5f})
    {

    }

    int m_int;
    int m_int2;
    std::vector<float> m_vec;

    friend std::ostream& operator<<(std::ostream& outputStream, const Something2& instance) 
    { 
        serialise(outputStream, instance.m_int);; 
        serialise(outputStream, instance.m_int2);; 
        serialise(outputStream, instance.m_vec); 
        return outputStream; 
    } 

    friend std::istream& operator>>(std::istream& inputStream, Something2& instance) 
    { 
        deserialise(inputStream, instance.m_int);; 
        deserialise(inputStream, instance.m_int2);; 
        deserialise(inputStream, instance.m_vec); 
        return inputStream; 
    }
};

class Something
{
public:

    Something()
    :
        m_int(0),
        m_float(0),
        m_string("something"),
        m_vecs(4)
    {

    }

    friend std::ostream& operator<<(std::ostream& outputStream, const Something& instance) 
    { 
        serialise(outputStream, instance.m_int);; 
        serialise(outputStream, instance.m_float);; 
        serialise(outputStream, instance.m_string);; 
        serialise(outputStream, instance.m_something);; 
        serialise(outputStream, instance.m_vecs); 
        return outputStream; 
    } 

    friend std::istream& operator>>(std::istream& inputStream, Something& instance) 
    { 
        deserialise(inputStream, instance.m_int);; 
        deserialise(inputStream, instance.m_float);; 
        deserialise(inputStream, instance.m_string);; 
        deserialise(inputStream, instance.m_something);; 
        deserialise(inputStream, instance.m_vecs); 
        return inputStream; 
    }

private:
    int m_int;
    float m_float;
    std::string m_string;
    Something2 m_something;
    std::vector<Something2> m_vecs;
};
Code language: C++ (cpp)

Next Steps

Now all you need to do is write serialisation specializations for the other STL containers you intend to use. Then, all your classes just need to have a serialisation macro added to them. You might have some cases when you’re playing with raw pointers, etc, in which case, you can write your own operator<< and operator>>, it will still be compatible with these ones, after all, they’re literally just resulting in those operators.

Also, have a think about doing this with something like JSON. It’s actually not too hard if you’re familiar with the format. For example, you already have the member names in the macro, those would be used for the keys.


I hope you learned something useful in this. Any questions or comments, leave them here or hop into the forums to discuss.

Leave a Comment