Reflection in C++ – part 2

In a previous note I wrote a little bit about my experimental reflection system for C++. Today, I’d like to describe a simple load-in-place mechanism I built on top of that.

Basic idea behind LIP is to minimize the overhead of any processing after data is loaded from a file. Ideally, all that needs to be done is call to ‘fread’ to a memory buffer. For more detailed description of such system see Fast File Loading article by Ent/Incognita.

Let’s consider possible scenarios when trying to load object from file without any futher processing:

  • raw ‘C’ object with no pointers, no virtual functions, consisting only of pure value members. No problems here, we can just load it from file and it’s ready to use.

  • object containing pointers. Things get more complicated, as obviously pointer values will point to random places in memory after loading. One popular method is to ‘fix-up’ pointers after loading. When saving object we also save blocks of memory associated with pointers and store additional info that link pointers to corresponding blocks (for example, an index in the fix-up table).

  • object containing virtual functions. We need to initialize vtable somehow. One way to achieve this is to call placement new on object’s memory. However, we have to remember that constructor will automatically call constructors for all contained objects, effectively overwriting member variables that we’ve just loaded. I’ll write a little bit more on that later.

Rough sketch of my approach:

  • iterate over all fields of pointer/class reflection type. For every pointer - prepare fixup entry, save related object/block of memory (for objects, we basically do the same thing for their fields again). For every class - iterate over fields again. I do have some special cases here, like vector class (I need to save whole memory block (vector contents) associated with vector).

  • write object header + pointer fixups. Object header contains type ID, total size of saved data and number of pointer fixups. Every pointer fixup entry consists of the following informations: offset of a pointer to patch (from the start of main object memory), offset of final pointer value (to get final pointer value we only need to add this offset to start of main object memory), type tag for objects that require vtable reinitialization.

  • write raw memory associated will all collected objects.

One tricky thing here is - how do we know how to initialize vtable? Up to this point whole system was rather portable. Sure, PDBs aren’t present on PS3, but the layout of objects shouldn’t change (objects that need reflection, that is), so for multiplatform projects you still can generate reflection info from PDB and use it with other compilers. Now, however, we need to obtain function address, which obviously will vary from platform to platform. I added two special functions that Reflector will try to search for. They are:

explicit SuperBar(EInitVTable): v(rde::noinitialize) {}
static void* Reflection_CreateInstance()
{
    return new SuperBar();
}
static void* Reflection_InitVTable(void* mem)
{
    return new (mem) SuperBar(INIT_VTABLE);
}
When those functions are found (they’re identified by name and must be static), Reflector will save their addresses. Later, we’ll add module base address to those and now it’s possible to call correct placement new for type type determined at runtime. One tricky thing here is, as you can see - I call a special constructor for v (v is rde::vector here). Reason for this has been mentioned before - we don’t want constructors to overwrite values that we’ve just loaded (in case of vector, it’d set capacity to 0). I considered patching int/float values as well (just like pointers), but it felt a little bit too brute-force. I had an access to source code anyway (I use RDESTL), so I decided it’d be more elegant to make it more LIP friendly instead. In this particular case, it doesn’t matter so much, because you shouldn’t be doing any operations on loaded vector anyway (it doesn’t really own the memory, so shouldn’t try to free/reallocate it).

Here’s loop patching all the pointers. Vtable initialization makes it a little bit more complicated than it should be. If you only save ‘C’-like objects, this could be dropped (I actually think it’s a good idea for most data/resource classes, it’d also help with the problem mentioned above).

for (PointerFixups::const_iterator it = pointerFixups.begin();
    it != pointerFixups.end(); ++it)
{
    rde::uint8_t* pptr = objectMem8 + it->m_pointerOffset;
    void* patchedMem = objectMem8 + it->m_pointerValueOffset;
    *reinterpret_cast<void**>(pptr) = patchedMem;

    if (it->m_typeTag != 0)
    {
        const rde::TypeClass* fieldType =
             static_cast
                (typeRegistry.FindType(it->m_typeTag));
        // We already initialized vtable for 'main' object.
        if (patchedMem != objectMem)
            fieldType->InitVTable(patchedMem);
    }
}

Generated assembly is as simple as it gets (whole 4 instructions for patching single pointer):

; 543 : rde::uint8_t* pptr = objectMem8 + it->m_pointerOffset;
; 544 : void* patchedMem = objectMem8 + it->m_pointerValueOffset;
mov esi, DWORD PTR [edi-4]
; 545 : *reinterpret_cast<void**>(pptr) = patchedMem;
mov eax, DWORD PTR [edi-8]
add esi, ebx
mov DWORD PTR [ebx+eax], esi

Offsets could probably be packed into 16-bits if necessary.

One last thing that I wanted was a support for special field flags. It’s mostly for the editor, sometimes you may want to mark field as ‘hidden’ or ‘read-only’ (so that it’s not possible to edit it) or have other annotations (like text description, more informative than the name). The best way I could think of was adding those annotations in a comment (a little bit like in C#, only there are special keywords there), then extracting them. Problem is, I couldn’t find a way to determine in what source file type has been defined. In the end, I scan all source files from .PDB, which can be very slow with bigger projects. One possible way to speed it up a little, would be to determine where type constructor is defined, then scan only files related to this compilation unit. Example of ‘hidden’ field:

// [Hidden]
float* p; // This field will set FieldFlags::HIDDEN flag

The whole ‘flag’ system is a little bit experimental at this stage (feels messy), so it’s disabled by default. In order to enable it, Reflector needs to be run with ‘-flag source_path’ argument (source_path is to narrow the choice of source files a little bit and not scan compiler includes for example. In my example, it’d be “reflection” as that’s main tree).

Some random notes:

  • load-in-place system is far from complete. Right now vector is the only supported container, it can only save vectors of POD types, etc. It’s rather proof of concept than complete solution. Still, I’m using it in my home projects already and it works good enough for my purposes. It should be quite easy to create more flexible serialization system that doesn’t rely on pointer patching, but iterates over all fields and save them in a more “traditional” way.

  • at some point I’d like to make it possible for the whole reflection system to switch to purely hash-based version. One problem is that I couldn’t really think about good way to detect special cases for LIP then (right now, if name starts with ‘rde::vector<’ it’s assumed to be vector, can’t be done with hashes). (admittedly, I didn’t try very hard)

  • as I wrote before - system should be semi-portable. If LIP/construction by name is not needed, then reflection info generated from PDB should be valid on other platforms as well (provided they don’t differ too much from Win32/X360). Obviously, it won’t work with platform specific types (rendering structures etc), but normally you don’t really expose/reflect those.

Source code to both Reflector & my test app can be downloaded here.

Old comments

Arseny Kapoulkine 2009-11-30 06:30:07

My experience with serialization is a bit different.
a. Layouts do differ; even without platform-specific types and multiple inheritance, this
struct barbase { int a; };
struct bar: barbase { virtual void foo(); double b; };
has size=24 in MSVC (Win/X360) and size=16 in GCC (Win/PS3).
b. We frequently use pointers to arrays, so vector-specific solution is not sufficient. Since our meta-info is in code form, there is a special pointerToArray construct; we were planning to switch to PDB/ELF (via libbfd) parsing for reflection for quite some time now, we’ll require some sort of annotation support for sure.
c. Saving platform-specific types is a necessary thing (although we expose them in Windows export code to be able to serialize them, of course).
d. LIP and SIP are more or less orthogonal; our LIP patches pointers, but SIP is slightly more complex than simple iteration and pointer following
e. Something we also use is so-called post-constructors - a bits of code attached to specific classes that execute after the whole graph that contains those classes gets loaded; you can rely on pointer values inside objects you’re pointing to there.

admin 2009-11-30 09:16:44

Arseny - good points, thanks for sharing. About b) - surely you need to store number of elements as well? In such case there’s no reason not to encapsulate it in some kind of object (ptr+size pair) and have custom save handler (like in vector example here).

C++ reflection: extract type inheritance information with code postprocessing | The LIBPF blog 2010-09-17 18:18:11

[…] Reflex, their C++ library for reflection information about C++ types; another (nonportable) sample here, based on parsing the debug information generated by the […]