The Smart Struct
Chad Austin (aegis at aegisknight dot org)
2003.07.13


This document describes a way to use the C++ preprocessor to convert a
single structural description of a block of memory into serialization
and deserialization methods.


When reading headers from binary files or sending packets over the
network, software developers often wrote C code like such:

    struct Header {
        char  magic[4];
        int   version;
        int   manufacturer;
        char  code;
        char  revision;
        short flags;
        float x;
        float y;
    };

    struct Header header;

    /* ...... */

    write(file, &header, sizeof(header));

    /* ...... */

    read(file, &header, sizeof(header));

However, the C and C++ standards do not guarantee the binary layout of
structs: padding may be inserted between fields to place them on
aligned memory addresses.  Directly accessing unaligned addresses can
be slow and, on some platforms, impossible.  On x86 architectures,
aligned reads are allowed, but they are inefficient.  MIPS is an
architecture where unaligned reads crash your executable.  Consider
the following struct.  (Note: When I say struct, I mean any POD (plain
old data) aggregate type, as defined by the C++ standard.  This
includes structs and classes, as long as they don't have any non-POD
elements or virtual functions.)

    struct S {
       char c;
       int i;
    };

In order to align the address of S::i to a 4-byte alignment, compilers
will insert three bytes of padding between 'c' and 'i'.  So, on most
platforms, sizeof(S) == 8.  There are various compiler options,
#pragmas, and compiler-specific attributes that let you manually
specify how to pack structs, so even on a given platform and compiler,
you should not make assumptions about the in-memory layout of the
struct.

Moreover, two platforms with different endianness will write the same
struct in a different way.  The 16-bit integer 0xABCD will be written
as two bytes: (0xCD, 0xAB) on little-endian architectures and (0xAB,
0xCD) on big-endian ones.

It's possible to manually address some of these issues by writing the
struct elements individually and inserting endian-conversion functions
in just the right places, but manually doing anything is error-prone,
especially when you want to change the underlying struct after it has
been in use for a while.  Also, the format of a header or packet is
much clearer when it is represented by a small block of descriptive
code instead of imperative read and write commands.

Ultimately, you want one place in the code where a byte-level layout
of the struct storage is represented.  You can accomplish this with
clever use of the C++ preprocessor.  I will briefly describe the
mental path I took in developing this mechanism.

    #define BODY(_)                     \
        _(array)   (char,   magic, 4)   \
        _(field_le)(int,    i)          \
        _(field)   (char,   c)          \
        _(pad)     (3)                  \
        _(field_be)(double, d)          \
        _(field_be)(float,  f)

    DEFINE_SMART_STRUCT(SmartStruct, BODY)
    #undef BODY

DEFINE_SMART_STRUCT is a macro that takes two arguments: the name of
the smart struct and another macro describing its contents.  The smart
struct uses the description of its contents embodied in BODY (which
can be named something else) in four (or more) places: struct
definition, size calculation, bytestream serialization, and
deserialization.  Each uses the contents in a different way, so the
struct fields cannot be defined using standard C declarations.  They
must be defined with replaceable macros.

Originally, I had designed the smart struct to pass all structure
definition macros directly into the body macro.  As I added more
functionality, the number of parameters grew and grew.  This
implementation didn't scale at all with new functionality, and I
hadn't yet considered backwards compatibility!

I realized I could pass a single "macro object" (above, named as a
single underscore) into BODY and use that object to gain access to
other macros by name.  At the expense of some readability, backwards
compatibility is achieved.

Through some magic under the hood -- the full implementation is
provided below -- DEFINE_SMART_STRUCT takes a single description of a
struct and provides a smart struct implementation capable of
calculating its packed size, writing itself, or reading itself.

A note on performance: The macro-generated code for the struct is just
as efficient as if you would write your own methods to do element-wise
reads and writes.  Depending on the application, you should be able to
improve the macro-generated code to be faster for your purposes.  The
size() static method optimizes out (in Visual C++, at least) to a
single integer constant, which is just as fast as calculating it
yourself.


Full (untested) implementation:


/******************************************************************************
*******************************************************************************
******************************************************************************/

/* This code is not intended for direct use, but for explanatory
   purposes.  Once you understand the basic concepts, integrating its
   ideas into your own code should be straightforward.  Making the
   code below faster is an exercise for the reader.  (Here's a hint,
   though.  Reduce the number of I/O calls per write() and read() to
   O(1) instead of O(n) without using any allocations.) */

#include <algorithm>
#include <stdio.h>
using std::swap;


template<typename T>
void swapEndian(T& t) {
    unsigned char* c = reinterpret_cast<unsigned char*>(&t);
    for (size_t s = 0; s < sizeof(t) / 2; ++s) {
        swap(c[s], c[sizeof(t) - s - 1]);
    }
}


/* As mentioned above, I chose to use FILE* here for explanation only.
   Using iostreams or your own I/O routines would be just as
   appropriate.  (The reason I used FILE* in the first place is that
   stdio.h stuff looks a LOT nicer than iostreams in the generated
   assembly.) */


template<typename T>
bool write(FILE* file, const T& t) {
    return fwrite(&t, 1, sizeof(t), file) == sizeof(t);
}

template<typename T>
bool write_le(FILE* file, const T& t) {
#ifdef BIG_ENDIAN
    T s = t;
    swapEndian(s);
    return write(file, s);
#else
    return write(file, t);
#endif
}

template<typename T>
bool write_be(FILE* file, const T& t) {
#ifdef BIG_ENDIAN
    return write(file, t);
#else
    T s = t;
    swapEndian(s);
    return write(file, s);
#endif
}

template<int size>
bool write_zero(FILE* file) {
    static char zero[size]; // uninitialized static = zero
    return fwrite(zero, 1, size, file) == size;
}


template<typename T>
bool read(FILE* file, T& t) {
    return fread(&t, 1, sizeof(t), file) == sizeof(t);
}

template<typename T>
bool read_le(FILE* file, T& t) {
#ifdef BIG_ENDIAN
    bool result = read(file, t);
    swapEndian(t);
    return result;
#else
    return read(file, t);
#endif
}

template<typename T>
bool read_be(FILE* file, T& t) {
#ifdef BIG_ENDIAN
    return read(file, t);
#else
    bool result = read(file, t);
    swapEndian(t);
    return result;
#endif
}

template<int size>
bool skip(FILE* file) {
    static char dummy[size];
    return fread(dummy, 1, size, file) == size;
}


#define DEFINE_field(type, name)          type name;
#define DEFINE_field_le(type, name)       DEFINE_field(type, name)
#define DEFINE_field_be(type, name)       DEFINE_field(type, name)
#define DEFINE_array(type, name, size)    type name[size];
#define DEFINE_array_le(type, name, size) DEFINE_array(type, name, size)
#define DEFINE_array_be(type, name, size) DEFINE_array(type, name, size)
#define DEFINE_pad(size)

#define SIZE_field(type, name)          total += sizeof(type);
#define SIZE_field_le(type, name)       SIZE_field(type, name)
#define SIZE_field_be(type, name)       SIZE_field(type, name)
#define SIZE_array(type, name, size)    total += sizeof(type) * size;
#define SIZE_array_le(type, name, size) SIZE_array(type, name, size)
#define SIZE_array_be(type, name, size) SIZE_array(type, name, size)
#define SIZE_pad(size)                  total += size;

#define FOR_IDX(size) for (size_t _i = 0; _i < size; ++_i)

#define WRITE_field(type, name)          if (!write(file, name))    return false;
#define WRITE_field_le(type, name)       if (!write_le(file, name)) return false;
#define WRITE_field_be(type, name)       if (!write_be(file, name)) return false;
#define WRITE_array(type, name, size)    FOR_IDX(size) { WRITE_field   (type, name[_i]); }
#define WRITE_array_le(type, name, size) FOR_IDX(size) { WRITE_field_le(type, name[_i]); }
#define WRITE_array_be(type, name, size) FOR_IDX(size) { WRITE_field_be(type, name[_i]); }
#define WRITE_pad(size)                  if (!write_zero<size>(file)) return false;

#define READ_field(type, name)          if (!read(file, name))    return false;
#define READ_field_le(type, name)       if (!read_le(file, name)) return false;
#define READ_field_be(type, name)       if (!read_be(file, name)) return false;
#define READ_array(type, name, size)    FOR_IDX(size) { READ_field   (type, name[_i]); }
#define READ_array_le(type, name, size) FOR_IDX(size) { READ_field_le(type, name[_i]); }
#define READ_array_be(type, name, size) FOR_IDX(size) { READ_field_be(type, name[_i]); }
#define READ_pad(size)                  if (!skip<size>(file)) return false;


#define DEFINE(x) DEFINE_##x
#define SIZE(x)   SIZE_##x
#define WRITE(x)  WRITE_##x
#define READ(x)   READ_##x


#define DEFINE_SMART_STRUCT(name, body)                \
    struct name {                                      \
        body(DEFINE)                                   \
                                                       \
        static size_t size() {                         \
            size_t total = 0;                          \
            body(SIZE)                                 \
            return total;                              \
        }                                              \
                                                       \
        bool writeImpl(FILE* file) const {             \
            body(WRITE)                                \
            return true;                               \
        }                                              \
                                                       \
        bool readImpl(FILE* file) {                    \
            body(READ)                                 \
            return true;                               \
        }                                              \
    };                                                 \
                                                       \
    bool write(FILE* file, const name& n) {            \
        return n.writeImpl(file);                      \
    }                                                  \
                                                       \
    bool read(FILE* file, name& n) {                   \
        return n.readImpl(file);                       \
    }


/* This is what you edit to make your own structs.  BODY is 
   arbitrary, you can choose any name you like.  All that's required
   is that you define a macro that takes one argument, uses it as
   shown below, and passes that macro to DEFINE_SMART_STRUCT, along
   with the name of the resulting struct.
*/

#define BODY(_)                     \
    _(array)   (char,   magic, 4)   \
    _(field_le)(int,    i)          \
    _(field)   (char,   c)          \
    _(pad)     (3)                  \
    _(field_be)(double, d)          \
    _(field_be)(float,  f)

DEFINE_SMART_STRUCT(SmartStruct, BODY)
#undef BODY


int main() {
    // the packed size of the struct can be calculated...
    printf("size: %u\n", SmartStruct::size());

    // and the struct itself can be written and read
    SmartStruct ss;
    write(stdout, ss);
    read(stdin, ss);
}


/******************************************************************************
*******************************************************************************
******************************************************************************/