Compressing integers

²/Apr 2012 By admin

Programming games is an uphill battle. We always try to fight for more, but our resources are limited. There’s never enough RAM, the CPUs are never too powerful and obviously same thing applies to bandwidth - we could always use some more. The eternal problem of mutiplayer game is trying to send as much information as possible using as little memory as possible. One of the weapons in our arsenal is compression. I will focus on integers today as there seems to be enough information about floats/vectors out there.

Let’s start with easy cases first (I’ll link to Shawn Hargreaves blog, good descriptions there):

“enum” type integers with limited range of possible values - use bitfields. If you know you only have 4 types of grenades, no need to waste 8 bits for grenade type, you only need 2 bits, obviously
arithmetic encoding for even more effective bitpacking

That’s all nice and good, but things aren’t always that straightforward. Imagine 32-bit variable that can take the complete range of values. Typical example is health/damage. Some level 1 creature might have 50hp, but your endgame boss sits on 250k. Same thing with damage, there’s a difference between poking someone with a stick and dropping a nuke from the orbit. Our previous approaches won’t help us much here, because even though we know the range – it’s huge. We could give up and always send 32-bits, but it seems wasteful, after all in majority of cases we require much less memory (huge damage spikes are rare).

At first, co-worker directed me towards Google’s Protocol Buffers (or rather the encoding scheme they used). They were created with more complicated applications in mind, but it seems like they’d work for us quite nicely. You’ll want to read the linked document for details, but the general idea in a nutshell is - we employ base 128 varints. We only use 7 bits in a byte for our data and the remaining bit is used to signalize if there are more bytes to consider. Some examples:

1 - 1 byte, 0000 0001 (MSB not set),
300 - 2 bytes, 1010 1100 0000 0010 (MSB set in the first byte, so there’s more data, MSB not set in the second (last) one)

Sounds better, but is it the best we can do? That depends on the typical distribution of values for our transmitted variable. For my purposes (damage/health) it still wasn’t perfect. My data was heavily skewed towards zero. Most typically values were in 0-230 range. Problem with 128 base varints is that values greater than 127 will not fit in a single byte (as we only have 7 bits)…

I decided to go with simpler solution that fit my data better - every ‘varint’ data starts with 2 bits telling us how many bytes follow. Examples:

1 = 10 bits, 00 0000001
128 = 10 bits, 00 10000000
300 = 18 bits, 01 100101100

It costs us additional 2 bits in every case, so in the worst situation we’re sending 34 bits, but this should happen once a blue moon. For values from [0,127] range it’s a little bit worse than base 128 as it requires 10 bits instead of 8. However, for the [128-255] it still only needs 10 bits (not 16). There’s no clear winner here, it all depends on your dataset, for me solution #2 was performing better (sending 10 bits in the overhelming majority of cases).

Old comments

Jo??o Matos 2012-04-02 04:26:34

Thanks for sharing. I am using a scheme inspired by Protocol Buffers for serialization, but your approach could come handy if I never need to squeeze it further.

Alex Turner 2012-04-02 09:21:21

Hi,
I liked this post but it worries me. In what are you sending these bit sets? Using less that a word for data can cause all sorts of data alignment slowdowns. This can be as much an issue in tcp/ip stacks are it is with in-memory work. I would love to know a little bit more about the context of the compression so we could understand why it is required and what benefits it gives.

admin 2012-04-02 13:24:39

Alex - usually games send data as one binary blob of data (bit stream). Yes, data in a bit stream is misaligned, but it’s unpacked on the other end before accessing.

Krzysztof Narkowicz 2012-04-02 18:35:50

Hi, it sounds wasteful to always send 2 additional bits. You could send 1 additional bit for 1 byte ints and 2 for 2 byte ints. Just like in Protocol Buffers.

Krzysztof Narkowicz 2012-04-02 19:27:04

I mean it’s hard to imagine what kind of data could require more than 2^16 values.

admin 2012-04-02 19:34:51

Yes, if you know you’re fine with 16-bits, it obviously becomes easier. Highly game specific, but for health points 65536 is not enough sometimes. Quick example, Ragnaros (one of WoW Molten Core bosses) has ~1mln HP.

Ian Romanick 2012-04-06 17:06:20

You could also use some sort of logarithmic mapping. Your boss may have 250k hp, but you probably don’t need a slightly more powerful boss with 250k+2 hp. Just use some bits for the exponent and some bits for the mantissa.
If you expect the exponents to usually be small, use some sort of fixed Huffman-like encoding for them. You could even use a different table for each level in the game so that whatever the most common exponent range is for a given level will get the fewest bits.

ed 2012-04-08 07:55:19

Positions may require more than 2^16, especially when you’re using UDP and you must send full positions instead of deltas.
When using TCP you can send deltas for position/rotation/hp changes etc. And for deltas you can use variety of encodings like Shannon-Fano, Elias gamma coding (used in video encoders) and so on.

admin 2012-04-09 13:40:11

@Ian - good ideas there. One of the problems with health though, it tends to drop :) So usually you need to represent both 250k & 250k - 10 (or whatever the damage is). We can try sending damage itself and do the calculation on the other end, but it suffers from the same problem (ie. range might be huge).

Sergey Makeev 2012-04-11 18:33:36

Spend higher bit, or a fixed number of bits is not very memory efficient.
You can just store numbers 0..254 as usual and value is 255 is a flag there is more data in stream.

Sergey Makeev 2012-04-11 19:46:30

1 = 8 bits, 00000001
128 = 8 bits, 10000000
300 = 16 bits, 00101110 11111111 (46 + 254)

Sergey Makeev 2012-04-12 04:47:54

My method is thrash, sorry :) It’s dont work with huge numbers its all.

.m.m.s.

Maciej Sinilo