Playing with bits

⁹/Feb 2010 By admin

Little challenge posted by a workmate. Good for those 5 minute breaks at work when you need to force your brain to think about something else than your current task. Recode the following piece of code so that it doesn’t use branches/multiplies:

unsigned char* oldStart = 0;
if (m_start + size  < m_bottom)
{
    oldStart = m_start;
    m_start += size;
}
return oldStart;

Don’t read below if you want to try it yourself.

My approach (not too portable, assumes 32-bit pointers):

const size_t d = (m_start + size) - m_bottom;
const unsigned int sign = d >> 31;
const unsigned int mask = 0xFFFFFFFF - (sign - 1);
oldStart = (unsigned char*)((size_t)m_start & mask);
m_start += size & mask;

I’m quite sure mask could be calculated in a smarter way. My first version used lookup tables (2 elements, indexed with sign), but this one should perform a little bit better (version with LUT had some implicit multiplications when accessing tables).

Old comments

M 2010-02-14 12:44:58

unsigned int mask = (m_start + size >= m_bottom)-1;

Arseny Kapoulkine 2010-02-10 18:21:20

It’s easy to make it 64-bit proof - just use intptr_t / uintptr_t instead of int/unsigned int and sizeof(intptr_t) * 8 - 1 instead of 31 (well, this assumes a 8-bit byte… :)

ed 2010-02-10 20:11:58

I took a bit different approach.
unsigned char* oldStart = m_start;
m_start += size;
uint32 mask_sign = (m_start - m_bottom) >> 31;
oldStart &= mask_sign;
size = (mask_sign ^ size) & size;
m_start -= size;
Mask is not reversed. OldStart equals m_start at the beginning, so instead of setting oldStart to m_start using a mask, we will clear to zero when condition is true. Then we just mangle a bit with mask and size, and we clear size when cond = true.
And if I’m not wrong, then this will require one arithmetic less.

ed 2010-02-10 20:13:19

errata:
Mask is not reversed. OldStart equals m_start at the beginning, so instead of setting oldStart to m_start using a mask, we will clear to zero when condition is “”“NOT”“” true. Then we just mangle a bit with mask and size, and we clear size when cond = true.
And if Iâ??m not wrong, then this will require one arithmetic less.

sopyer 2010-02-09 21:11:45

Why not use conditional assignment? It is not branching instruction:
bool cond = m_start+size < m_bottom;
unsigned char* oldStart = cond?m_start:0;
m_start+=cond?size:0;
return oldStart;

Arseny Kapoulkine 2010-02-09 21:52:09

The better way to generate mask is to take advantage of sign bit extension when right shifting (arithmetic shift):
const int d = (m_start + size) - m_bottom;
const unsigned int mask = d >> 31;

js 2010-02-09 22:01:07

My answer would be:
unsigned int oldStart = min(m_start+size, m_bottom) % m_bottom;
Of course, you could argue that min is not branchless but there exist a branchless version of min
#define branchless_min(a,b) b + ((a-b) & (a-b)>>31)
unsigned int oldStart = branchless_min((m_start+size), m_bottom) % m_bottom;
Additionally, one would point that it will not work if m_bottom equals zero then I would do this:
#define branchless_max(a,b) a - ((a-b) & (a-b)>>31)
#define branchless_min(a,b) b + ((a-b) & (a-b)>>31)
unsigned int oldStart = branchless_min((m_start+size), m_bottom) % branchless_max(m_bottom,1);
ps: I agree that #defines are ugly, you can replace them with templates

Joshua 2010-02-09 23:21:59

sopyer:
Actually, in many compilers (particularly targeting x86 and variants), using conditional assignment actually generates a branch instruction because short of a conditional move (which is only available on newer processors), branching is the fastest way to do it. MIPS and PowerPC both have instructions to do this directly without a branch (slt for Set on Less Than).
Also, the conditional operator (?:) is no better than an if in terms of generating branching instructions. It’s just syntactic sugar.

sopyer 2010-02-10 07:37:46

@Joshua
As far as I know x86 assembler has command cmov__ which is available from Pentium Pro processor family. But I am actually unaware about its support by compilers.

admin 2010-02-10 08:25:34

IIRC latest GCC is smart enough to use cmov__ whenever it can, I don’t think MSVC will do it, though. It’s not a “real” piece of code we’re trying to optimize here, BTW, just a little brain exercise.
@Arseny: good call, I had a feeling my mask calculation looks a little bit too complicated.

Sander van Rossen 2010-02-10 09:49:28

enum IntegerTraits
{
INTEGER_BITS = (sizeof(int) * 8),
INTEGER_MAX = (1u <> (INTEGER_BITS - 1);
const unsigned int mask = ((unsigned int)INTEGER_MAX) - (sign - 1);
oldStart = (unsigned char*)((size_t)m_start & mask);
m_start += size & mask;
Not tested, but shouldn’t this make it portable?

Sander van Rossen 2010-02-10 09:51:53

Hmm.. not sure what happened to the above post, but it seems to have swallowed some text..
The essence of it all is the:
INTEGER_BITS = (sizeof(int) * 8),
INTEGER_MAX = (1u << (INTEGER_BITS - 1))
enum part

admin 2010-02-10 10:33:46

Sander: almost portable now :) Will probably break with 64-bit pointers.

.m.m.s.

Maciej Sinilo