Null references

One of the most popular questions that fresh C++ programmers ask is about differences between pointers and references and which one to use. One of the differences people cite is “references can never be NULL”. That’s true in theory and according to the standard, but in practice, especially when mixing pointers and references there’s nothing preventing you from doing this:

1void Bar(Foo& f)
2{
3    f.x = 5;
4}
5...
6Foo* pf = NULL;
7Bar(*pf);

Now, technically, this code doesn’t conform to the standard and dereferencing a NULL pointer is an undefined behaviour (line 7). In practice, I have yet to see a compiler that gives a damn. Program will obviously crash later, when trying to write to x (line 3).

What’s interesting however, some compilers (I know of GCC/SNC, but there might be more) take advantage of the fact that standard conforming code cannot contain null references.

Consider the following snippet:

struct Lol { int x;};
struct Cat { int y;};
struct Lolcat : public Lol, public Cat {};
Cat* GetCatPtr(Lolcat* x) { return x; }
Cat& GetCatRef(Lolcat& x) { return x; }

With multiple inheritance, when trying to get one of the base classes, we need to add it’s “offset” (sizeof(Lol) for Cat) to the object address. There’s one gotcha, though - we do not shift it for NULL pointers, we do not want to get 4 (or whatever the offset is) from 0, obviously. Let’s see what MSVC does:

Cat *GetCatPtr(Lolcat *x) {return x;}
007A3340 8B 44 24 04          mov         eax,dword ptr [esp+4]  
007A3344 85 C0                test        eax,eax  
007A3346 74 04                je          GetCatRef+0Ch (7A334Ch)  
007A3348 83 C0 04             add         eax,4  
007A334B C3                   ret  
007A334C 33 C0                xor         eax,eax  
007A334E C3                   ret

It treats both cases exactly the same, in fact it only generates one function and calls it in both situations.

Code spitted out by GCC is more interesting:

000000000000000c <._Z9GetCatPtrP6Lolcat>:
       c:	2c 03 00 00 	cmpwi   r3,0
      10:	38 80 00 00 	li      r4,0
      14:	41 82 00 0c 	beq     20 <._Z9GetCatPtrP6Lolcat+0x14>
      18:	30 83 00 04 	addic   r4,r3,4
      1c:	78 84 00 20 	clrldi  r4,r4,32		# 20
      20:	60 83 00 00 	ori     r3,r4,0
      24:	4e 80 00 20 	blr

0000000000000028 <._Z9GetCatRefR6Lolcat>:
      28:	30 63 00 04 	addic   r3,r3,4
      2c:	78 63 00 20 	clrldi  r3,r3,32		# 20
      30:	4e 80 00 20 	blr

As you can see – there is no NULL check in GetCatRef function as compiler assumes it’s always valid. It’s doubtful you’ll see the impact of this optimization, especially in games, where it’s relatively rare to use MI, not to mention cast to base thousands times a frame… Still a funny little trivia and one example where there’s an actual difference between pointers and references in C++.

Old comments

Trillian 2012-03-25 23:02:25

Nice to know, I never thought about that.
Out of curiosity, I can see that MSVC disassembly is x86, but what is the architecture of the GCC one? I’ve never seen those opcodes.

wiewior 2012-03-26 00:36:26

@Trillian
That’s PowerPC, hint was GCC/SNC in text (SNC being the console compiler.)

peterchen 2012-04-01 19:14:00

Never rely on undefined behavior. You get away with it only for so long.
See e.g. here: http://blog.regehr.org/archives/213

More Reading
Older// GDC 2012/Mexico