Hidden danger of the BSF instruction

Had a very interesting debugging session recently. I’ve been investigating some of external crash reports, all I had was a crash dump related to a fairly innocent-looking piece of code. Here’s a minimal snippet demonstrating the problem:

struct Tree
{
    void* items[32];
};

#pragma intrinsic(_BitScanForward)
__declspec(noinline) void* Grab(Tree* t, unsigned int n)
{
    unsigned int j = 0;
    _BitScanForward((unsigned long *)&j, n);
    return t->items[j];
}

Easy enough, right? Seemingly, nothing can go wrong here. We find first set bit and use to index our table. ‘n’ is 32-bit, so our index is guaranteed to be in the 0-31 range, so it’s all good. Well, why is it crashing in extreme rare cases then? Let’s take a look at generated assembly code (x64), maybe it’ll help decipher this madness:

and    DWORD PTR j$[rsp], 0
bsf    eax, edx
; 667  :     return t->items[j];
mov    rax, QWORD PTR [rcx+rax*8]
ret    0

Three measly instructions, nothing extraordinary here, right? Don’t click “read the rest” if you want to give it a try yourself.

I probably should have gone with another title as it’s a hint here, so if you think the problem lies in the BSF instruction, you’re right. Let’s think… What happens if there are no bits set at all? Interestingly, Intel & AMD act differently (yay, more fun). See Wikipedia note on x86-64, specifically this fragment (emphasis mine):

“Intel 64’s BSF and BSR instructions act differently than AMD64’s when the source is zero and the operand size is 32 bits. The processor sets the zero flag and leaves the upper 32 bits of the destination undefined.

Uh, oh, what?? Noticed how we use 64-bits of RAX as an index? Now, try running this code:

__declspec(noinline) __int64 Get64()
{
    return __int64(0xF00000000) | 0x1;
}
[..]
__int64 x = Get64();
Tree t;
void* item = Grab(&t, 0);

Generated assembly:

// Get64
; 672  :     return __int64(0xF00000000) | 0x1;
mov    rax, 64424509441            ; 0000000f00000001H
ret    0

[..]

call    ?Get64@?A0x333a7d86@@YA_JXZ        ; `anonymous namespace'::Get64
mov    r8, rax
; 678  :     Tree t;
; 679  :     void* mem = Grab(&t, 0);
xor    edx, edx
lea    rcx, QWORD PTR t$[rsp]
call    ?Grab@?A0x333a7d86@@YAPEAXPEAUTree@1@I@Z ; `anonymous namespace'::Grab

Basically, if we get inside Grab function with EAX containing anything greater than 31 (and n == 0), we’ll get an access violation crash (or read random memory if we’re “lucky”, that’s why having something in upper 32bits is more likely to crash immediately)… In my tests (Ivy Bridge laptop) it doesn’t seem like EAX is modified at all in such case. Technically, it’s a programmer’s mistake (as 0 is an edge case and should be handled accordingly), so hard to put all blame on CPUs/compiler here, but it’s a very subtle one, caused me some head scratching (especially that it was buried much deeper than in my example). [All caveats described here apply to the BSR as well]

More Reading
Newer// Rust pathtracer