Know your assembly (part N)

An entertaining head-scratcher from today. Application has suddenly started crashing at launch, seemingly inside steam_api.dll. It’d be easy to blame external code, but as mentioned - it’s a new thing, so most likely related to our changes. To make things more interesting, it’s only 32-bit build that crashes, 64-bit seems fine. High-level code looks like (simplified):

 1struct tester
 2{
 3    virtual bool foo();
 4    virtual void bar()
 5    {
 6    }
 7    virtual void lol()
 8    {
 9        if(!foo())
10        {
11            printf("Failed\n");
12            return;
13        }
14        bar();
15    }
16};

Crash occurs when trying to call bar() [the original code was actually crashing ‘inside’ bar, which debugger claimed to be inside aforementioned DLL]:

 100199FB3 8B 06            mov         eax,dword ptr [esi]
 200199FB5 8B 10            mov         edx,dword ptr [eax]
 300199FB7 FF D2            call        edx
 400199FB9 84 C0            test        al,al
 500199FBB 75 10            jne         tester::lol+1Dh (199FCDh)
 6[...]
 700199FCD 8B 06            mov         eax,dword ptr [esi]
 800199FCF 8B 50 04         mov         edx,dword ptr [eax+4] ***
 900199FD2 8B CE            mov         ecx,esi
1000199FD4 5E               pop         esi
1100199FD5 FF E2            jmp         edx

Crash line marked with stars - Access violation reading location 0x00000018, EAX=0x14 at this point. That’d suggest something’s wrong with our vtable. How can this be, though, we have just called another virtual method (foo) and it’s been fine! As you might have guessed, it’s foo itself that’s wreaking havoc. It’s been modified recently and now contains the following code:

1static BOOL MyCallback(LPVOID, PENUM_PAGE_FILE_INFORMATION, LPCTSTR);
2virtual bool foo()
3{
4    EnumPageFiles(MyCallback, NULL);
5    return true;
6}

EnumPageFiles is a system function, so we’ll ignore it for now. Perhaps there’s something in the MyCallback that causes problems. Let’s remove everything and try running with empty callback function. Still crashes. Remove call to EnumPageFiles altogether - works fine…

Did some more poking and discovered that the vtable itself is actually OK and never modified by our code (data breakpoint at EAX+4 before calling foo). It’s the value of ESI that changes! See the 2 places where we move [ESI] to EAX? ESI differs between these 2 points. It’s like foo doesn’t restore it properly! Let’s keep digging… Deep inside the EnumPageFiles there is a push/pop esi pair, so it should be fine, right? The problem is, ESP doesn’t match. When trying to pop, it’s 12 bytes less than it should be, so we’re popping a completely unrelated value (in the original code it happened to point to the middle of some function in Steam DLL). You can probably guess where this is going. Stack mismatches like that are usually as sign of calling convention issues. Quick inspection of the code that executes our callback confirms it’s the problem:

175C08D83 50               push        eax
275C08D84 8D 45 E0         lea         eax,[ebp-20h]
375C08D87 50               push        eax
475C08D88 FF 75 0C         push        dword ptr [ebp+0Ch]
575C08D8B FF 55 08         call        dword ptr [ebp+8]
6...
7// Callback body (return true):
8003C9F90 B8 01 00 00 00   mov         eax,1
9003C9F95 C3               ret

As you can see, caller pushes 3 arguments to the stack (12 bytes!), but there’s no code that pops them (we expect callee to clean-up). Consulting the documentation confirms our findings, callback function is supposed to follow the stdcall convention. We’re not done yet, adding

static BOOL CALLBACK MyCallback(LPVOID, PENUM_PAGE_FILE_INFORMATION, LPCTSTR)

doesn’t seem to cut it, compiler complains: _EnumPageFilesW’ : cannot convert parameter 1 from ‘BOOL (__stdcall *)(LPVOID,PENUM_PAGE_FILE_INFORMATION,LPCTSTR)’ to ‘PENUM_PAGE_FILECALLBACKW’

As it turns out, the PENUM_PAGE_FILE_CALLBACKW type doesn’t actually include __stdcall… Let’s try to force it nonetheless:

1PENUM_PAGE_FILE_CALLBACKW cb = (PENUM_PAGE_FILE_CALLBACKW)MyCallback;
2EnumPageFiles(cb, NULL);
3...
4// Callback code:
500D99F90 B8 01 00 00 00   mov         eax,1
600D99F95 C2 0C 00         ret         0Ch

As you can see, MyCallback now cleans everything up properly and - as expected - code runs without crashing. Not sure where does the discrepancy between system headers and documentation come from.

Now, you might wonder - why did the x64 build worked fine? Well, we’ve been lucky. Callback function has only 3 arguments and x64 calling convention will pass them all in registers (r8, rdx, rcx) so that stack stays untouched. If it had just 1 more – we’d run into trouble. (Correction: as pointed by Ofek, this would still be fine, there’s 1 calling convention on x64 anyway). Interestingly enough, debug version worked “fine” as well (or at least was hiding the problem more effectively, as it was using more registers so one of the top functions was saving/restoring ESI too).

Old comments

Ofek 2015-06-03 08:35:41

I had very similar issues in the past.
I’d have guessed calling conventions are non-issue on x64 regardless of the number of arguments at the callback. Aren’t they part of an ABI? Aren’t calling convention declarations ignored by an x64 compiler?

admin 2015-06-03 12:29:37

Doh, that’s right, of course.

Arseny Kapoulkine 2015-06-03 14:29:51

Ugh. Looks like a bug in psapi.h?
This is probably worth reporting via MSConnect. Maybe they will fix it…

More Reading
Older// C++ 11 final