vector constructor iterator

As promised, today a little bit about mysterious ‘vector constructor iterator’. Essentially, it’s a function generated by MSVC to initialize arrays of UDTs. Unfortunately, sometimes compiler gets confused and tends to spit out superflous code. Consider the following example, taken almost as-is from certain engine (names hidden/changed to protect innocent):

struct Vector4
{
	float X, Y, Z, W;
	RDE_FORCEINLINE Vector4() {};
        ...implementation
};
struct Matrix
{
	RDE_FORCEINLINE Matrix() {}
	Vector4	v[4];
};

As you can see, default constructors do not initialize member variables for maximum performance. Now, here’s some function operating on arrays of matrices:

Matrix tabMatrix[128];

What code would you expect to be generated here? None, right? After all, there’s nothing to do. Well, it turns out, compiler has a different opinion. MSVC will actually generate the following:

00BC5329  push        7Fh
00BC532B  lea         esi,[esp+224h]
00BC5332  pop         edi
00BC5333  push        offset Vector4::Vector4 (0BC1CE3h)
00BC5338  push        4
00BC533A  push        10h
00BC533C  mov         eax,esi
00BC533E  call        `vector constructor iterator' (0BC1171h)
...
`vector constructor iterator':
00BC1171  push        esi
00BC1172  mov         esi,eax
00BC1174  jmp         `vector constructor iterator'+0Fh (0BC1180h)
00BC1176  mov         ecx,esi
00BC1178  call        dword ptr [esp+10h]                 <--- Vector4 constructor
00BC117C  add         esi,dword ptr [esp+8]
00BC1180  dec         dword ptr [esp+0Ch]               <---- Number of rows in matrix
00BC1184  jns         `vector constructor iterator'+5 (0BC1176h)
00BC1186  pop         esi
00BC1187  ret         0Ch
...
//Vector4() {};
00BC1CE3  mov         eax,ecx
00BC1CE5  ret

As you can see, we have two loops here. First, we iterate over all matrices, calling ‘vector constructor iterator’ for each of them. Then, we iterate over 4 rows of each matrix, calling empty Vector4::Vector4 (which does nothing… Well, it assigns this to eax, but it doesn’t really make too much sense here). In general, much ado about nothing, as all this code is not needed. The easiest way, to get rid of it, is to remove default constructors altogether and rely on those generated by compiler. Sadly, it’s not very practical, as you most likely have some other constructors, so you have to provide a default one. Here’s less elegant solution, that gets the job done:

	__declspec(align(16)) unsigned char matrixMem[128 * sizeof(Matrix)];
	Matrix* tabMatrix = (Matrix*)matrixMem;
As expected – piece of code above results in 0 assembly instructions. This particular behavior seems to be triggered by two constructors (ie. Matrix, then Vector4). If Matrix class contained 16 floats or 4 vec regs, generated code would be much better, too.

Old comments

Pierre 2011-02-22 08:50:57

The vector constructor iterators have been driving me mad for years. In the past (~VC6) I thought they only appeared when exceptions were enabled. But I recently spotted a few of them with exceptions disabled, on Xbox. It was exactly in a case similar to what you describe here. A colleague reported there was an open compiler issue about it at MS, but it’s unlikely they’re going to do anything about it…

Stephen 2011-02-22 12:47:13

We’ve had to remove all constructors from lowlevel structures. There are more subtle problems with spurious stores to the stack. Kill them all, it’s the only way to be sure.

David Sveningsson 2011-02-22 16:42:21

I was a bit surprised reading this, so I had to test it with gcc and clang. I noticed that at -O0 both gcc and clang exhibit this behavior but at any higher optimization level it is optimized away. Doesn’t MSVC optimize this even with optimization turned on?

admin 2011-02-23 01:07:57

@David: it doesn’t seem to be related to optimization options, rather to enabling/disabling exception support, as noticed by Pierre.

Darren 2011-02-23 12:51:53

Change your __forceinline to just a regular inline and see what happens. I can get the same thing to happen here in a minimal test app if I use __forceinline on the constructors, but not if I use just inline.

gpakosz 2011-02-23 20:24:32

Your post rings a bell, reminds me about this answer I gave on stackoverflow.com http://bit.ly/fH7IEA
I didn’t launch the compiler yet to verify on your sample but I think it’s all about trivial constructors and/or destructors.
Your Vector4 and Matrix classes have empty constructors. Albeit doing nothing, they are not trivial constructors and I believe the compiler fails to diagnose they actually don’t throw. As a consequence, it inserts stack unwinding related code.
I would be glad if someone could confirm or deny my investigations. Microsoft compiler team, are you reading this? :)

admin 2011-02-25 05:18:45

@Darren - forceinline/inline doesn’t seem to make a difference for me (VS 2008).

gpakosz 2011-02-25 16:16:56

With VS 2008, I’m experimenting the same as @Darren. In release mode, when I remove __forceinline, there is no code generated by MVC++.
However, when adding __forceinline back, I can see `vector constructor iterator’ back.
Lesson of the day: just don’t mess with inlining? :)
Finally, the bug report mentioned by Pierre is likely to be: http://connect.microsoft.com/VisualStudio/feedback/details/576348/c-eh-vector-constructor-iterator-called-unnecessarily

Darren 2011-02-28 16:51:05

Interesting that forceinline doesn’t change things for you - I’m definitely seeing a difference here, in VS2008. I have the service pack installed (amongst other stuff) - I don’t know if that’ll make a difference.
For reference, here’s my test code - a single .cpp, make a new console project in VS and add this as a source file.
—8<—
#include
#include
#if 0
#define INLINE inline
#else
#define INLINE forceinline
#endif
struct Vector4
{
INLINE Vector4() {}
float x, y, z, w;
};
struct Matrix
{
INLINE Matrix() {}
Vector4 m_rows[4];
};
const unsigned int uNumMatrices = 128;
int main( int argc, char* argv[] )
{
Matrix m[uNumMatrices];
for ( unsigned int j=0; j != uNumMatrices; ++j )
{
for ( unsigned int i=0; i != 4; ++i )
{
m[j].m_rows[i].x = 1.0f;
m[j].m_rows[i].y = 2.0f;
m[j].m_rows[i].z = 3.0f;
m[j].m_rows[i].w = 4.0f;
}
}
for ( unsigned int j=0; j != uNumMatrices; ++j )
{
for ( unsigned int i=0; i != 4; ++i )
{
printf( “Row %u = [ %f, %f, %f, %f ]\n”, i, m[j].m_rows[i].x, m[j].m_rows[i].y, m[j].m_rows[i].z, m[j].m_rows[i].w );
}
}
return 0;
}
—8<—
Changing the "#if 0" to "#if 1" to toggle between "inline" and "
forceinline" should, hopefully, show the differences when you step through the disassembly.
Of course, it could be this is one issue with the compiler and you're having a completely different problem :-).

admin 2011-03-04 03:00:06

No difference… Weird, I know. Maybe I’m missing some SP (I do have SP1, IIRC).

More Reading
Newer// GDC 2011