EA STL released, updated benchmarks

²²/Oct 2010 By admin

Big weekend news were EA releasing some of their codebase source, including parts of EASTL. Given that the original EASTL document has been the main inspiration for my RDE STL experiment, I was very interested in finally seeing the code itself. First impression - it’s still big. It is leaner than STL, but it is a big library. That’s quite understandable if you remember it’s not a local, one studio thing, it’s used by the whole organization. Thus, it includes features like optional exception support, that in other situation could be easily ignored. I was especially curious to test how their hash map implementation performed when compared to Google’s or RDESTL.

hash_map tests were done using Google’s sparsehash (v1.9), ran them on my old laptop (Core Duo 2.2GHz, first value) and desktop machine (8 core, i7@2.8GHz, second value). All benchmarks were run for three different types of objects - sizes of 4 bytes (5000000 iterations), 16 bytes (1250000 iterations) and 256 bytes (156250 iterations). Results presented in tables below (omitted sparse_hash_map (slowest by far) and std::map. See my previous post for test description):

4 byte objects	dense_hash_map	rde::hash_map	stdext::hash_map	eastl::hash_map
map_grow	1273.8/365.9 ns	699.0/205.9 ns	2095.8/535.7 ns	1815.9/402.3 ns
map_predict/grow	535.6/169.4 ns	461.5/151.3 ns	2209.3/544.2 ns	1833.8/421.3 ns
map_replace	351.9/128. ns	369.9/116.9 ns	540.2/167.0 ns	395.6/121.9 ns
map_replace2	377.0/130.4 ns	322.3/94.2 ns	513.5/158.3 ns	354.1/118.8 ns
map_fetch	370.9/125.8 ns	309.3/98.4 ns	514.3/155.0 ns	412.9/103.7 ns
map_fetch_empty	101.1/28.2 ns	35.5/8.4 ns	224.6/54.9 ns	191.3/42.8 ns
map_remove	495.2/158.8 ns	370.4/121.4 ns	2131.5/577.2 ns	587.9/163.2 ns
map_toggle	1050.9/238.9 ns	479.5/121.6 ns	3470.2/844.6 ns	1193.9/264.5 ns

16 byte objects	dense_hash_map	rde::hash_map	stdext::hash_map	eastl::hash_map
map_grow	2760.6/685.5 ns	1235.3/282.7 ns	1964.5/1016.3 ns	2057.8/522.9 ns
map_predict/grow	934.5/258.5 ns	715.4/210.7 ns	1802.9/1026.8 ns	2044.4/528.7 ns
map_replace	693.0/199.8 ns	569.9/172.0 ns	774.8/178.5 ns	651.9/214.8 ns
map_replace2	826.6/204.4 ns	549.5/168.9 ns	783.1/176.4 ns	628.4/204.2 ns
map_fetch	712.9/203.6 ns	537.5/159.0 ns	765.9/177.1 ns	627.4/200.2 ns
map_fetch_empty	152.3/40.5 ns	87.7/23.7 ns	169.5/91.6 ns	369.7/78.8 ns
map_remove	931.8/252.9 ns	588.1/164.0 ns	1708.3/1481.7 ns	823.1/221.4 ns
map_toggle	2220.7/509.0 ns	903.3/215.5 ns	1414.6/896.6 ns	1785.7/371.3 ns

256 byte objects	dense_hash_map	rde::hash_map	stdext::hash_map	eastl::hash_map
map_grow	16839.8/3724.5 ns	5786.5/1360.1 ns	12130.4/3012.7 ns	7629.7/1831.8 ns
map_predict/grow	4675.8/1174.3 ns	3845.0/934.9 ns	12281.7/2969.9 ns	7411.1/1852.1 ns
map_replace	3893.9/999.0 ns	3334.1/800.1 ns	3307.8/789.5 ns	3239.0/763.0 ns
map_replace2	4039.3/1018.4 ns	3298.5/807.4 ns	3222.9/815.0 ns	3234.3/838.3 ns
map_fetch	3941.6/978.8 ns	3186.0/820.3 ns	3173.7/797.6 ns	3188.8/819.7 ns
map_fetch_empty	233.6/66.3 ns	158.3/46.1 ns	3057.8/749.3 ns	2954.8/720.3 ns
map_remove	4457.8/1116.7 ns	3301.5/785.4 ns	9395.7/2347.9 ns	3408.8/928.7 ns
map_toggle	9773.2/2334.4 ns	6438.6/1525.4 ns	13127.5/2931.8 ns	9738.0/2437.8 ns

Note: couldn’t find reserve/resize method for EASTL, that’s why map_grow is the same as map_predict/grow. RDE tries to be as cache friendly as possible and uses open addressing, so it’s the most effective with small objects. When iterating to next object means a cache miss anyway, most of its advantages is lost, as is clearly visible for 256 byte items. Actually, for the last group, std::map was the fastest, beating all hash_maps easily. Below you can find comparative charts (results from the 8-core machine), last one includes std::map.

I ran also my own test, which included adding names of all files from c:\Windows\System32 to a (hash)map, performing some look-ups and then removing them. In general EASTL was quite close to RDESTL here (excluding construction), both quite clearly faster than standard implementation. std::map, rde::map & eastl::map were included in the test as well, and while they were slower than hash_maps, EASTL was clearly the fastest from those three (with RDE ‘fetch’ being the slowest, slower than MSVC implementation even, will have to take a look there). Typical results (in ms, laptop):

	Grow	Find	Remove	Total
RDE hashmap	191	133	105	429
EASTL hashmap	182	155	108	445
STD hashmap	222	185	240	647
EASTL map	253	584	278	1115
RDE map	252	709	338	1299
STD map	228	601	482	1311

I tried some less interesting containers, like vector as well. EASTL implementation is really nice, it’s a little bit faster than RDE in most important areas (push_back), while slightly slower when removing objects. There are some neat ideas there, like storing capacity same way as begin/end (ie. as vector), so that we don’t need to ‘convert’ between pointer/index when testing if vector should grow. I plan to incorporate some of those into RDE soon (it could use some overhaul/clean-up anyway).

In general, switching to EASTL should be a no brainer for any companies that are still using STL, if only for better memory management. If you have your own implementation, you probably want to run similar benchmarks yourself first, but I’d say it’s worth at least giving a try (not the whole package maybe, as I mentioned - it may be a little too big for everyone’s taste) . It’s clearly visible that it was written by someone, who actually cared about performance/memory management.

Appendix: all charts in this note were generated using Perl. I wrote a little script that parsed program output and generated images using built-in module. It was the first time I wrote something in Perl. Impressions? Boy, is it powerful when it comes to working with strings (shocker, I know). Whole parsing code is maybe 15 lines and I’m sure could be half of it, were it written by someone who actually knew what he was doing. Syntax is ugly, yes’ It’s good to be concise, but here it was pushed a little bit too far. I think I still prefer it to Tcl. Thing I absolutely hated was that statements need to end with a semicolon. For some reason when I code in a scripting language I immediately switch to sloppy, non semicolon mode. What’s more annoying - error messages are completely bogus in such case, I think I’ve seen at least 10 different ones. After a while I just learnt that if message didn’t make any sense - it’s a semicolon missing somewhere.

Old comments

Paul 2010-10-22 08:53:33

Which compiler and OS did you run theses tests with?
Have you also compared to tr1:unordered_map?

Ofek 2010-10-22 09:59:42

Maybe add STLPort to the benchmark?

Sebastian 2010-10-22 12:38:56

Thanks for posting these benchmarks. Like the previous poster I think it would be great if you could include a test case for unordered_map. I’m mostly interested in boost’s implementation.
http://www.boost.org/doc/libs/1_44_0/doc/html/unordered.html

snk_kid 2010-10-22 13:12:53

You should try do this with with (pre-)C++0x enabled library implementations that support features such as move semantics and again later once C++0x offically out and has better support with other compilers.
VC++2010 supports move semantics and in their STL implementation. tr1::unordered_map is std::unordered_map in C++0x, there should not be any significant difference between std::unordered_map and stdext::hash_map except for move semantics and slight interface differences.

admin 2010-10-22 23:23:57

Code was compiled with MSVC 2008.
Tests run under Windows Vista 32-bit (laptop) and Windows 7 64-bit (desktop).
I’ll try to add boost/STLPort over the weekend. Basically, it’s unmodified Google’s benchmark from their sparsehash library, so if anyone’s in hurry, it should be simple enough to give it a whirl.

snk_kid 2010-10-23 13:43:44

Ideally you should do this for a number of compilers, there will be differences. You should definately do this with VC++2010 because you should see a sigifnicant differences because of rvalue references and move semantics and a better optimizing compiler. Get the express edition it’s free.

admin 2010-10-23 14:03:23

I’ll give 2010 a try some day. Reason why I’m not particularly interested in testing move semantics right now is that it’s not widely support by console (X360/PS3) compilers anyway.

Steve 2010-10-25 06:53:29

http://github.com/paulhodge/EASTL/

xp 2010-10-25 18:32:58

for those wondering how unordered_map performs, check out this benchmark: http://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/
it’s a little dated but I think most results are still good today.

EASTL publicada como open source « martin b.r. 2010-11-12 10:57:27

[…] autor, Paul Pedriana, ha conseguido publicar una versi??n (m??s o menos el 60%) como open source. Aqui teneis un extenso analisis de sus […]

Adrian 2010-11-18 11:32:30

Would be interesting to see a comparison vs PS3/360.
I’d bet that they’ve paid more attention to keeping a smaller cache happy and the differences between best and worst are exaggerated further.
Adrian

admin 2010-11-19 03:19:44

X360 benchmarks can be found here: http://msinilo.pl/blog/?p=675 .

dk1rein 2010-11-22 10:35:11

Would be interesting to see a comparison vs PS3/360.

Hubert 2011-12-08 14:46:54

I did a comparison of std STL, RDE, ea STL on PC/win7, x360 and ps3: http://imageshack.us/f/545/wyniky.png/
Compiler was VS2010, all optimizations on
Can you comment on the results? Especially about vector.erase on x360
If you want I can share benchmark’s code

admin 2011-12-08 15:37:58

Thanks for the test, looks interesting. What was the stored element type? I’d have to take a look at erase() myself, I might be doing something stupid there, I mostly use unordered version, so didn’t focus on it too much (it’s still surprising it’s so slow only on X360). If stored element is POD, specializing is_pod might help.

Hubert 2011-12-09 16:35:12

Thanks for answer :] All tests for vector, list and deque were made on uint32_t.
Today I tested it also on android 2.3 NEON (Xperia Play), and again - vector.erase in rde lasts 13843m, compared to ~8200 in stl and ea. I’m gonna test also how it behaves for bigger structs, and come back here.

admin 2012-09-05 03:12:00

Doh, forgot about this one. Seems like there was a bug that would cause unsigned values to be treated as types requiring copy constructor and nullify some of optimizations. It has been fixed, feel free to give a new version a try.

.m.m.s.

Maciej Sinilo