EA STL released, updated benchmarks

Big weekend news were EA releasing some of their codebase source, including parts of EASTL. Given that the original EASTL document has been the main inspiration for my RDE STL experiment, I was very interested in finally seeing the code itself. First impression - it’s still big. It is leaner than STL, but it is a big library. That’s quite understandable if you remember it’s not a local, one studio thing, it’s used by the whole organization. Thus, it includes features like optional exception support, that in other situation could be easily ignored. I was especially curious to test how their hash map implementation performed when compared to Google’s or RDESTL.

hash_map tests were done using Google’s sparsehash (v1.9), ran them on my old laptop (Core Duo 2.2GHz, first value) and desktop machine (8 core, i7@2.8GHz, second value). All benchmarks were run for three different types of objects - sizes of 4 bytes (5000000 iterations), 16 bytes (1250000 iterations) and 256 bytes (156250 iterations). Results presented in tables below (omitted sparse_hash_map (slowest by far) and std::map. See my previous post for test description):

4 byte objects dense_hash_map rde::hash_map stdext::hash_map eastl::hash_map
map_grow 1273.8/365.9 ns 699.0/205.9 ns 2095.8/535.7 ns 1815.9/402.3 ns
map_predict/grow 535.6/169.4 ns 461.5/151.3 ns 2209.3/544.2 ns 1833.8/421.3 ns
map_replace 351.9/128. ns 369.9/116.9 ns 540.2/167.0 ns 395.6/121.9 ns
map_replace2 377.0/130.4 ns 322.3/94.2 ns 513.5/158.3 ns 354.1/118.8 ns
map_fetch 370.9/125.8 ns 309.3/98.4 ns 514.3/155.0 ns 412.9/103.7 ns
map_fetch_empty 101.1/28.2 ns 35.5/8.4 ns 224.6/54.9 ns 191.3/42.8 ns
map_remove 495.2/158.8 ns 370.4/121.4 ns 2131.5/577.2 ns 587.9/163.2 ns
map_toggle 1050.9/238.9 ns 479.5/121.6 ns 3470.2/844.6 ns 1193.9/264.5 ns

16 byte objects dense_hash_map rde::hash_map stdext::hash_map eastl::hash_map
map_grow 2760.6/685.5 ns 1235.3/282.7 ns 1964.5/1016.3 ns 2057.8/522.9 ns
map_predict/grow 934.5/258.5 ns 715.4/210.7 ns 1802.9/1026.8 ns 2044.4/528.7 ns
map_replace 693.0/199.8 ns 569.9/172.0 ns 774.8/178.5 ns 651.9/214.8 ns
map_replace2 826.6/204.4 ns 549.5/168.9 ns 783.1/176.4 ns 628.4/204.2 ns
map_fetch 712.9/203.6 ns 537.5/159.0 ns 765.9/177.1 ns 627.4/200.2 ns
map_fetch_empty 152.3/40.5 ns 87.7/23.7 ns 169.5/91.6 ns 369.7/78.8 ns
map_remove 931.8/252.9 ns 588.1/164.0 ns 1708.3/1481.7 ns 823.1/221.4 ns
map_toggle 2220.7/509.0 ns 903.3/215.5 ns 1414.6/896.6 ns 1785.7/371.3 ns

256 byte objects dense_hash_map rde::hash_map stdext::hash_map eastl::hash_map
map_grow 16839.8/3724.5 ns 5786.5/1360.1 ns 12130.4/3012.7 ns 7629.7/1831.8 ns
map_predict/grow 4675.8/1174.3 ns 3845.0/934.9 ns 12281.7/2969.9 ns 7411.1/1852.1 ns
map_replace 3893.9/999.0 ns 3334.1/800.1 ns 3307.8/789.5 ns 3239.0/763.0 ns
map_replace2 4039.3/1018.4 ns 3298.5/807.4 ns 3222.9/815.0 ns 3234.3/838.3 ns
map_fetch 3941.6/978.8 ns 3186.0/820.3 ns 3173.7/797.6 ns 3188.8/819.7 ns
map_fetch_empty 233.6/66.3 ns 158.3/46.1 ns 3057.8/749.3 ns 2954.8/720.3 ns
map_remove 4457.8/1116.7 ns 3301.5/785.4 ns 9395.7/2347.9 ns 3408.8/928.7 ns
map_toggle 9773.2/2334.4 ns 6438.6/1525.4 ns 13127.5/2931.8 ns 9738.0/2437.8 ns
Note: couldn’t find reserve/resize method for EASTL, that’s why map_grow is the same as map_predict/grow. RDE tries to be as cache friendly as possible and uses open addressing, so it’s the most effective with small objects. When iterating to next object means a cache miss anyway, most of its advantages is lost, as is clearly visible for 256 byte items. Actually, for the last group, std::map was the fastest, beating all hash_maps easily. Below you can find comparative charts (results from the 8-core machine), last one includes std::map.

graph4 graph16

graph256

I ran also my own test, which included adding names of all files from c:\Windows\System32 to a (hash)map, performing some look-ups and then removing them. In general EASTL was quite close to RDESTL here (excluding construction), both quite clearly faster than standard implementation. std::map, rde::map & eastl::map were included in the test as well, and while they were slower than hash_maps, EASTL was clearly the fastest from those three (with RDE ‘fetch’ being the slowest, slower than MSVC implementation even, will have to take a look there). Typical results (in ms, laptop):

Grow Find Remove Total
RDE hashmap 191 133 105 429
EASTL hashmap 182 155 108 445
STD hashmap 222 185 240 647
EASTL map 253 584 278 1115
RDE map 252 709 338 1299
STD map 228 601 482 1311
I tried some less interesting containers, like vector as well. EASTL implementation is really nice, it’s a little bit faster than RDE in most important areas (push_back), while slightly slower when removing objects. There are some neat ideas there, like storing capacity same way as begin/end (ie. as vector), so that we don’t need to ‘convert’ between pointer/index when testing if vector should grow. I plan to incorporate some of those into RDE soon (it could use some overhaul/clean-up anyway).

In general, switching to EASTL should be a no brainer for any companies that are still using STL, if only for better memory management. If you have your own implementation, you probably want to run similar benchmarks yourself first, but I’d say it’s worth at least giving a try (not the whole package maybe, as I mentioned - it may be a little too big for everyone’s taste) . It’s clearly visible that it was written by someone, who actually cared about performance/memory management.

Appendix: all charts in this note were generated using Perl. I wrote a little script that parsed program output and generated images using built-in module. It was the first time I wrote something in Perl. Impressions? Boy, is it powerful when it comes to working with strings (shocker, I know). Whole parsing code is maybe 15 lines and I’m sure could be half of it, were it written by someone who actually knew what he was doing. Syntax is ugly, yes’ It’s good to be concise, but here it was pushed a little bit too far. I think I still prefer it to Tcl. Thing I absolutely hated was that statements need to end with a semicolon. For some reason when I code in a scripting language I immediately switch to sloppy, non semicolon mode. What’s more annoying - error messages are completely bogus in such case, I think I’ve seen at least 10 different ones. After a while I just learnt that if message didn’t make any sense - it’s a semicolon missing somewhere.

Old comments

Paul 2010-10-22 08:53:33

Which compiler and OS did you run theses tests with?
Have you also compared to tr1:unordered_map?

Ofek 2010-10-22 09:59:42

Maybe add STLPort to the benchmark?

Sebastian 2010-10-22 12:38:56

Thanks for posting these benchmarks. Like the previous poster I think it would be great if you could include a test case for unordered_map. I’m mostly interested in boost’s implementation.
http://www.boost.org/doc/libs/1_44_0/doc/html/unordered.html

snk_kid 2010-10-22 13:12:53

You should try do this with with (pre-)C++0x enabled library implementations that support features such as move semantics and again later once C++0x offically out and has better support with other compilers.
VC++2010 supports move semantics and in their STL implementation. tr1::unordered_map is std::unordered_map in C++0x, there should not be any significant difference between std::unordered_map and stdext::hash_map except for move semantics and slight interface differences.

admin 2010-10-22 23:23:57

Code was compiled with MSVC 2008.
Tests run under Windows Vista 32-bit (laptop) and Windows 7 64-bit (desktop).
I’ll try to add boost/STLPort over the weekend. Basically, it’s unmodified Google’s benchmark from their sparsehash library, so if anyone’s in hurry, it should be simple enough to give it a whirl.

snk_kid 2010-10-23 13:43:44

Ideally you should do this for a number of compilers, there will be differences. You should definately do this with VC++2010 because you should see a sigifnicant differences because of rvalue references and move semantics and a better optimizing compiler. Get the express edition it’s free.

admin 2010-10-23 14:03:23

I’ll give 2010 a try some day. Reason why I’m not particularly interested in testing move semantics right now is that it’s not widely support by console (X360/PS3) compilers anyway.

Steve 2010-10-25 06:53:29

http://github.com/paulhodge/EASTL/

xp 2010-10-25 18:32:58

for those wondering how unordered_map performs, check out this benchmark: http://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/
it’s a little dated but I think most results are still good today.

EASTL publicada como open source « martin b.r. 2010-11-12 10:57:27

[…] autor, Paul Pedriana, ha conseguido publicar una versi??n (m??s o menos el 60%) como open source. Aqui teneis un extenso analisis de sus […]

Adrian 2010-11-18 11:32:30

Would be interesting to see a comparison vs PS3/360.
I’d bet that they’ve paid more attention to keeping a smaller cache happy and the differences between best and worst are exaggerated further.
Adrian

admin 2010-11-19 03:19:44

X360 benchmarks can be found here: http://msinilo.pl/blog/?p=675 .

dk1rein 2010-11-22 10:35:11

Would be interesting to see a comparison vs PS3/360.

Hubert 2011-12-08 14:46:54

I did a comparison of std STL, RDE, ea STL on PC/win7, x360 and ps3: http://imageshack.us/f/545/wyniky.png/
Compiler was VS2010, all optimizations on
Can you comment on the results? Especially about vector.erase on x360
If you want I can share benchmark’s code

admin 2011-12-08 15:37:58

Thanks for the test, looks interesting. What was the stored element type? I’d have to take a look at erase() myself, I might be doing something stupid there, I mostly use unordered version, so didn’t focus on it too much (it’s still surprising it’s so slow only on X360). If stored element is POD, specializing is_pod might help.

Hubert 2011-12-09 16:35:12

Thanks for answer :] All tests for vector, list and deque were made on uint32_t.
Today I tested it also on android 2.3 NEON (Xperia Play), and again - vector.erase in rde lasts 13843m, compared to ~8200 in stl and ea. I’m gonna test also how it behaves for bigger structs, and come back here.

admin 2012-09-05 03:12:00

Doh, forgot about this one. Seems like there was a bug that would cause unsigned values to be treated as types requiring copy constructor and nullify some of optimizations. It has been fixed, feel free to give a new version a try.