Compilers are (too) smart

Last week I noticed some unexpected assembly code in a function calling std::unordered_map::find (yes, I know, friends don’t let friends use STL, but that is a topic for another discussion). I decided to investigate a bit and it turned out to be quite a journey. It was a Clang compiler, ARM-based platform, I will use x64 here as behavior is similar and it is easier to repro in the Compiler Explorer.

Streaky locks

I have been comparing how locks (critical sections) behave on different platforms recently and noticed Windows seemed to be quite different than others. The behavior is consistent across two PC machines I tried (desktop and a laptop), test code is a bit too long to include it in the post, but it is linked here. The gist of it is really simple though - we have multiple threads (even 2 is enough) fighting for access to a shared resource, guarded by a critical section.

Know your assembly (part N+1)

Recently we have updated some of our compilers (Clang) and started running into mysterious problems, mostly in the physics code. We were able to narrow it down to raycasts failing to hit… sometimes. What made it a bit more annoying - it was only happening on 1 platform I had no easy access too. It was ARM based, but another ARM platform was fine. It seemed like the problem was deep in the guts of PhysX, code is publicly available, so I can link it here (rayAABBIntersect2).

Not all dot products are equal

I had another interesting debugging session recently with quite an unexpected conclusion. It all started when we received a new crash report - a certain platform (and 1 platform only) was crashing in a fairly old and seemingly innocent fragment of code: someModule->GetElements(elements); std::sort(elements.begin(), elements.end(), [&origin](const Element* a, const Element* b) { return a->pos.DistSqr(origin) < b->pos.DistSqr(origin); }); Crash was fairly rare although quite consistent in a particular location in the game, release build only, crashing on access violation inside the predicate.

operator[] considered harmful

I’ve always felt conflicted about the subscription operator[] in standard containers. It makes sense for random access structures like vectors, but it gets a little bit problematic elsewhere. Consider this seemingly innocent piece of code (it’s not 1:1, but this is code you can easily find in many codebases): if(someMap[key].value < x) { someMap[key].value = x; } 2 lines of actual code, but more than 1 problem, this snippet is potentially incorrect and inefficient.

Know your assembly (part 5)

The other day I was looking at a crash dump for a friend. A discussion that followed made me realize it might be worth to write a short post explaining why sometimes 2 seemingly almost identical function calls behave very differently. Consider the following code snippet (simplified): 1struct Foo 2{ 3 __declspec(noinline) const int& GetX() const { return x; } 4 virtual const int& GetY() const { return y; } 5 6 int x, y; 7}; 8 9int Cat(int); 10int Lol(Foo* f) 11{ 12 const int& x = f->GetX(); 13 const int& y = f->GetY(); // *** 14 15 return Cat(x+y); 16}

Simple multithreading tricks

Today I’d like to share a simple multithreading trick you can use to minimize “bubbles” in your pipeline. “Simple” because it applies mostly to “oldschool” threading systems, ie. the ones with main thread concept that kicks jobs and eventually flushes them. Cool kids using proper task graphs where everything just flows beautifully should not need it. Imagine we have a simple scenario, our thread produces some work, continues with whatever it’s doing, eventually waits for the job to finish (ideally this overlaps the work from previous point, so not much to do here) and processes results.

Zig pathtracer

If you’ve been reading this blog for a bit, you might know that when I experiment with new programming languages, I like to write a simple pathtracer, to get a better “feel”. It’s very much based on smallpt (99-line pathtracer in C++), but I do not want to make it as short as possible. I am more interested in how easily I can get it to run and what language ‘features’ I can use.

A debugger barrier

I’ve recently been asked by a friend for a little help with debugging a problem he was running into. Occasionally the program would freeze while trying to process a chunk of data and never moved on to the next one. Application is heavily threaded and processing is done by thread B, while thread A does its own job and periodically checks if work has been finished. If so, it sends it for further transformations and queues more work for thread B.

Grafana for dummies

Grafana is a very popular “analytics platform” or in more professional terms - a system to create pretty graphs. It’s very popular for monitoring system metrics, but really can be used for any timeseries data. It supports plethora of data sources and there is a decent chance you can use one of the off-the-shelf solutions to do 99% of the work for you (for example for some basic system metrics, especially on Linux).

Microcorruption writeup

Microcorruption is my ongoing “distraction” – it’s an online CTF. I’m way late to the party and have been doing it on and off since… 2013. How does it work exactly? To use their own description: “tl;dr: Given a debugger and a device, find an input that unlocks it. Solve the level with that input. You’ve been given access to a device that controls a lock. Your job: defeat the lock by exploiting bugs in the device’s code.

API granularity

I’d be the first to admit I don’t have much experience designing public APIs. I typically work on code that’s fairly specific to the game we’re making and while some of it is expected to be reused, our potential user pool is very limited, we’re still talking just one team, so <40 people and definitely not thousands. I’m still successfully using some little utilities/helpers I designed/coded 10+ years ago, but every now and then I run into a situation where decisions taken back then catch up with me and force to rewrite the API.

Ranged based "for" story

Consider the following, seemingly innocent (and completely made up) fragment of code: typedef std::map<int, std::string> MyMap; void foo(const MyMap& m) { for(const std::pair<int, std::string>& i : m) { if(i.first) { printf("%s\n", i.second.c_str()); } } } Looks simple enough, right? Just iterate over all elements of the map and print the value for non-zero keys. We use range-based for construct, use references, so do not expect any copies. Let’s just quickly make sure it all works as expected and consult Compiler Explorer.

Two Stage Push & Pop

I probably mentioned this before, but SPSC (single producer, single consumer) queue is one of my favorite structures. If implemented correctly, it’s actually 100% lock-free and is also surprisingly versatile (not all problems require MPMC!). I typically use a slightly modified version of this or an unbounded version, based on code by Dmitry Vyukov. Both implementations are very simple and possibly lack some of the modern C++ bells and whistles, like emplace, but these should not be hard to add.

Circular buffers (to the rescue)

Circular buffers are one (if not the) of my favorite data structures for some quick&dirty debugging. A simple, not production ready version can be implemented in a few lines of code (not ashamed to admit, I usually just copy paste these and remove when I’m done) and they’re a great tool to “record” a history of generic events. Any time I run into a seemingly random/unpredictable issue that might take a long time to repro, they’re on my short list.

Vanishing warning

Yet another MSVC story. Visual Studio has a nice compile-time warning when trying to access a static array with invalid index - C4789. According to documentation it’s mostly meant for various ‘copy’ functions (memcpy/strcpy etc), but it seems to work on ‘simple’ accesses as well. Consider (here’s a Godbolt link): struct Tab { float tab[2]; }; void Foo(const Tab&); void Bar(float forward) { Tab tab;[0] = forward;[2] = forward; // OOB access!

Compilers are smart

I recently transitioned to Visual Studio 2017 and while it went relatively painless, the new and improved optimizer uncovered some subtle issues lurking in the code (to be fair, Clang/GCC has been behaving same way for a long time now). The code in question was actually quite ancient and originated from this Devmaster forum post (gone now, but found if using The Wayback Machine): Fast and accurate sine/cosine. To be more precise, it was this version with ‘fast wrapping’:


Ever since I was a kid I was always fascinated by the northern lights and hoped to see them in person one day. Poland is too far south/densely populated to spot them, though, so it wasn’t a very realistic dream. In 2009 I did move to Sweden, but still, this was Stockholm area, so my chances might have been bigger, but not by much (plus only been there for a year).

Neural networks and the stock market

Machine learning and neural networks seem to be all the rage recently. I was sifting through my old drives the other day and found my master thesis. It’s actually vaguely related, subject was Neural Networks for Stock Market Forecasting. Complete thesis can be downloaded here, but comes with 2 major caveats: it’s 100% in Polish (so actual title is Sztuczne sieci neuronowe w prognozowaniu kursów giełdowych) it’s over 15 years old Sadly, I could not find an accompanying MATLAB program I wrote.

Who ate my stack

The Old New Thing is one of my favorite blogs. It’s a collection of Windows development anecdotes, but every now and then Raymond will post a gnarly debugging/crash story. I’ve recently found some of my old notes related to a crash I was chasing in a third-party library, it reminded me a little bit of The Old New Thing and I decided to try something similar. The whole thing took place a few years ago, we were using some super early versions of a certain vendor library (no sources).