Is your multi-threaded application running like a dog on Solaris?
Mine was, and after weeks of head scratching I was reduced to suggesting that the client ran it on Linux instead, which was a mite embarrassing to say the least.
Fast forward a year or so and I discover the wonderful Libumem, or, because we have to support Sol8 the almost as wonderful mtmalloc (incidentally, Oracle need to up their SEO game, it took me ages to dig up that link), two multithread optimised memory allocators that speed things up a lot! Brilliantly all you need to do is set LD_PRELOAD and you're off, you don't even need to recompile anything.
Except that I am not working on that application any more. Now I am trying to make something else go faster. And this something else is old, and sits on the legacy comms layer.
Way back in the dim and distant past somebody who isn't here any more wrote a buffered interface for the comms layer. Now if you or I were implementing a data buffer we'd probably put it in a unit tested class, and be all agile about it, and probably use templates and a design pattern and stuff. We'd definitely keep at least two counters, one to track the size of the buffer, and one to track the size of the data in it.
Things weren't like that back then. Classes were just considered uppity structures, unit-testing was viewed as a dubious eccentricity, and templates were for Microsoft Word and Microsoft Word only. Our long forgotten developer didn't need two counters. He just made it so the buffer was always the size of the data in it and whenever that changed he realloc'd it.
No messin'!
The writers of mtmalloc and libumem were not like that, they were like you and me, and they did not worry about realloc because everyone these days uses c++y things like new and delete and there was no c++y way to do realloc. So when they implimented realloc they simply did a malloc, a memcopy, and a free.
Easy!
Except you know what happens next. I set my LD_PRELOAD and everything runs swimmingly, for a while, until it hits a certain data-rate where, suddenly, the comms buffering is being exercised and the application slows back down to a crawl. Even worse than it was before, and I have to open up the labyrinthine antediluvian comms code and take out all those reallocs.
Which was no fun.