Friday 21 September 2012

One Definition to Rule them all

Generally I am a little bit suspicious of people who can quote the C++ standard, knowing the exact definitions and all the obscure intricacies should only be necessary for compiler writers. The rest of us should get by with a broad understanding of how things work and the compiler should catch us when we stray.

In fact, and as an aside, I would actually consider too much knowledge of operator precedence to be positively harmful. People who know things can't help themselves but to use that knowledge and end up leaving out unnecessary braces in compound statements because it's obvious to them that they are unnecessary. Some idiot who hasn't memorised the rules (like me) then inherits the code, makes an incorrect assumption about what it's trying to do, and screws it up.

Bytes are cheap, just use brackets.

At least that was what I thought until I got bitten firmly in the arse by this little beauty:
http://stackoverflow.com/questions/6379422/c-multiple-classes-with-same-name
Which if I had been more familiar with standard I would have spotted immediately.

In my case I had a little interface I needed to mock for a couple of tests.

class ThingResponder
{
public: 
   virtual void respond(int) = 0;
};

In the first test I didn't care about it.

test1.cpp:

class MyThingResponder : public ThingResponder
{
public: 
   virtual void respond(int) {}
};

void test1::someTest()
{
    MyThingResponder responder;
    foo(responder);
    ...

But in the second test I wanted to verify it was called, so did something a little different

test2.cpp:

class MyThingResponder : public ThingResponder{
public: 
    std::vector<int> responces;
    virtual void respond(int responce)
    {
        responces.push.back(responce);
    }
};

void test2::someTest()
{
    MyThingResponder responder;
    foo(responder);
    ...

Can you guess what happened next.

I had already broken the one definition rule, this did not result in any build errors or warnings, instead when the calling frame wanted to allocate an object of MyThingResponder it used the local definition, but when it called the constructor it used the one from test2.cpp. This meant after I wrote test2, test1 started crashing in weird ways because the constructor for MyThingResponder was initialising a std::vector that wasn't there and screwing up a load of other local variables.

Had it been the other way around the responces vector would have never been initialised and it would have been even harder to track down the error.

These are the things that scared me.

  1. As we move to using more functors and the like, small locally defined classes with generic names will become more common.
  2. There is no way to guard against this. Anyone else's new class could break your code.
And these are the things I realised:
  1. I am going to put everything in a namespace from now on. Everything!
  2. I suppose I do kind-of have to know the obscure intricacies of the C++ standard after all.

Tuesday 18 September 2012

Another pointless C vs C++ musing

I really wanted to agree with this blog post, I largely agreed with part1 and having once tried to integrate ZeroMQ with an application that was tied to an old version of gcc I really wished he'd written it in C too.

Ultimately I think it's a bit wrong-headed though. He seems to want to break encapsulation for a performance benefit (which is fine, so long you appreciate the trade-off and genuinely do need the speed) and for some reason thinks that's fine in C but not in C++. Personally I think ugly hard-to-maintain code is ugly and hard to maintain whatever language you write it in.

I have a pet theory that C will out-live C++. Managed languages will slowly intrude on the application space because the pain of developing in them is so much less, and the performance penalty will continue to get smaller until the one outweighs the other for all but a tiny subset of problems. Even the resource scarce embedded space, where you'd have thought C++ would have trounced, say, Java every time, seems to have been largely ceded, Android being prepared to take the hit of a JVM in return for stability and safety (how stable and safe is arguable, but certainly more stable and safe than running native code from unknown third parties).

Meanwhile C++, with horrors such as exceptions and (yuck!) template meta-programming, is never going to make much inroad into the system space. Operating systems, while they may employ more and more C++ components, are for the foreseeable future going to be written mostly in C. In fact it's clear from noises coming out of Microsoft that they'd like to do as much as possible in C# and are only stymied by performance issues; performance issues that will surely be resolved sooner or later, if only by faster hardware.

So to come full circle, one of the pain points of C++ that really irritates me, and does not seem to get talked about much, is people who insist on writing APIs in C++. In theory it's all nice, you simply compile everything up and off you go, and if they've used some later code feature then you update your compiler. In the real world it doesn't work that way. If I need to update my compiler I need to recompile about a dozen other open-source C++ libraries and go and find updated version of the two or three third party proprietary ones. All that functionality now needs to be re-tested, most of it by human beings, all at great cost.

For these reasons we rarely update compilers, and so the chances of a given C++ API not compiling on one of the (several) compilers we use is greatly increased. Greatly increasing the pain of using it.

On top of all that I can no longer dynamically load it at run time (or at least, not without jumping through more hoops than is worthwhile) which means it becomes an absolute dependency whether or not the user needs the functionality, and I either have to link it statically bloating the binary or leave myself vulnerable to version mismatches.

All of this makes me sad. I love C++; I like the philosophy behind it, believe that smart pointers trump any other form of garbage collection, that templates are great, and think that it is generally just unforgiving enough to force people into actually thinking about what their code is doing rather than muddle through by trial and error. But I think the lack of an ABI (is it even possible? I don't know) is becoming such a large pain point that it will eventually (though hopefully not in my professional lifetime) kill the language off.

Tuesday 29 May 2012

The Unit Testing Catch 22

I am unit testing. Unit testing falls closer to the second of the two types of tests I enumerated here and is, I think everyone agrees, a 'good thing.'

The problem with unit testing is that, like everything short of full functional testing, it relies on domain knowledge. In the days when everything was specified in documents, about which architects had meetings and then bestowed like blessings upon the rest of us, this was not a big deal; the document described exactly what stimulus the unit would receive and if it was wrong it was not your problem. These days we are all agile and stuff, and there are no documents, and we are forced to rely on our own understanding of the wider system to know what stimulus our unit will receive, which in a large system is almost certainly incomplete.

The catch 22 here is that the majority of bugs are due to incomplete domain knowledge. I simply didn't realise that the system would, in some rare occasions, provide me with two notifications before actually doing anything. I assumed that when I received the second notification the first operation was complete. No amount of unit testing would have caught this bug because I would no more have tested it than I would have tested how it coped receiving references to deleted objects.

This is why I am suspicious of mocking frameworks. Mocking is useful, obviously, but it forces you to rely even more on domain knowledge, which is a point of weakness.

Monday 9 January 2012

Solaris Memory Allocators

Is your multi-threaded application running like a dog on Solaris?

Mine was, and after weeks of head scratching I was reduced to suggesting that the client ran it on Linux instead, which was a mite embarrassing to say the least.

Fast forward a year or so and I discover the wonderful Libumem, or, because we have to support Sol8 the almost as wonderful mtmalloc (incidentally, Oracle need to up their SEO game, it took me ages to dig up that link), two multithread optimised memory allocators that speed things up a lot! Brilliantly all you need to do is set LD_PRELOAD and you're off, you don't even need to recompile anything.

Except that I am not working on that application any more. Now I am trying to make something else go faster. And this something else is old, and sits on the legacy comms layer.

Way back in the dim and distant past somebody who isn't here any more wrote a buffered interface for the comms layer. Now if you or I were implementing a data buffer we'd probably put it in a unit tested class, and be all agile about it, and probably use templates and a design pattern and stuff. We'd definitely keep at least two counters, one to track the size of the buffer, and one to track the size of the data in it.

Things weren't like that back then. Classes were just considered uppity structures, unit-testing was viewed as a dubious eccentricity, and templates were for Microsoft Word and Microsoft Word only. Our long forgotten developer didn't need two counters. He just made it so the buffer was always the size of the data in it and whenever that changed he realloc'd it.

No messin'!

The writers of mtmalloc and libumem were not like that, they were like you and me, and they did not worry about realloc because everyone these days uses c++y things like new and delete and there was no c++y way to do realloc. So when they implimented realloc they simply did a malloc, a memcopy, and a free.

Easy!

Except you know what happens next. I set my LD_PRELOAD and everything runs swimmingly, for a while, until it hits a certain data-rate where, suddenly, the comms buffering is being exercised and the application slows back down to a crawl. Even worse than it was before, and I have to open up the labyrinthine antediluvian comms code and take out all those reallocs.

Which was no fun.