C++ Cargo Cults, RVO and Copy Elision
So I was watching the talks from Microsoft’s Going Native conference – it was the typical mix of modern C++ nonsense; all make_unique, rvalue references and, my new favourite*, the promise of decltype(auto) for function return types. As I watched Scott Meyers’ talk (don’t bother, it’s not worth it – here is a summary “words words words std::move words”, only imagine it going on for 59 more minutes), something caught my eye (apart from his hair, I mean – I honestly don’t know how that is legal in this day and age – and this is coming from someone whose locks look “hobbit like” on a good day), namely a bit of code and a weird justification:
void Function(const std::string argument) { // Do some stuff with the string here }
After showing this, he said that the argument was passed by value “for speed“, which I can only assume was some kind of mouth-based typo for “inefficiency” (or possibly “contrivance, so I can talk about std::move until all of you congeal”). To his – and his hair’s – credit, he then took back the suggestion that it was actually faster, pointing instead to someone’s blog post as the source of the confusion. My curiosity piqued, I went in search of the post in question.
* Please don’t use this.
Want Speed? Don’t Invoke the Copy Constructor for Nontrivial Objects
Okay, it’s less catchy than the blog post that caused this whole episode, but I guarantee it is at least 90% more accurate. It seems that the author is under the (surprisingly common) mistaken impression that compilers are filled with magical elves that will wave away inefficiency with their pixie wands. From that position, he begins with a rather outlandish claim – that passing compound objects (e.g., a vector of strings no less) by value is faster than passing them by reference. I can accept his functional programming arguments (“mutation is so ew“) – not that I agree with them, mind – but surely, surely he is mistaken.
Return Value Optimization
Let’s start at the top – RVO, or return value optimization, allows the compiler to construct the return value of a function in place at the call site – an operation called elision, which basically means “I totally did what you wanted me to”. It’s something that has existed in compilers since time immemorial, as it’s quite easy to create situations where you have to return an object (potentially with an expensive construction cost) by value.
Let’s start by creating a class which will track our constructor/destructor calls – the class will just wrap an int, but the principle is the same; imagine it’s something really expensive, like, I don’t know, a vector of strings.
class TrackingClass { int Value; public: TrackingClass(int _x) : Value(_x) { printf("Constructor, val: %d\n", Value); } TrackingClass(const TrackingClass& cc) : Value(cc.Value) { printf("Copy Constructor, val: %d\n", Value); } ~TrackingClass() { printf("Destructor\n"); } };
By the way, I am using printf rather than iostreams because the latter was possibly the worst designed library in the STL**, and that is saying a lot. Okay, now we need some test code.
TrackingClass Fibonacci(int n) { if (n < 2) return TrackingClass(1); int a = 1, b = 1; for (int i = 2; i < n; i++) { int c = a + b; a = b; b = c; } return TrackingClass(b); } int main() { TrackingClass t = Fibonacci(10); printf("End"); }
Okay – simple, we’re returning an object by value. What does the program print?
End
Destructor
Pretty cool! The object is made directly into the return value – even in Debug mode for Visual Studio, so you don’t need to crank up the optimization settings to take advantage of it. There are a few limitations though. First, you might notice that my Fibonacci function returns temporaries only; I create the returned value in the returning statement. This is important – as I could have written the function in a way that would throw a spanner in the works. Let’s say we add a SetValue method to our TrackingClass, so we can change the value at a later time. What happens when we use this function instead?
TrackingClass Fibonacci2(int n) { TrackingClass t; // default constructor if (n < 2) t.SetValue(1); else { int a = 1, b = 1; for (int i = 2; i < n; i++) { int c = a + b; a = b; b = c; } t.SetValue(b); } return t; }
And the results:
End
Destructor
It turns out that I have again dodged a bullet – I have taken advantage of two forms of RVO, Unnamed Return Value Optimization (the first case – all return statements are temporaries), and Named Return Value Optimization in the second case, where I returned only one object. Note that NRVO is less… effective in debug mode, so don’t rely on it always! What happens if we make a little change for the case where the app asks for Fibonacci numbers where N < 2? Let’s also modify our test function too, while we’re at it.
TrackingClass Fibonacci3(int n) { if (n < 2) return TrackingClass(1); else { TrackingClass t; int a = 1, b = 1; for (int i = 2; i < n; i++) { int c = a + b; a = b; b = c; } t.SetValue(b); return t; } } int main() { TrackingClass t = Fibonacci3(1); TrackingClass t2 = Fibonacci3(10); printf("End"); }
Who wants to bet what the output is?
Default Constructor
Copy Constructor, val: 55
Destructor
End
Destructor
Bleh. When I wrote this, I didn’t expect it to elide the first (URVO) case, so that was a surprise (if I make another temp TrackingClass in that branch, we lose all RVO). But the point is, suddenly the shit has hit the wall – by making a fairly insubstantial change to the function body, you have to go through the copy constructor to get your value back. So remember – return only temporaries or one variable local to the function. For reference:
- RVO happens basically all the time – it is the common case with compilers and you do not need to inline the heck out of everything;
- Update: NRVO doesn’t always happen in debug mode;
- Optimization isn’t going to save you if you break the requirements for URVO/NRVO;
- RVO works across compilation modules;
- RVO works across DLL/library boundaries.
The last two are true because the ABI for various C++ compilers is designed to set up space for return values at the call site, rather than some kind of max optimizer hackery. But does this mean we should return everything by value? Does it heck! Firstly, writing your code to be perfect for SRVO or URVO is not always going to be the best option, and secondly, I present an example that is close to my heart:
class Example { shared_ptr<Thing> m_Resource; public: shared_ptr<Thing> GetResource() { return m_Resource; } const shared_ptr<Thing>& GetResourceByRef() { return m_Resource; } };
Surely we’ve just determined that GetResource is just as good as GetResourceByRef, I mean, that shit has got to be elided, am I right? Well yes, and no – you will get all the RVO your little heart could desire, but the fact of the matter is that by returning by value, you have to make a copy of the shared pointer at least once, which is where the cost is. Sure, you dodged a copy, but that just makes it bad rather than terrible.
C++11 added a new constructor variant which I feel I should mention at this point – it’s called the move constructor, and it looks like this:
TrackingClass(TrackingClass&& cc) : Value(cc.Value) { printf("Move Constructor, val: %d\n", Value); }
The job of the move constructor is essentially to steal the pertinent bits from the argument – the compiler emits move operations when it is sure you will never want the object again (like a return value about to go out of scope), and if you declare it for your types, you will see move being called rather than the copy constructor. When writing the move constructor, you have to do two things:
- Steal all the “costly” data and state info – e.g., for a vector you take the count and the buffer;
- Trick the object you steal the stuff from into forgetting about everything it owns.
Point 2 is critical – the compiler will still call the destructor on the item you moved from. So in the vector case, you have to null the heck out of it to stop it from deciding “hey, I’m out of scope, better delete that buffer!”. The reason you have to do this is because move semantics are a horrible hack on an, essentially, broken language.
** Both in terms of performance and design. Go on, try it. I’ll wait (heck I’ll have to). Also, imbue() and getloc()? Really guys?
Okay, so what about pass by value?
Yeah, enough about RVO. That stuff is great, it works. In C++11 you can even bandaid over the bits that are a bit wobbly. But this post is meant to be about passing objects by value, like vectors (of strings!!!!). I looked through the guys post for a sensible demonstration, but unfortunately all his code consisted of sorting a vector in an improbable way, so let’s make up our own stuff!
TrackingClass Increment(TrackingClass t) { return TrackingClass(t.GetValue() + 1); } TrackingClass IncrementRef(const TrackingClass& t) { return TrackingClass(t.GetValue() + 1); }
And some testing code:
int main() { TrackingClass r = Fibonacci(10); printf("Pre-Increment\n"); TrackingClass t = Increment(r); printf("End"); }
What do we end up getting? No surprises here – RVO works because this is the better version of the Fibonacci function, and we have to copy the value into the function argument.
Pre-Increment
Copy Constructor, val: 55
Constructor, val: 56
Destructor
End
Destructor
Destructor
And if we replace the call to Increment with IncrementRef:
Pre-Increment
Constructor, val: 56
End
Destructor
Destructor
Score – one less copy constructor, one less destroyed temporary. A victory in anyone’s book? Perhaps for most, but not our intrepid hero, Dan, who chips away at the received wisdom. “What,” he asks himself, “if I put the return value of a function as the argument?”. Good question Dan, what does happen? Let’s make a small change:
int main() { TrackingClass t = Increment(Fibonacci(10)); printf("End"); }
And the output?
Constructor, val: 56
Destructor
End
Destructor
Pretty good all up, we got some sexy elision going on (the IncrementRef case is identical), so maybe Dan isn’t as crazy as I’ve been implying this whole post?
Sorry Dan. This is a bad idea. Yes, it works fine for temporaries (rvalues, in the parlance of people who care too much), but it doesn’t work fine for:
- Members
- Globals
- Statics
- Locals
Oh hell, that’s like 99% of the stuff we care about! Actually it’s all making sense – if you write C++ in a purely functional way, you won’t need any of those pesky fields! So in conclusion, pass by reference if the object is “complex”. The compiler may find it harder to do SSA on your code, but the compiler is not as clever as people say it is.
(This article continues with a look at rvalue references)
A Short Rant
All these things bring to the fore something that has irritated me a great deal – a lot of people who really should know better advocate this sort of thing, as well as all the nu-C++isms like not managing memory yourself, or blindly using the STL types without thinking (std::distance into a std::set is fantastic and coming to production code near you!). There is also general derision for the kind of C++ I, and many in my field write – the game industry tends to be very results based, and I think it’s quite unfair to accuse us of “cargo culting”. It’s also irritating to see the language try to deprecate the low level parts – several talks in the MS conference said things along the lines of “don’t do this [perfectly reasonable C++ code like I would write], unless you’re writing something which needs ~*high performance*~.”
Listen, if I wasn’t writing something which needed to be fast, I wouldn’t use C++.
Good article about RVO and copy elision, but the whole rant about “pass by value” is totally wrong. You missed the point of the “Want speed?” article.
The suggestion is “pass by value” for the SPECIFIC case when your function is going to “make a copy of the parameter anyway”, instead of passing a reference and doing the copy inside the function.
For all other cases, the “pass by reference” should by used as always.
Yeah, and the whole rant about Scott Meyers too, he happens to be one of the most smart people I’ve had the pleasure to listen to. Judging someone by its appearance is anything but smart, also.
Don’t worry, I am aware that I’m a terrible person.