Skip to content

RVO 2.0 and std::move – Still Not Ideal

October 3, 2013

Part 2

(This is a continuation from my post about return value optimization)

I thought it would be good to dig a little deeper into RVO and argument passing, mainly for my own curiosity’s sake. This time, we’re going to go down the rabbit hole – how are return values actually returned, and what happens when we return one of the arguments from our function? Let’s start, again, with RVO.

RVO Setup and Mechanism

Imagine your program’s stack – you have a bunch of locals in the frame you’re in, one of them is where you want to place the return value of a function call. To set up the call, you have to push the arguments of the function (from back to front), but crucially, the compiler inserts one extra argument – a hidden parameter that is the pointer to where you want the return to go.

RVO2-1

When the function is called, a function without RVO simply allocates space for the return in its own frame, and then copies it in as the last operation before returning to the caller:

RVO2-2

Whereas a function with RVO enabled will use the parent stack frame (or an arbitrary block of memory) to store its own local, avoiding the copy:

RVO2-3

Passing hidden parameters is nothing new – the __cdecl x86/MS-x64 calling convention, which we are using here, will return integer values in the register EAX/RAX, floats in ST0/XMM0, and it will use the hidden parameter for returning everything else. Remember that in C++, all member function calls have a hidden parameter (the this pointer) passed through ECX/RCX before calling any methods. The magic of RVO is the compiler using the address in the hidden parameter as the location for an object of local scope, the return value of the method.

Caveats, Corner Cases and General Failures

I got a bit overexcited when I did the last post about this, proclaiming that the single return value optimization was something that happened pretty reliably. Well, it isn’t. Not only is it unreliable, but it relies on the optimizer doing sufficient program flow analysis to determine if it is completely safe. Two examples:

TrackingClass RVOTest1(int n)
{
	TrackingClass t;
	if (n < 2)
	{
		t.SetValue(1);
	}
	else
	{
		int a = 1, b = 1;
		for (int i = 2; i < n; i++)
		{
			int c = a + b;
			a = b;
			b = c;
		}
		t.SetValue(b);
	}
	return t;
}

 

TrackingClass RVOTest2(int n)
{
	TrackingClass t;
	if (n < 2)
	{
		t.SetValue(1);
		return t; // extra return here
	}
	else
	{
		int a = 1, b = 1;
		for (int i = 2; i < n; i++)
		{
			int c = a + b;
			a = b;
			b = c;
		}
		t.SetValue(b);
	}
	return t;
}

RVOTest1 will, with optimizations on, get the return value optimized into the caller’s scope. RVOTest2 does not. With or without optimizations. Full disclosure, at this point I tried GCC (4.8), and it did a better job that MSVC (2013), correctly RVOing both cases, even without optimizations. It behaved the same as MSVC when it encountered a mix of named and unnamed returns, however (which is particularly harsh because I am careful to have only one of the returnable objects “alive” at any moment, so it’s not like that slot is somehow being used).

Copy Elision and Passthrough Functions

Cast your mind back – I made the argument in the previous post that passing heavy objects by value isn’t a great idea, as copy elision for function arguments only happens with rvalues (which makes sense, they have to be in the right place on the stack for it to work). But there is a special case of functions I didn’t mention – mutators that modify an input and return a new object based on that input. The example function we were looking at before was this:

TrackingClass IncrementRef(const TrackingClass& t)
{
	return TrackingClass(t.GetValue() + 1);
}

How can we be most optimal with this sort of function? Well, first, let’s think about what happens when RVO and elision are working perfectly. Remember that for this case:

int main()
{
	TrackingClass q = IncrementRef(Fibonacci(10));
	printf("End");
}

The output was:

Constructor, val: 55
Constructor, val: 56
Destructor
End
Destructor

We do get copy elision, and return optimization, but we still end up constructing two whole objects and throwing away one of them – quite costly if the object has associated allocations. Luckily, C++11 has rvalue references and move constructors. We should be able to write a version of this that will be called when we know we’re getting a temporary as our argument, in which case we should be able to avoid extra costly work by moving the data between instances – we’ll still have two objects (at least!), but hopefully we can take advantage of move semantics to improve the efficiency. Okay, so let’s dive in – we should be able to overload our existing code to take an rvalue reference variant of the function:

TrackingClass Increment(TrackingClass&& t)
{
	return TrackingClass(t.GetValue() + 1);
}

Note that the function does not take a const parameter. after making sure the class has a defined move constructor, we can see what output do we get from this:

Constructor, val: 55
Constructor, val: 56
Destructor
End
Destructor

Hmm. Same as before – it’s because we’re creating a new object from the constructor that takes an int, so let’s just modify the object “in-flight” as it were.

TrackingClass Increment(TrackingClass&& t)
{
	t.SetValue(t.GetValue() + 1);
	return t;
}
Constructor, val: 55
Copy Constructor, val: 56
Destructor
End
Destructor

This is actually worse in some ways – we’re clearly copying the object to where we want the return to go, but it’s an rvalue! Shouldn’t it be moved? Maybe we can just change the return type to TrackingClass&&? (protip: you can’t)

I’ll save you the trouble, this is how you fix this issue:

TrackingClass Increment(TrackingClass&& t)
{
	t.SetValue(t.GetValue() + 1);
	return std::move(t);
}
Constructor, val: 55
Move Constructor, val: 56
Destructor
End
Destructor

Surprisingly, our argument seems to have lost the move-semantic-ness that it started with when it entered this function. std::move is identical to static_cast<T&&>, so clearly, as far as the compiler is concerned, by the time we return our object it’s been converted into an lvalue. Once we put the std::move in there, it’s perfect again. We could have also solved it by doing this:

TrackingClass Increment(TrackingClass&& t)
{
	TrackingClass q(std::move(t)); // std::move required here as well
	q.SetValue(q.GetValue() + 1);
	return q;
}

Which just forces our RVO-able “local” q to be created via the move constructor. As you might imagine, things deteriorate pretty quickly once you start branching, although the compiler seems happy to emit move constructors for anything it can’t RVO, except passing through rvalues, where you have to explicitly invoke move sematics.

Morals

Well I hope we’ve all grown as people during this adventure. I think we can draw a few conclusions though:

  • RVO is pretty good, but less good if it’s not unnamed;
  • Given how RVOs and return types work, it’s no wonder that slight mismatches in object sizes completely hose your stack;
  • The compiler will dutifully emit move instructions when it can’t RVO, so if you have a heavy object type, make sure you write a move constructor!
  • Returning rvalues from functions doesn’t emit move instructions, unless you explicitly invoke it via std::move;
  • By having a copy of your function which takes an rvalue-ref (as well as a const ref variant), you can potentially get better performance (this is pretty visible in the performance jumps seen in std::string since C++11);
  • C++ is a broken language and it should really just work instead.

And on that bombshell, I will end this post!

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: