The ‘Type of Lambda’

In my first post about functional programming styles I sort of skipped over what exactly a lambda expression ‘becomes’ when it’s evaluated. As this is generally not that well documented I thought it would be worth talking about it briefly.

The Standard (N3242) starts out by stating what most people would assume at clause 5.1.2.1:

“Lambda expressions provide a concise way to create simple function objects.”

So we know that the result of evaluating a lambda expression is a function object. This is called the ‘closure object’ and is a prvalue:

“The evaluation of a lambda-expression results in a prvalue temporary. This temporary is called the closure object.” (5.1.2.2)

As a reminder, ‘prvalue’ stands for pure rvalue. For all practical purposes a prvalue IS an rvalue, it’s just that the standard is being exact here by differentiating two different kinds of rvalues, i.e. xvalues and prvalues.

That’s because there is a subtle difference between an object “near the end of its lifetime” (3.10.1) and something that, for example, ONLY appears on the right hand side of an expression. As there can be no question that the latter is actually an rvalue, the standard calls this a pure rvalue or prvalue:

“An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime (so that its resources may be moved, for example). […]

— An rvalue (so called, historically, because rvalues could appear on the right-hand side of an assignment expression) is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.

— A prvalue (“pure” rvalue) is an rvalue that is not an xvalue. [ Example: The result of calling a function whose return type is not a reference is a prvalue. The value of a literal […] or true is also a prvalue. —end example ]” (3.10.1)

The intuition behind an xvalue is that that it could seen as BOTH an lvalue and an rvalue in context. Consider this snippet of code:

Image

Our object foo is of automatic storage duration which means that it will be deleted at the end of the block. This makes it safe to ‘move’ from foo, i.e. to convert it to an rvalue using std::move. However, our object has a name, ‘foo’, which makes it an lvalue (remember that the rule of thumb is that anything that has a name is an lvalue). It is only because it is nearing the end of its lifetime that we can use it like an rvalue (provided of course we do not access/modify it after having treated it like an rvalue). Thus, foo is an lvalue that can be used like an rvalue, which the standard calls an xvalue. This is expressed in the following diagram found at 3.10.1:

Image

The gist of this is that lambda expressions return temporary objects in the same way that objects are returned by value from functions, which makes them pure rvalues.

Now that we know what kind of thing is being returned by lambda expression, we have to look at its type, i.e. what exactly we are getting. The standard tells us that this type is unique as well as implementation defined, so let’s call it ‘type lambda’ for now (Although if you were speaking standardese you would call this the ‘closure type’).

“The type of the lambda-expression (which is also the type of the closure object) is a unique, unnamed non-union class type — called the closure type — whose properties are described below. […] An implementation may define the closure type differently from what is described below provided this does not alter the observable behavior of the program [..]” (5.1.2.3)

The last line of this quote is a promise that there are certain things about type lambda that will always be the same and this is what we can exploit when using it. The most important of those things for the purposes of this discussion is that type lambda will have an operator () that matches our definition in the lambda expression:

“The closure type for a lambda-expression has a public inline function call operator (13.5.4) whose parameters and return type are described by the lambda-expression’s parameter-declaration-clause and trailing return-type respectively. This function call operator is declared const (9.3.1) if and only if the lambda expression’s parameter-declaration-clause is not followed by mutable. It is neither virtual nor declared volatile.“ (5.1.2.5)

This promise is important because it tells us how we can use a type lambda. Let’s assume we are dealing with a simple lambda expression that simply adds its two integer arguments and returns the result by value.

First of all, you could simply use auto like this:

Image

The expression statement ‘X(11, 22)’ is guaranteed to work because of the promise the standard makes, regardless of what the compiler deduces auto to be.

This also means that we can use a std::function of the appropriate type here instead of auto:

Image

Similarly, if you were writing a function that takes a lambda and then applies it to some arguments, simply use templates and then call the overloaded operator:

Image

As an aside, this also gives you a compile-time check whether the lambda is of the correct type which will help to identify errors as early as possible. This is because the compiler knows exactly what operator () will look like for the type you are trying to pass to the function.

I hope this helps clarify some things. As always, thx for reading 🙂

Advertisements

Functional Programming Styles in C++ (1 of n) : ‘Partial Application’ using ‘std::bind’

While I am definitely a supporter of the imperative style of programming I have found some ideas from functional programming extremely useful in producing clean code. I will demonstrate some of these ideas in this series of posts (please click on the images to expand them).

The C++ 11 standard expanded the capability to use functional styles directly in standard C++, especially in the classes found in the <functional> header (based on the excellent equivalent from boost) and the inclusion of lambda expressions directly into the language. As there are many excellent tutorials about lambda expressions out there, I will not go into detail about them here except to note their connection to the topics we are concerned with, namely that they have to do with functions.

For everyone who has never programmed in a functional language before, it is worth noting that purely functional languages do not have the concept of objects or side effects at all. In fact, the user has no access to memory and everything is defined using functions and recursion as well as a usually limited set of primitive types and more complex but also pre-defined ones such as tuples and lists (e.g. Miranda).

Technically, what C++ understands as a lambda expression is not the same as what functional programming languages understand by that term. In C++, lambda expressions are a way to write inline definitions of functions objects (or ‘functors’). These get handled by C++ in exactly the same way as ‘regularly’ defined functors and have nothing to do with the lambda calculus in a formal way. However, they derive their name from the idea that there should be a short-hand notation for the type of things they are meant to be used for. This mimics the terse syntactic style of many functional languages.

Let’s look at an example. The STL sorting algorithm ‘std::sort’ can take a predicate as an optional third argument. Thus, if I wanted to sort an array from largest to smallest (ignoring std::greater and std::less for now), I would either have to write a custom function or a functor like this:

Image

The problem is that the code for either of these is quite verbose for what we are trying to achieve and might be located in a completely different block than the one we are in now. This will make it harder for anyone to understand and make the code bloated. What C++ learned from the functional style is that an inline definition should be possible here. Lambda expressions are an awesome way to do this and will even become truly polymorphic in C++ 14!

Using lambda expressions our code becomes the following:

Image

While lambda expressions have deservedly had the most attention of all functional elements in C++ 11, we are concerned with the less famous ones here.

Now that the introduction is out of the way let’s turn to the main concern of this post: partial application.

Functional programming languages usually let a programmer define functions in a curried style. This means that rather than having a function that has to be called with a pre-defined number of parameters we can call it with just one argument to start with and supply the others later.

Consider the following C++ function:

int add(int x, int y) {
                return x + y;
}

An equivalent curried function would look like this in Miranda:

add x y = x + y

The first thing that might look strange here is the absence of brackets around the function parameters x and y on the left hand side of the expression. This reflects the idea that this function can be evaluated when only one argument is present, i.e. the function is curried.

Actually, that last sentence was slightly incorrect in that the function can only be partially evaluated or ‘partially applied’ because the + operator needs to have a right and a left-hand side! So what does the function add return if we partially apply it to say, 6?

The answer is easy: Another function with expects only one more argument that looks like this:

f x = x + 6

This doesn’t work for us because (non-variadic) functions in C++ expect a fixed number of arguments and this is fundamental to the imperative style and memory model. Hence we cannot do ‘partial application’. But what we can do is save some arguments and then return something that expects the rest before the function gets called at all, i.e. we do not evaluate our ‘add’ function until both arguments are provided. In other words, we can bind arguments to certain function parameters. In C++ 11, we can do this using std::bind with the help of some placeholders:

Image

In this example, we bind a specialisation of ‘add’ to 6 (for the ‘right’ parameter) which gives us a function called bound_function which expects one more argument. We specify this in the bind expression by using one of the placeholders defined in the std::placeholders namespace, namely ‘_1’. These are uglified by a leading underscore (highlighting once more that no one should use similar names for user-defined variables) and you can have any number of these in your bind expression (depending on the function you want to bind).

Because this process goes against the way C++ normally likes to think, there are some problems with this approach. First of all the actual type of bound_function that the compiler will deduce when you’re using auto will be very complicated. Secondly, operator () of that type has a variadic templated parameter, i.e. it takes any number of arguments. However, only the first of these gets evaluated and the rest discarded so the following compiles and does not produce incorrect results:

Image

While this produces the correct result your code will be harder for someone else to understand, especially as being able to supply more than the expected number of arguments might lead someone else to think your code does something which it actually does not do. Worst of all, neither the compiler nor the runtime will ever complain!

Now let’s have a look at the type that std::bind returns. You might need that if you are trying to define a function whose return type is the result of a std::bind expression. One such case is the famous ‘cancel’ ‘combinator’ function (I know combinators aren’t relevant for C++ but it’s a good example).

If you ask the compiler for the type of our cancel function (specialised for int in this particular case) it will tell you that you are dealing with:

std::_Bind<true,T,T (__cdecl *const )(T,T),X &,std::_Ph<1> &>' to 'int (__cdecl *)(int,int)

That’s not something we want to use! More troubling is that the type of std::bind is actually unspecified, i.e. implementation defined! So what is the solution here?

As always, members of the standard’s committee were aware of this problem and gave us a really smart answer with C++ 11. And the added bonus of that solution is that it gets rid of the problem of not having the number of arguments defined.

The solution is to assign the result of a bind expression to a std::function of the appropriate type:

Functional_1_5

Because ‘cancel’ explicitly returns something of type function, the auto in main will also be of that type.

Note that we are using the new ‘trailing-return-type’ syntax here because it would otherwise be very tedious to get template argument deduction to correctly figure out the return type of the ‘func’ parameter. This way, we can use a decltype expression with the actual parameters to figure out the return type of the itself templated argument to this function more easily. The standard (N3242) highlights this in clause 8.3.12:

“A trailing-return-type is most useful for a type that would be more complicated to specify before the declarator-id:

template <class T, class U> auto add(T t, U u) -> decltype(t + u);

rather than

template <class T, class U> decltype((*(T*)0) + (*(U*)0)) add(T t, U u);

With this knowledge, we can now safely and transparently pass around bound functions and use them in a straightforward manner.