The ‘Type of Lambda’

In my first post about functional programming styles I sort of skipped over what exactly a lambda expression ‘becomes’ when it’s evaluated. As this is generally not that well documented I thought it would be worth talking about it briefly.

The Standard (N3242) starts out by stating what most people would assume at clause 5.1.2.1:

“Lambda expressions provide a concise way to create simple function objects.”

So we know that the result of evaluating a lambda expression is a function object. This is called the ‘closure object’ and is a prvalue:

“The evaluation of a lambda-expression results in a prvalue temporary. This temporary is called the closure object.” (5.1.2.2)

As a reminder, ‘prvalue’ stands for pure rvalue. For all practical purposes a prvalue IS an rvalue, it’s just that the standard is being exact here by differentiating two different kinds of rvalues, i.e. xvalues and prvalues.

That’s because there is a subtle difference between an object “near the end of its lifetime” (3.10.1) and something that, for example, ONLY appears on the right hand side of an expression. As there can be no question that the latter is actually an rvalue, the standard calls this a pure rvalue or prvalue:

“An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime (so that its resources may be moved, for example). […]

— An rvalue (so called, historically, because rvalues could appear on the right-hand side of an assignment expression) is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.

— A prvalue (“pure” rvalue) is an rvalue that is not an xvalue. [ Example: The result of calling a function whose return type is not a reference is a prvalue. The value of a literal […] or true is also a prvalue. —end example ]” (3.10.1)

The intuition behind an xvalue is that that it could seen as BOTH an lvalue and an rvalue in context. Consider this snippet of code:

Image

Our object foo is of automatic storage duration which means that it will be deleted at the end of the block. This makes it safe to ‘move’ from foo, i.e. to convert it to an rvalue using std::move. However, our object has a name, ‘foo’, which makes it an lvalue (remember that the rule of thumb is that anything that has a name is an lvalue). It is only because it is nearing the end of its lifetime that we can use it like an rvalue (provided of course we do not access/modify it after having treated it like an rvalue). Thus, foo is an lvalue that can be used like an rvalue, which the standard calls an xvalue. This is expressed in the following diagram found at 3.10.1:

Image

The gist of this is that lambda expressions return temporary objects in the same way that objects are returned by value from functions, which makes them pure rvalues.

Now that we know what kind of thing is being returned by lambda expression, we have to look at its type, i.e. what exactly we are getting. The standard tells us that this type is unique as well as implementation defined, so let’s call it ‘type lambda’ for now (Although if you were speaking standardese you would call this the ‘closure type’).

“The type of the lambda-expression (which is also the type of the closure object) is a unique, unnamed non-union class type — called the closure type — whose properties are described below. […] An implementation may define the closure type differently from what is described below provided this does not alter the observable behavior of the program [..]” (5.1.2.3)

The last line of this quote is a promise that there are certain things about type lambda that will always be the same and this is what we can exploit when using it. The most important of those things for the purposes of this discussion is that type lambda will have an operator () that matches our definition in the lambda expression:

“The closure type for a lambda-expression has a public inline function call operator (13.5.4) whose parameters and return type are described by the lambda-expression’s parameter-declaration-clause and trailing return-type respectively. This function call operator is declared const (9.3.1) if and only if the lambda expression’s parameter-declaration-clause is not followed by mutable. It is neither virtual nor declared volatile.“ (5.1.2.5)

This promise is important because it tells us how we can use a type lambda. Let’s assume we are dealing with a simple lambda expression that simply adds its two integer arguments and returns the result by value.

First of all, you could simply use auto like this:

Image

The expression statement ‘X(11, 22)’ is guaranteed to work because of the promise the standard makes, regardless of what the compiler deduces auto to be.

This also means that we can use a std::function of the appropriate type here instead of auto:

Image

Similarly, if you were writing a function that takes a lambda and then applies it to some arguments, simply use templates and then call the overloaded operator:

Image

As an aside, this also gives you a compile-time check whether the lambda is of the correct type which will help to identify errors as early as possible. This is because the compiler knows exactly what operator () will look like for the type you are trying to pass to the function.

I hope this helps clarify some things. As always, thx for reading 🙂

Implementing Semaphores in C++11

[NOTE: Thanks to everyone for all the great feedback on this. Someone flagged a potential issue with the final implementation at the end, so I’ve removed it it temporarily until I get a chance to review it in detail next weekend.]

C++ 11 added a lot of advanced constructs used in parallel programming to the standard library. While atomics, mutexes, condition variables and threads are all available, there is no ‘out of the box’ implementation of semaphores. This is not a defect at all as it is good to keep the standard library from becoming too bloated. As I will show in this post, the committee actually provided all the building blocks necessary for us to create our own synchronisation tools. In this tutorial, we will be using mutexes, atomics and condition variables to implement a general-purpose semaphore class ourselves.

If you are already familiar with semaphores and concurrency, feel free to skip the next section which is quick refresher on semaphores and why they might be useful.

Most modern computers from high-end desktops to mobile phones (excluding the simplistic ones in microwaves etc.) are capable of executing code in parallel. However, we as software engineers keep writing programs as if the machine we are dealing with was sequential. This has the obvious reason that with the complexity that hardware has reached and the non-deterministic nature of executing programs on a multi-core CPU it would be impossible for us to write or understand code explicitly written for that hardware. Therefore, we leave it to the compiler to convert our sequential program into one that can be executed in parallel. As compilers generally cannot know how data is shared between threads, we sometimes have to intervene to protect bits of the code that should only be executed sequentially. We do this by using things such as mutexes, condition variables, monitors, semaphores, etc.

These rather abstract and vague observations express how the C++ 11 memory model is designed, i.e. code will behave as if the program was executed sequentially, as long as threads are correctly synchronised. Moving to the technically accurate description, C++ 11 guarantees sequential consistency in the absence of race conditions. A race conditions occurs if 2 or more threads access the same shared resource at the same time and at least one these threads is a writer. It is easy to see why the standard says that such a race triggers undefined behaviour as it makes no sense to write AND read the same thing at the same time. This is succinctly summarised in clause 1.10.21 of the standard:

“The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. [ Note: It can be shown that programs that correctly use mutexes and memory_order_- seq_cst operations to prevent all data races and use no other synchronization operations behave as if the operations executed by their constituent threads were simply interleaved, with each value computation of an object being taken from the last side effect on that object in that interleaving. This is normally referred to as “sequential consistency”. However, this applies only to data-race-free programs, and data-race-free programs cannot observe most program transformations that do not change single-threaded program semantics. In fact, most single-threaded program transformations continue to be allowed, since any program that behaves differently as a result must perform an undefined operation. —end note ]”

Semaphores are one specific way of avoiding race conditions. They can be thought of ‘special’ integers with three functions:

  • 1. A constructor that takes a integer argument
  • 2. A function wait()
  • 3. A function signal()

Note that there is no way to read the value of the integer once created. The reason is that the internal state of the object may change at any time, so there is no point in returning a variable: it would be unreliable as soon as the next instruction is executed.

The ‘internal value’ of the semaphore is however important to understand its operation. When a thread calls wait() and the semaphore is negative, the calling thread will not proceed until it is awakened. If the semaphore is positive, it is decremented immediately and wait() will return.

This means that whenever a call to wait() returns, the semaphore is decremented, indicating that a thread has invoked wait().

Conversely, if signal() is called, the semaphore is incremented which will cause one and only one of the waiting threads to proceed (if any are waiting).

Depending on the implementation of the semaphore a thread might busy wait in the wait() function, e.g. by using a spin lock, or it may be sent to sleep and woken up by a call to signal() later. In some books, you will find the operation of the semaphore described using queues.

This means that it is crucial what value we initialise the semaphore with as it determines how many threads can concurrently access the critical region, i.e. the one we wish to protect. Another way to look at this is that the initial value determines how many threads can call wait() before one of them will be blocked.

If the semaphore was initialised to 1 and there were two threads running concurrently either one of them will call wait() first, decrement the semaphore to 0 and proceed to the critical section. This will cause the other thread to wait (which will find the semaphore to be 0) until the first thread calls signal(), thereby waking up the waiting thread and incrementing the semaphore to 1. This other thread will then in turn decrement the semaphore to 0 again and proceed to the critical section.

However, if we had initialised the semaphore to 2, neither of the threads would have had to wait as decrementing 2 once still yields a value greater than 0.

If you are interested, I can highly recommend Allen Downey’s brilliant ‘The Little Book of Semaphores’ (http://greenteapress.com/semaphores/). It’s free, very clearly written and comes with lots of practical examples.

Now that we’ve assembled some background, let’s have a look at how to implement semaphores in C++ 11.

Our class will need to implement the three functions given above. Therefore we are left with the following outline (please click on the pics to expand them):

Next, we need some way to track the value of the semaphore. You can see at this point that we have to say ‘value of the semaphore’ because although thinking about a semaphore as a special integer is helpful as a concept, we are now dealing with a class type.

We know that several threads might call signal() and wait() concurrently. Therefore our internal ‘counter’ int which we will name ‘n_’ will have to be protected. Otherwise, we will almost definitely trigger undefined behaviour.

One way to do this would be to use a mutex to lock both signal() and wait(). However, locks are expensive so we want to avoid them as much as possible. The easiest way we can make sure to avoid a race condition is by using atomics which guarantee the ‘atomicity’ of simple instructions such as increment and decrement. These are defined in the <atomic> header and are templated on a single parameter. Atomics allow us to perform a single ‘read-modify-write’ instruction that other threads cannot interfere with. Adding this to the code gives us:

Note how we’ve also added the increment and decrement as described earlier as well as initialisation of the atomic counter n_ in the constructor.

Next, we need some way to make threads ‘wait’ if they call our member function wait() under the appropriate circumstances. We also the need the logical complement of this operation in signal() when threads that have been temporarily suspended need to wake up again.

One way to approach this is by using condition variables (detailed in clause 30.5 of the standard) and mutexes. Doing so should have some consequences for the design of our class as copying or moving a semaphore object should not be permitted. Although we could in principle move from an instance of type semaphore, copying would be ill defined (Thinking for example about what would happen to the condition variable) and it generally does not make sense in terms of using semaphores to copy them. They are after all synchronisations primitives designed for one specific part of our code.

Following this logic, we should prevent the user being able to access the copy and move constructors of our class. This is necessary as the compiler can generate a copy constructor for a class we define by default if we haven’t done so. The standard mentions this at 3.1.3:

“Note: In some circumstances, C++ implementations implicitly define the default constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move assignment operator (12.8), or destructor (12.4) member functions. —end note ]

Even though not every compiler implements a default move constructor at present (e.g. Visual C++), most compilers at least create a copy constructor.

Under C++ 98, the convention was to simply declare the copy constructor private (without defining it!). We could do the same here for the constructors/operators we wish to make inaccessible but setting their access-specifier (which is what the standard calls public, protected and private; see clause 11.1.1-2) to private seems more like a hack.

This is why C++ 11 allows us to use the ‘delete’ keyword in this context. If like me you want to learn to speak standardese this is called a ‘deleted definition’ and is defined at clause 8.4.3:

“8.4.3 Deleted definitions [dcl.fct.def.delete]

1 A function definition of the form: attribute-specifier-seqopt decl-specifier-seqopt declarator = delete; is called a deleted definition. A function with a deleted definition is also called a deleted function.

2 A program that refers to a deleted function implicitly or explicitly, other than to declare it, is ill-formed.

[ Note: This includes calling the function implicitly or explicitly and forming a pointer or pointer-to-member to the function. It applies even for references in expressions that are not potentially-evaluated. If a function is overloaded, it is referenced only if the function is selected by overload resolution. —end note ]

3 [ Example: One can enforce non-default initialization and non-integral initialization with

struct sometype {

sometype() = delete; // OK, but redundant

some_type(std::intmax_t) = delete;

some_type(double);

};

—end example ]”

8.4.3.2 is really important here because it tells us that a program that attempts to call a deleted function is ‘ill-formed’ meaning that it will not compile, therefore allowing us to spot errors as soon as possible.

In fact, cause 30.5.1 details that this is exactly how the class condition_variable deals with this problem in its implementation:

    “namespace std {

class condition_variable {

public:

condition_variable();

~condition_variable();

condition_variable(const condition_variable&) = delete;

condition_variable& operator=(const condition_variable&) = delete; […]”

[Please check back for the final implementation soon – this is under review following some discussion.]