Implementing Semaphores in C++11

[NOTE: Thanks to everyone for all the great feedback on this. Someone flagged a potential issue with the final implementation at the end, so I’ve removed it it temporarily until I get a chance to review it in detail next weekend.]

C++ 11 added a lot of advanced constructs used in parallel programming to the standard library. While atomics, mutexes, condition variables and threads are all available, there is no ‘out of the box’ implementation of semaphores. This is not a defect at all as it is good to keep the standard library from becoming too bloated. As I will show in this post, the committee actually provided all the building blocks necessary for us to create our own synchronisation tools. In this tutorial, we will be using mutexes, atomics and condition variables to implement a general-purpose semaphore class ourselves.

If you are already familiar with semaphores and concurrency, feel free to skip the next section which is quick refresher on semaphores and why they might be useful.

Most modern computers from high-end desktops to mobile phones (excluding the simplistic ones in microwaves etc.) are capable of executing code in parallel. However, we as software engineers keep writing programs as if the machine we are dealing with was sequential. This has the obvious reason that with the complexity that hardware has reached and the non-deterministic nature of executing programs on a multi-core CPU it would be impossible for us to write or understand code explicitly written for that hardware. Therefore, we leave it to the compiler to convert our sequential program into one that can be executed in parallel. As compilers generally cannot know how data is shared between threads, we sometimes have to intervene to protect bits of the code that should only be executed sequentially. We do this by using things such as mutexes, condition variables, monitors, semaphores, etc.

These rather abstract and vague observations express how the C++ 11 memory model is designed, i.e. code will behave as if the program was executed sequentially, as long as threads are correctly synchronised. Moving to the technically accurate description, C++ 11 guarantees sequential consistency in the absence of race conditions. A race conditions occurs if 2 or more threads access the same shared resource at the same time and at least one these threads is a writer. It is easy to see why the standard says that such a race triggers undefined behaviour as it makes no sense to write AND read the same thing at the same time. This is succinctly summarised in clause 1.10.21 of the standard:

“The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. [ Note: It can be shown that programs that correctly use mutexes and memory_order_- seq_cst operations to prevent all data races and use no other synchronization operations behave as if the operations executed by their constituent threads were simply interleaved, with each value computation of an object being taken from the last side effect on that object in that interleaving. This is normally referred to as “sequential consistency”. However, this applies only to data-race-free programs, and data-race-free programs cannot observe most program transformations that do not change single-threaded program semantics. In fact, most single-threaded program transformations continue to be allowed, since any program that behaves differently as a result must perform an undefined operation. —end note ]”

Semaphores are one specific way of avoiding race conditions. They can be thought of ‘special’ integers with three functions:

  • 1. A constructor that takes a integer argument
  • 2. A function wait()
  • 3. A function signal()

Note that there is no way to read the value of the integer once created. The reason is that the internal state of the object may change at any time, so there is no point in returning a variable: it would be unreliable as soon as the next instruction is executed.

The ‘internal value’ of the semaphore is however important to understand its operation. When a thread calls wait() and the semaphore is negative, the calling thread will not proceed until it is awakened. If the semaphore is positive, it is decremented immediately and wait() will return.

This means that whenever a call to wait() returns, the semaphore is decremented, indicating that a thread has invoked wait().

Conversely, if signal() is called, the semaphore is incremented which will cause one and only one of the waiting threads to proceed (if any are waiting).

Depending on the implementation of the semaphore a thread might busy wait in the wait() function, e.g. by using a spin lock, or it may be sent to sleep and woken up by a call to signal() later. In some books, you will find the operation of the semaphore described using queues.

This means that it is crucial what value we initialise the semaphore with as it determines how many threads can concurrently access the critical region, i.e. the one we wish to protect. Another way to look at this is that the initial value determines how many threads can call wait() before one of them will be blocked.

If the semaphore was initialised to 1 and there were two threads running concurrently either one of them will call wait() first, decrement the semaphore to 0 and proceed to the critical section. This will cause the other thread to wait (which will find the semaphore to be 0) until the first thread calls signal(), thereby waking up the waiting thread and incrementing the semaphore to 1. This other thread will then in turn decrement the semaphore to 0 again and proceed to the critical section.

However, if we had initialised the semaphore to 2, neither of the threads would have had to wait as decrementing 2 once still yields a value greater than 0.

If you are interested, I can highly recommend Allen Downey’s brilliant ‘The Little Book of Semaphores’ (http://greenteapress.com/semaphores/). It’s free, very clearly written and comes with lots of practical examples.

Now that we’ve assembled some background, let’s have a look at how to implement semaphores in C++ 11.

Our class will need to implement the three functions given above. Therefore we are left with the following outline (please click on the pics to expand them):

Next, we need some way to track the value of the semaphore. You can see at this point that we have to say ‘value of the semaphore’ because although thinking about a semaphore as a special integer is helpful as a concept, we are now dealing with a class type.

We know that several threads might call signal() and wait() concurrently. Therefore our internal ‘counter’ int which we will name ‘n_’ will have to be protected. Otherwise, we will almost definitely trigger undefined behaviour.

One way to do this would be to use a mutex to lock both signal() and wait(). However, locks are expensive so we want to avoid them as much as possible. The easiest way we can make sure to avoid a race condition is by using atomics which guarantee the ‘atomicity’ of simple instructions such as increment and decrement. These are defined in the <atomic> header and are templated on a single parameter. Atomics allow us to perform a single ‘read-modify-write’ instruction that other threads cannot interfere with. Adding this to the code gives us:

Note how we’ve also added the increment and decrement as described earlier as well as initialisation of the atomic counter n_ in the constructor.

Next, we need some way to make threads ‘wait’ if they call our member function wait() under the appropriate circumstances. We also the need the logical complement of this operation in signal() when threads that have been temporarily suspended need to wake up again.

One way to approach this is by using condition variables (detailed in clause 30.5 of the standard) and mutexes. Doing so should have some consequences for the design of our class as copying or moving a semaphore object should not be permitted. Although we could in principle move from an instance of type semaphore, copying would be ill defined (Thinking for example about what would happen to the condition variable) and it generally does not make sense in terms of using semaphores to copy them. They are after all synchronisations primitives designed for one specific part of our code.

Following this logic, we should prevent the user being able to access the copy and move constructors of our class. This is necessary as the compiler can generate a copy constructor for a class we define by default if we haven’t done so. The standard mentions this at 3.1.3:

“Note: In some circumstances, C++ implementations implicitly define the default constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move assignment operator (12.8), or destructor (12.4) member functions. —end note ]

Even though not every compiler implements a default move constructor at present (e.g. Visual C++), most compilers at least create a copy constructor.

Under C++ 98, the convention was to simply declare the copy constructor private (without defining it!). We could do the same here for the constructors/operators we wish to make inaccessible but setting their access-specifier (which is what the standard calls public, protected and private; see clause 11.1.1-2) to private seems more like a hack.

This is why C++ 11 allows us to use the ‘delete’ keyword in this context. If like me you want to learn to speak standardese this is called a ‘deleted definition’ and is defined at clause 8.4.3:

“8.4.3 Deleted definitions [dcl.fct.def.delete]

1 A function definition of the form: attribute-specifier-seqopt decl-specifier-seqopt declarator = delete; is called a deleted definition. A function with a deleted definition is also called a deleted function.

2 A program that refers to a deleted function implicitly or explicitly, other than to declare it, is ill-formed.

[ Note: This includes calling the function implicitly or explicitly and forming a pointer or pointer-to-member to the function. It applies even for references in expressions that are not potentially-evaluated. If a function is overloaded, it is referenced only if the function is selected by overload resolution. —end note ]

3 [ Example: One can enforce non-default initialization and non-integral initialization with

struct sometype {

sometype() = delete; // OK, but redundant

some_type(std::intmax_t) = delete;

some_type(double);

};

—end example ]”

8.4.3.2 is really important here because it tells us that a program that attempts to call a deleted function is ‘ill-formed’ meaning that it will not compile, therefore allowing us to spot errors as soon as possible.

In fact, cause 30.5.1 details that this is exactly how the class condition_variable deals with this problem in its implementation:

    “namespace std {

class condition_variable {

public:

condition_variable();

~condition_variable();

condition_variable(const condition_variable&) = delete;

condition_variable& operator=(const condition_variable&) = delete; […]”

[Please check back for the final implementation soon – this is under review following some discussion.]

Advertisements

One thought on “Implementing Semaphores in C++11

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s