# Notes on Controllable Heliotrope Video

This post documents three challenges faced when implementing an algorithm for controllable heliotrope video as describe here. I completed this as coursework for my current degree and this post is a slightly modified version of the report I submitted.

1. # Trajectory Similarity

“Compute better paths that somehow take both image-difference AND trajectory-similarity into account.”

As no further guidance is provided in the notes, I define ‘trajectory similarity’ to mean how close the interpolated path comes to the input path which for our purposes, always is a linear element with known start and end points.

Rather than modifying the distance matrix itself, my method applies during the selection of the optimal path after the optical flow calculation.

Apart from the normal consideration of how close the interpolated path comes to the target point, this adds the factor of how close it came to the actual path that the user specified. The relative importance of those factors can be tuned by the user by adjusting a weighting coefficient, β, and by optimally applying a non-linear weighting as detailed below.

1. ## Calculation of trajectory similarity

The main input path the user provides is a piecewise-linear function. Thus, if a curve-shaped path is desired, the user can generate it by increasing the number of segments in that function. My trajectory method ensures that the synthesised paths come as close as possible to the actual ‘segments’ of the function, thereby guaranteeing a trajectory that comes close to the line input by the user originally.

Thus, the output of the optical flow for each path is evaluated once based on how close the end point comes to the desired output and additionally for how similar the trajectory was to the line segment. To do, this we calculate the average distance of all advected points in the path to the line segment. Using the appropriate weighting as given by β, we then combine this with the information of end point similarity. For every possible path through the graph, we thus calculate its final cost p:

Where

• ‘dist’ represents the distance between its two operands,
• ‘path’ one possible path in
• the set of Paths, P, which is generated by graph traversal
• and line_segment is the current piece-wise linear element

This results in a vector expressing the cost we have given to all possible paths. The algorithm then simply selects the path with the minimal cost.

Note: As the optical flow tends to result in points that are clustered together, it is more stable to calculate the distance of each point along the optical flow to a truncated version of the piecewise linear element. Thus, if the number of substeps in the path was 3, we truncate the line segment three times as follows:

This is the implementation that can found in my script.

Because of the limited amount of input images and available paths, this does not actually make a difference for the input sequence in the majority of cases unless β is very close to 1. However, tests on different datasets should reveal the effect of this additional constraint more readily.

1. ## Weighting

A user might wish to specify that trajectory similarity is only important for a specific part of the line segment. In this case, we can simply apply a weighting function to the distance values summed up for every path, which would modify our equation as follows.

Where w is a (non-linear) function yielding the appropriate weights.

Note that weighting is currently not implemented but one way the system could be expanded in future.

1. # Slow Motion

“Render slow motion interpolation based initially on the flow between real images in the collection. If synthesized images have artefacts, also consider inventing some post-processing tricks.”

My submission implements two approaches to slow motion. Both methods allow a variable number of interpolated frames.

Creating a slow motion video requires the reconstruction of frames in between the images in the selected path. Thus the problem can be seen as ‘guessing’ the values of a discrete function which describes the pixels P of each frame over time and for which samples at the start and end of the interval we are interested in are known. I call these images I1 and I2 respectively. Regardless of how sophisticated our method is, the approximation will always be a guess. However, we can improve it considerably by some inferences outlined in B.

1. ## The naïve method

The simplest way to achieve slow motion is to simply blend between the two frames that represent the discrete samples between which we wish to interpolate. The following graph illustrates how our scalar alpha which controls the transparency behaves for I1 and I2 in a time interval:

As expected, this method produces ghosting artefacts that are very noticeable.

1. ## Slow motion based on optical flow

Using information from the optical flow, we can warp the image on a per-pixel basis to create the missing information between frames.

Thus, for every time interval, we can warp the image at I1 by the results of the optical flow divided by a factor that represents the current sub-step of the frame we wish to interpolate.

While this produces satisfactory results in some scenarios, this method encounters significant problems if an object is for example rotated by a significant amount between I1 and I2.

This is because we are warping the image only based on I1. To overcome this limitation, I have implemented a backwards-forwards warp that generates half the required samples from I1 and half from I2 (using an inverse transform of the optical flow results at I2). Thus, most artefacts of this kind can be eliminated resulting in smooth, naturalistic looking slow motion.

I also implemented one post-processing technique. To avoid black seems at the borders of the image which result from information not being available ‘outside’ the area covered by the pixels at each frame, any pixel for which complementary information cannot be found is set to its original value. As the background in the current image is static anyway, this technique works but might need to be amended for a different type of input sequence with motion at its edges.

Note: In future versions, it might be worth considering implementing a per-pixel motion blur based on the optical flow which would very straightforward given the data already available.

1. # Custom Video

“Produce and use your own data.”

In addition to the above, I have produced my own data for the system. The motivation for this was the creation of stylised motion, specifically creating a system that would allow a ball to be moved across a ‘drawing board’. The stylisation results from the fact that diagonal motion would be disallowed for the ball and would yield motion similar to a shape like this:

Settings this up proved far more difficult than I had original expected as enough motion samples needed to be provided for the ball with the added constraint that diagonal motion had to be more expensive in terms of frame-to-frame difference than non-diagonal motion.

As this extra bit of control was need, I opted to produce the material digitally. The following screen shot illustrates scene set-up in Maya:

Although this setup might seem rather simplistic, exact measurement of distances and timing of the animation was required. The animation of the ball can be described as looping through each line using the following pattern:

Figure 2: The animation of the ball

To move across the entire board, I rendered roughly 300 frames:

Figure 3: A Frame from the drawing board footage

Note that I added ambient occlusion to make the scene less artificial, i.e. to provide the optical flow with some difficulties.

Experiments carried out with the scene proved to be successful as the exact predicted behaviour of the ball is produced (Please see the readme file for information on how to run this configuration of the script).

Figure 4: Left: Desired Path (Red) & Calculated Path Outcomes (Blue); Right: Selected Trajectory (time-blend)

Thus, the footage created provides a virtual drawing board in which the ‘pencil’ can only move horizontally and vertically.

Note: I additionally created and rendered a sequence of a waving flag textured with a checkerboard pattern (in order to help the optical flow) to test whether any meaningful results could be yielded by the method, for example for use in the far background of scenes.

As can be seen from the images, the wind force is animated so information is provided of the flag moving all directions outlined in figure 4. Unfortunately this experiment was unsuccessful but I have included the footage for completeness.

Figure 5: A frame from the flag footage

# The ‘Type of Lambda’

In my first post about functional programming styles I sort of skipped over what exactly a lambda expression ‘becomes’ when it’s evaluated. As this is generally not that well documented I thought it would be worth talking about it briefly.

The Standard (N3242) starts out by stating what most people would assume at clause 5.1.2.1:

“Lambda expressions provide a concise way to create simple function objects.”

So we know that the result of evaluating a lambda expression is a function object. This is called the ‘closure object’ and is a prvalue:

“The evaluation of a lambda-expression results in a prvalue temporary. This temporary is called the closure object.” (5.1.2.2)

As a reminder, ‘prvalue’ stands for pure rvalue. For all practical purposes a prvalue IS an rvalue, it’s just that the standard is being exact here by differentiating two different kinds of rvalues, i.e. xvalues and prvalues.

That’s because there is a subtle difference between an object “near the end of its lifetime” (3.10.1) and something that, for example, ONLY appears on the right hand side of an expression. As there can be no question that the latter is actually an rvalue, the standard calls this a pure rvalue or prvalue:

“An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime (so that its resources may be moved, for example). […]

— An rvalue (so called, historically, because rvalues could appear on the right-hand side of an assignment expression) is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.

— A prvalue (“pure” rvalue) is an rvalue that is not an xvalue. [ Example: The result of calling a function whose return type is not a reference is a prvalue. The value of a literal […] or true is also a prvalue. —end example ]” (3.10.1)

The intuition behind an xvalue is that that it could seen as BOTH an lvalue and an rvalue in context. Consider this snippet of code:

Our object foo is of automatic storage duration which means that it will be deleted at the end of the block. This makes it safe to ‘move’ from foo, i.e. to convert it to an rvalue using std::move. However, our object has a name, ‘foo’, which makes it an lvalue (remember that the rule of thumb is that anything that has a name is an lvalue). It is only because it is nearing the end of its lifetime that we can use it like an rvalue (provided of course we do not access/modify it after having treated it like an rvalue). Thus, foo is an lvalue that can be used like an rvalue, which the standard calls an xvalue. This is expressed in the following diagram found at 3.10.1:

The gist of this is that lambda expressions return temporary objects in the same way that objects are returned by value from functions, which makes them pure rvalues.

Now that we know what kind of thing is being returned by lambda expression, we have to look at its type, i.e. what exactly we are getting. The standard tells us that this type is unique as well as implementation defined, so let’s call it ‘type lambda’ for now (Although if you were speaking standardese you would call this the ‘closure type’).

“The type of the lambda-expression (which is also the type of the closure object) is a unique, unnamed non-union class type — called the closure type — whose properties are described below. […] An implementation may define the closure type differently from what is described below provided this does not alter the observable behavior of the program [..]” (5.1.2.3)

The last line of this quote is a promise that there are certain things about type lambda that will always be the same and this is what we can exploit when using it. The most important of those things for the purposes of this discussion is that type lambda will have an operator () that matches our definition in the lambda expression:

“The closure type for a lambda-expression has a public inline function call operator (13.5.4) whose parameters and return type are described by the lambda-expression’s parameter-declaration-clause and trailing return-type respectively. This function call operator is declared const (9.3.1) if and only if the lambda expression’s parameter-declaration-clause is not followed by mutable. It is neither virtual nor declared volatile.“ (5.1.2.5)

This promise is important because it tells us how we can use a type lambda. Let’s assume we are dealing with a simple lambda expression that simply adds its two integer arguments and returns the result by value.

First of all, you could simply use auto like this:

The expression statement ‘X(11, 22)’ is guaranteed to work because of the promise the standard makes, regardless of what the compiler deduces auto to be.

This also means that we can use a std::function of the appropriate type here instead of auto:

Similarly, if you were writing a function that takes a lambda and then applies it to some arguments, simply use templates and then call the overloaded operator:

As an aside, this also gives you a compile-time check whether the lambda is of the correct type which will help to identify errors as early as possible. This is because the compiler knows exactly what operator () will look like for the type you are trying to pass to the function.

I hope this helps clarify some things. As always, thx for reading 🙂

# Implementing Semaphores in C++11

[NOTE: Thanks to everyone for all the great feedback on this. Someone flagged a potential issue with the final implementation at the end, so I’ve removed it it temporarily until I get a chance to review it in detail next weekend.]

C++ 11 added a lot of advanced constructs used in parallel programming to the standard library. While atomics, mutexes, condition variables and threads are all available, there is no ‘out of the box’ implementation of semaphores. This is not a defect at all as it is good to keep the standard library from becoming too bloated. As I will show in this post, the committee actually provided all the building blocks necessary for us to create our own synchronisation tools. In this tutorial, we will be using mutexes, atomics and condition variables to implement a general-purpose semaphore class ourselves.

If you are already familiar with semaphores and concurrency, feel free to skip the next section which is quick refresher on semaphores and why they might be useful.

Most modern computers from high-end desktops to mobile phones (excluding the simplistic ones in microwaves etc.) are capable of executing code in parallel. However, we as software engineers keep writing programs as if the machine we are dealing with was sequential. This has the obvious reason that with the complexity that hardware has reached and the non-deterministic nature of executing programs on a multi-core CPU it would be impossible for us to write or understand code explicitly written for that hardware. Therefore, we leave it to the compiler to convert our sequential program into one that can be executed in parallel. As compilers generally cannot know how data is shared between threads, we sometimes have to intervene to protect bits of the code that should only be executed sequentially. We do this by using things such as mutexes, condition variables, monitors, semaphores, etc.

These rather abstract and vague observations express how the C++ 11 memory model is designed, i.e. code will behave as if the program was executed sequentially, as long as threads are correctly synchronised. Moving to the technically accurate description, C++ 11 guarantees sequential consistency in the absence of race conditions. A race conditions occurs if 2 or more threads access the same shared resource at the same time and at least one these threads is a writer. It is easy to see why the standard says that such a race triggers undefined behaviour as it makes no sense to write AND read the same thing at the same time. This is succinctly summarised in clause 1.10.21 of the standard:

“The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. [ Note: It can be shown that programs that correctly use mutexes and memory_order_- seq_cst operations to prevent all data races and use no other synchronization operations behave as if the operations executed by their constituent threads were simply interleaved, with each value computation of an object being taken from the last side effect on that object in that interleaving. This is normally referred to as “sequential consistency”. However, this applies only to data-race-free programs, and data-race-free programs cannot observe most program transformations that do not change single-threaded program semantics. In fact, most single-threaded program transformations continue to be allowed, since any program that behaves differently as a result must perform an undefined operation. —end note ]”

Semaphores are one specific way of avoiding race conditions. They can be thought of ‘special’ integers with three functions:

• 1. A constructor that takes a integer argument
• 2. A function wait()
• 3. A function signal()

Note that there is no way to read the value of the integer once created. The reason is that the internal state of the object may change at any time, so there is no point in returning a variable: it would be unreliable as soon as the next instruction is executed.

The ‘internal value’ of the semaphore is however important to understand its operation. When a thread calls wait() and the semaphore is negative, the calling thread will not proceed until it is awakened. If the semaphore is positive, it is decremented immediately and wait() will return.

This means that whenever a call to wait() returns, the semaphore is decremented, indicating that a thread has invoked wait().

Conversely, if signal() is called, the semaphore is incremented which will cause one and only one of the waiting threads to proceed (if any are waiting).

Depending on the implementation of the semaphore a thread might busy wait in the wait() function, e.g. by using a spin lock, or it may be sent to sleep and woken up by a call to signal() later. In some books, you will find the operation of the semaphore described using queues.

This means that it is crucial what value we initialise the semaphore with as it determines how many threads can concurrently access the critical region, i.e. the one we wish to protect. Another way to look at this is that the initial value determines how many threads can call wait() before one of them will be blocked.

If the semaphore was initialised to 1 and there were two threads running concurrently either one of them will call wait() first, decrement the semaphore to 0 and proceed to the critical section. This will cause the other thread to wait (which will find the semaphore to be 0) until the first thread calls signal(), thereby waking up the waiting thread and incrementing the semaphore to 1. This other thread will then in turn decrement the semaphore to 0 again and proceed to the critical section.

However, if we had initialised the semaphore to 2, neither of the threads would have had to wait as decrementing 2 once still yields a value greater than 0.

If you are interested, I can highly recommend Allen Downey’s brilliant ‘The Little Book of Semaphores’ (http://greenteapress.com/semaphores/). It’s free, very clearly written and comes with lots of practical examples.

Now that we’ve assembled some background, let’s have a look at how to implement semaphores in C++ 11.

Our class will need to implement the three functions given above. Therefore we are left with the following outline (please click on the pics to expand them):

Next, we need some way to track the value of the semaphore. You can see at this point that we have to say ‘value of the semaphore’ because although thinking about a semaphore as a special integer is helpful as a concept, we are now dealing with a class type.

We know that several threads might call signal() and wait() concurrently. Therefore our internal ‘counter’ int which we will name ‘n_’ will have to be protected. Otherwise, we will almost definitely trigger undefined behaviour.

One way to do this would be to use a mutex to lock both signal() and wait(). However, locks are expensive so we want to avoid them as much as possible. The easiest way we can make sure to avoid a race condition is by using atomics which guarantee the ‘atomicity’ of simple instructions such as increment and decrement. These are defined in the <atomic> header and are templated on a single parameter. Atomics allow us to perform a single ‘read-modify-write’ instruction that other threads cannot interfere with. Adding this to the code gives us:

Note how we’ve also added the increment and decrement as described earlier as well as initialisation of the atomic counter n_ in the constructor.

Next, we need some way to make threads ‘wait’ if they call our member function wait() under the appropriate circumstances. We also the need the logical complement of this operation in signal() when threads that have been temporarily suspended need to wake up again.

One way to approach this is by using condition variables (detailed in clause 30.5 of the standard) and mutexes. Doing so should have some consequences for the design of our class as copying or moving a semaphore object should not be permitted. Although we could in principle move from an instance of type semaphore, copying would be ill defined (Thinking for example about what would happen to the condition variable) and it generally does not make sense in terms of using semaphores to copy them. They are after all synchronisations primitives designed for one specific part of our code.

Following this logic, we should prevent the user being able to access the copy and move constructors of our class. This is necessary as the compiler can generate a copy constructor for a class we define by default if we haven’t done so. The standard mentions this at 3.1.3:

“Note: In some circumstances, C++ implementations implicitly define the default constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move assignment operator (12.8), or destructor (12.4) member functions. —end note ]

Even though not every compiler implements a default move constructor at present (e.g. Visual C++), most compilers at least create a copy constructor.

Under C++ 98, the convention was to simply declare the copy constructor private (without defining it!). We could do the same here for the constructors/operators we wish to make inaccessible but setting their access-specifier (which is what the standard calls public, protected and private; see clause 11.1.1-2) to private seems more like a hack.

This is why C++ 11 allows us to use the ‘delete’ keyword in this context. If like me you want to learn to speak standardese this is called a ‘deleted definition’ and is defined at clause 8.4.3:

“8.4.3 Deleted definitions [dcl.fct.def.delete]

1 A function definition of the form: attribute-specifier-seqopt decl-specifier-seqopt declarator = delete; is called a deleted definition. A function with a deleted definition is also called a deleted function.

2 A program that refers to a deleted function implicitly or explicitly, other than to declare it, is ill-formed.

[ Note: This includes calling the function implicitly or explicitly and forming a pointer or pointer-to-member to the function. It applies even for references in expressions that are not potentially-evaluated. If a function is overloaded, it is referenced only if the function is selected by overload resolution. —end note ]

3 [ Example: One can enforce non-default initialization and non-integral initialization with

struct sometype {

sometype() = delete; // OK, but redundant

some_type(std::intmax_t) = delete;

some_type(double);

};

—end example ]”

8.4.3.2 is really important here because it tells us that a program that attempts to call a deleted function is ‘ill-formed’ meaning that it will not compile, therefore allowing us to spot errors as soon as possible.

In fact, cause 30.5.1 details that this is exactly how the class condition_variable deals with this problem in its implementation:

“namespace std {

class condition_variable {

public:

condition_variable();

~condition_variable();

condition_variable(const condition_variable&) = delete;

condition_variable& operator=(const condition_variable&) = delete; […]”

[Please check back for the final implementation soon – this is under review following some discussion.]

# Functional Programming Styles in C++ (1 of n) : ‘Partial Application’ using ‘std::bind’

While I am definitely a supporter of the imperative style of programming I have found some ideas from functional programming extremely useful in producing clean code. I will demonstrate some of these ideas in this series of posts (please click on the images to expand them).

The C++ 11 standard expanded the capability to use functional styles directly in standard C++, especially in the classes found in the <functional> header (based on the excellent equivalent from boost) and the inclusion of lambda expressions directly into the language. As there are many excellent tutorials about lambda expressions out there, I will not go into detail about them here except to note their connection to the topics we are concerned with, namely that they have to do with functions.

For everyone who has never programmed in a functional language before, it is worth noting that purely functional languages do not have the concept of objects or side effects at all. In fact, the user has no access to memory and everything is defined using functions and recursion as well as a usually limited set of primitive types and more complex but also pre-defined ones such as tuples and lists (e.g. Miranda).

Technically, what C++ understands as a lambda expression is not the same as what functional programming languages understand by that term. In C++, lambda expressions are a way to write inline definitions of functions objects (or ‘functors’). These get handled by C++ in exactly the same way as ‘regularly’ defined functors and have nothing to do with the lambda calculus in a formal way. However, they derive their name from the idea that there should be a short-hand notation for the type of things they are meant to be used for. This mimics the terse syntactic style of many functional languages.

Let’s look at an example. The STL sorting algorithm ‘std::sort’ can take a predicate as an optional third argument. Thus, if I wanted to sort an array from largest to smallest (ignoring std::greater and std::less for now), I would either have to write a custom function or a functor like this:

The problem is that the code for either of these is quite verbose for what we are trying to achieve and might be located in a completely different block than the one we are in now. This will make it harder for anyone to understand and make the code bloated. What C++ learned from the functional style is that an inline definition should be possible here. Lambda expressions are an awesome way to do this and will even become truly polymorphic in C++ 14!

Using lambda expressions our code becomes the following:

While lambda expressions have deservedly had the most attention of all functional elements in C++ 11, we are concerned with the less famous ones here.

Now that the introduction is out of the way let’s turn to the main concern of this post: partial application.

Functional programming languages usually let a programmer define functions in a curried style. This means that rather than having a function that has to be called with a pre-defined number of parameters we can call it with just one argument to start with and supply the others later.

Consider the following C++ function:

```int add(int x, int y) {
return x + y;
}```

An equivalent curried function would look like this in Miranda:

`add x y = x + y`

The first thing that might look strange here is the absence of brackets around the function parameters x and y on the left hand side of the expression. This reflects the idea that this function can be evaluated when only one argument is present, i.e. the function is curried.

Actually, that last sentence was slightly incorrect in that the function can only be partially evaluated or ‘partially applied’ because the + operator needs to have a right and a left-hand side! So what does the function add return if we partially apply it to say, 6?

The answer is easy: Another function with expects only one more argument that looks like this:

`f x = x + 6`

This doesn’t work for us because (non-variadic) functions in C++ expect a fixed number of arguments and this is fundamental to the imperative style and memory model. Hence we cannot do ‘partial application’. But what we can do is save some arguments and then return something that expects the rest before the function gets called at all, i.e. we do not evaluate our ‘add’ function until both arguments are provided. In other words, we can bind arguments to certain function parameters. In C++ 11, we can do this using std::bind with the help of some placeholders:

In this example, we bind a specialisation of ‘add’ to 6 (for the ‘right’ parameter) which gives us a function called bound_function which expects one more argument. We specify this in the bind expression by using one of the placeholders defined in the std::placeholders namespace, namely ‘_1’. These are uglified by a leading underscore (highlighting once more that no one should use similar names for user-defined variables) and you can have any number of these in your bind expression (depending on the function you want to bind).

Because this process goes against the way C++ normally likes to think, there are some problems with this approach. First of all the actual type of bound_function that the compiler will deduce when you’re using auto will be very complicated. Secondly, operator () of that type has a variadic templated parameter, i.e. it takes any number of arguments. However, only the first of these gets evaluated and the rest discarded so the following compiles and does not produce incorrect results:

While this produces the correct result your code will be harder for someone else to understand, especially as being able to supply more than the expected number of arguments might lead someone else to think your code does something which it actually does not do. Worst of all, neither the compiler nor the runtime will ever complain!

Now let’s have a look at the type that std::bind returns. You might need that if you are trying to define a function whose return type is the result of a std::bind expression. One such case is the famous ‘cancel’ ‘combinator’ function (I know combinators aren’t relevant for C++ but it’s a good example).

If you ask the compiler for the type of our cancel function (specialised for int in this particular case) it will tell you that you are dealing with:

`std::_Bind<true,T,T (__cdecl *const )(T,T),X &,std::_Ph<1> &>' to 'int (__cdecl *)(int,int)`

That’s not something we want to use! More troubling is that the type of std::bind is actually unspecified, i.e. implementation defined! So what is the solution here?

As always, members of the standard’s committee were aware of this problem and gave us a really smart answer with C++ 11. And the added bonus of that solution is that it gets rid of the problem of not having the number of arguments defined.

The solution is to assign the result of a bind expression to a std::function of the appropriate type:

Because ‘cancel’ explicitly returns something of type function, the auto in main will also be of that type.

Note that we are using the new ‘trailing-return-type’ syntax here because it would otherwise be very tedious to get template argument deduction to correctly figure out the return type of the ‘func’ parameter. This way, we can use a decltype expression with the actual parameters to figure out the return type of the itself templated argument to this function more easily. The standard (N3242) highlights this in clause 8.3.12:

“A trailing-return-type is most useful for a type that would be more complicated to specify before the declarator-id:

template <class T, class U> auto add(T t, U u) -> decltype(t + u);

rather than

template <class T, class U> decltype((*(T*)0) + (*(U*)0)) add(T t, U u);

With this knowledge, we can now safely and transparently pass around bound functions and use them in a straightforward manner.

## Behind the Magic: “Pacific Rim” Hong Kong Battle

### Video

ILM has just posted a couple of awesome making of videos of their spectacular work on Pacific Rim! Worth a look as always, those guys are a real inspiration! 🙂

# A Practical Tutorial on C# – C++ Interop using (mostly) Standard C++

Sometimes you will find yourself wanting to call a C++ dll from C# or have part of your project written in C++ and the other part in C#. This tutorial is about how to call a native dll from C# in a straightforward manner and how to handle callbacks from C++ (please click on the pics to expand them).

While I was working the ‘medi-board’ project for University College London it was decided fairly early on that we needed some kind of interoperability between C# and C++. Specifically, we needed to achieve the following things:

1. The native object needed to be used like any C# object (with a clean interface)
2. We needed bi-directional data exchange between C++ and C#
3. We needed a C# delegate function to be invoked as a callback function from C++

While there are some really good tutorials out there, I feel there is general lack of information on how to accomplish this in a simple, straightforward manner. I hope this tutorial can help clarify some things.

The first decision to make is whether to accomplish the majority of the interop functionality from the C# of the C++ side. We opted for the former because:

• We wanted to use standard C++ and not rely on visual C++ as the native code had to be cross-platform.
• We knew the C# code would never be used in any other interop and could thus be specific whereas specialising on the C++ side would eventually lead to having multiple specialisations in the code

First, it has to be noted that because Microsoft’s CLR provides such a rich set of functionality, there are many ways to accomplish interoperability between C# and C++. Anyone interested should have a look here Microsoft’s Tutorials on this.

Our starting point is a solution with the following two projects:

•  ‘Interop’: A C# Console Application
• ‘Library’: An empty Visual C++ project (It’s not going to visual C++ only but that’s what the template is called)

Which should look something like this in your visual studio (I’m using VS 2012 here):

Before we actually do some coding, let’s examine our general strategy. The idea is that there is a static interface both on the managed and the native side. The native wrapper exposes the object/functions we want to use in such a way that C# can understand them. The managed wrapper ‘imports’ that exposed C++ interface in a static way. Because we want to be able to use our C++ object like any C# object, we encapsulate the managed wrapper further in a standard C# class. It is important to note that with this approach C# has ownership of the C++ instance and hence is responsible for deleting it appropriately.

On the C++ side, we add a simple class called ‘NativeObject’ to test our code:

The implementation of this object is trivial for now:

Next, we need to create the native wrapper that exposes functionality so it will be visible in the DLL’s interface.

While most of this should look very familiar, note how we had to wrap the constructor and destructor of the object in some extra functions and that whenever we wish to invoke a member function we need to also specify the instance.

Also we had to explicitly specify linkage by using ‘extern “C”’. Those not interested in the details can safely skip this section. The C++ standard (I’m using version N3242) states in clause 7.5 that:

“All function types, function names with external linkage, and variable names with external linkage have a language linkage […]. The default language linkage of all function types, function names, and variable names is C++ language linkage. Two function types with different language linkages are distinct types even if they are otherwise identical.

2 Linkage (3.5) between C++ and non-C++ code fragments can be achieved using a linkage-specification:

extern string-literal { declaration-seqopt}

extern string-literal declaration

The string-literal indicates the required language linkage. This International Standard specifies the semantics for the string-literals “C” and “C++”. “

This is basically saying that for every function for which we do not explicitly specify another linkage, the compiler assumes that we meant “C++”, but that we have the option change this default. The reason why this matters for our interop is that specifying C-linkage means that the compiler is not allowed to do some stuff that other languages do not understand, most importantly ‘name-mangling’. If you’re overloading a function or have the same name declared in different namespaces, the compiler will eventually have to use both of them in the same parse tree and namespaces do not exist in that.

So in order to differentiate between those different names, additional information such as the namespace that name was declared in is encoded into the function name used internally.

As we will see below in order for the interop to work, C# has to know the exact name of the native function – so we cannot allow name mangling. (See IBM’s notes on this)

Using the ‘dumpbin’ utility (See Microsoft dumpbin’s help page) under windows, we can have a look at this in action. Open your ‘VS2012 x64 Native Tools Command Prompt’ and navigate to the directory .dll is in.

Then use the command “dumpbin /EXPORTS Library.dll” which will give you the dll exports. In a version with the ‘extern “C”’ deleted, the output looks like:

Note how the names are mangled. If I add the ‘extern “C”’ linkage specification again, the exports looks like:

This is exactly what we want and presents C# an interface it can understand.

The fact that we pass a pointer to every function of our wrapper highlights again that C# has ownership of our object handle. The implementation of these functions is therefore straightforward:

And that’s it – We’ve written all the code we need on the C++ side. In order to create the managed wrapper, we have to use .Net’s ‘platform invoke’ or p-invoke functionality (If you want to know more about it, read this article).

All we have to do to use it is to import ‘System.Runtime.InteropServices’ and provide functions that match exactly those defined in our native wrapper. Note how we can represent the NativeObject handles with the type ‘IntPtr’ here.

This is actually all we have to do to get the interop to work. However, our goal was to encapsulate this rather ugly static interface in a nice C# object. To do so, we just add another class called ManagedObject that has a private IntPtr and calls the static functions of ‘Managed_Wrapper’:

The only thing that is absolutely crucial here is that delete() is called before this object goes out of scope to prevent memory leaks. This is because garbage collection will only delete the IntPtr as it has no knowledge of that native instance it points to (and C# has no control over the native heap anyway).

In order to use this new class, consider the following example:

This will print 11 followed by 22 as expected.

In order to make this compile we will have to adjust the following settings:

C++:

• Change the configuration type to .dll:

• Change the Platform to x64 in the Configuration Manager:

C#:

• Set platform target in the Project settings to x64

• Make sure the set the C# project as the startup project that ‘Build’ is enabled in the configuration manager

If you attempt to run the code you will most likely receive this error:

This is because by default, C++ and C# projects have a different output directory for the files they create. To avoid having to copy stuff by hand it’s useful to use a post-build-event here, i.e. some simple operation that takes place after your code is compiled. In our case, we want to use it to copy the .dll to the correct directory, namely that one that our .exe file from the C# project is in.

To set this up navigate to the following section of the C++ project’s properties and edit the ‘command line’:

The command

xcopy /d /y “\$(SolutionDir)x64\Debug\*.*” “\$(SolutionDir)Interop\bin\Debug\”

Tells VS to copy all files from the first directory to the second if they are a ‘newer’ version of the ones currently in that directory.

Note that if you run into problems with this, please consider the following common sources of errors:

• Your projects/folder hierarchy is different to mine
• Configuration of 32/64bit does not match. It is vital to get this right because the pointers we are passing around have to come from the same address space
• Spaces in Folder names (Yes, the system is fairly archaic)
• If you’ve changed from debug to release, this command might fail if the ‘Release’ folder does not already exist, i.e. it won’t create folders for you

If you the program, it produces the following output as expected:

Finally, let’s consider a few special cases (I will use some code examples from the ‘medi-board’ project here:

Passing Arrays/Strings

I will show passing Strings as char arrays so this technique is applicable to all arrays. Again, there are ‘cleaner’ ways to do this when you’re writing visual C++ but we want to keep our C++ code as close to standard C++ as possible.

On the native side we simply have:

And in C#

Note how we pass the length separately so we know how long the array is on the C++ side (please note that this has the same security implications as C arrays). As an aside, a C# String as the ‘ToCharArray’ function that makes it really easy to use this method.

Callback functions

Callback functions are somewhat tricky to get right. Our approach was to register a function ptr with our native libraries that could then be used as a callback:

With ‘__event_callback’ being defined as:

We need to use __stdcall to make sure we know who cleans the stack after the function call. This is Windows specific feature (please read more about it here if you’re interested).

On the C# side, this gives us:

We always pass a delegate here to make a programmer able to use the C# function in a generic way. If this code seems strange, please make sure you have looked carefully at delegate functions first as this is the hardest part of understanding this bit of code. Everything else is merely using some pre-defined features of the CLR.

Hope this tutorials helps and let me know what you think! Many thx for reading 🙂