The ‘Type of Lambda’

In my first post about functional programming styles I sort of skipped over what exactly a lambda expression ‘becomes’ when it’s evaluated. As this is generally not that well documented I thought it would be worth talking about it briefly.

The Standard (N3242) starts out by stating what most people would assume at clause 5.1.2.1:

“Lambda expressions provide a concise way to create simple function objects.”

So we know that the result of evaluating a lambda expression is a function object. This is called the ‘closure object’ and is a prvalue:

“The evaluation of a lambda-expression results in a prvalue temporary. This temporary is called the closure object.” (5.1.2.2)

As a reminder, ‘prvalue’ stands for pure rvalue. For all practical purposes a prvalue IS an rvalue, it’s just that the standard is being exact here by differentiating two different kinds of rvalues, i.e. xvalues and prvalues.

That’s because there is a subtle difference between an object “near the end of its lifetime” (3.10.1) and something that, for example, ONLY appears on the right hand side of an expression. As there can be no question that the latter is actually an rvalue, the standard calls this a pure rvalue or prvalue:

“An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime (so that its resources may be moved, for example). […]

— An rvalue (so called, historically, because rvalues could appear on the right-hand side of an assignment expression) is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.

— A prvalue (“pure” rvalue) is an rvalue that is not an xvalue. [ Example: The result of calling a function whose return type is not a reference is a prvalue. The value of a literal […] or true is also a prvalue. —end example ]” (3.10.1)

The intuition behind an xvalue is that that it could seen as BOTH an lvalue and an rvalue in context. Consider this snippet of code:

Image

Our object foo is of automatic storage duration which means that it will be deleted at the end of the block. This makes it safe to ‘move’ from foo, i.e. to convert it to an rvalue using std::move. However, our object has a name, ‘foo’, which makes it an lvalue (remember that the rule of thumb is that anything that has a name is an lvalue). It is only because it is nearing the end of its lifetime that we can use it like an rvalue (provided of course we do not access/modify it after having treated it like an rvalue). Thus, foo is an lvalue that can be used like an rvalue, which the standard calls an xvalue. This is expressed in the following diagram found at 3.10.1:

Image

The gist of this is that lambda expressions return temporary objects in the same way that objects are returned by value from functions, which makes them pure rvalues.

Now that we know what kind of thing is being returned by lambda expression, we have to look at its type, i.e. what exactly we are getting. The standard tells us that this type is unique as well as implementation defined, so let’s call it ‘type lambda’ for now (Although if you were speaking standardese you would call this the ‘closure type’).

“The type of the lambda-expression (which is also the type of the closure object) is a unique, unnamed non-union class type — called the closure type — whose properties are described below. […] An implementation may define the closure type differently from what is described below provided this does not alter the observable behavior of the program [..]” (5.1.2.3)

The last line of this quote is a promise that there are certain things about type lambda that will always be the same and this is what we can exploit when using it. The most important of those things for the purposes of this discussion is that type lambda will have an operator () that matches our definition in the lambda expression:

“The closure type for a lambda-expression has a public inline function call operator (13.5.4) whose parameters and return type are described by the lambda-expression’s parameter-declaration-clause and trailing return-type respectively. This function call operator is declared const (9.3.1) if and only if the lambda expression’s parameter-declaration-clause is not followed by mutable. It is neither virtual nor declared volatile.“ (5.1.2.5)

This promise is important because it tells us how we can use a type lambda. Let’s assume we are dealing with a simple lambda expression that simply adds its two integer arguments and returns the result by value.

First of all, you could simply use auto like this:

Image

The expression statement ‘X(11, 22)’ is guaranteed to work because of the promise the standard makes, regardless of what the compiler deduces auto to be.

This also means that we can use a std::function of the appropriate type here instead of auto:

Image

Similarly, if you were writing a function that takes a lambda and then applies it to some arguments, simply use templates and then call the overloaded operator:

Image

As an aside, this also gives you a compile-time check whether the lambda is of the correct type which will help to identify errors as early as possible. This is because the compiler knows exactly what operator () will look like for the type you are trying to pass to the function.

I hope this helps clarify some things. As always, thx for reading 🙂

Functional Programming Styles in C++ (1 of n) : ‘Partial Application’ using ‘std::bind’

While I am definitely a supporter of the imperative style of programming I have found some ideas from functional programming extremely useful in producing clean code. I will demonstrate some of these ideas in this series of posts (please click on the images to expand them).

The C++ 11 standard expanded the capability to use functional styles directly in standard C++, especially in the classes found in the <functional> header (based on the excellent equivalent from boost) and the inclusion of lambda expressions directly into the language. As there are many excellent tutorials about lambda expressions out there, I will not go into detail about them here except to note their connection to the topics we are concerned with, namely that they have to do with functions.

For everyone who has never programmed in a functional language before, it is worth noting that purely functional languages do not have the concept of objects or side effects at all. In fact, the user has no access to memory and everything is defined using functions and recursion as well as a usually limited set of primitive types and more complex but also pre-defined ones such as tuples and lists (e.g. Miranda).

Technically, what C++ understands as a lambda expression is not the same as what functional programming languages understand by that term. In C++, lambda expressions are a way to write inline definitions of functions objects (or ‘functors’). These get handled by C++ in exactly the same way as ‘regularly’ defined functors and have nothing to do with the lambda calculus in a formal way. However, they derive their name from the idea that there should be a short-hand notation for the type of things they are meant to be used for. This mimics the terse syntactic style of many functional languages.

Let’s look at an example. The STL sorting algorithm ‘std::sort’ can take a predicate as an optional third argument. Thus, if I wanted to sort an array from largest to smallest (ignoring std::greater and std::less for now), I would either have to write a custom function or a functor like this:

Image

The problem is that the code for either of these is quite verbose for what we are trying to achieve and might be located in a completely different block than the one we are in now. This will make it harder for anyone to understand and make the code bloated. What C++ learned from the functional style is that an inline definition should be possible here. Lambda expressions are an awesome way to do this and will even become truly polymorphic in C++ 14!

Using lambda expressions our code becomes the following:

Image

While lambda expressions have deservedly had the most attention of all functional elements in C++ 11, we are concerned with the less famous ones here.

Now that the introduction is out of the way let’s turn to the main concern of this post: partial application.

Functional programming languages usually let a programmer define functions in a curried style. This means that rather than having a function that has to be called with a pre-defined number of parameters we can call it with just one argument to start with and supply the others later.

Consider the following C++ function:

int add(int x, int y) {
                return x + y;
}

An equivalent curried function would look like this in Miranda:

add x y = x + y

The first thing that might look strange here is the absence of brackets around the function parameters x and y on the left hand side of the expression. This reflects the idea that this function can be evaluated when only one argument is present, i.e. the function is curried.

Actually, that last sentence was slightly incorrect in that the function can only be partially evaluated or ‘partially applied’ because the + operator needs to have a right and a left-hand side! So what does the function add return if we partially apply it to say, 6?

The answer is easy: Another function with expects only one more argument that looks like this:

f x = x + 6

This doesn’t work for us because (non-variadic) functions in C++ expect a fixed number of arguments and this is fundamental to the imperative style and memory model. Hence we cannot do ‘partial application’. But what we can do is save some arguments and then return something that expects the rest before the function gets called at all, i.e. we do not evaluate our ‘add’ function until both arguments are provided. In other words, we can bind arguments to certain function parameters. In C++ 11, we can do this using std::bind with the help of some placeholders:

Image

In this example, we bind a specialisation of ‘add’ to 6 (for the ‘right’ parameter) which gives us a function called bound_function which expects one more argument. We specify this in the bind expression by using one of the placeholders defined in the std::placeholders namespace, namely ‘_1’. These are uglified by a leading underscore (highlighting once more that no one should use similar names for user-defined variables) and you can have any number of these in your bind expression (depending on the function you want to bind).

Because this process goes against the way C++ normally likes to think, there are some problems with this approach. First of all the actual type of bound_function that the compiler will deduce when you’re using auto will be very complicated. Secondly, operator () of that type has a variadic templated parameter, i.e. it takes any number of arguments. However, only the first of these gets evaluated and the rest discarded so the following compiles and does not produce incorrect results:

Image

While this produces the correct result your code will be harder for someone else to understand, especially as being able to supply more than the expected number of arguments might lead someone else to think your code does something which it actually does not do. Worst of all, neither the compiler nor the runtime will ever complain!

Now let’s have a look at the type that std::bind returns. You might need that if you are trying to define a function whose return type is the result of a std::bind expression. One such case is the famous ‘cancel’ ‘combinator’ function (I know combinators aren’t relevant for C++ but it’s a good example).

If you ask the compiler for the type of our cancel function (specialised for int in this particular case) it will tell you that you are dealing with:

std::_Bind<true,T,T (__cdecl *const )(T,T),X &,std::_Ph<1> &>' to 'int (__cdecl *)(int,int)

That’s not something we want to use! More troubling is that the type of std::bind is actually unspecified, i.e. implementation defined! So what is the solution here?

As always, members of the standard’s committee were aware of this problem and gave us a really smart answer with C++ 11. And the added bonus of that solution is that it gets rid of the problem of not having the number of arguments defined.

The solution is to assign the result of a bind expression to a std::function of the appropriate type:

Functional_1_5

Because ‘cancel’ explicitly returns something of type function, the auto in main will also be of that type.

Note that we are using the new ‘trailing-return-type’ syntax here because it would otherwise be very tedious to get template argument deduction to correctly figure out the return type of the ‘func’ parameter. This way, we can use a decltype expression with the actual parameters to figure out the return type of the itself templated argument to this function more easily. The standard (N3242) highlights this in clause 8.3.12:

“A trailing-return-type is most useful for a type that would be more complicated to specify before the declarator-id:

template <class T, class U> auto add(T t, U u) -> decltype(t + u);

rather than

template <class T, class U> decltype((*(T*)0) + (*(U*)0)) add(T t, U u);

With this knowledge, we can now safely and transparently pass around bound functions and use them in a straightforward manner.

A Practical Tutorial on C# – C++ Interop using (mostly) Standard C++

Sometimes you will find yourself wanting to call a C++ dll from C# or have part of your project written in C++ and the other part in C#. This tutorial is about how to call a native dll from C# in a straightforward manner and how to handle callbacks from C++ (please click on the pics to expand them).

While I was working the ‘medi-board’ project for University College London it was decided fairly early on that we needed some kind of interoperability between C# and C++. Specifically, we needed to achieve the following things:

  1. The native object needed to be used like any C# object (with a clean interface)
  2. We needed bi-directional data exchange between C++ and C#
  3. We needed a C# delegate function to be invoked as a callback function from C++

While there are some really good tutorials out there, I feel there is general lack of information on how to accomplish this in a simple, straightforward manner. I hope this tutorial can help clarify some things.

The first decision to make is whether to accomplish the majority of the interop functionality from the C# of the C++ side. We opted for the former because:

  • We wanted to use standard C++ and not rely on visual C++ as the native code had to be cross-platform.
  • We knew the C# code would never be used in any other interop and could thus be specific whereas specialising on the C++ side would eventually lead to having multiple specialisations in the code

First, it has to be noted that because Microsoft’s CLR provides such a rich set of functionality, there are many ways to accomplish interoperability between C# and C++. Anyone interested should have a look here Microsoft’s Tutorials on this.

Our starting point is a solution with the following two projects:

  •  ‘Interop’: A C# Console Application
  • ‘Library’: An empty Visual C++ project (It’s not going to visual C++ only but that’s what the template is called)

Which should look something like this in your visual studio (I’m using VS 2012 here):

1

Before we actually do some coding, let’s examine our general strategy. The idea is that there is a static interface both on the managed and the native side. The native wrapper exposes the object/functions we want to use in such a way that C# can understand them. The managed wrapper ‘imports’ that exposed C++ interface in a static way. Because we want to be able to use our C++ object like any C# object, we encapsulate the managed wrapper further in a standard C# class. It is important to note that with this approach C# has ownership of the C++ instance and hence is responsible for deleting it appropriately.

2

On the C++ side, we add a simple class called ‘NativeObject’ to test our code:

3

The implementation of this object is trivial for now:

4

Next, we need to create the native wrapper that exposes functionality so it will be visible in the DLL’s interface.

5

While most of this should look very familiar, note how we had to wrap the constructor and destructor of the object in some extra functions and that whenever we wish to invoke a member function we need to also specify the instance.

Also we had to explicitly specify linkage by using ‘extern “C”’. Those not interested in the details can safely skip this section. The C++ standard (I’m using version N3242) states in clause 7.5 that:

“All function types, function names with external linkage, and variable names with external linkage have a language linkage […]. The default language linkage of all function types, function names, and variable names is C++ language linkage. Two function types with different language linkages are distinct types even if they are otherwise identical.

2 Linkage (3.5) between C++ and non-C++ code fragments can be achieved using a linkage-specification:

linkage-specification:

extern string-literal { declaration-seqopt}

extern string-literal declaration

The string-literal indicates the required language linkage. This International Standard specifies the semantics for the string-literals “C” and “C++”. “

This is basically saying that for every function for which we do not explicitly specify another linkage, the compiler assumes that we meant “C++”, but that we have the option change this default. The reason why this matters for our interop is that specifying C-linkage means that the compiler is not allowed to do some stuff that other languages do not understand, most importantly ‘name-mangling’. If you’re overloading a function or have the same name declared in different namespaces, the compiler will eventually have to use both of them in the same parse tree and namespaces do not exist in that.

So in order to differentiate between those different names, additional information such as the namespace that name was declared in is encoded into the function name used internally.

As we will see below in order for the interop to work, C# has to know the exact name of the native function – so we cannot allow name mangling. (See IBM’s notes on this)

Using the ‘dumpbin’ utility (See Microsoft dumpbin’s help page) under windows, we can have a look at this in action. Open your ‘VS2012 x64 Native Tools Command Prompt’ and navigate to the directory .dll is in.

Then use the command “dumpbin /EXPORTS Library.dll” which will give you the dll exports. In a version with the ‘extern “C”’ deleted, the output looks like:

6

Note how the names are mangled. If I add the ‘extern “C”’ linkage specification again, the exports looks like:

7

This is exactly what we want and presents C# an interface it can understand.

The fact that we pass a pointer to every function of our wrapper highlights again that C# has ownership of our object handle. The implementation of these functions is therefore straightforward:

8

And that’s it – We’ve written all the code we need on the C++ side. In order to create the managed wrapper, we have to use .Net’s ‘platform invoke’ or p-invoke functionality (If you want to know more about it, read this article).

All we have to do to use it is to import ‘System.Runtime.InteropServices’ and provide functions that match exactly those defined in our native wrapper. Note how we can represent the NativeObject handles with the type ‘IntPtr’ here.

9

This is actually all we have to do to get the interop to work. However, our goal was to encapsulate this rather ugly static interface in a nice C# object. To do so, we just add another class called ManagedObject that has a private IntPtr and calls the static functions of ‘Managed_Wrapper’:

10

The only thing that is absolutely crucial here is that delete() is called before this object goes out of scope to prevent memory leaks. This is because garbage collection will only delete the IntPtr as it has no knowledge of that native instance it points to (and C# has no control over the native heap anyway).

In order to use this new class, consider the following example:

11

This will print 11 followed by 22 as expected.

In order to make this compile we will have to adjust the following settings:

C++:

  • Change the configuration type to .dll:

12

  • Change the Platform to x64 in the Configuration Manager:

13

C#:

  • Set platform target in the Project settings to x64

14

  • Make sure the set the C# project as the startup project that ‘Build’ is enabled in the configuration manager

If you attempt to run the code you will most likely receive this error:

15

This is because by default, C++ and C# projects have a different output directory for the files they create. To avoid having to copy stuff by hand it’s useful to use a post-build-event here, i.e. some simple operation that takes place after your code is compiled. In our case, we want to use it to copy the .dll to the correct directory, namely that one that our .exe file from the C# project is in.

To set this up navigate to the following section of the C++ project’s properties and edit the ‘command line’:

16

17

The command

xcopy /d /y “$(SolutionDir)x64\Debug\*.*” “$(SolutionDir)Interop\bin\Debug\”

Tells VS to copy all files from the first directory to the second if they are a ‘newer’ version of the ones currently in that directory.

Note that if you run into problems with this, please consider the following common sources of errors:

  • Your projects/folder hierarchy is different to mine
  • Configuration of 32/64bit does not match. It is vital to get this right because the pointers we are passing around have to come from the same address space
  • Spaces in Folder names (Yes, the system is fairly archaic)
  • If you’ve changed from debug to release, this command might fail if the ‘Release’ folder does not already exist, i.e. it won’t create folders for you

If you the program, it produces the following output as expected:

18

Finally, let’s consider a few special cases (I will use some code examples from the ‘medi-board’ project here:

Passing Arrays/Strings

I will show passing Strings as char arrays so this technique is applicable to all arrays. Again, there are ‘cleaner’ ways to do this when you’re writing visual C++ but we want to keep our C++ code as close to standard C++ as possible.

On the native side we simply have:

19

And in C#

20

Note how we pass the length separately so we know how long the array is on the C++ side (please note that this has the same security implications as C arrays). As an aside, a C# String as the ‘ToCharArray’ function that makes it really easy to use this method.

Callback functions

Callback functions are somewhat tricky to get right. Our approach was to register a function ptr with our native libraries that could then be used as a callback:

21

With ‘__event_callback’ being defined as:

22

We need to use __stdcall to make sure we know who cleans the stack after the function call. This is Windows specific feature (please read more about it here if you’re interested).

On the C# side, this gives us:

23

We always pass a delegate here to make a programmer able to use the C# function in a generic way. If this code seems strange, please make sure you have looked carefully at delegate functions first as this is the hardest part of understanding this bit of code. Everything else is merely using some pre-defined features of the CLR.

Hope this tutorials helps and let me know what you think! Many thx for reading 🙂