Study Notes: Functional Programming with C++

I have been enthused with functional programming recently. As C++ is my favourite language (or most familiar language), I have also tried to implement in C++ some of the techniques I learnt. And this blog serves to record my learning experience. It is my sincere hope that it will be useful to other people that come to functional programming with an imperative programming background.

Without further ado, let me describe first what I wanted to implement:

  • Map
  • Reduce
  • Pipeline

If you are not familiar with these concepts, do not worry, as I will show you the code in both C++ and some ‘more functional’ languages. The functional language implementations are always easy to understand, even if you do not know the languages!

Map

The concept of the map function is quite simple: it applies the function parameter to each item in a list parameter, and return the result as a list. The following examples shows the application of the successor function to a list of numbers (in Haskell):

> print (map succ [1,2,3,4])
[2,3,4,5]

Map is normally a built-in function in a functional language, but implementing it is trivial too. The Haskell implementation might be the most succinct:

map f [] = []
map f (x:xs) = f x : map f xs

It does the work recursively: apply f to the head of the list (x), and concatenate the result with that of map f applied to the rest of the list (xs) until the list is empty. The procedure may be more explicit in the Scheme code below:

(define (map f l)
  (cond
    ((null? l) l)
    (else (cons (f (car l)) (map f (car l))))))

So how do we implement it in C++?

Actually C++98[1] already has something quite close: the std::transform [2] function template. The problem is it is not as composable as the functional equivalent, and you cannot just take the return result and print. The equivalent code for the Haskell example above is as follows (in C++11[3] style):

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

using namespace std;

template <typename Type>
ostream& operator<<(ostream& os, const vector<Type>& v)
{
    os << "[ ";
    copy(v.begin(), v.end(),
         ostream_iterator<Type>(os, " "));
    os << "]";
    return os;
}

int main()
{
    auto const succ = [](int x) { return x + 1; };
    vector<int> v1{1, 2, 3, 4};
    vector<int> v2;
    transform(v1.begin(), v1.end(), back_inserter(v2), succ);
    cout << v2 << endl;
}

The programmer has to define the return variable first, and something like back_inserter[4] needs to be explicitly used. The flexibility is there, but the programmer needs to take extra burdens.

Of course, the C++ language provides enough facilities to define an alternative function. It is actually pretty simple[5] (though the C++ syntax is intimidating indeed for people newly coming into the template world):

template <template <typename,typename> class OutCont=vector,
          template <typename> class Alloc = allocator,
          typename Fn, class Cont>
OutCont<typename Cont::value_type,
        Alloc<typename Cont::value_type>>
map(Fn mapfn, const Cont& inputs)
{
    OutCont<typename Cont::value_type,
        Alloc<typename Cont::value_type>> result;
    for (auto& item : inputs)
        result.push_back(mapfn(item));
    return result;
}

With this function template, you can now write

    cout << map(succ, v1) << endl;

or even

    cout << map(succ, vector<int>{1, 2, 3, 4}) << endl;

Reduce

The reduce function, also called fold, reduces a list to single value. Its usage can be powerfully demonstrated in the following Haskell example:

> foldl (+) 0 [1..100]
5050

The implementation of foldl (called thus as there is also a foldr function) should be like follows:

foldl f z []     = z
foldl f z (x:xs) = foldl f (f z x) xs

I.e., this function applies the two-argument function f recursively over the list items, and the parameter z is used as the initial value. I will show immediately my C++ code for comparison:

template <typename Fn, class Cont>
typename Cont::value_type
reduce(Fn reducefn, const Cont& inputs,
       typename Cont::value_type initval =
       typename Cont::value_type())
{
    auto result = initval;
    for (auto& item : inputs)
        result = reducefn(result, item);
    return result;
}

One can use the template like the code below:

    cout << reduce(plus<int>(), vector<int>{1, 2, 3, 4, 5})
         << endl;

It needs to be mentioned that C++ has a std::accumulate[6] function template, which is similar to reduce, but it suffers from the similar problem like std::transform. I will not elaborate the details here.

Pipeline

Pipelining is about composing functions to form a new function. Haskell supports composition directly in the language with the operator ‘.’. In order to calculate sqrt(x + 1), one only needs to define a new function like follows:

plus_1_sqrt = sqrt . succ

The following is something similar in Python using the reduce function:

def pipeline_func(data, fns):
    return reduce(lambda a, x: x(a),
                  fns,
                  data)

One could use the function like follows:

def plus_1(x):
    return x + 1

pipeline_func(3, [plus_1, math.sqrt])  # result is 2.0

I actually was frustrated at this point about how to implement it in C++. I simply did not have a clue. After some googling, and especially after finding the insightful blogs of Bartosz Milewski[7], I had better ideas. Specifically, the blog ‘What Does Haskell Have to Do with C++?’[8] enlightened me. I quickly came up with this solution[9]:

template <typename Tp>
auto apply(Tp&& data)
{
    return forward<Tp>(data);
}

template <typename Tp, typename Fn, typename... Fargs>
auto apply(Tp&& data, Fn fn, Fargs... args)
{
    return apply(fn(forward<Tp>(data)), args...);
}

In order to make type inference work (which depends on the input and return types of the passed function arguments), a C++14[10] compiler is needed. Using Clang 3.4+ or GCC 4.9 (-std=c++1y needs to be specified on the command line), the following code will print the correct result 55:

    auto const sqr = [](int x) { return x * x; };
    auto const square_list =
        [=](const vector<int>& data) {
            return map(sqr, data);
        };
    auto const sum_list =
        [=](const vector<int>& data) {
            return reduce(plus<int>(), data);
        };
    cout << apply(vector<int>{1, 2, 3, 4, 5},
                  square_list,
                  sum_list)
         << endl;

There is a minor problem, though. In Haskell, the result of composition is a function that can be passed around (with type inference); in Python, the function list can be passed  around (no type inference, due to its dynamic nature). In my implementation of apply, the programmer has to specify the function list exactly at the point of calling in order to make type inference work. This significantly limits its usefulness.

I realized the solution only a few weeks later. The key issue was that Haskell is a lazy language[11], but C++ is an eager language. Actually the answer was always in front of me, but I just failed to see it for a long time. All I needed was an extra layer of indirection, which lambda expressions fit nicely. After learning that, everything seems simple:

template <typename Tp>
auto compose()
{
    return apply<Tp>;
}

template <typename Tp, typename Fn, typename... Fargs>
auto compose(Fn fn, Fargs... args)
{
    return [=](Tp&& x)
    {
        return fn(compose<Tp>(args...)(forward<Tp>(x)));
    };
}

You can see that it is very much like the apply implementation, but the additional lambda expression makes lazy evaluation possible. Incidentally, compose() with no argument returns the identity function (id in Haskell).

The following code demonstrates its use:

    auto const squared_sum =
        compose<const vector<int>&>(sum_list, square_list);
    cout << squared_sum(vector<int>{1, 2, 3, 4, 5}) << endl;

(Please be aware that apply and compose take function arguments in the opposite order—the latter takes arguments like the Haskell ‘.’ operator.)

Ranges

During the search for functional programming information, I encountered Eric Niebler’s blog ‘Range Comprehensions’[12], which is simply amazing. If you have read thus far and are interested in functional programming with C++, definitely have a look at his blog. He provided a comprehensive library with many Haskell-like features, and the C++ standard committee liked it too! Hopefully we will see it in some production libraries soon.

Summary

As C++ evolves, more features are added to the language to enable a more functional style for programming. This is a good thing, as it allows people to be more productive, while keeping easy interaction with the traditional C/C++ code. I am learning, and wish my learning experience could be useful to others too.

May Imperative and Functional live happily together ever after!

Footnotes

1. C++03
2. transform – C++ Reference
3. C++11
4. back_inserter – C++ Reference
5. There are ways to optimize the code for efficiency (and make the code more complicated). You may want to check out a more complete implementation here.
6. accumulate – C++ Reference
7. Bartosz Milewski’s Programming Cafe
8. What Does Haskell Have to Do with C++?
9. To be honest, perfect forwarding was added later. Worse, I did not make it right. See Type Deduction and My Reference Mistakes for my updated code.
10. C++14
11. Lazy evaluation
12. Range Comprehensions

13 thoughts on “Study Notes: Functional Programming with C++

  1. Quoting N Wirth, “Algorithms + Data Structures = Programs”. I can see how functional programming does a fine job on the algorithms side. My concern is how it handles the data structures side. I’m still unconvinced that I could build a typical application (one with both complex function and complex data) using functional programming. I’m sure the function side of the application is manageable, but how much extra work would be involved in making the data management workable?

    Like

  2. Actually I have similar concerns. Given that some respectful programmers are advocating for function programming, I believe there is at least something to learn. At least I love how things can be composed together in the functional languages.

    There are big projects written in languages like Common Lisp, but it does not seem there are many, or they are familiar to most programmers. I am learning to know the benefits and shortcomings of functional programming, and see whether I can apply some of the functional techniques in more traditional programming :-).

    Like

  3. The most prominent way to build data structures in the second-wave functional programming languages (languages having roots in the 80ies: SML, Haskell, OCaml) is to use algebraic data types (ADTs). ADTs combine two necessary ways of composing types: taking a record (“multiplying” types), and taking a discriminated union (“adding” types). Manipulating ADTs is supported by the languages: pattern matching is compiled into optimized dispatches. If the encoding of a data structure is very informative — the record labels and variant / tag names correspond to the meaning of the data — you can expose them to users. In OCaml, you can expose ADTs in a private way: only for pattern matching, not for creating new values. Otherwise, you can put the ADT data structure behind an interface. For efficiency reasons, besides ADTs you also need hash tables. OCaml had good support for hash tables from the beginning, partly because OCaml allows for imperative programming (it is multi-paradigm).

    I’d love to know how to do discriminated unions in C++ efficiently and succinctly.

    Like

    • As far as I know, C++ does not support discriminated unions in a nice way from the language. Many programmer still use hand-crafted tagged unions like the following:

      struct IntOrPointer {
          enum Type { Int, Pointer } type_;
          union {
              int value_;
              IntOrPointer* pointer_;
          };
          IntOrPointer(int value)
              : type_(Int), value_(value) {}
          IntOrPointer(IntOrPointer* pointer)
              : type_(Pointer), pointer_(pointer) {}
      };
      

      You can see that constructors can be used to help simplify things a little bit.

      This is still not convenient, but C++ provides mechanisms to allow library writers to extend the language. Boost.Variant is one such library. The code above can be simply written as:

      typedef boost::make_recursive_variant<
          int, boost::recursive_variant_*>::type IntOrPointer;
      

      (If no recursive types are involved, things can be simpler, like boost::variant<int, string>.)

      It is actually better than the struct definition, as the struct could not be output to an I/O stream without writing a function of this prototype: ostream& operator<<(ostream& os, const IntOrPointer& iop);. Now the library handles it automatically already!

      You can find more details about the Boost.Variant library here.

      Like

      • wu yongwei:
        Tagged unions (or variants in Pascal) are discriminate unions, structs (or records in Pascal).(Programs = Data Structures + Programs, mentioned above, used Pascal)
        I remember that the book Little Java a few patterns, “https://mitpress.mit.edu/books/little-java-few-patterns”, by Dan Friedman and Mat Felleisen has discriminate unions as virtual classes, it is a book in the style of The Little Lisper that you know.
        I don’t know C++ nor Java. If C++ has virtual classes like Java, unions can be implemented in that way too.
        I remember (since 2001) that they implemented lists by means of virtual classes, those lists where pizzas or something like that, where nil was the base and the elements toppings. Cons putting a topping to the pizza. That simple, but they worked the example to an interesting modular structure. Take a look on it, you will enjoy it, the style is like The Little Lisper that you already know.

        Like

  4. Thank you very much for your post. Is there any way to generalize the mapfn function to have output containers of the same type as input containers? For example, if I say
    list words; words.push_back(“hello”); words.push_back(“world”);
    auto first_letters = mapfn([](auto s) {return s[0];},words);
    then first_letters is a vector, not a list.

    Like

    • Oh, it is an interesting question. It is achievable in C++14, with return type deduction. You also need a container rebinder. There are some library implementations, but I am simply borrowing from this StackOverflow answer:

      http://stackoverflow.com/questions/5052211/changing-value-type-of-a-given-stl-container

      The solution is as follows:

      template <typename Cont, typename T>
      struct rebind_seq_container;
      
      template <typename Value, typename Alloc,
                template <typename, typename> class Cont,
                typename T>
      struct rebind_seq_container<Cont<Value, Alloc>, T> {
          typedef Cont<
              T, typename Alloc::template rebind<T>::other> type;
      };
      
      template <typename Fn, class Cont>
      auto map(Fn mapfn, const Cont& inputs)
      {
          typedef decltype(mapfn(std::declval<
              typename Cont::value_type>())) result_type;
          typename rebind_seq_container<Cont, result_type>::type
              result;
          for (auto& item : inputs)
              result.push_back(mapfn(item));
          return result;
      }
      

      Like

      • Thanks again, your blog has been very helpful to me. I have been using another map implementation from a stackoverflow question I asked:
        http://stackoverflow.com/questions/33069172/functional-c-map-combinator-using-auto
        I have been using a slightly-modified version of Piotr Skotnicki’s code. When I switched to this version and tried to compile with an existing body of code, I got compiler errors when the type deduction failed for lambda expressions involving map. This may be because of my Lisp accent 🙂

        Ultimately I’d like to use a version of this that works like Piotr Snotnicki’s code, but using std::transform instead of boost stuff. I would expect such an implementation to port easily to std::experimental::parallel::transform, which would be like having Clojure’s pmap in C++.

        Like

  5. This version seems to behave well when its output is returned by a lambda expression (which has been a problem with numerous attempts at this problem I have seen). Apologies for the lack of formatting (YW EDIT: I have formatted for you).

    template <typename function, template class collection,
              typename scalar_type, typename... types>
    auto map(function f, collection<scalar_type, types...> c)
    {
      collection<std::decay_t<decltype(f(*c.begin()))>> result;
      if (!c.empty()) {
        result = collection<std::decay_t<decltype(f(*c.begin()))>>(
            c.size(), f(*c.begin()));
        std::transform(c.begin(), c.end(), result.begin(), f);
      }
      return result;
    }
    

    This should be easy to parallelize. Thanks again for your excellent article.

    Like

Leave a comment