Y Combinator and C++

If one searches for ‘Y combinator’ now, it is likely that the first hit is Paul Graham’s incubator company Y Combinator. I am not sure whether Paul likes this fact or not—as a Lisp hacker, he must like the Y combinator very much to name his company with it, but now people looking for information for the mathematical/programmatic concept may be distracted by his company information. If he is truly still a Lisp hacker, he may even regret it a bit—though it apparently benefits his company.

Y combinator is an intriguing concept. When I first saw it, I spent the whole weekend reading and studying about it. Still, I did not fully get it. People think the Y combinator is an important concept in functional programming1:

… we can similarly use knowledge of the Y combinator as a dividing line between programmers who are “functionally literate” (i.e. have a reasonably deep knowledge of functional programming) and those who aren’t.

There are a lot of existing materials to introduce the Y combinator, but I have not yet known one that is written for C++ programmers. I would like to fill this gap.

This said, I still will start from functional languages, but you, dear reader, are not required to know one in advance. The focus is on the concept, and I will show running C++ code soon.

Now the journey begins.

Lambda Calculus

Of course you can find a huge amount of materials about lambda calculus, and I will describe just enough for C++ programmers to have the basic concept. Instead of the more common way of defining a square function as ‘sqr(x) = x · x’, one can define it as a lambda expression:

     sqr = λx.x · x

And that is exactly how languages like Haskell and Scheme define such a function, bar the syntax difference.

Haskell:

sqr = \ x -> x * x

Scheme:

(define sqr (lambda (x) (* x x)))

The key benefit is that you do not need a name for functions now. In order to apply the sqr function to 3, one can simply write (you may notice that parentheses are only used to override associativity, but not required around 3: this is the usual convention):

     (λx.x · x) 3

It is also intuitive how it is evaluated (called β-reduction in the terminology of lambda calculus): one replaces the occurrences of x with 3, and removes the initial λx. So the result is simply 3 · 3 = 9.

The C++ code for printing the result would be (assuming int type; also note that parentheses are required around ‘3’ in C++):

cout << [](int x) { return x * x; } (3) << endl;

However, considering that C++’s grammar is not really helpful to see lambda expressions clearly, it is probably better to use a name wherever appropriate:

auto const sqr = [](int x) { return x * x; };
cout << sqr(3) << endl;

Please be aware that auto is used here, as each C++ lambda expression has a unique type. However, a lambda expression can be converted to a function object (defined in <functional>). The example above can be changed as follows (‘std::’ is omitted), and the code will still compile and run (maybe with a negligible overhead):

function<int(int)> const sqr = [](int x) { return x * x; };
cout << sqr(3) << endl;

Fixed-point Combinator

A fixed-point combinator2 is a higher-order function that satisfied the following equation:

     y f = f (y f)

One can see immediately that this definition can lead to infinite expansion:

     y f = f (y f) = f (f (y f)) = f (… f (y f)…)

And that is exactly where the power lies. If the function y is given, it is possible to define a recursive function in a non-recursive way! Let us take factorial as an example. The traditional definition should be like:

     fact n = If IsZero n then 1 else n × fact (n − 1)

Assume a function F exists so that fact = y F:

     (y F) n = If IsZero n then 1 else n × (y F) (n − 1)
     F (y F) n = If IsZero n then 1 else n × (y F) (n − 1)

Replace y F with f:

     F f n = If IsZero n then 1 else n × f (n − 1)

We now get a function F that can be defined without any recursions. We can define factorial exactly this way in Haskell:

fix f = f (fix f)
fact = fix $ \ f n -> if (n == 0) then 1 else n * f (n - 1)

This definition has two problems:

  • It works only in a lazy language3 like Haskell.
  • Recursion occurs in the definition of fix.

I will address the first problem in the next section, and leave the second to the section after.

How to Overcome Laziness

The laziness of Haskell is great power, but it is not available in most other languages. As I mentioned in my last blog4, the solution is quite simple: an additional layer of indirection. A lambda expression is not evaluated before the argument is given, so the following transformation will help (η-abstraction; assuming f takes one argument):

     fλx.f x

The following program works in lazy Scheme:

(define Y
  (lambda (f)
    (f (Y f))))

(define F
  (lambda (f)
    (lambda (n)
      (cond
        ((eq? n 0) 1)
        (else (* n (f (- n 1))))))))

(define fact (Y F))

Before going ahead to transform the code for strict5 Scheme, let us analyse the involved types first. That will be very important when we try to implement it in C++, and the misunderstanding of the types led me to think that the Y combinator was unimplementable in C++. A Haskell-like notation is used below:

  • f (in F) ∷ int → int
  • F f ∷ int → int
  • Y F ∷ int → int
  • F ∷ (int → int) → int → int
  • Y ∷ ((int → int) → int → int) → int → int

We can see that f (see line 10 in the code above), F f (see line 7), and Y F (which is the factorial function) are all first-order functions that takes an integer and returns an integer. Therefore, F is a second-order function that takes such a first-order function and returns a first-order function. Our Y takes a second-order function like F, and returns a first-order function.

Seeing that Y f  is a first-order function, we can now apply the η-abstraction on it to eliminate the infinite expansion. The following definition of Y makes the program work again in strict Scheme:

(define Y
  (lambda (f)
    (f (lambda (x) ((Y f) x)))))

In the meantime, we have prepared enough for an implementation in C++, and I can show you the code now:

#include <iostream>
#include <functional>

using namespace std;

template <typename T, typename R>
function<R(T)> Y(function<function<R(T)>(function<R(T)>)> f)
{   // Y f = f (λx.(Y f) x)
    return f([=](T x) { return Y(f)(x); });
}

typedef function<int(int)>         fn_1ord;
typedef function<fn_1ord(fn_1ord)> fn_2ord;

fn_2ord almost_fact = [](fn_1ord f)
{
    return [f](int n)
    {
        if (n == 0) return 1; else return n * f(n - 1);
    };
};

int main()
{
    fn_1ord fact = Y(almost_fact);
    cout << "fact(5) = " << fact(5) << endl;
}

Hopefully everything now looks familiar and natural (except the C++ syntax, which is still quite cumbersome for the job).

Curry’s Y Combinator

Now we can come back and address the recursive definition problem. It is probably not a problem for practical usage, but it is very interesting indeed. The solution we encounter most often, the Y combinator, is attributed to Haskell Curry6 (yes, the Haskell language is named after him). It is also mind-boggling, like any of the self-referencing puzzles and paradoxes. I am amazed by it, and that is why I want to write this blog.

There are some good references about deriving the Y combinator7 8 9, and I will simply present the result here:

     Y = λf.(λx.f (x x)) (λx.f (x x))

I also want to show to you that this definition satisfies the fixed-point equation:

     Y F = (λf.(λx.f (x x)) (λx.f (x x))) F   (expand Y)
           = (λx.F (x x)) (λx.F (x x))          (β-reduction on the bound variable f)
           = F ((λx.F (x x)) (λx.F (x x)))   (β-reduction on the first bound variable x)
           = F (Y F)

Notice there is an equivalent form (β-reduce it once to get the Y above):

     Y = λf.(λx.x x) (λx.f (x x))

Basically, this is the definition in lazy Scheme:

(define Y
  (lambda (f)
    ((lambda (x) (x x))
     (lambda (x) (f (x x))))))

Recalling that the function argument of Y—which is f above—takes a first-order function as argument, we know x x should return a first-order function. So we can apply the η-abstraction to get the strict Scheme version:

(define Y
  (lambda (f)
    ((lambda (x) (x x))
     (lambda (x) (f (lambda (y) ((x x) y)))))))

Before implementing it in C++, we have one more difficulty. A statically typed language cannot make a function call like x x, at least not directly so, as the type system will forbid it. Even Haskell, which normally allows people to write very succinct code, cannot express the Y as directly as Scheme. One has to fool the type system with a proxy object.

Here is how it goes in C++:

template <typename F>
struct self_ref_func {
    function<F(self_ref_func)> fn;
};

I.e. an object of this type contains a function that takes an object of its own type and returns an object of the template type, which is again a function. E.g.:

typedef function<int(int)>     fn_1ord;
typedef self_ref_func<fn_1ord> fn_self_ref;
fn_self_ref x = { … };         // assign a suitable value to x
x.fn(x);                       // (x x)

With the help of this self-referencing type, Curry’s Y combinator can finally be implemented in C++:

template <typename T, typename R>
function<R(T)> Y(function<function<R(T)>(function<R(T)>)> f)
{   // Y = λf.(λx.x x) (λx.f (λy.(x x) y))
    typedef function<R(T)>          fn_1ord;
    typedef self_ref_func<fn_1ord>  fn_self_ref;
    fn_self_ref r = {
        [f](fn_self_ref x)
        {   // λx.f (λy.(x x) y)
            return f(fn_1ord([x](T y)
                             {
                                 return x.fn(x)(y);
                             }));
        }
    };
    return r.fn(r);
}

Summary

I have shown how the famous Y combinator can be implemented in C++, and what techniques are required to do it. I wish you found the information useful. If you have gone thus far, you will also probably be interested in reading about practical uses of the Y combinator10, and you will probably want a decent Haskell implementation11 and Scheme implementation12. I have also put the C++ code for Y (slightly changed) in a public repository13, as ready-to-use library code. People have actually implemented the Y combinator in more than five dozen languages14.

My C++ code is tested on Clang 3.5 and GCC 4.9 in C++11 mode. Haskell code is tested on GHC 7.8.3, and Scheme code is tested on DrRacket 6.1.

Have fun!

Update (5 May 2016)

The code here only demonstrates what can be done in C++, but is not optimized for speed. I have just noticed that there is now a proposal15 to add Y combinator to the C++ standard library. To my amazement, the reference implementation is both simple and fast—in my test, its speed is close to that of native recursive functions (-O3 under Clang, or -O2 under GCC). You probably want to check it out!


  1. Mike Vanier: The Y Combinator (Slight Return) or How to Succeed at Recursion Without Really Recursing, section ‘Why Y?’. 
  2. Wikipedia: Fixed-point combinator
  3. Wikipedia: Lazy evaluation
  4. Yongwei Wu: Study Notes: Functional Programming with C++
  5. Wikipedia: Strict programming language
  6. Felice Cardone and J. Roger Hindley: History of Lambda-calculus and Combinatory Logic (PDF), pp. 8–9. 
  7. Mike Vanier: The Y Combinator (Slight Return) or How to Succeed at Recursion Without Really Recursing, section ‘Deriving the Y combinator’. 
  8. Matthias Felleisen: A Lecture on the Why of Y (PS). 
  9. Daniel P. Friedman and Matthias Felleisen: The Little Schemer, chapter 9. MIT Press, Cambridge, Mass., 4th edition, 1995. 
  10. Bruce J. McAdams: That About Wraps it Up: Using FIX to Handle Errors Without Exceptions, and Other Programming Tricks
  11. GHC is the standard Haskell implementation. 
  12. Racket is a nice Scheme platform. 
  13. functional.h contains fix_simple, fix_curry, and fix_function_converter make_curry, among other templates. 
  14. Rosetta Code: Y Combinator
  15. Yegor Derevenets: A Proposal to Add Y Combinator to the Standard Library
Advertisements

Study Notes: Functional Programming with C++

I have been enthused with functional programming recently. As C++ is my favourite language (or most familiar language), I have also tried to implement in C++ some of the techniques I learnt. And this blog serves to record my learning experience. It is my sincere hope that it will be useful to other people that come to functional programming with an imperative programming background.

Without further ado, let me describe first what I wanted to implement:

  • Map
  • Reduce
  • Pipeline

If you are not familiar with these concepts, do not worry, as I will show you the code in both C++ and some ‘more functional’ languages. The functional language implementations are always easy to understand, even if you do not know the languages!

Map

The concept of the map function is quite simple: it applies the function parameter to each item in a list parameter, and return the result as a list. The following examples shows the application of the successor function to a list of numbers (in Haskell):

> print (map succ [1,2,3,4])
[2,3,4,5]

Map is normally a built-in function in a functional language, but implementing it is trivial too. The Haskell implementation might be the most succinct:

map f [] = []
map f (x:xs) = f x : map f xs

It does the work recursively: apply f to the head of the list (x), and concatenate the result with that of map f applied to the rest of the list (xs) until the list is empty. The procedure may be more explicit in the Scheme code below:

(define (map f l)
  (cond
    ((null? l) l)
    (else (cons (f (car l)) (map f (car l))))))

So how do we implement it in C++?

Actually C++98[1] already has something quite close: the std::transform [2] function template. The problem is it is not as composable as the functional equivalent, and you cannot just take the return result and print. The equivalent code for the Haskell example above is as follows (in C++11[3] style):

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

using namespace std;

template <typename Type>
ostream& operator<<(ostream& os, const vector<Type>& v)
{
    os << "[ ";
    copy(v.begin(), v.end(),
         ostream_iterator<Type>(os, " "));
    os << "]";
    return os;
}

int main()
{
    auto const succ = [](int x) { return x + 1; };
    vector<int> v1{1, 2, 3, 4};
    vector<int> v2;
    transform(v1.begin(), v1.end(), back_inserter(v2), succ);
    cout << v2 << endl;
}

The programmer has to define the return variable first, and something like back_inserter[4] needs to be explicitly used. The flexibility is there, but the programmer needs to take extra burdens.

Of course, the C++ language provides enough facilities to define an alternative function. It is actually pretty simple[5] (though the C++ syntax is intimidating indeed for people newly coming into the template world):

template <template <typename,typename> class OutCont=vector,
          template <typename> class Alloc = allocator,
          typename Fn, class Cont>
OutCont<typename Cont::value_type,
        Alloc<typename Cont::value_type>>
map(Fn mapfn, const Cont& inputs)
{
    OutCont<typename Cont::value_type,
        Alloc<typename Cont::value_type>> result;
    for (auto& item : inputs)
        result.push_back(mapfn(item));
    return result;
}

With this function template, you can now write

    cout << map(succ, v1) << endl;

or even

    cout << map(succ, vector<int>{1, 2, 3, 4}) << endl;

Reduce

The reduce function, also called fold, reduces a list to single value. Its usage can be powerfully demonstrated in the following Haskell example:

> foldl (+) 0 [1..100]
5050

The implementation of foldl (called thus as there is also a foldr function) should be like follows:

foldl f z []     = z
foldl f z (x:xs) = foldl f (f z x) xs

I.e., this function applies the two-argument function f recursively over the list items, and the parameter z is used as the initial value. I will show immediately my C++ code for comparison:

template <typename Fn, class Cont>
typename Cont::value_type
reduce(Fn reducefn, const Cont& inputs,
       typename Cont::value_type initval =
       typename Cont::value_type())
{
    auto result = initval;
    for (auto& item : inputs)
        result = reducefn(result, item);
    return result;
}

One can use the template like the code below:

    cout << reduce(plus<int>(), vector<int>{1, 2, 3, 4, 5})
         << endl;

It needs to be mentioned that C++ has a std::accumulate[6] function template, which is similar to reduce, but it suffers from the similar problem like std::transform. I will not elaborate the details here.

Pipeline

Pipelining is about composing functions to form a new function. Haskell supports composition directly in the language with the operator ‘.’. In order to calculate sqrt(x + 1), one only needs to define a new function like follows:

plus_1_sqrt = sqrt . succ

The following is something similar in Python using the reduce function:

def pipeline_func(data, fns):
    return reduce(lambda a, x: x(a),
                  fns,
                  data)

One could use the function like follows:

def plus_1(x):
    return x + 1

pipeline_func(3, [plus_1, math.sqrt])  # result is 2.0

I actually was frustrated at this point about how to implement it in C++. I simply did not have a clue. After some googling, and especially after finding the insightful blogs of Bartosz Milewski[7], I had better ideas. Specifically, the blog ‘What Does Haskell Have to Do with C++?’[8] enlightened me. I quickly came up with this solution[9]:

template <typename Tp>
auto apply(Tp&& data)
{
    return forward<Tp>(data);
}

template <typename Tp, typename Fn, typename... Fargs>
auto apply(Tp&& data, Fn fn, Fargs... args)
{
    return apply(fn(forward<Tp>(data)), args...);
}

In order to make type inference work (which depends on the input and return types of the passed function arguments), a C++14[10] compiler is needed. Using Clang 3.4+ or GCC 4.9 (-std=c++1y needs to be specified on the command line), the following code will print the correct result 55:

    auto const sqr = [](int x) { return x * x; };
    auto const square_list =
        [=](const vector<int>& data) {
            return map(sqr, data);
        };
    auto const sum_list =
        [=](const vector<int>& data) {
            return reduce(plus<int>(), data);
        };
    cout << apply(vector<int>{1, 2, 3, 4, 5},
                  square_list,
                  sum_list)
         << endl;

There is a minor problem, though. In Haskell, the result of composition is a function that can be passed around (with type inference); in Python, the function list can be passed  around (no type inference, due to its dynamic nature). In my implementation of apply, the programmer has to specify the function list exactly at the point of calling in order to make type inference work. This significantly limits its usefulness.

I realized the solution only a few weeks later. The key issue was that Haskell is a lazy language[11], but C++ is an eager language. Actually the answer was always in front of me, but I just failed to see it for a long time. All I needed was an extra layer of indirection, which lambda expressions fit nicely. After learning that, everything seems simple:

template <typename Tp>
auto compose()
{
    return apply<Tp>;
}

template <typename Tp, typename Fn, typename... Fargs>
auto compose(Fn fn, Fargs... args)
{
    return [=](Tp&& x)
    {
        return fn(compose<Tp>(args...)(forward<Tp>(x)));
    };
}

You can see that it is very much like the apply implementation, but the additional lambda expression makes lazy evaluation possible. Incidentally, compose() with no argument returns the identity function (id in Haskell).

The following code demonstrates its use:

    auto const squared_sum =
        compose<const vector<int>&>(sum_list, square_list);
    cout << squared_sum(vector<int>{1, 2, 3, 4, 5}) << endl;

(Please be aware that apply and compose take function arguments in the opposite order—the latter takes arguments like the Haskell ‘.’ operator.)

Ranges

During the search for functional programming information, I encountered Eric Niebler’s blog ‘Range Comprehensions’[12], which is simply amazing. If you have read thus far and are interested in functional programming with C++, definitely have a look at his blog. He provided a comprehensive library with many Haskell-like features, and the C++ standard committee liked it too! Hopefully we will see it in some production libraries soon.

Summary

As C++ evolves, more features are added to the language to enable a more functional style for programming. This is a good thing, as it allows people to be more productive, while keeping easy interaction with the traditional C/C++ code. I am learning, and wish my learning experience could be useful to others too.

May Imperative and Functional live happily together ever after!

Footnotes

1. C++03
2. transform – C++ Reference
3. C++11
4. back_inserter – C++ Reference
5. There are ways to optimize the code for efficiency (and make the code more complicated). You may want to check out a more complete implementation here.
6. accumulate – C++ Reference
7. Bartosz Milewski’s Programming Cafe
8. What Does Haskell Have to Do with C++?
9. To be honest, perfect forwarding was added later. Worse, I did not make it right. See Type Deduction and My Reference Mistakes for my updated code.
10. C++14
11. Lazy evaluation
12. Range Comprehensions