Y Combinator and C++

If one searches for ‘Y combinator’ now, it is likely that the first hit is Paul Graham’s incubator company Y Combinator. I am not sure whether Paul likes this fact or not—as a Lisp hacker, he must like the Y combinator very much to name his company with it, but now people looking for information for the mathematical/programmatic concept may be distracted by his company information. If he is truly still a Lisp hacker, he may even regret it a bit—though it apparently benefits his company.

Y combinator is an intriguing concept. When I first saw it, I spent the whole weekend reading and studying about it. Still, I did not fully get it. People think the Y combinator is an important concept in functional programming1:

… we can similarly use knowledge of the Y combinator as a dividing line between programmers who are “functionally literate” (i.e. have a reasonably deep knowledge of functional programming) and those who aren’t.

There are a lot of existing materials to introduce the Y combinator, but I have not yet known one that is written for C++ programmers. I would like to fill this gap.

This said, I still will start from functional languages, but you, dear reader, are not required to know one in advance. The focus is on the concept, and I will show running C++ code soon.

Now the journey begins.

Lambda Calculus

Of course you can find a huge amount of materials about lambda calculus, and I will describe just enough for C++ programmers to have the basic concept. Instead of the more common way of defining a square function as ‘sqr(x) = x · x’, one can define it as a lambda expression:

     sqr = λx.x · x

And that is exactly how languages like Haskell and Scheme define such a function, bar the syntax difference.

Haskell:

sqr = \ x -> x * x

Scheme:

(define sqr (lambda (x) (* x x)))

The key benefit is that you do not need a name for functions now. In order to apply the sqr function to 3, one can simply write (you may notice that parentheses are only used to override associativity, but not required around 3: this is the usual convention):

     (λx.x · x) 3

It is also intuitive how it is evaluated (called β-reduction in the terminology of lambda calculus): one replaces the occurrences of x with 3, and removes the initial λx. So the result is simply 3 · 3 = 9.

The C++ code for printing the result would be (assuming int type; also note that parentheses are required around ‘3’ in C++):

cout << [](int x) { return x * x; } (3) << endl;

However, considering that C++’s grammar is not really helpful to see lambda expressions clearly, it is probably better to use a name wherever appropriate:

auto const sqr = [](int x) { return x * x; };
cout << sqr(3) << endl;

Please be aware that auto is used here, as each C++ lambda expression has a unique type. However, a lambda expression can be converted to a function object (defined in <functional>). The example above can be changed as follows (‘std::’ is omitted), and the code will still compile and run (maybe with a negligible overhead):

function<int(int)> const sqr = [](int x) { return x * x; };
cout << sqr(3) << endl;

Fixed-point Combinator

A fixed-point combinator2 is a higher-order function that satisfied the following equation:

     y f = f (y f)

One can see immediately that this definition can lead to infinite expansion:

     y f = f (y f) = f (f (y f)) = f (… f (y f)…)

And that is exactly where the power lies. If the function y is given, it is possible to define a recursive function in a non-recursive way! Let us take factorial as an example. The traditional definition should be like:

     fact n = If IsZero n then 1 else n × fact (n − 1)

Assume a function F exists so that fact = y F:

     (y F) n = If IsZero n then 1 else n × (y F) (n − 1)
     F (y F) n = If IsZero n then 1 else n × (y F) (n − 1)

Replace y F with f:

     F f n = If IsZero n then 1 else n × f (n − 1)

We now get a function F that can be defined without any recursions. We can define factorial exactly this way in Haskell:

fix f = f (fix f)
fact = fix $ \ f n -> if (n == 0) then 1 else n * f (n - 1)

This definition has two problems:

  • It works only in a lazy language3 like Haskell.
  • Recursion occurs in the definition of fix.

I will address the first problem in the next section, and leave the second to the section after.

How to Overcome Laziness

The laziness of Haskell is great power, but it is not available in most other languages. As I mentioned in my last blog4, the solution is quite simple: an additional layer of indirection. A lambda expression is not evaluated before the argument is given, so the following transformation will help (η-abstraction; assuming f takes one argument):

     fλx.f x

The following program works in lazy Scheme:

(define Y
  (lambda (f)
    (f (Y f))))

(define F
  (lambda (f)
    (lambda (n)
      (cond
        ((eq? n 0) 1)
        (else (* n (f (- n 1))))))))

(define fact (Y F))

Before going ahead to transform the code for strict5 Scheme, let us analyse the involved types first. That will be very important when we try to implement it in C++, and the misunderstanding of the types led me to think that the Y combinator was unimplementable in C++. A Haskell-like notation is used below:

  • f (in F) ∷ int → int
  • F f ∷ int → int
  • Y F ∷ int → int
  • F ∷ (int → int) → int → int
  • Y ∷ ((int → int) → int → int) → int → int

We can see that f (see line 10 in the code above), F f (see line 7), and Y F (which is the factorial function) are all first-order functions that take an integer and return an integer. Therefore, F is a second-order function that takes such a first-order function and returns a first-order function. Our Y takes a second-order function like F, and returns a first-order function.

Seeing that Y f  is a first-order function, we can now apply the η-abstraction on it to eliminate the infinite expansion. The following definition of Y makes the program work again in strict Scheme:

(define Y
  (lambda (f)
    (f (lambda (x) ((Y f) x)))))

In the meantime, we have prepared enough for an implementation in C++, and I can show you the code now:

#include <iostream>
#include <functional>

using namespace std;

template <typename T, typename R>
function<R(T)> Y(
    function<function<R(T)>(function<R(T)>)> f)
{   // Y f = f (λx.(Y f) x)
    return f([=](T x) { return Y(f)(x); });
}

typedef function<int(int)>         fn_1ord;
typedef function<fn_1ord(fn_1ord)> fn_2ord;

fn_2ord almost_fact = [](fn_1ord f)
{
    return [f](int n)
    {
        if (n == 0) return 1; else return n * f(n - 1);
    };
};

int main()
{
    fn_1ord fact = Y(almost_fact);
    cout << "fact(5) = " << fact(5) << endl;
}

Hopefully everything now looks familiar and natural (except the C++ syntax, which is still quite cumbersome for the job).

Curry’s Y Combinator

Now we can come back and address the recursive definition problem. It is probably not a problem for practical usage, but it is very interesting indeed. The solution we encounter most often, the Y combinator, is attributed to Haskell Curry6 (yes, the Haskell language is named after him). It is also mind-boggling, like any of the self-referencing puzzles and paradoxes. I am amazed by it, and that is why I want to write this blog.

There are some good references about deriving the Y combinator7 8 9, and I will simply present the result here:

     Y = λf.(λx.f (x x)) (λx.f (x x))

I also want to show to you that this definition satisfies the fixed-point equation:

     Y F = (λf.(λx.f (x x)) (λx.f (x x))) F   (expand Y)
            = (λx.F (x x)) (λx.F (x x))           (β-reduction on the bound variable f)
            = F ((λx.F (x x)) (λx.F (x x)))   (β-reduction on the first bound variable x)
            = F (Y F)

Notice there is an equivalent form (β-reduce it once to get the Y above):

     Y = λf.(λx.x x) (λx.f (x x))

Basically, this is the definition in lazy Scheme:

(define Y
  (lambda (f)
    ((lambda (x) (x x))
     (lambda (x) (f (x x))))))

Recalling that the function argument of Y—which is f above—takes a first-order function as argument, we know x x should return a first-order function. So we can apply the η-abstraction to get the strict Scheme version:

(define Y
  (lambda (f)
    ((lambda (x) (x x))
     (lambda (x) (f (lambda (y) ((x x) y)))))))

Before implementing it in C++, we have one more difficulty. A statically typed language cannot make a function call like x x, at least not directly so, as the type system will forbid it. Even Haskell, which normally allows people to write very succinct code, cannot express the Y as directly as Scheme. One has to fool the type system with a proxy object.

Here is how it goes in C++:

template <typename F>
struct self_ref_func {
    function<F(self_ref_func)> fn;
};

I.e. an object of this type contains a function that takes an object of its own type and returns an object of the template type, which is again a function. E.g.:

typedef function<int(int)>     fn_1ord;
typedef self_ref_func<fn_1ord> fn_self_ref;
fn_self_ref x = { … };    // assign a suitable value to x
x.fn(x);                  // (x x)

With the help of this self-referencing type, Curry’s Y combinator can finally be implemented in C++:

template <typename T, typename R>
function<R(T)> Y(
    function<function<R(T)>(function<R(T)>)> f)
{   // Y = λf.(λx.x x) (λx.f (λy.(x x) y))
    typedef function<R(T)>          fn_1ord;
    typedef self_ref_func<fn_1ord>  fn_self_ref;
    fn_self_ref r = {
        [f](fn_self_ref x)
        {   // λx.f (λy.(x x) y)
            return f(fn_1ord([x](T y)
                             {
                                 return x.fn(x)(y);
                             }));
        }
    };
    return r.fn(r);
}

Summary

I have shown how the famous Y combinator can be implemented in C++, and what techniques are required to do it. I wish you found the information useful. If you have gone thus far, you will also probably be interested in reading about practical uses of the Y combinator10, and you will probably want a decent Haskell implementation11 and Scheme implementation12. I have also put the C++ code for Y (slightly changed) in a public repository13, as ready-to-use library code. People have actually implemented the Y combinator in more than five dozen languages14.

My C++ code is tested on Clang 3.5 and GCC 4.9 in C++11 mode. Haskell code is tested on GHC 7.8.3, and Scheme code is tested on DrRacket 6.1.

Have fun!

Update (5 May 2016)

The code here only demonstrates what can be done in C++, but is not optimized for speed. I have just noticed that there is now a proposal15 to add Y combinator to the C++ standard library. To my amazement, the reference implementation is both simple and fast—in my test, its speed is close to that of native recursive functions (-O3 under Clang, or -O2 under GCC). You probably want to check it out!


  1. Mike Vanier: The Y Combinator (Slight Return) or How to Succeed at Recursion Without Really Recursing, section ‘Why Y?’. 
  2. Wikipedia: Fixed-point combinator
  3. Wikipedia: Lazy evaluation
  4. Yongwei Wu: Study Notes: Functional Programming with C++
  5. Wikipedia: Strict programming language
  6. Felice Cardone and J. Roger Hindley: History of Lambda-calculus and Combinatory Logic (PDF), pp. 8–9. 
  7. Mike Vanier: The Y Combinator (Slight Return) or How to Succeed at Recursion Without Really Recursing, section ‘Deriving the Y combinator’. 
  8. Matthias Felleisen: A Lecture on the Why of Y (PS). 
  9. Daniel P. Friedman and Matthias Felleisen: The Little Schemer, chapter 9. MIT Press, Cambridge, Mass., 4th edition, 1995. 
  10. Bruce J. McAdams: That About Wraps it Up: Using FIX to Handle Errors Without Exceptions, and Other Programming Tricks
  11. GHC is the standard Haskell implementation. 
  12. Racket is a nice Scheme platform. 
  13. functional.h contains fix_simple, fix_curry, and fix_function_converter make_curry, among other templates. 
  14. Rosetta Code: Y Combinator
  15. Yegor Derevenets: A Proposal to Add Y Combinator to the Standard Library

5 thoughts on “Y Combinator and C++

  1. In Wikipedia, the Y combinator is an “implementation of the fixed point combinator”, which in turn is defined as a “higher-order function that returns some fixed point of its argument function, if one exists”. The property Yf=f(Yf) is a sequence there. You use Yf=f(Yf) as a definition. The Y combinator you implemented works for your definition, but it’s not a fixed point combinator as it does not return the fixed point of a function. Is the Wikipedia definition wrong?

    Like

    • y f = f (y f) is the (symbolic) definition of a fixed-point combinator, but it is not the definition of the Y combinator. The Y combinator refers the Curry’s version, λf.(λx.f (x x)) (λx.f (x x)), and it is a fixed-point combinator.

      Like

      • Well λf.(λx.f (x x)) (λx.f (x x)) seems right, but above you cite Y = λf.(λx.x x) (λx.f (x x)) and comment in the code you comment is as Y = λf.(λx.x x) (λx.f (λy.(x x) y)). This is hard to get. Did you check f(Yf), f(f(Yf)), …?

        Like

      • @Pavel, it seems you have missed this sentence in the article: ‘Notice there is an equivalent form (β-reduce it once to get the Y above)’.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s