Upgrading to Boost 1.61 in MacPorts

The Boost version in MacPorts was still 1.59.0—a year old now. When I wrote about Boost.Coroutine2, I found I had to install the latest Boost version 1.61.0. So I had two sets of Boost libraries on my hard drive, which made things . . . er . . . a little bit complicated. After I built Microsoft’s cpprestsdk last night—I managed to make it find and use the MacPorts Boost libraries—I feel more urged to change the situation. So this morning I subscribed to the MacPorts mailing list and posted the question about the outdated version problem. With the help from Mr Michael Dickens and Google, I have a working port of Boost 1.61.0 now. This article will document the procedure how it works.

The first thing one needs to do is check out the port files from the MacPorts Subversion repository. In my case, The boost files are under devel/boost. So I checked out only the boost directory into ~/Programming/MacPorts/devel.

One then needs to tell MacPorts to look for ports in that directory. There are two steps involved:

  1. Add the URL of the local ports directory (e.g. ‘file:///Users/yongwei/Programming/MacPorts’ in my case) to /opt/local/etc/macports/sources.conf, above the default rsync URL.
  2. Run the portindex command under that directory. It needs to be rerun every time a Portfile is changed.

Now MacPorts should find ports first in my local ports directory and then the system default. And I could begin patching the files.

It turned out that people tried to update boost half a year ago for Boost 1.60, but they found there were failing ports and the ABI was incompatible with 1.59. The patch was still good to me, as I had now a good example. I simply applied the patch, ran portindex again, and went ahead to port upgrade boost.

The procedure turned out quite smooth, though mkvtoolnix, the only installed port that depended on boost on my laptop, failed to run after the upgrade. I had to port uninstall it and then port install it again (rebuilding it).

After I had some confidence, I began to change the port files. I changed first Portfile, which contained the version information and file checksums. Updating them was trivial. When I could see the new version 1.61.0 from port info boost, I kicked off the build with port upgrade boost again.

Then came the more painful process of fixing the patch files under devel/boost/files (the ‘patch’ I mentioned a moment ago actually contained patches for these patch files). Most of these MacPorts-specific patch files could be applied without any problems, but one of them failed. It was actually due to trivial code changes in Boost, but I still had to check all the rejections, manually apply the changes, and generate a new patch file. After that, everything went on smoothly.

Against all my hopes, I found that I had to rebuild mkvtoolnix yet again. So the ABI instability is really an issue, and I understand now why boost was stuck at the old version for such a long time. However, I consider my task completed, when I uploaded the updated patch to the MacPorts ticket. At least I have the new working port of boost for myself now. And you can have it too.

Python yield and C++ Coroutines

Back in 2008, an old friend challenged me with a programming puzzle, when we both attended a wedding. He later gave a solution in Python. Comparing with my C++ solution, the Python one has about half the code lines. I was not smart enough to begin learning Python then, but instead put an end to my little interest in Python with a presentation in C++ Conference China 2009. I compared the basic constructs, and concluded they were equivalent, not realizing that Python had more to offer than that trivial programming exercise showed.

Fast forwarding to today (2016), I am really writing some (small) programs in Python. I have begun to appreciate Python more and more. For many of the tasks, the performance loss in executing the code is ignorable, but the productivity boost is huge. I have also realized that there are constructs in Python that are not easily reproducible in other languages. Generator/yield is one of them.

The other day, I was writing a small program to sort hosts based on dot-reversed order so as to group the host names in a more reasonable order (regarding ‘www.wordpress.com’ as ‘com.wordpress.www’). I quickly came up with a solution in Python. The code is very short and I can show it right here:

def backsort(lines):
    result = {}
    for line in lines:
        result['.'.join(reversed(line.split('.')))] = line
    return map(lambda item: item[1],
               sorted(result.items()))

Of course, we can implement a similar function in C++11. We will immediately find that there are no standard implementations for split and join (see Appendix below for my implementation). Regardless, we can write some code like:

template <typename C>
vector<string> backsort(C&& lines)
{
    map<string, string> rmap;
    for (auto& line : lines) {
        auto split_line = split(line, '.');
        reverse(split_line.begin(), split_line.end());
        rmap[join(split_line, '.')] = line;
    }
    vector<string> result(rmap.size());
    transform(rmap.begin(), rmap.end(), result.begin(),
              [](const pair<string, string>& pr)
              {
                  return pr.second;
              });
    return result;
}

Even though it has twice the non-trivial lines of code and is a function template, there is immediately something Python can do readily but C++ cannot. I can give the Python file handle (like os.stdin) directly to backsort, and the for line will iterate through the file content.1 This is because the Python file object implements the iterator protocol over lines of text, but the C++ istream does not do anything similar.

Let us forget this C++ detail, and focus on the problem. My Python code accepts an iterator, and ‘backsorts’ all the input lines. Can we make it process multiple files (like the cat command line), without changing the backsort function?

Of course it can be done. There is a traditional way, and there is a smart way. The traditional way is write a class that implements the iterator protocol (which can be readily modelled by C++):

class cat:
    def __init__(self, files):
        self.files = files
        self.cur_file = None

    def __iter__(self):
        return self

    def next(self):
        while True:
            if self.cur_file:
                line = self.cur_file.readline()
                if line:
                    return line.rstrip('\n')
                self.cur_file.close()
            if self.files:
                self.cur_file = open(self.files[0])
                self.files = self.files[1:]
            else:
                raise StopIteration()

We can then cat files by the following lines:

if __name__ == '__main__':
    if sys.argv[1:]:
        for line in cat(sys.argv[1:]):
            print(line)

Using yield, we can reduce the 18 lines of code of cat to only 5:

def cat(files):
    for fn in files:
        with open(fn) as f:
            for line in f:
                yield line.rstrip('\n')

There is no more bookkeeping of the current file and the unprocessed files, and everything is wrapped in simple loops. Isn’t that amazing? I actually learnt about the concept before (in C#), but never used it in real code—perhaps because I was too much framed by existing code, using callbacks, observer pattern, and the like.—Those ‘patterns’ now look ugly, when compared to the simplicity of generators.

Here comes the real challenge for C++ developers: Can we do the same in C++? Can we do something better than inelegant callbacks? 2


My investigations so far indicate the following: No C++ standards (up to C++14) support such constructs, and there is no portable way to implement them as a library.

Are we doomed? No. Apart from standardization efforts regarding coroutines (which is the ancient name for a superset of generators, dated from 1958) in C++,3 there have been at least five cross-platform implementations for C++:

  • The unofficial Boost.Coroutine by Giovanni P. Deretta (2006), compatible with Windows, Linux, and maybe a few Unix variants (tested not working on OS X); apparently abandoned.4
  • The official Boost.Coroutine by Oliver Kowalke (2013), compatible with ARM, MIPS, PPC, SPARC, x86, and x86-64 architectures.
  • The official Boost.Coroutine2 by Oliver Kowalke (2015), compatible with the same hardware architectures but only C++ compilers/code conformant to the C++14 standard.
  • Mordor by Mozy (2010), compatible with Windows, Linux, and OS X, but seemingly no longer maintained.
  • CO2 by Jamboree (2015), supporting stackless coroutines only, using preprocessor tricks, and requiring C++14.

As Boost.Coroutine2 looks modern, is well-maintained, and is very much comparable to the Python constructs, I will use it in the rest of this article.5 It hides all the platform details with the help of Boost.Context. Now I can write code simply as follows for cat:

typedef boost::coroutines2::coroutine<const string&> coro_t;

void cat(coro_t::push_type& yield, int argc, char* argv[])
{
    for (int i = 1; i < argc; ++i) {
        ifstream ifs(argv[i]);
        for (;;) {
            string line;
            if (getline(ifs, line)) {
                yield(line);
            } else {
                break;
            }
        }
    }
}

int main(int argc, char* argv[])
{
    using namespace std::placeholders;
    for (auto& line : coro_t::pull_type(
             boost::coroutines2::fixedsize_stack(),
             bind(cat, _1, argc, argv))) {
        cout << line << endl;
    }
}

Is this simple and straightforward? The only thing that is not quite intuitive is the detail that the constructor of pull_type expects the second argument to be a function object that takes a push_type& as the only argument. That is why we need to use bind to generate it—a lambda expression being the other alternative.

I definitely believe being able to write coroutines is a big step forward to make C++ more expressive. I can foresee many tasks simplified, like recursive parsing. I believe this will prove very helpful in the C++ weaponry. I only wish we could see it standardized soon.

Appendix

The complete backsort code in Python:

#!/usr/bin/env python
#coding: utf-8

import sys

def cat(files):
    for fn in files:
        with open(fn) as f:
            for line in f:
                yield line.rstrip('\n')

def backsort(lines):
    result = {}
    for line in lines:
        result['.'.join(reversed(line.split('.')))] = line
    return map(lambda item: item[1],
               sorted(result.items()))

def main():
    if sys.argv[1:]:
        result = backsort(cat(sys.argv[1:]))
    else:
        result = backsort(map(
                lambda line: line.rstrip('\n'), sys.stdin))
    for line in result:
        print(line)

if __name__ == '__main__':
    main()

The complete backsort code in C++:

#include <assert.h>         // assert
#include <algorithm>        // std::reverse/transform
#include <fstream>          // std::ifstream
#include <functional>       // std::bind
#include <iostream>         // std::cin/cout
#include <map>              // std::map
#include <string>           // std::string
#include <vector>           // std::vector
#include <boost/coroutine2/all.hpp>

using namespace std;

typedef boost::coroutines2::coroutine<const string&> coro_t;

void cat(coro_t::push_type& yield, int argc, char* argv[])
{
    for (int i = 1; i < argc; ++i) {
        ifstream ifs(argv[i]);
        for (;;) {
            string line;
            if (getline(ifs, line)) {
                yield(line);
            } else {
                break;
            }
        }
    }
}

vector<string> split(const string& str, char delim)
{
    vector<string> result;
    string::size_type last_pos = 0;
    string::size_type pos = str.find(delim);
    while (pos != string::npos) {
        result.push_back(
            str.substr(last_pos, pos - last_pos));
        last_pos = pos + 1;
        pos = str.find(delim, last_pos);
        if (pos == string::npos) {
            result.push_back(str.substr(last_pos));
        }
    }
    return result;
}

template <typename C>
string join(const C& str_list, char delim)
{
    string result;
    for (auto& item : str_list) {
        result += item;
        result += delim;
    }
    if (result.size() != 0) {
        result.resize(result.size() - 1);
    }
    return result;
}

template <typename C>
vector<string> backsort(C&& lines)
{
    map<string, string> rmap;
    for (auto& line : lines) {
        auto split_line = split(line, '.');
        reverse(split_line.begin(), split_line.end());
        rmap[join(split_line, '.')] = line;
    }
    vector<string> result(rmap.size());
    transform(rmap.begin(), rmap.end(), result.begin(),
              [](const pair<string, string>& pr)
              {
                  return pr.second;
              });
    return result;
}

class istream_line_reader {
public:
    class iterator { // implements InputIterator
    public:
        typedef const string& reference;
        typedef string value_type;

        iterator() : stream_(nullptr) {}
        explicit iterator(istream& is) : stream_(&is)
        {
            ++*this;
        }

        reference operator*()
        {
            assert(stream_ != nullptr);
            return line_;
        }
        value_type* operator->()
        {
            assert(stream_ != nullptr);
            return &line_;
        }
        iterator& operator++()
        {
            getline(*stream_, line_);
            if (!*stream_) {
                stream_ = nullptr;
            }
            return *this;
        }
        iterator operator++(int)
        {
            iterator temp(*this);
            ++*this;
            return temp;
        }

        bool operator==(const iterator& rhs) const
        {
            return stream_ == rhs.stream_;
        }
        bool operator!=(const iterator& rhs) const
        {
            return !operator==(rhs);
        }

    private:
        istream* stream_;
        string line_;
    };

    explicit istream_line_reader(istream& is)
        : stream_(is)
    {
    }
    iterator begin() const
    {
        return iterator(stream_);
    }
    iterator end() const
    {
        return iterator();
    }

private:
    istream& stream_;
};

int main(int argc, char* argv[])
{
    using namespace std::placeholders;
    vector<string> result;
    if (argc > 1) {
        result = backsort(coro_t::pull_type(
            boost::coroutines2::fixedsize_stack(),
            bind(cat, _1, argc, argv)));
    } else {
        result = backsort(istream_line_reader(cin));
    }
    for (auto& item : result) {
       cout << item << endl;
    }
}

The istream_line_reader class is not really necessary, and we can simplify it with coroutines. I am including it here only to show what we have to write ‘normally’ (if we cannot use coroutines). Even if we remove it entirely, the C++ version will still have about three times as many non-trivial lines of code as the Python equivalent. It is enough proof to me that I should move away from C++ a little bit. . . .


  1. There is one gotcha: the ‘\n’ character will be part of the string. It will be handled in my solution. 
  2. Generally speaking, callbacks or similar techniques are what C++ programmers tend to use in similar circumstances, if the ‘producer’ part is complicated (otherwise the iterator pattern may be more suitable). Unfortunately, we cannot then combine the use of two simple functions like cat and backsort simultaneously. If we used callbacks, backsort would need to be modified and fragmented. 
  3. P0057 is one such effort, which is experimentally implemented in Visual Studio 2015
  4. According to the acknowledgement pages of next two Boost projects, Giovanni Deretta contributed to them. So his work was not in vain. 
  5. This said, CO2 is also well-maintained, and is more efficient if only a stackless coroutine is needed. See Jamboree’s answer on StackOverflow. The difference looks small to me, and preprocessor tricks are not my favourites, so I will not spend more time on CO2 for now.