A Small Experiment of System Scripting in Python

My main laptop is still on Mac OS X Lion (10.7). I know I am guilty of exposing my laptop to potential security risks,1 but some of my paid applications do not work on newer OS X versions without an upgrade. I am an austere person and do not want to pay the money yet. In addition, I am also a little bit nostalgic about the skeuomorphic design, though I know some day I will have to use a Mac that has the latest macOS version in order to use new applications. Anyway, I am just procrastinating now, until some sexy new laptop from Apple makes me take out my wallet, or my old laptop goes crazy.

Sorry for this verbose beginning. What I really want to whine about is that Homebrew has stopped supporting my obsolete version of OS X, and I am relying more and more on MacPorts.2 I even had to rebuild most of my ‘ports’ (the term for packages in MacPorts) because the ‘standard’ way of building ports on Lion does not use libc++, while it is necessary for some ports.3 Unlike Homebrew, MacPorts does not show whether a dependency of a port is already installed or not. Worse, MacPorts packages often have heavy dependencies. For example, the command-line tool mkvtoolnix currently has 20 (recursive) dependencies in Homebrew, but 60 dependencies in MacPorts. My default compiler is clang-3.7, which has 46 dependencies. That pretty much makes the ‘port rdeps’ command useless.

A Google search showed this port command could be helpful:

port echo rdepof:PORT_NAME and not installed

However, more investigation showed there were several problems:

  • One cannot specify variants (like ‘+openmp’).
  • An option (like ‘configure.compiler=macports-clang-3.7’) can affect dependencies, but options do not have the intended effect in the ‘port echo’ command.
  • The recursion is not ‘cut’ when a port is already installed, which can result in unnecessary ports.

This problem had fretted me for some time, before I finally decided to take some action. Naturally, the ultimate solution is write some code. I normally use Bash or Perl for such scripting tasks, but, as I have become more and more interested in Python recently, I decided also to give Python a try to see how it handles such tasks.

I first wrote a Bash version for comparison purposes. It was not recursive, though (too cumbersome for Bash):

#!/bin/bash
function escape {
  printf "%s" "$1" | sed 's/[.*\[]/\\&/g'
}

INSTALLED=`port installed \
         | sed -n 's/^  \([A-Za-z_][^ ]*\).*/-e ^\1$/p'`
INSTALLED_ESC=`escape "$INSTALLED"`
port deps "$@" | sed -n 's/.*Dependencies:[[:space:]]*//p' \
               | sed $'s/, /\\\n/g' \
               | sort \
               | uniq \
               | grep -v $INSTALLED_ESC

Let me explain the code quickly (assuming you are familiar with the basic use of Bash and common Unix tools). ‘port installed’ returns the installed ports, and every line beginning with two spaces are port names followed by other information (like version). I retrieve the port names, and wrap each of them with ‘-e ^…$’. Since they will be used for grep, special characters need to be escaped (practically only ‘.’). I then invoke ‘port deps’ with the command-line arguments, look for lines containing ‘Dependencies:’, get everything after it, split at the commas to get the depended ports, sort the ports, remove duplicates, and filter out all installed ports from the result.

It basically works, and the code is succinct. It is also far from elegant, and quite error-prone. A Bash function feels like a hack. The quotation rules are tricky (when invoking escape, $INSTALLED must be quoted; but when invoking grep, $INSTALLED_ESC must not be quoted). Escaping can easily get problematic when used inside quotation marks. And so on. . . . It is difficult to imagine people can write Bash scripts without some trial and error, even though only a few lines are written.

I knew some Python, but I am not very familiar with it. So I was basically writing while Googling. I got the first version, sort of an equivalent of the Bash script, in about two hours:

#!/usr/bin/env python
#coding: utf-8

import re
import sys
import subprocess

# Gets command output as a list of lines
def popen_readlines(cmd):
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    p.wait()
    if p.returncode != 0:
        raise subprocess.CalledProcessError(p.returncode, \
                                            cmd)
    else:
        return map(lambda line: line.rstrip('\n'), \
                   p.stdout.readlines())

# Gets the port name from a line like
# "  gcc6 @6.1.0_0 (active)"
def get_port_name(port_line):
    return re.sub(r'^  (\S+).*', r'\1', port_line)

# Gets installed ports as a set
def get_installed():
    installed_ports_lines = \
            popen_readlines(['port', 'installed'])[1:]
    installed_ports = \
            set(map(get_port_name, installed_ports_lines))
    return installed_ports

# Gets dependencies for the given port list (which may
# contain options etc.), as a list, excluding items in
# ignored_ports
def get_deps(ports, ignored_ports):
    deps_raw = popen_readlines(['port', 'deps'] + ports)
    uninstalled_ports = []
    for line in deps_raw:
        if re.search(r'Dependencies:', line):
            deps = re.sub(r'.*Dependencies:\s*', '', \
                          line).split(', ')
            uninstalled_ports += \
                [x for x in deps if x not in ignored_ports]
            ignored_ports |= set(deps)
    return uninstalled_ports

def main():
    if sys.argv[1:]:
        installed_ports = get_installed()
        uninstalled_ports = get_deps(sys.argv[1:], \
                                     installed_ports)
        for port in uninstalled_ports:
            print port

if __name__ == '__main__':
    main()

A few things immediately came to notice:

  • The code is apparently more verbose than Bash or Perl, but arguably also clearer and more readable.
  • Strings are ubiquitous in Bash, but lists are ubiquitous in Python. Python allowed backticks (`…`) for piping, but they are deprecated now in favour of the subprocess routines, which accept the command line as a list.
  • The set is a built-in type and is a breeze to use.
  • I/O is not as easy as in Perl (thinking of <> and chomp now), but can be easily simplified with helper functions, as composability is very good.
  • List comprehension and map are very helpful to keep the code concise.

It is not all. The real fun was that it was easy to convert the code to work recursively on all depended ports. I only needed to add/change seven lines of code, at the beginning and end of get_deps:

def get_deps(ports, ignored_ports):
    # New code to end the recursion
    if ports == []:
        return []

    # This part is not changed
    deps_raw = popen_readlines(['port', 'deps'] + ports)
    uninstalled_ports = []
    for line in deps_raw:
        if re.search(r'Dependencies:', line):
            deps = re.sub(r'.*Dependencies:\s*', '', \
                          line).split(', ')
            uninstalled_ports += \
                [x for x in deps if x not in ignored_ports]
            ignored_ports |= set(deps)

    # New code to call recursively and collect the result
    results = []
    for port in uninstalled_ports:
        results.append(port)
        results += get_deps([port], ignored_ports)
    return results

The output did not show any indentation yet, and I found another problem later. The improved final code looks as follows:

#!/usr/bin/env python
#coding: utf-8

import re
import sys
import subprocess

# Gets command output as a list of lines
def popen_readlines(cmd):
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    p.wait()
    if p.returncode != 0:
        raise subprocess.CalledProcessError(p.returncode, \
                                            cmd)
    else:
        return map(lambda line: line.rstrip('\n'), \
                   p.stdout.readlines())

# Gets the port name from a line like
# "  gcc6 @6.1.0_0 (active)"
def get_port_name(port_line):
    return re.sub(r'^  (\S+).*', r'\1', port_line)

# Gets installed ports as a set
def get_installed():
    installed_ports_lines = \
            popen_readlines(['port', 'installed'])[1:]
    installed_ports = \
            set(map(get_port_name, installed_ports_lines))
    return installed_ports

# Gets port names from items that may contain version
# specifications, variants, or options
def get_ports(ports_and_specs):
    requested_ports = set()
    for item in ports_and_specs:
        if not (re.search(r'^[-+@]', item) or \
                re.search(r'=', item)):
            requested_ports.add(item)
    return requested_ports

# Gets dependencies for the given port list (which may
# contain options etc.), as a list of tuples (combining
# with level), excluding items in ignored_ports
def get_deps(ports, ignored_ports, level):
    if ports == []:
        return []

    deps_raw = popen_readlines(['port', 'deps'] + ports)
    uninstalled_ports = []
    for line in deps_raw:
        if re.search(r'Dependencies:', line):
            deps = re.sub(r'.*Dependencies:\s*', '', \
                          line).split(', ')
            uninstalled_ports += \
                [x for x in deps if x not in ignored_ports]
            ignored_ports |= set(deps)

    port_level_pairs = []
    for port in uninstalled_ports:
        port_level_pairs += [(port, level)]
        port_level_pairs += get_deps([port], \
                                     ignored_ports, \
                                     level + 1)
    return port_level_pairs

def main():
    if sys.argv[1:]:
        ports_and_specs = sys.argv[1:]
        ignored_ports = get_installed() | \
                        get_ports(ports_and_specs)
        uninstalled_ports = get_deps(ports_and_specs, \
                                     ignored_ports, 0)
        for (port, level) in uninstalled_ports:
            print ' ' * (level * 2) + port

if __name__ == '__main__':
    main()

I would say I am very happy, even excited, with the experiment results. No wonder Python has been a great success, despite being verbose and having a slightly weird syntax :-). I guess I would do more Python in the future.

By the way, the code in this article is in Python 2. Python 3 is stricter and even more verbose: I do not see the benefits of using it for system scripting (yet).


  1. Not really. My MacBook Pro has the firewall turned on, it is behind the home router nearly at all times, and I do not visit strange web sites—not with Safari at least. 
  2. Honestly, it is not the fault of Homebrew, or even Apple. However, I do miss the support lifecycle that Microsoft provided for Windows XP. 
  3. For more details, Using libc++ on older system explains the why and how. 
Advertisements