25x Performance Boost in Two Hours

Our system has a find_child_regions API, which, as the name indicates, can find subregions of a region up to a certain level. It needs to look up two MongoDB collections, combine the data in a certain structure, and return the result in JSON.

One day, it was reported that the API was slow for big data sets. Tests showed that it took more than 50 seconds to return close to 6000 records. Er . . . that means the average processing speed is only about 100 records a second—not terribly slow, but definitely not ideal.

When there is a performance problem, a profiler is always your friend.1 Profiling quickly revealed that a database read function was called about twice the number of returned records, and occupied the biggest chunk of time. The reason was that the function first found out all the IDs of the regions to return, and then it read all the data and generated the result. Since the data were already read once when the IDs were returned, they could be saved and reused. I had to write a new function, which resembled the function that returned region IDs, but returned objects that contained all the data read instead (we had such a class already). I also needed to split the result-generating function into two, so that either the region IDs, or the data objects, could be accepted. (I could not change these functions directly, as they have many other users than find_child_regions; changing all of them at once would have been both risky and unnecessary.)

In about 30 minutes, this change generated the expected improvement: call time was shortened to about 30 seconds. A good start!

While the improvement percentage looked nice, the absolute time taken was still a bit long. So I continued to look for further optimization chances.

Seeing that database reading was still the bottleneck and the database read function was still called for each record returned, I thought I should try batch reading. Fortunately, I found I only needed to change one function. Basically, I needed to change something like the following

result = []
for x in xs:
    object_id = f(x)
    obj = get_from_db(object_id, …)
    if obj:
        result.append(obj)
return result

to

object_ids = [f(x) for x in xs]
return find_in_db({"_id": {"$in": object_ids}}, …)

I.e. in that specific function, all data of one level of subregions were read in one batch. Getting four levels of subregions took only four database reads, instead of 6000. This reduced the latency significantly.

In 30 minutes, the call time was again reduced, from 30 seconds to 14 seconds. Not bad!

Again, the profiler showed that database reading was still the bottleneck. I made more experiments, and found that the data object could be sizeable, whereas we did not always need all data fields. We might only need, say, 100 bytes from each record, but the average size of each region was more than 50 KB. The functions involved always read the full record, something equivalent to the traditional SQL statement ‘SELECT * FROM ...’. It was convenient, but not efficient. MongoDB APIs provided a projection parameter, which allowed callers to specify which fields to read from the collection, so I tried it. We had the infrastructure in place, and it was not very difficult. It took me about an hour to make it fully work, as many functions needed to be changed to pass the (optional) projection/field names around. When it finally worked, the result was stunning: if one only needed the basic fields about the regions, the call time could be less than 2 seconds. Terrific!

While Python is not a performant language, and I still like C++, I am glad that Python was chosen for this project. The performance improvement by the C++ language would have been negligible when the call time was more than 50 seconds, and still a small number when I improved its performance to less than 2 seconds. In the meanwhile, it would have been simply impossible for me to refactor the code and achieve the same performance in two hours if the code had been written in C++. I highly doubt whether I could have finished the job in a full day. I would probably have been fighting with the compiler and type system most of the time, instead of focusing on the logic and testing.

Life is short—choose your language wisely.


  1. Being able to profile Python programs easily was actually the main reason I purchased a professional licence of PyCharm, instead of just using the Community Edition. 

Pipenv and Relocatable Virtual Environments

Pipenv is a very useful tool to create and maintain independent Python working environments. Using it feels like a breeze. There are enough online tutorials about it, and I will only talk about one specific thing in this article: how to move a virtual environment to another machine.

The reason I need to make virtual environments movable is that our clients do not usually allow direct Internet access in production environments, therefore we cannot install packages from online sources on production servers. They also often enforce a certain directory structure. So we need to prepare the environment in our test environment, and it would be better if we did not need to worry about where we put the result on the production server. Virtual environments, especially with the help of Pipenv, seem to provide a nice and painless way of achieving this effect—if we can just make the result of pipenv install movable, or, in the term of virtualenv, relocatable.

virtualenv is already able to make most of the virtual environment relocatable. When working with Pipenv, it can be as simple as

virtualenv --relocatable `pipenv --venv`

There are two problems, though:

They are not difficult to solve, and we can conquer them one by one.

As pointed out in the issue discussion, one only needs to replace one line in activate to make it relocatable. What is originally

VIRTUAL_ENV="/home/yongwei/.local/share/virtualenvs/something--PD5l8nP"

should be changed to

VIRTUAL_ENV=$(cd $(dirname "$BASH_SOURCE"); dirname `pwd`)

To be on the safe side, I would look for exactly the same line and replace it, so some sed tricks are needed. I also need to take care of the differences between BSD sed and GNU sed, but it is a problem already solved before.

The second problem is even easier. Creating a new relative symlink solves the problem.

I’ll share the final result here, a simple script that can make a virtual environment relocatable, as well as creating a tarball from it. The archive has ‘-venv-platform’ as the suffix, but it does not include a root directory. Keep this in mind when you unpack the tarball.

#!/bin/sh

case $(sed --version 2>&1) in
  *GNU*) sed_i () { sed -i "$@"; };;
  *) sed_i () { sed -i '' "$@"; };;
esac

sed_escape() {
  echo $1|sed -e 's/[]\/$*.^[]/\\&/g'
}

VENV_PATH=`pipenv --venv`
if [ $? -ne 0 ]; then
  exit 1
fi
virtualenv --relocatable "$VENV_PATH"

VENV_PATH_ESC=`sed_escape "$VENV_PATH"`
RUN_PATH=`pwd`
BASE_NAME=`basename "$RUN_PATH"`
PLATFORM=`python -c 'import sys; print(sys.platform)'`
cd "$VENV_PATH"
sed_i "s/^VIRTUAL_ENV=\"$VENV_PATH_ESC\"/VIRTUAL_ENV=\$(cd \$(dirname \"\$BASH_SOURCE\"); dirname \`pwd\`)/" bin/activate
[ -h lib64 ] && rm -f lib64 && ln -s lib lib64
tar cvfz $RUN_PATH/$BASE_NAME-venv-$PLATFORM.tar.gz .

After running the script, I can copy result tarball to another machine of the same OS, unpack it, and then either use the activate script or set the PYTHONPATH environment variable to make my Python program work. Problem solved.

A last note: I have not touched activate.csh and activate.fish, as I do not use them. If you did, you would need to update the script accordingly. That would be your homework as an open-source user. 😼


  1. I tried removing it, and Pipenv was very unhappy. 

A VPN Issue with MTU

One environment I have access to uses a PPTP VPN to allow people to connect to the site remotely.1 One thing that had been troublesome was that there were always people complaining that they could not access the Internet after connecting to the VPN.

I was not concerned in the beginning as my test showed no problem: it seemed my browser had no problems opening http://www.taobao.com/ after connecting to the VPN. Actually, my test was flawed and limited, as I only accessed one or two sites in a virtual machine (my laptop ran a macOS version that no longer supported PPTP). More on this immediately.

Our previous VPN server had a problem, and we switched to the Linux-based pptpd last week. After the set-up was done, I checked with other users and found the web access problem persisted. This time I sat down with one user and looked into the problem together. It turned out that, after connecting to the VPN, he was able to access http://www.taobao.com/, but not http://www.baidu.com/, which was actually the default web page for many people. And I could reproduce this behaviour in my virtual machine. . . .

My experience told me that it was very much like an MTU-related problem (I have encountered plenty of MTU-related networking problems). I checked the server-side script, and found it already clamped the MSS value to 1356, while the MTU value for the PPP connections was 1396. All seemed quite reasonable.

When in doubt with a network problem, a sniffer should always be in your weaponry. I launched tcpdump on the server, and analysed the result in Wireshark. Something became clearer soon.

For the traffic between the pptpd server and Baidu (when a client visited the web site), the following things occurred:

  1. The pptpd server started a connection to the web server, with MSS = 1356
  2. The web server responded with MSS = 1380
  3. The web server soon sent a packet as large as 1420 bytes (TCP payload length is 1380 bytes)
  4. The pptpd server responded with ICMP Destination unreachable (Fragmentation needed), in which the next-hop MTU of 1396 was reported
  5. The above two steps were repeated, and nothing was improved

For the traffic between the pptpd server and Taobao, things were slightly different:

  1. The pptpd server started a connection to the web server, with MSS = 1356
  2. The web server responded with MSS = 1380
  3. The web server soon sent a packet as large as 1420 bytes (TCP payload length is 1380 bytes)
  4. The pptpd server responded with ICMP Destination unreachable (Fragmentation needed), in which the next-hop MTU of 1396 was reported
  5. A few milliseconds later, the web server began to send TCP packets no larger than 1396 bytes
  6. Now the pptpd server and the web server continued to exchange packets without any problems

Apparently there was an ICMP black hole between our server and the Baidu server, but not between our server and the Taobao server.

Once the issue was found, the solution was easy. Initially, I just ran a cron job to check all the PPP connections and changed their MTU value to 1468 (though 1420 should be good enough in my case). The better way, of course, was to change the MTU on new client connections. It could be done via the script /etc/ppp/ip-up, but the environment variable name for the network interface—which I found on the web—was wrong in the beginning. After dumping all the existing environment variables in the script, I finally got the correct name. The following line in /etc/ppp/ip-up was able to get the job done:

ifconfig $IFNAME mtu 1468

Only one thing remained mysterious now: why didn’t the MSS value in the server script take effect? A packet capture on a server I could control confirmed what I guessed, i.e. the MSS value in the TCP SYN packets from our pptpd server was clamped to 1380. It could be the router, or the ISP. Whatever it is, it really should not have clamped the value up.

In summary, problems occurred because:

  • The MSS value was increased, but pptpd did not know and still enforced a small MTU value on the PPP connections, which no longer matched the MSS
  • Path MTU discovery also failed because of the existence of ICMP black holes

Bad things can always happen, and we sometimes just have to find a way around.


  1. PPTP is not considered secure enough, but is quite convenient, especially because UDP port 500 is not usable in our case due to a router compatibility problem. 😔