One environment I have access to uses a PPTP VPN to allow people to connect to the site remotely.1 One thing that had been troublesome was that there were always people complaining that they could not access the Internet after connecting to the VPN.
I was not concerned in the beginning as my test showed no problem: it seemed my browser had no problems opening http://www.taobao.com/ after connecting to the VPN. Actually, my test was flawed and limited, as I only accessed one or two sites in a virtual machine (my laptop ran a macOS version that no longer supported PPTP). More on this immediately.
Our previous VPN server had a problem, and we switched to the Linux-based pptpd last week. After the set-up was done, I checked with other users and found the web access problem persisted. This time I sat down with one user and looked into the problem together. It turned out that, after connecting to the VPN, he was able to access http://www.taobao.com/, but not http://www.baidu.com/, which was actually the default web page for many people. And I could reproduce this behaviour in my virtual machine. . . .
My experience told me that it was very much like an MTU-related problem (I have encountered plenty of MTU-related networking problems). I checked the server-side script, and found it already clamped the MSS value to 1356, while the MTU value for the PPP connections was 1396. All seemed quite reasonable.
When in doubt with a network problem, a sniffer should always be in your weaponry. I launched tcpdump on the server, and analysed the result in Wireshark. Something became clearer soon.
For the traffic between the pptpd server and Baidu (when a client visited the web site), the following things occurred:
- The pptpd server started a connection to the web server, with MSS = 1356
- The web server responded with MSS = 1380
- The web server soon sent a packet as large as 1420 bytes (TCP payload length is 1380 bytes)
- The pptpd server responded with ICMP Destination unreachable (Fragmentation needed), in which the next-hop MTU of 1396 was reported
- The above two steps were repeated, and nothing was improved
For the traffic between the pptpd server and Taobao, things were slightly different:
- The pptpd server started a connection to the web server, with MSS = 1356
- The web server responded with MSS = 1380
- The web server soon sent a packet as large as 1420 bytes (TCP payload length is 1380 bytes)
- The pptpd server responded with ICMP Destination unreachable (Fragmentation needed), in which the next-hop MTU of 1396 was reported
- A few milliseconds later, the web server began to send TCP packets no larger than 1396 bytes
- Now the pptpd server and the web server continued to exchange packets without any problems
Apparently there was an ICMP black hole between our server and the Baidu server, but not between our server and the Taobao server.
Once the issue was found, the solution was easy. Initially, I just ran a cron job to check all the PPP connections and changed their MTU value to 1468 (though 1420 should be good enough in my case). The better way, of course, was to change the MTU on new client connections. It could be done via the script /etc/ppp/ip-up
, but the environment variable name for the network interface—which I found on the web—was wrong in the beginning. After dumping all the existing environment variables in the script, I finally got the correct name. The following line in /etc/ppp/ip-up
was able to get the job done:
ifconfig $IFNAME mtu 1468
Only one thing remained mysterious now: why didn’t the MSS value in the server script take effect? A packet capture on a server I could control confirmed what I guessed, i.e. the MSS value in the TCP SYN packets from our pptpd server was clamped to 1380. It could be the router, or the ISP. Whatever it is, it really should not have clamped the value up.
In summary, problems occurred because:
- The MSS value was increased, but pptpd did not know and still enforced a small MTU value on the PPP connections, which no longer matched the MSS
- Path MTU discovery also failed because of the existence of ICMP black holes
Bad things can always happen, and we sometimes just have to find a way around.
- PPTP is not considered secure enough, but is quite convenient, especially because UDP port 500 is not usable in our case due to a router compatibility problem. 😔 ↩