A Journey of Purely Static Linking

As I mentioned last time, I found Microsoft has really messed up its console Unicode support when the C runtime DLL (as versus the static runtime library) is used. So I decided to have a try with linking everything statically in my project that uses C++ REST SDK (a.k.a. cpprestsdk). This is not normally recommended, but in my case it has two obvious advantages:

  • It would solve the Unicode I/O problem.
  • It would be possible to ship just the binaries without requiring the target PC to install the Microsoft Visual C++ runtime.

It took me several hours to get it rolling, but I felt it was worthwhile.

Before I start, I need to mention that cpprestsdk has a solution file that supports building a static library. It turned out not satisfactory:

  • It used NuGet packages for Boost and OpenSSL, and both versions were out of date. Worse, my Visual Studio 2017 IDE hung while I tried to update the packages. Really a nuisance.
  • The static library, as well as all its dependencies like Boost and OpenSSL, still uses the C runtime DLL. I figured it might be easier to go completely on my own.

Prerequisites

Boost

This part is straightforward. After going into the Boost directory, I only need to type (I use version 1.65.1):

bootstrap.bat
.\b2.exe toolset=msvc -j 2 --with-chrono --with-date_time --with-regex --with-system --with-thread release link=static runtime-link=static stage

OpenSSL

As I already have Perl and NASM installed, installing OpenSSL is trivial too (I use version 1.0.2l):

perl Configure VC-WIN32 --prefix=C:/Libraries/OpenSSL
ms\do_nasm
nmake -f ms\nt.mak
nmake -f ms\nt.mak install

zlib

This part requires a small change to the build script (for version 1.2.11). I need to open win32\Makefile.msc and change all occurrences of ‘-MD’ to ‘-MT’. Then these commands will work:

nmake -f win32\Makefile.msc zlib.lib
mkdir C:\Libraries\zlib
mkdir C:\Libraries\zlib\include
mkdir C:\Libraries\zlib\lib
copy zconf.h C:\Libraries\zlib\include
copy zlib.h C:\Libraries\zlib\include
copy zlib.lib C:\Libraries\zlib\lib

Building C++ REST SDK

We need to set some environment variables to help the CMake build system find where the libraries are. I set them in ‘Control Panel > System > Advanced system settings > Environment variables’:1

BOOST_ROOT=C:\src\boost_1_65_1
OPENSSL_ROOT_DIR=C:\Libraries\OpenSSL
INCLUDE=C:\Libraries\zlib\include
LIB=C:\Libraries\zlib\lib

(The above setting assumes Boost is unpacked under C:\src.)

We would need to create the solution files for the current environment under cpprestsdk:

cd Release
mkdir build
cd build
cmake ..

If the environment is set correctly, the last command should succeed and report no errors. A cpprest.sln should be generated now.

We then open this solution file. As we only need the release files, we should change the ‘Solution Configuration’ from ‘Debug’ to ‘Release’. After that, we need to find the project ‘cpprest’ in ‘Solution Explorer’, go to its ‘Properties’, and make the following changes under ‘General’:

  • Set Target Name to ‘cpprest’.
  • Set Target Extension to ‘.lib’.
  • Set Configuration Type to ‘Static library (.lib)’.

And the most important change we need under ‘C/C++ > Code Generation’:

  • Set Runtime Library to ‘Multi-threaded (/MT)’.

Click on ‘OK’ to accept the changes. Then we can build this project.

Like zlib, we need to copy the header and library files to a new path, and add the include and lib directories to environment variables INCLUDE and LIB, respectively. In my case, I have:

INCLUDE=C:\Libraries\cpprestsdk\include;C:\Libraries\zlib\include
LIB=C:\Libraries\cpprestsdk\lib;C:\Libraries\zlib\lib

Change to my Project

Of course, the cpprestsdk-based project needs to be adjusted too. I will first show the diff, and then give some explanations:

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -24,14 +24,25 @@ set(CMAKE_CXX_FLAGS "${ELPP_FLAGS}")

 if(WIN32)
 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}\
- -DELPP_UNICODE -D_WIN32_WINNT=0x0601")
-set(USED_LIBS Boost::dynamic_linking ${Boost_DATE_TIME_LIBRARY} ${Boost_SYSTEM_LIBRARY} ${Boost_THREAD_LIBRARY})
+ -DELPP_UNICODE -D_WIN32_WINNT=0x0601 -D_NO_ASYNCRTIMP")
+set(USED_LIBS Winhttp httpapi bcrypt crypt32 zlib)
 else(WIN32)
 set(USED_LIBS "-lcrypto" OpenSSL::SSL ${Boost_CHRONO_LIBRARY} ${Boost_SYSTEM_LIBRARY} ${Boost_THREAD_LIBRARY})
 endif(WIN32)

 if(MSVC)
 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /EHsc /W3")
+set(CompilerFlags
+        CMAKE_CXX_FLAGS
+        CMAKE_CXX_FLAGS_DEBUG
+        CMAKE_CXX_FLAGS_RELEASE
+        CMAKE_C_FLAGS
+        CMAKE_C_FLAGS_DEBUG
+        CMAKE_C_FLAGS_RELEASE)
+foreach(CompilerFlag ${CompilerFlags})
+  string(REPLACE "/MD" "/MT" ${CompilerFlag}
+         "${${CompilerFlag}}")
+endforeach()
 else(MSVC)
 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -W -Wall -Wfatal-errors")
 endif(MSVC)

There are two blocks of changes. In the first block, one can see that the Boost libraries are no longer needed, but, instead, one needs to link the Windows dependencies of cpprestsdk (I found the list in Release\build\src\cpprest.vcxproj), as well as zlib. One also needs to explicitly define _NO_ASYNCRTIMP so that the cpprestsdk functions will be not treated as dllimport.

As CMake defaults to using ‘/MD’, the second block of changes replaces all occurrences of ‘/MD’ with ‘/MT’ in the compiler flags.2 With these changes, I am able to generate an executable without any external dependencies.

A Gotcha

I am now used to using cmake without specifying the ‘-G’ option on Windows. By default, CMake generates Visual Studio project files: they have several advantages, including multiple configurations (selectable on the MSBuild command line like ‘/p:Configuration=Release’), and parallel building (say, using ‘/m:2’ to take advantage of two processor cores). Neither is possible with nmake. However, the executables built by this method still behave abnormally regarding outputting non-ASCII characters. Actually, I believe everything is still OK at the linking stage, but the build process then touches the executables in some mysterious way and the result becomes bad. I am not familiar with MSBuild well enough to manipulate the result, so I am going back to using ‘cmake -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release’ followed by ‘nmake’ for now.


  1. CMake can recognize Boost and OpenSSL by some known environment variables. I failed to find one that really worked for zlib, so the INCLUDE and LIB variables need to be explicitly set. 
  2. This technique is shamelessly copied from a StackOverflow answer
Advertisements

Another Microsoft Unicode I/O Problem

I encountered an annoying bug in Visual C++ 2017 recently. It started when I found my favourite logging library, Easylogging++, output Chinese as garbage characters on Windows. Checking the documentation carefully, I noted that I should have used the macro START_EASYLOGGINGPP. It turned out to be worse: all output starting from the Chinese character was gone. Puzzled but busy, I put it down and worked on something else.

I spend another hour of detective work on this issue today. The result was quite surprising.

  • First, it is not an issue with Easylogging++. The problem can occur if I purely use std::wcout.
  • Second, the magical thing about START_EASYLOGGINGPP is that it will invoke std::locale::global(std::locale("")). This is the switch that leads to the different behaviour.
  • Myteriously, with the correct locale setting, I can get the correct result with both std::wcout and Easylogging++ in a test program. I was not able to get it working in my real project.
  • Finally, it turns out that the difference above is caused by /MT vs. /MD! The former (default if neither is specified on the command line) tells the Visual C++ compiler to use the static multi-threading library, and the latter (set by default in Visual Studio projects) tells the compiler to use the dynamic multi-threading library.

People may remember that I wrote about MSVCRT.DLL Console I/O Bug. While Visual C++ 2013 shows consistent behaviour between /MT and /MD, Visual C++ 2015 and 2017 exhibit the same woeful bug when /MD is specified on the command line. This is something perplexingly weird: it seems someone at Microsoft messed up with the MSVCRT.DLL shipped with Windows first (around 2006), and then the problem spread to the Visual C++ runtime DLL after nearly a decade!

I am using many modern C++ features, so I definitely do not want to go back to Visual C++ 2013 for the new project. It seems I have to tolerate garbage characters in the log for now. Meanwhile, I submitted a bug to Microsoft. Given that I have a bug report that is deferred for four years, I am not very hopeful. But let us wait and see.

MSVCRT.DLL Console I/O Bug

I have been quite annoyed by a Windows bug that causes a huge number of open-source command-line tools to choke on multi-byte characters at the Windows Command Prompt. The MSVCRT.DLL shipped with Windows Vista or later has been having big troubles with such characters. While Microsoft tools and compilers after Visual Studio 6.0 do not use this DLL anymore, the GNU tools on Windows, usually built by MinGW or Mingw-w64, are dependent on this DLL and suffer from this problem. One cannot even use ls to display a Chinese file name, when the system locale is set to Chinese.

The following simple code snippet demonstrates the problem:

#include <locale.h>
#include <stdio.h>

char msg[] = "\xd7\xd6\xb7\xfb Char";
wchar_t wmsg[] = L"字符 char";

void Test1()
{
    char* ptr = msg;
    printf("Test 1: ");
    while (*ptr) {
        putchar(*ptr++);
    }
    putchar('\n');
}

void Test2()
{
    printf("Test 2: ");
    puts(msg);
}

void Test3()
{
    wchar_t* ptr = wmsg;
    printf("Test 3: ");
    while (*ptr) {
        putwchar(*ptr++);
    }
    putwchar(L'\n');
}

int main()
{
    char buffer[32];
    puts("Default C locale");
    Test1();
    Test2();
    Test3();
    putchar('\n');
    puts("Chinese locale");
    setlocale(LC_CTYPE, "Chinese_China.936");
    Test1();
    Test2();
    Test3();
    putchar('\n');
    puts("English locale");
    setlocale(LC_CTYPE, "English_United States.1252");
    Test1();
    Test2();
    Test3();
}

When built with a modern version of Visual Studio, it gives the expected output (console code page is 936):

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: 字符 char

English locale
Test 1: ×?·? Char
Test 2: ×?·? Char
Test 3:  char

I.e. when the locale is the default ‘C’, the ‘ANSI’ version of character output routines can successfully output single-byte and multi-byte characters, while putwchar, the ‘Unicode’ version of putchar, fails at the multi-byte characters (reasonably, as the C locale does not understand how to translate Chinese characters). When the locale is set correctly to code page 936 (Simplified Chinese), everything is correct. When the locale is set to code page 1252 (Latin), the corresponding characters at the same code points of the original Chinese characters (‘×Ö·û’ instead of ‘字符’) are shown with the ‘ANSI’ routines, though ‘Ö’ (\xd6) and ‘û’ (\xfb) are shown as ‘?’ because they do not exist in code page 936. The Chinese characters, of course, cannot be shown with putwchar in this locale, just like the C locale.

When built with GCC, the result is woeful:

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1:  Char
Test 2: 字符 Char
Test 3:  char

English locale
Test 1: ×?·? Char
Test 2: ×?·? Char
Test 3:  char

Two things are worth noticing:

  • putchar stops working for Chinese when the locale is correctly set.
  • putwchar never works for Chinese.

Horrible and thoroughly broken! (Keep in mind that Microsoft is to blame here. You can compile the program with MSVC 6.0 using the /MD option, and the result will be the same—an executable that works in Windows XP but not in Windows Vista or later.)

I attacked this problem a few years ago, and tried some workarounds. The solution I came up with looked so fragile that I did not push it up to the MinGW library. It was a personal failure, as well as an indication that working around a buggy implementation without affecting the application code can be very difficult or just impossible.


The problem occurs only with the console, where the Microsoft runtime does some translation (broken in MSVCRT.DLL, but OK in newer MSVC runtimes). It vanishes when users redirect the output from the console. So one solution is not to use the Command Prompt at all. The Cygwin Terminal may be a good choice, especially for people familiar with Linux/Unix. I have Cygwin installed, but sometimes I still want to do things in the more Windows-y way. I figured I could make a small tool (like cat) to get the input from stdin, and forward everything to stdout. As long as this tool is compiled by a Microsoft compiler, things should be OK. Then I thought a script could be faster. Finally, I came up with putting the following line into an mbf.bat:

@perl -p -e ""

(Perl is still wonderful for text processing, even in this ‘empty’ program!)

Now the executables built by GCC and MSVC give the same result, if we append ‘|mbf’ on the command line:

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: 字符 char

English locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

If you know how to make Microsoft fix the DLL problem, do it. Otherwise you know at least a workaround now. 🙂


The following code is my original partial solution to the problem, and it may be helpful to your GCC-based project. I don’t claim any copyright of it, nor will I take any responsibilities for its use.

/* mingw_mbcs_safe_io.c */

#include <mbctype.h>
#include <stdio.h>

/* Output functions that work with the Windows 7+ MSVCRT.DLL
 * for multi-byte characters on the console.  Please notice
 * that buffering must not be enabled for the console (e.g.
 * by calling setvbuf); otherwise weird things may occur. */

int __cdecl _mgw_flsbuf(int ch, FILE* fp)
{
  static char lead = '\0';
  int ret = 1;

  if (lead != '\0')
    {
      ret = fprintf(fp, "%c%c", lead, ch);
      lead = '\0';
      if (ret < 0)
        return EOF;
    }
  else if (_ismbblead(ch))
    lead = ch;
  else
    return _flsbuf(ch, fp);

  return ch;
}

int __cdecl putc(int ch, FILE* fp)
{
  static __thread char lead = '\0';
  int ret = 1;

  if (lead != '\0')
    {
      ret = fprintf(fp, "%c%c", lead, ch);
      lead = '\0';
    }
  else if (_ismbblead(ch))
    lead = ch;
  else
    ret = fprintf(fp, "%c", ch);

  if (ret < 0)
    return EOF;
  else
    return ch;
}

int __cdecl putchar(int ch)
{
  putc(ch, stdout);
}

int __cdecl _mgwrt_putchar(int ch)
{
  putc(ch, stdout);
}