MSVCRT.DLL Console I/O Bug

I have been quite annoyed by a Windows bug that causes a huge number of open-source command-line tools to choke on multi-byte characters at the Windows Command Prompt. The MSVCRT.DLL shipped with Windows Vista or later has been having big troubles with such characters. While Microsoft tools and compilers after Visual Studio 6.0 do not use this DLL anymore, the GNU tools on Windows, usually built by MinGW or Mingw-w64, are dependent on this DLL and suffer from this problem. One cannot even use ls to display a Chinese file name, when the system locale is set to Chinese.

The following simple code snippet demonstrates the problem:

#include <locale.h>
#include <stdio.h>

char msg[] = "\xd7\xd6\xb7\xfb Char";
wchar_t wmsg[] = L"字符 char";

void Test1()
{
    char* ptr = msg;
    printf("Test 1: ");
    while (*ptr) {
        putchar(*ptr++);
    }
    putchar('\n');
}

void Test2()
{
    printf("Test 2: ");
    puts(msg);
}

void Test3()
{
    wchar_t* ptr = wmsg;
    printf("Test 3: ");
    while (*ptr) {
        putwchar(*ptr++);
    }
    putwchar(L'\n');
}

int main()
{
    char buffer[32];
    puts("Default C locale");
    Test1();
    Test2();
    Test3();
    putchar('\n');
    puts("Chinese locale");
    setlocale(LC_CTYPE, "Chinese_China.936");
    Test1();
    Test2();
    Test3();
    putchar('\n');
    puts("English locale");
    setlocale(LC_CTYPE, "English_United States.1252");
    Test1();
    Test2();
    Test3();
}

When built with a modern version of Visual Studio, it gives the expected output (console code page is 936):

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: 字符 char

English locale
Test 1: ×?·? Char
Test 2: ×?·? Char
Test 3:  char

I.e. when the locale is the default ‘C’, the ‘ANSI’ version of character output routines can successfully output single-byte and multi-byte characters, while putwchar, the ‘Unicode’ version of putchar, fails at the multi-byte characters (reasonably, as the C locale does not understand how to translate Chinese characters). When the locale is set correctly to code page 936 (Simplified Chinese), everything is correct. When the locale is set to code page 1252 (Latin), the corresponding characters at the same code points of the original Chinese characters (‘×Ö·û’ instead of ‘字符’) are shown with the ‘ANSI’ routines, though ‘Ö’ (\xd6) and ‘û’ (\xfb) are shown as ‘?’ because they do not exist in code page 936. The Chinese characters, of course, cannot be shown with putwchar in this locale, just like the C locale.

When built with GCC, the result is woeful:

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1:  Char
Test 2: 字符 Char
Test 3:  char

English locale
Test 1: ×?·? Char
Test 2: ×?·? Char
Test 3:  char

Two things are worth noticing:

  • putchar stops working for Chinese when the locale is correctly set.
  • putwchar never works for Chinese.

Horrible and thoroughly broken! (Keep in mind that Microsoft is to blame here. You can compile the program with MSVC 6.0 using the /MD option, and the result will be the same—an executable that works in Windows XP but not in Windows Vista or later.)

I attacked this problem a few years ago, and tried some workarounds. The solution I came up with looked so fragile that I did not push it up to the MinGW library. It was a personal failure, as well as an indication that working around a buggy implementation without affecting the application code can be very difficult or just impossible.


The problem occurs only with the console, where the Microsoft runtime does some translation (broken in MSVCRT.DLL, but OK in newer MSVC runtimes). It vanishes when users redirect the output from the console. So one solution is not to use the Command Prompt at all. The Cygwin Terminal may be a good choice, especially for people familiar with Linux/Unix. I have Cygwin installed, but sometimes I still want to do things in the more Windows-y way. I figured I could make a small tool (like cat) to get the input from stdin, and forward everything to stdout. As long as this tool is compiled by a Microsoft compiler, things should be OK. Then I thought a script could be faster. Finally, I came up with putting the following line into an mbf.bat:

@perl -p -e ""

(Perl is still wonderful for text processing, even in this ‘empty’ program!)

Now the executables built by GCC and MSVC give the same result, if we append ‘|mbf’ on the command line:

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: 字符 char

English locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

If you know how to make Microsoft fix the DLL problem, do it. Otherwise you know at least a workaround now. 🙂


The following code is my original partial solution to the problem, and it may be helpful to your GCC-based project. I don’t claim any copyright of it, nor will I take any responsibilities for its use.

/* mingw_mbcs_safe_io.c */

#include <mbctype.h>
#include <stdio.h>

/* Output functions that work with the Windows 7+ MSVCRT.DLL
 * for multi-byte characters on the console.  Please notice
 * that buffering must not be enabled for the console (e.g.
 * by calling setvbuf); otherwise weird things may occur. */

int __cdecl _mgw_flsbuf(int ch, FILE* fp)
{
  static char lead = '\0';
  int ret = 1;

  if (lead != '\0')
    {
      ret = fprintf(fp, "%c%c", lead, ch);
      lead = '\0';
      if (ret < 0)
        return EOF;
    }
  else if (_ismbblead(ch))
    lead = ch;
  else
    return _flsbuf(ch, fp);

  return ch;
}

int __cdecl putc(int ch, FILE* fp)
{
  static __thread char lead = '\0';
  int ret = 1;

  if (lead != '\0')
    {
      ret = fprintf(fp, "%c%c", lead, ch);
      lead = '\0';
    }
  else if (_ismbblead(ch))
    lead = ch;
  else
    ret = fprintf(fp, "%c", ch);

  if (ret < 0)
    return EOF;
  else
    return ch;
}

int __cdecl putchar(int ch)
{
  putc(ch, stdout);
}

int __cdecl _mgwrt_putchar(int ch)
{
  putc(ch, stdout);
}
Advertisements

Installing Clang 3.5 for Windows

I had used LLVM 3.4 on Windows for quite some time. It had worked well, and had all the features I needed—mostly the beautiful C++11/C++14 compiler and the easy-to-use Clang-Format. However, the C++ compiler only works when some specific GCC versions are installed together, and the GCC version 4.6.3 I installed for Clang has a conflict with the GCC 4.9 I use. The major issue is the C++ run-time library libstdc++-6.dll, which actually has many variants due to the combination of different thread models and different exception handling methods. The result is that GCC 4.9 generated executables will crash when the libstdc++-6.dll from GCC 4.6.3 appears earlier in path, and Clang generated executables will crash when the libstdc++-6.dll from GCC 4.9 appears earlier in path. I do not like this situation. So recently I tried new combinations when I installed LLVM 3.5, and made sure everything work together. I would like to share the result.

Let me first list the binary files one needs to download:

I install Clang to the default location, C:\Program Files (x86)\LLVM. For the rest of this article, I assume GCC 4.9.2 is extracted to C:\ (so all files are under C:\mingw32), and GCC 4.8.2 is extracted to C:\Temp (all files are under C:\Temp\mingw32).

Although I need GCC 4.9 for the best and latest C++ features, Clang does not work with it. One can tell from the error output of Clang that it should work with the MinGW-w64 GCC 4.8.2:

ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.0"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.0/x86_64-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.0/i686-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.0/backward"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.1"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.1/x86_64-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.1/i686-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.1/backward"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.2"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.2/x86_64-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.2/i686-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.2/backward"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.3"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.3/x86_64-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.3/i686-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.7.3/backward"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.0"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.0/x86_64-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.0/i686-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.0/backward"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.1"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.1/x86_64-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.1/i686-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.1/backward"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.2"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.2/x86_64-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.2/i686-w64-mingw32"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0/../../../include/c++/4.8.2/backward"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.0/include/c++"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.0/include/c++/mingw32"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.0/include/c++/backward"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.1/include/c++"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.1/include/c++/mingw32"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.1/include/c++/backward"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.2/include/c++"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.2/include/c++/mingw32"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.2/include/c++/backward"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.3/include/c++"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.3/include/c++/mingw32"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.7.3/include/c++/backward"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.8.0/include/c++"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.8.0/include/c++/mingw32"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.8.0/include/c++/backward"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.8.1/include/c++"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.8.1/include/c++/mingw32"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.8.1/include/c++/backward"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.8.2/include/c++"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.8.2/include/c++/mingw32"
ignoring nonexistent directory "c:/MinGW/lib/gcc/mingw32/4.8.2/include/c++/backward"
ignoring nonexistent directory "/usr/include/c++/4.4"
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "C:\Program Files (x86)\LLVM\bin\..\lib\clang\3.5.0\../../../x86_64-w64-mingw32/include"
ignoring nonexistent directory "/mingw/include"
ignoring nonexistent directory "c:/mingw/include"
ignoring nonexistent directory "/usr/include"

(As one may expect from the error messages, the official MinGW GCC, currently at version 4.8.1, also works with Clang. I personally prefer MinGW-w64, as its GCC is more usable—e.g., the MinGW version supports only Win32 threads, and therefore does not support std::thread. MinGW does not provide GCC 4.9 yet, and you can’t put C:\MinGW\bin in the path, if you want to use MinGW-w64 GCC 4.9 simultaneously. You do need to put either C:\MinGW\bin or C:\mingw32\bin—for MinGW-w64 GCC 4.9—in the path, as Clang cannot find a working GCC for linking otherwise. If you use only MinGW GCC 4.8.1, or only MinGW-w64 GCC 4.9, this configuration works.)

Now back to MinGW-w64 GCC 4.8.2. Depending on the size of your hard disk, you may want to tailor it. In my case, I removed all traces of Fortran, Ada, and Objective-C, as well as build-info.txt, etc, license, opt, and shared from C:\Temp\mingw32. After that, you need to do the following to make GCC 4.8.2 work for Clang:

  • Make directory c++ under C:\Temp\mingw32\include.
  • Make directory 4.8.2 under C:\Temp\mingw32\include\c++.
  • Copy all contents under C:\Temp\mingw32\i686-w64-mingw32\include\c++ to C:\Temp\mingw32\include\c++\4.8.2.
  • Move all contents under C:\Temp\mingw32 to C:\Program Files (x86)\LLVM, merging with existing directories there.
  • Remove the empty C:\Temp\mingw32.

You can now add both C:\mingw32\bin and C:\Program Files (x86)\LLVM to the path: both Clang and GCC are at your hand and won’t conflict with each other.