Fixing A VS2017 15.6 Installation Problem

After installing the latest Visual Studio 2017 15.6.6 (Community Edition), I found my custom setting of environment variables INCLUDE lost effect in the Developer Command Prompt. Strangely, LIB was still there. Some tracing indicated that it was a bug in the .BAT files Microsoft provided to initialize the environment. The offending lines are the following (in C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\Tools\vsdevcmd\core\winsdk.bat; one of them is very long and requires scrolling for viewing):

@REM the folowing are architecture neutral
set __tmpwinsdk_include=
if "%INCLUDE%" NEQ "" set "__tmp_include=;%INCLUDE%"
set "INCLUDE=%WindowsSdkDir%include\%WindowsSDKVersion%shared;%WindowsSdkDir%include\%WindowsSDKVersion%um;%WindowsSdkDir%include\%WindowsSDKVersion%winrt;%WindowsSdkDir%include\%WindowsSDKVersion%cppwinrt%__tmpwinsdk_include%"
set __tmpwinsdk_include=

Apparently somebody missed renaming __tmp_include to __tmpwinsdk_include. Doing that myself fixed the problem.

I’ve reported the problem to Microsoft. In the meanwhile, you know how to fix it if you encounter the same problem.

Advertisements

Another Microsoft Unicode I/O Problem

I encountered an annoying bug in Visual C++ 2017 recently. It started when I found my favourite logging library, Easylogging++, output Chinese as garbage characters on Windows. Checking the documentation carefully, I noted that I should have used the macro START_EASYLOGGINGPP. It turned out to be worse: all output starting from the Chinese character was gone. Puzzled but busy, I put it down and worked on something else.

I spend another hour of detective work on this issue today. The result was quite surprising.

  • First, it is not an issue with Easylogging++. The problem can occur if I purely use std::wcout.
  • Second, the magical thing about START_EASYLOGGINGPP is that it will invoke std::locale::global(std::locale("")). This is the switch that leads to the different behaviour.
  • Myteriously, with the correct locale setting, I can get the correct result with both std::wcout and Easylogging++ in a test program. I was not able to get it working in my real project.
  • Finally, it turns out that the difference above is caused by /MT vs. /MD! The former (default if neither is specified on the command line) tells the Visual C++ compiler to use the static multi-threading library, and the latter (set by default in Visual Studio projects) tells the compiler to use the dynamic multi-threading library.

People may remember that I wrote about MSVCRT.DLL Console I/O Bug. While Visual C++ 2013 shows consistent behaviour between /MT and /MD, Visual C++ 2015 and 2017 exhibit the same woeful bug when /MD is specified on the command line. This is something perplexingly weird: it seems someone at Microsoft messed up with the MSVCRT.DLL shipped with Windows first (around 2006), and then the problem spread to the Visual C++ runtime DLL after nearly a decade!

I am using many modern C++ features, so I definitely do not want to go back to Visual C++ 2013 for new projects. It seems I have to tolerate garbage characters in the log for now. Meanwhile, I submitted a bug to Microsoft. Given that I have a bug report that is deferred for four years, I am not very hopeful. But let us wait and see.

Update (20 December 2017)

A few more tests show that the debug versions (/MTd and /MDd) both work well. So only the default release build (using the dynamic C runtime) exhibits this problem, where the executable depends on DLLs like api-ms-win-crt-stdio-l1-1-0.dll. It seems this issue is related to the Universal C Runtime introduced in Visual Studio 2015 and Windows 10. . . .

Update (25 March 2018)

The bug was closed, and a Microsoft developer indicated that the issue had already been fixed since the Windows 10 Anniversary Update SDK (build 10.0.14393). Actually I had had build 10.0.15063 installed. The reason why I still saw the problem was that the Universal C Runtime on Windows 7 had not been updated (‘the issue will be fixed in a future update to the Universal C Runtime on Windows 7’), and I should not have seen the problem on a Windows 10 box. The current workaround is either use static linking (as I did), or copy the redistributable DLLs under C:\Program Files (x86)\Windows Kits\10\Redist\ucrt\DLLs\x86 (or x64 etc.) to the app directory (so called ‘app-local deployment’; which should not be used on Windows 10, as the system version is always preferred). My test showed that copying ucrtbase.dll was enough to fix my test case.

MSVCRT.DLL Console I/O Bug

I have been quite annoyed by a Windows bug that causes a huge number of open-source command-line tools to choke on multi-byte characters at the Windows Command Prompt. The MSVCRT.DLL shipped with Windows Vista or later has been having big troubles with such characters. While Microsoft tools and compilers after Visual Studio 6.0 do not use this DLL anymore, the GNU tools on Windows, usually built by MinGW or Mingw-w64, are dependent on this DLL and suffer from this problem. One cannot even use ls to display a Chinese file name, when the system locale is set to Chinese.

The following simple code snippet demonstrates the problem:

#include <locale.h>
#include <stdio.h>

char msg[] = "\xd7\xd6\xb7\xfb Char";
wchar_t wmsg[] = L"字符 char";

void Test1()
{
    char* ptr = msg;
    printf("Test 1: ");
    while (*ptr) {
        putchar(*ptr++);
    }
    putchar('\n');
}

void Test2()
{
    printf("Test 2: ");
    puts(msg);
}

void Test3()
{
    wchar_t* ptr = wmsg;
    printf("Test 3: ");
    while (*ptr) {
        putwchar(*ptr++);
    }
    putwchar(L'\n');
}

int main()
{
    char buffer[32];
    puts("Default C locale");
    Test1();
    Test2();
    Test3();
    putchar('\n');
    puts("Chinese locale");
    setlocale(LC_CTYPE, "Chinese_China.936");
    Test1();
    Test2();
    Test3();
    putchar('\n');
    puts("English locale");
    setlocale(LC_CTYPE, "English_United States.1252");
    Test1();
    Test2();
    Test3();
}

When built with a modern version of Visual Studio, it gives the expected output (console code page is 936):

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: 字符 char

English locale
Test 1: ×?·? Char
Test 2: ×?·? Char
Test 3:  char

I.e. when the locale is the default ‘C’, the ‘ANSI’ version of character output routines can successfully output single-byte and multi-byte characters, while putwchar, the ‘Unicode’ version of putchar, fails at the multi-byte characters (reasonably, as the C locale does not understand how to translate Chinese characters). When the locale is set correctly to code page 936 (Simplified Chinese), everything is correct. When the locale is set to code page 1252 (Latin), the corresponding characters at the same code points of the original Chinese characters (‘×Ö·û’ instead of ‘字符’) are shown with the ‘ANSI’ routines, though ‘Ö’ (\xd6) and ‘û’ (\xfb) are shown as ‘?’ because they do not exist in code page 936. The Chinese characters, of course, cannot be shown with putwchar in this locale, just like the C locale.

When built with GCC, the result is woeful:

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1:  Char
Test 2: 字符 Char
Test 3:  char

English locale
Test 1: ×?·? Char
Test 2: ×?·? Char
Test 3:  char

Two things are worth noticing:

  • putchar stops working for Chinese when the locale is correctly set.
  • putwchar never works for Chinese.

Horrible and thoroughly broken! (Keep in mind that Microsoft is to blame here. You can compile the program with MSVC 6.0 using the /MD option, and the result will be the same—an executable that works in Windows XP but not in Windows Vista or later.)

I attacked this problem a few years ago, and tried some workarounds. The solution I came up with looked so fragile that I did not push it up to the MinGW library. It was a personal failure, as well as an indication that working around a buggy implementation without affecting the application code can be very difficult or just impossible.


The problem occurs only with the console, where the Microsoft runtime does some translation (broken in MSVCRT.DLL, but OK in newer MSVC runtimes). It vanishes when users redirect the output from the console. So one solution is not to use the Command Prompt at all. The Cygwin Terminal may be a good choice, especially for people familiar with Linux/Unix. I have Cygwin installed, but sometimes I still want to do things in the more Windows-y way. I figured I could make a small tool (like cat) to get the input from stdin, and forward everything to stdout. As long as this tool is compiled by a Microsoft compiler, things should be OK. Then I thought a script could be faster. Finally, I came up with putting the following line into an mbf.bat:

@perl -p -e ""

(Perl is still wonderful for text processing, even in this ‘empty’ program!)

Now the executables built by GCC and MSVC give the same result, if we append ‘|mbf’ on the command line:

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

Chinese locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: 字符 char

English locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3:  char

If you know how to make Microsoft fix the DLL problem, do it. Otherwise you know at least a workaround now. 🙂


The following code is my original partial solution to the problem, and it may be helpful to your GCC-based project. I don’t claim any copyright of it, nor will I take any responsibilities for its use.

/* mingw_mbcs_safe_io.c */

#include <mbctype.h>
#include <stdio.h>

/* Output functions that work with the Windows 7+ MSVCRT.DLL
 * for multi-byte characters on the console.  Please notice
 * that buffering must not be enabled for the console (e.g.
 * by calling setvbuf); otherwise weird things may occur. */

int __cdecl _mgw_flsbuf(int ch, FILE* fp)
{
  static char lead = '\0';
  int ret = 1;

  if (lead != '\0')
    {
      ret = fprintf(fp, "%c%c", lead, ch);
      lead = '\0';
      if (ret < 0)
        return EOF;
    }
  else if (_ismbblead(ch))
    lead = ch;
  else
    return _flsbuf(ch, fp);

  return ch;
}

int __cdecl putc(int ch, FILE* fp)
{
  static __thread char lead = '\0';
  int ret = 1;

  if (lead != '\0')
    {
      ret = fprintf(fp, "%c%c", lead, ch);
      lead = '\0';
    }
  else if (_ismbblead(ch))
    lead = ch;
  else
    ret = fprintf(fp, "%c", ch);

  if (ret < 0)
    return EOF;
  else
    return ch;
}

int __cdecl putchar(int ch)
{
  putc(ch, stdout);
}

int __cdecl _mgwrt_putchar(int ch)
{
  putc(ch, stdout);
}

A Complaint of ODF’s Asian Language Support

I have recently read about news about better support for ODF from Google. The author then went on to complain that neither Google nor Microsoft makes ‘it easy to use ODF as part of a workflow’. This reminds me that maybe I should write down a long-time complaint I have for ODF.

I have always loved open standards. However, there are not only open and proprietary standards, there are also good and bad standards. ODF looks pretty bad regarding Asian language support. It can be powerfully demonstrated by this image:

ODF Issue

If you are interested in it, you can download the document yourself. It simply contains four lines:

  • The first line has a left quotation mark, the English word ‘Test’, and the corresponding Chinese word. It looks OK.
  • The second line is a duplication of the first line, with an additional colon added at the beginning. It immediately changes the font of the left quotation mark.
  • The third line is a duplication of the second line, with the Chinese word removed. Both quotation marks are now using the default Western font ‘Times New Roman’.
  • The fourth line is a duplication of the third line, with the leading colon removed. Weirdly enough, the left quotation mark now uses the Chinese font. (This may be related to my using the Chinese OpenOffice version or Chinese Windows OS.)

Is it ridiculous that adding or removing a character can change how other characters are rendered? Still, I would not blog about it, if it had only been a bug in OpenOffice (actually I filed three bug reports back in 2006—more ancient than I thought—and this bug remains unfixed ). It actually seems a problem in the ODF standard. After extracting the content from the .ODT file (as a zip file), I can shrink the content of the document to these XML lines (content.xml with irrelevant contents removed and the result reformatted):

<office:font-face-decls>
<style:font-face
    style:name="Times New Roman"
    style:font-family-generic="roman"
    style:font-pitch="variable"/>
<style:font-face
    style:name="宋体"
    style:font-family-generic="system"
    style:font-pitch="variable"/>
</office:font-face-decls>
<office:automatic-styles>
<style:style
    style:name="P1"
    style:family="paragraph"
    style:parent-style-name="Standard">
<style:text-properties
    fo:font-size="12pt" fo:language="en" fo:country="GB"
    style:language-asian="zh" style:country-asian="CN"/>
</style:style>
</office:automatic-styles>
<office:body>
<office:text>
<text:p text:style-name="P1">“Test测试”</text:p>
<text:p text:style-name="P1">:“Test测试”</text:p>
<text:p text:style-name="P1">:“Test”</text:p>
<text:p text:style-name="P1">“Test”</text:p>
</office:text>
</office:body>

The problem is that instead of specifying a single language on any text, it specifies both a ‘fo:language’ and a ‘style:language-asian’. The designer of this feature definitely did not think carefully about the fact that many symbols exist in both Asian and non-Asian contexts and can often be rendered differently!

When I repeated the same process in Microsoft Word (on Windows), all text appeared correctly—Microsoft applications recognize which keyboard I use and which language it represents. Pasting as plain text introduced one error (as no language information is present). Even in that case, fixing the problem is easier. In OpenOffice I have to change the font manually, but in Microsoft Word I only need to specify the correct language (‘Office, this is English, not Chinese’). It is much more intuitive and natural.

I also analysed the XML in the resulting .DOCX file. Its styles.xml contained this:

<w:lang w:val="en-US" w:eastAsia="zh-CN" w:bidi="ar-SA"/>

So these are default languages. I had to use UK English and Traditional Chinese to force Word to specify the languages in the document explicitly. The embedded document.xml now contains content like the following:

<w:p>
<w:r>
<w:rPr>
<w:rFonts w:eastAsia="PMingLiU" w:hint="eastAsia"/>
<w:lang w:eastAsia="zh-TW"/>
</w:rPr>
<w:t>“</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rFonts w:eastAsia="PMingLiU"/>
<w:lang w:val="en-GB" w:eastAsia="zh-TW"/>
</w:rPr>
<w:t>Test</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rFonts w:eastAsia="PMingLiU" w:hint="eastAsia"/>
<w:lang w:eastAsia="zh-TW"/>
</w:rPr>
<w:t>測試”</w:t>
</w:r>
</w:p>
...
<w:p>
<w:r>
<w:rPr>
<w:rFonts w:eastAsia="PMingLiU"/>
<w:lang w:val="en-GB" w:eastAsia="zh-TW"/>
</w:rPr>
<w:t>“Test”</w:t>
</w:r>
</w:p>

We can argue the structure is somewhat similar (compare ‘w:val’ in <w:lang> with ‘fo:language’ and ‘fo:country’, and ‘w:eastAsia’ with ‘style:language-asian’ and ‘style:country-asian’), but the semantics are obviously different, and text of different languages is not mixed together. The English text has the language attribute <w:lang w:val="en-GB" w:eastAsia="zh-TW"/>, and the Chinese text has only <w:lang w:eastAsia="zh-TW"/>. It looks to me a more robust approach to processing mixed text.

Although it might be true that Microsoft lobbied strongly to get OOXML approved as an international standard, I do not think ODF’s openness alone is enough to make people truly adopt it.