blocks|key|306463|text|"ANSI“代码页基本上是遗留的:Windows9X时代。无论如何，所有现代软件都应该基于Unicode+(即UTF-16)。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|306464|基本上，当最初设计Ansi代码页时，UTF-8甚至还没有发明出来，因此对多字节编码的支持是相当随意的(即大多数Ansi代码页是单字节的，除了一些东亚代码页是单字节的)。当所有新的开发都应该用UTF-16完成时，添加对“适当的”多字节编码的支持可能被认为是不值得的。|306465|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|F|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|G|8|@]|9|@]|A|$]]|$1|D|3|-4|5|6|7|H|8|@]|9|@]|A|$]]]|E|$]]

The "ANSI" codepage is basically legacy: Windows 9X era. All modern software should be Unicode (that is, UTF-16) based anyway.

Basically, when the Ansi code page stuff was originally designed, UTF-8 wasn't even invented and so support for multi-byte encodings was rather haphazard (i.e. most Ansi code pages are single byte, with the exception of some East Asian code pages which are one-or-two byte). Adding support for "proper" multi-byte encodings was probably deemed not worth the effort when all new development should be done in UTF-16 anyway.

blocks|key|308579|text|来自微软的国际化专家Michael+Kaplan试图回答这个on+his+blog。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|308580|基本上，他的解释是，尽管Windows+API函数的"ANSI“版本旨在处理不同的代码页，但从历史上看，有一种隐含的期望，即每个代码点最多需要两个字节的字符编码。UTF-8没有达到这个期望，现在改变所有这些功能将需要大量的测试。|308581|entityMap|0|LINK|mutability|MUTABLE|url|http://www.siao2.com/2006/10/11/816996.aspx^0|U|B|0|0|0^^$0|@$1|2|3|4|5|6|7|N|8|@]|9|@$A|O|B|P|1|Q]]|C|$]]|$1|D|3|E|5|6|7|R|8|@]|9|@]|C|$]]|$1|F|3|-4|5|6|7|S|8|@]|9|@]|C|$]]]|G|$H|$5|I|J|K|C|$L|M]]]]

Michael Kaplan, an internationalization expert from Microsoft, tried to answer this <a href="http://www.siao2.com/2006/10/11/816996.aspx" rel="noreferrer">on his blog</a>.

Basically his explanation is that even though the "ANSI" versions of Windows API functions are meant to handle different code pages, historically there was an implicit expectation that character encodings would require at most two bytes per code point. UTF-8 doesn't meet that expectation, and changing all of those functions now would require a massive amount of testing.

blocks|key|3253985|text|_setmbcp()是VC%2B%2B+RTL函数，不是Win32+API函数。它只影响RTL解释字符串的方式。它对Win32+API+A函数没有任何影响。在内部调用对应的W时，A函数始终使用指定代码页0+(CP_ACP)的MultiByteToWideChar()和WideCharToMultiByte()来使用系统默认的Ansi代码页进行转换。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|3253986|entityMap^0|0|A|1S|1|2A|1|2D|1|2S|6|30|L|3M|L|0^^$0|@$1|2|3|4|5|6|7|H|8|@$9|I|A|J|B|C]|$9|K|A|L|B|C]|$9|M|A|N|B|C]|$9|O|A|P|B|C]|$9|Q|A|R|B|C]|$9|S|A|T|B|C]|$9|U|A|V|B|C]]|D|@]|E|$]]|$1|F|3|-4|5|6|7|W|8|@]|D|@]|E|$]]]|G|$]]

<code>_setmbcp()</code> is a VC++ RTL function, not a Win32 API function. It only affects how the RTL interprets strings. It has no effect whatsoever on Win32 API <code>A</code> functions. When they call their <code>W</code> counterparts internally, the <code>A</code> functions always use <code>MultiByteToWideChar()</code> and <code>WideCharToMultiByte()</code> specifying codepage 0 (<code>CP_ACP</code>) to use the system default Ansi codepage for the conversions.

blocks|key|3254067|text|原因与jamesdlin's+answers和下面的注释完全一样：MBCS+is+the+same+as+DBCS+in+Windows和一些函数不能处理长度超过2个字节的字符|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|3254068|3254069|微软表示，UTF-8语言环境可能会破坏一些函数，因为它们被编写为假设多字节编码使用的每个字符不超过2个字节，因此不能将具有更多字节的代码页(如UTF-8+(以及GB+18030，cp54936)+)设置为语言环境。|blockquote|3254070|3254071|https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8|3254072|3254073|因此，在诸如读/写之类的函数中允许使用UTF-8，但在用作区域设置时则不允许|3254074|3254075|然而，微软最终解决了这个问题，所以现在我们可以使用use+UTF-8+as+a+locale了。事实上，微软甚至再次开始推荐ANSI+(-A)，而不是像以前那样推荐Unicode+(-W)版本。MSVC中有一些用于设置字符集的新选项：/execution-charset:utf-8和/utf-8，您也可以在UWP应用程序的appxmanifest中设置ActiveCodePage属性|style|CODE|3254076|从Windows+10+insider+build+17035开始，在引入这些选项之前，还添加了"Beta:+Use+Unicode+UTF-8+for+worldwide+language+support“复选框，用于将区域设置代码页设置为UTF-8|3254077|​|3254078|📷|atomic|3254079|3254080|要打开该对话框，请打开开始菜单，键入"+Region“，然后选择Region+settings+>+Additional+date，time+&Region+settings+>+Change+date，time，or+number+formats+>+Administrative|3254081|启用后，您可以调用setlocale()更改为UTF-8语言环境：|3254082|3254083|从Windows10Build+17134+(2018年4月更新)开始，Universal+C运行时支持使用UTF8代码页。这意味着传递给C运行时函数的char字符串将使用UTF8编码的字符串。要启用UTF-8模式，请在使用setlocale时使用"UTF-8“作为代码页。例如，setlocale(LC_ALL,+".utf8")将对区域设置使用当前默认的Windows+ANSI代码页，对代码页使用UTF-8。|3254084|3254085|UTF-8+Support|3254086|3254087|您也可以在较早的Windows版本中使用此功能|3254088|3254089|要在Windows10之前的操作系统(如Windows7)上使用此功能，您必须使用app-local+deployment或使用Windows+SDK+17134或更高版本的静态链接。对于17134之前的Windows+10操作系统，仅支持静态链接。|3254090|3254091|3254092|另请参阅|3254093|3254094|Is+it+possible+to+set+“locale”+of+a+Windows+application+to+UTF-8?|unordered-list-item|3254095|3254096|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/a/21523914/995714|1|https://docs.microsoft.com/en-us/cpp/text/mbcs-support-in-visual-cpp|2|3|https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page|4|https://docs.microsoft.com/en-us/cpp/build/reference/execution-charset-set-execution-character-set?view=vs-2019|5|https://docs.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=vs-2019|6|IMAGE|IMMUTABLE|imageUrl|https://ask.qcloudimg.com/http-save/yehe-900000/8633417407e9471411aad558d99e1161.png|imageAlt|7|https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=vs-2019#utf-8-support|8|https://docs.microsoft.com/en-us/cpp/windows/universal-crt-deployment?view=vs-2019#local-deployment|9|https://superuser.com/q/1033088/241386^0|3|J|0|X|Z|1|0|0|0|0|0|1S|2|0|0|0|0|1W|2|2J|2|39|O|3Y|6|P|L|3|39|O|4|3Y|6|5|0|0|0|0|1|6|0|0|0|9|B|0|0|24|4|34|9|3W|Q|0|0|0|D|7|0|0|0|0|15|K|8|0|0|0|0|0|0|1T|9|0|0^^$0|@$1|2|3|4|5|6|7|2I|8|@]|9|@$A|2J|B|2K|1|2L]|$A|2M|B|2N|1|2O]]|C|$]]|$1|D|3|-4|5|6|7|2P|8|@]|9|@]|C|$]]|$1|E|3|F|5|G|7|2Q|8|@]|9|@]|C|$]]|$1|H|3|-4|5|6|7|2R|8|@]|9|@]|C|$]]|$1|I|3|J|5|6|7|2S|8|@]|9|@$A|2T|B|2U|1|2V]]|C|$]]|$1|K|3|-4|5|6|7|2W|8|@]|9|@]|C|$]]|$1|L|3|M|5|6|7|2X|8|@]|9|@]|C|$]]|$1|N|3|-4|5|6|7|2Y|8|@]|9|@]|C|$]]|$1|O|3|P|5|6|7|2Z|8|@$A|30|B|31|Q|R]|$A|32|B|33|Q|R]|$A|34|B|35|Q|R]|$A|36|B|37|Q|R]]|9|@$A|38|B|39|1|3A]|$A|3B|B|3C|1|3D]|$A|3E|B|3F|1|3G]]|C|$]]|$1|S|3|T|5|6|7|3H|8|@]|9|@]|C|$]]|$1|U|3|V|5|6|7|3I|8|@]|9|@]|C|$]]|$1|W|3|X|5|Y|7|3J|8|@]|9|@$A|3K|B|3L|1|3M]]|C|$]]|$1|Z|3|V|5|6|7|3N|8|@]|9|@]|C|$]]|$1|10|3|11|5|6|7|3O|8|@]|9|@]|C|$]]|$1|12|3|13|5|6|7|3P|8|@$A|3Q|B|3R|Q|R]]|9|@]|C|$]]|$1|14|3|-4|5|6|7|3S|8|@]|9|@]|C|$]]|$1|15|3|16|5|G|7|3T|8|@$A|3U|B|3V|Q|R]|$A|3W|B|3X|Q|R]|$A|3Y|B|3Z|Q|R]]|9|@]|C|$]]|$1|17|3|-4|5|6|7|40|8|@]|9|@]|C|$]]|$1|18|3|19|5|6|7|41|8|@]|9|@$A|42|B|43|1|44]]|C|$]]|$1|1A|3|-4|5|6|7|45|8|@]|9|@]|C|$]]|$1|1B|3|1C|5|6|7|46|8|@]|9|@]|C|$]]|$1|1D|3|-4|5|6|7|47|8|@]|9|@]|C|$]]|$1|1E|3|1F|5|G|7|48|8|@]|9|@$A|49|B|4A|1|4B]]|C|$]]|$1|1G|3|-4|5|6|7|4C|8|@]|9|@]|C|$]]|$1|1H|3|-4|5|6|7|4D|8|@]|9|@]|C|$]]|$1|1I|3|1J|5|6|7|4E|8|@]|9|@]|C|$]]|$1|1K|3|-4|5|6|7|4F|8|@]|9|@]|C|$]]|$1|1L|3|1M|5|1N|7|4G|8|@]|9|@$A|4H|B|4I|1|4J]]|C|$]]|$1|1O|3|-4|5|6|7|4K|8|@]|9|@]|C|$]]|$1|1P|3|-4|5|6|7|4L|8|@]|9|@]|C|$]]]|1Q|$1R|$5|1S|1T|1U|C|$1V|1W]]|1X|$5|1S|1T|1U|C|$1V|1Y]]|1Z|$5|1S|1T|1U|C|$1V|J]]|20|$5|1S|1T|1U|C|$1V|21]]|22|$5|1S|1T|1U|C|$1V|23]]|24|$5|1S|1T|1U|C|$1V|25]]|26|$5|27|1T|28|C|$29|2A|2B|-4]]|2C|$5|1S|1T|1U|C|$1V|2D]]|2E|$5|1S|1T|1U|C|$1V|2F]]|2G|$5|1S|1T|1U|C|$1V|2H]]]]

The reason is exactly like what was said in <a href="https://stackoverflow.com/a/21523914/995714">jamesdlin's answers</a> and the comments below it: <a href="https://docs.microsoft.com/en-us/cpp/text/mbcs-support-in-visual-cpp" rel="nofollow noreferrer">MBCS is the same as DBCS in Windows</a> and some functions don't work with characters that are longer than 2 bytes
<blockquote>
Microsoft said that a UTF-8 locale might break some functions as they were written to assume multibyte encodings used no more than 2 bytes per character, thus code pages with more bytes such as UTF-8 (and also GB 18030, cp54936) could not be set as the locale.
<a href="https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8</a>
</blockquote>
So UTF-8 was allowed in functions like read/write but not when using as a locale
<hr />
However Microsoft has finally fixed that so now we can <a href="https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page" rel="nofollow noreferrer">use UTF-8 as a locale</a>. In fact MS even started recommending the ANSI APIs (<code>-A</code>) again instead of the Unicode (<code>-W</code>) versions like before. There are some new options in MSVC: <a href="https://docs.microsoft.com/en-us/cpp/build/reference/execution-charset-set-execution-character-set?view=vs-2019" rel="nofollow noreferrer"><code>/execution-charset:utf-8</code></a> and <a href="https://docs.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=vs-2019" rel="nofollow noreferrer"><code>/utf-8</code></a> to set the charset, or you can also set the ActiveCodePage property in appxmanifest of the UWP app
Since Windows 10 insider build 17035, before those options were introduced, a &quot;Beta: Use Unicode UTF-8 for worldwide language support&quot; checkbox had also been added for setting the locale code page to UTF-8
<a href="https://i.stack.imgur.com/heCud.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/heCud.png" alt="Beta: Use Unicode UTF-8 for worldwide language support" /></a>
To open that dialog box open start menu, type &quot;region&quot; and select Region settings &gt; Additional date, time &amp; regional settings &gt; Change date, time, or number formats &gt; Administrative
After enabling it you can call <code>setlocale()</code> to change to UTF-8 locale:
<blockquote>
Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that <code>char</code> strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use &quot;UTF-8&quot; as the code page when using <code>setlocale</code>. For example, <code>setlocale(LC_ALL, &quot;.utf8&quot;)</code> will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.
<a href="https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=vs-2019#utf-8-support" rel="nofollow noreferrer">UTF-8 Support</a>
</blockquote>
You can also use this in older Windows versions
<blockquote>
To use this feature on an OS prior to Windows 10, such as Windows 7, you must use <a href="https://docs.microsoft.com/en-us/cpp/windows/universal-crt-deployment?view=vs-2019#local-deployment" rel="nofollow noreferrer">app-local deployment</a> or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.
</blockquote>
<h1>See also</h1>
<ul>
<li><a href="https://superuser.com/q/1033088/241386">Is it possible to set “locale” of a Windows application to UTF-8?</a></li>
</ul>

The Windows <a href="https://web.archive.org/web/20100108193149/http://msdn.microsoft.com/en-us/library/883tf19a(VS.80).aspx" rel="nofollow noreferrer"><code>_setmbcp</code></a> function allows any valid code page...
<blockquote>
(except UTF-7 and UTF-8, which are not supported)
</blockquote>
OK, not supporting UTF-7 makes sense: Characters have non-unique representations and that introduces complexity and security risks.
But why not UTF-8?
As I understand it, the &quot;ANSI&quot; versions of the Windows API functions convert their arguments to UTF-16, call the equivalent &quot;W&quot; function, and convert any strings in the output to &quot;ANSI&quot;. This is what I've been doing manually. So why can't Windows do it for me?

Why isn't UTF-8 allowed as the "ANSI" code page?

Windows 函数允许任何有效的代码页... (不支持UTF-7和UTF-8除外)好吧，不支持UTF-7是有意义的:字符具有非唯一的表示形式，这会带来复杂性和安全风险。但是为什么不是UTF-8呢？据我所知，Windows API函数的"ANSI“版本将其参数转换为UTF-16，调用等效的"W”函数，并将输出中的任何字符串转换为"ANSI“。这就是我一直在手动做的。那么为什么Windows不能为我

问为什么不允许将UTF-8作为"ANSI“代码页？
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么不允许将UTF-8作为"ANSI“代码页？EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么不允许将UTF-8作为"ANSI“代码页？
EN