10bet网址
MySQL 8.0 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr)- 41.5Mb
PDF (A4)- 41.6Mb
Man Pages (TGZ)- 262.2Kb
Man Pages (Zip)- 372.3Kb
Info (Gzip)- 4.0Mb
Info (Zip)- 4.0Mb
Excerpts from this Manual

10.10.7.1 The cp932 Character Set

Why iscp932needed?

In MySQL, thesjischaracter set corresponds to theShift_JIScharacter set defined by IANA, which supports JIS X0201 and JIS X0208 characters. (Seehttp://www.iana.org/assignments/character-sets.)

However, the meaning ofSHIFT JISas a descriptive term has become very vague and it often includes the extensions toShift_JISthat are defined by various vendors.

For example,SHIFT JISused in Japanese Windows environments is a Microsoft extension ofShift_JISand its exact name isMicrosoft Windows Codepage : 932orcp932. In addition to the characters supported byShift_JIS,cp932supports extension characters such as NEC special characters, NEC selected—IBM extended characters, and IBM selected characters.

Many Japanese users have experienced problems using these extension characters. These problems stem from the following factors:

  • MySQL automatically converts character sets.

  • Character sets are converted using Unicode (ucs2).

  • Thesjischaracter set does not support the conversion of these extension characters.

  • There are several conversion rules from so-calledSHIFT JISto Unicode, and some characters are converted to Unicode differently depending on the conversion rule. MySQL supports only one of these rules (described later).

The MySQLcp932character set is designed to solve these problems.

Because MySQL supports character set conversion, it is important to separate IANAShift_JISandcp932into two different character sets because they provide different conversion rules.

How doescp932differ fromsjis?

Thecp932character set differs fromsjisin the following ways:

  • cp932supports NEC special characters, NEC selected—IBM extended characters, and IBM selected characters.

  • Somecp932characters have two different code points, both of which convert to the same Unicode code point. When converting from Unicode back tocp932, one of the code points must be selected. For thisround trip conversion,the rule recommended by Microsoft is used. (Seehttp://support.microsoft.com/kb/170559/EN-US/.)

    The conversion rule works like this:

    • If the character is in both JIS X 0208 and NEC special characters, use the code point of JIS X 0208.

    • If the character is in both NEC special characters and IBM selected characters, use the code point of NEC special characters.

    • If the character is in both IBM selected characters and NEC selected—IBM extended characters, use the code point of IBM extended characters.

    The table shown athttps://msdn.microsoft.com/en-us/goglobal/cc305152.aspxprovides information about the Unicode values ofcp932characters. Forcp932table entries with characters under which a four-digit number appears, the number represents the corresponding Unicode (ucs2) encoding. For table entries with an underlined two-digit value appears, there is a range ofcp932character values that begin with those two digits. Clicking such a table entry takes you to a page that displays the Unicode value for each of thecp932字符begin with those digits.

    The following links are of special interest. They correspond to the encodings for the following sets of characters:

    • NEC special characters (lead byte0x87):

      https://msdn.microsoft.com/en-us/goglobal/gg674964
    • NEC selected—IBM extended characters (lead byte0xEDand0xEE):

      https://msdn.microsoft.com/en-us/goglobal/gg671837 https://msdn.microsoft.com/en-us/goglobal/gg671838
    • IBM selected characters (lead byte0 xfa,0xFB,0xFC):

      https://msdn.microsoft.com/en-us/goglobal/gg671839 https://msdn.microsoft.com/en-us/goglobal/gg671840 https://msdn.microsoft.com/en-us/goglobal/gg671841
  • cp932supports conversion of user-defined characters in combination witheucjpms, and solves the problems withsjis/ujisconversion. For details, please refer tohttp://www.sljfaq.org/afaq/encodings.html.

For some characters, conversion to and fromucs2is different forsjisandcp932. The following tables illustrate these differences.

Conversion toucs2:

sjis/cp932Value sjis->ucs2Conversion cp932->ucs2Conversion
5C 005C 005C
7E 007E 007E
815C 2015 2015
815F 005C FF3C
8160 301C FF5E
8161 2016 2225
817C 2212 FF0D
8191 00A2 FFE0
8192 00 a3 FFE1
81CA 00AC FFE2

Conversion fromucs2:

ucs2value ucs2->sjisConversion ucs2->cp932Conversion
005C 815F 5C
007E 7E 7E
00A2 8191 3F
00 a3 8192 3F
00AC 81CA 3F
2015 815C 815C
2016 8161 3F
2212 817C 3F
2225 3F 8161
301C 8160 3F
FF0D 3F 817C
FF3C 3F 815F
FF5E 3F 8160
FFE0 3F 8191
FFE1 3F 8192
FFE2 3F 81CA

Users of any Japanese character sets should be aware that using--character-set-client-handshake(or--skip-character-set-client-handshake) has an important effect. SeeSection 5.1.7, “Server Command Options”.