# 24.3.字符集支持

24.3.1. 支持的字符集

24.3.2. 设置角色集

24.3.3. 服务器和客户端之间的自动字符集转换

24.3.4. 可用的字符集转换

24.3.5. 进一步阅读

PostgreSQL中的字符集支持允许以各种字符集(也称为编码)存储文本,包括单字节字符集(如ISO 8859系列)和多字节字符集(如EUC(扩展Unix代码)、UTF-8和Mule内部代码)。客户端可以透明地使用所有受支持的字符集,但服务器内部不支持使用少数字符集(即,作为服务器端编码)。使用初始化PostgreSQL数据库集群时,会选择默认字符集initdb。在创建数据库时可以覆盖它,因此可以有多个数据库,每个数据库具有不同的字符集。

然而,一个重要的限制是,每个数据库的字符集必须与数据库的字符集兼容LC_CTYPE(字符分类)和立法会(字符串排序顺序)区域设置。对于CPOSIX语言环境,任何字符集都是允许的,但对于其他libc提供的语言环境,只有一个字符集可以正常工作。(不过,在Windows上,UTF-8编码可以用于任何语言环境。)如果配置了ICU支持,ICU提供的区域设置可以用于大多数但不是所有的服务器端编码。

# 24.3.1.支持的字符集

表24.1显示可在PostgreSQL中使用的字符集。

表24.1.PostgreSQL字符集

名称 描述 语言 服务器 重症监护室? 字节/​烧焦 化名
大5 五巨头 繁体中文 1-2 WIN950, 视窗950
EUC_CN 扩展 UNIX 代码-CN 简体中文 是的 是的 1-3
EUC_JP 扩展 UNIX 代码-JP 日本人 是的 是的 1-3
EUC_JIS_2004 扩展 UNIX 代码-JP,JIS X 0213 日本人 是的 1-3
EUC_KR 扩展 UNIX 代码-KR 韩国人 是的 是的 1-3
EUC_TW 扩展 UNIX 代码-TW 繁体中文, 台湾话 是的 是的 1-3
GB18030 国家标准 中国人 1-4
GBK 扩展国家标准 简体中文 1-2 WIN936, 视窗936
ISO_8859_5 ISO 8859-5、ECMA 113 拉丁文/西里尔文 是的 是的 1
ISO_8859_6 ISO 8859-6、ECMA 114 拉丁语/阿拉伯语 是的 是的 1
ISO_8859_7 ISO 8859-7、ECMA 118 拉丁语/希腊语 是的 是的 1
ISO_8859_8 ISO 8859-8、ECMA 121 拉丁语/希伯来语 是的 是的 1
乔哈布 乔哈布 韩语(韩文) 1-3
KOI8R KOI8-R 西里尔文(俄语) 是的 是的 1 KOI8
KOI8U KOI8-U 西里尔文(乌克兰文) 是的 是的 1
拉丁语1 ISO 8859-1、ECMA 94 西欧 是的 是的 1 ISO88591
拉丁语2 ISO 8859-2、ECMA 94 中欧 是的 是的 1 ISO88592
拉丁语3 ISO 8859-3、ECMA 94 南欧 是的 是的 1 ISO88593
拉丁语4 ISO 8859-4、ECMA 94 北欧 是的 是的 1 ISO88594
拉丁语5 ISO 8859-9、ECMA 128 土耳其 是的 是的 1 ISO88599
拉丁语6 ISO 8859-10、ECMA 144 北欧的 是的 是的 1 ISO885910
拉丁语7 ISO 8859-13 波罗的海 是的 是的 1 ISO885913
拉丁语8 ISO 8859-14 凯尔特人 是的 是的 1 ISO885914
拉丁语9 ISO 8859-15 带有欧元和口音的 LATIN1 是的 是的 1 ISO885915
拉丁语10 ISO 8859-16,ASRO SR 14111 罗马尼亚语 是的 1 ISO885916
MULE_INTERNAL 骡子内部代码 多语言 Emacs 是的 1-4
SJIS 移位 JIS 日本人 1-2 姆坎吉, ShiftJIS, WIN932, 视窗932
SHIFT_JIS_2004 移位 JIS,JIS X 0213 日本人 1-2
SQL_ASCII 未指定(见正文) 任何 是的 1
全民健康覆盖 统一韩文代码 韩国人 1-2 WIN949, 视窗949
UTF8 Unicode,8 位 全部 是的 是的 1-4 统一码
WIN866 视窗 CP866 西里尔 是的 是的 1 ALT
WIN874 视窗 CP874 泰国 是的 1
WIN1250 视窗 CP1250 中欧 是的 是的 1
WIN1251 视窗 CP1251 西里尔 是的 是的 1
WIN1252 视窗 CP1252 西欧 是的 是的 1
WIN1253 视窗 CP1253 希腊语 是的 是的 1
WIN1254 视窗 CP1254 土耳其 是的 是的 1
WIN1255 视窗 CP1255 希伯来语 是的 是的 1
WIN1256 视窗 CP1256 阿拉伯 是的 是的 1
WIN1257 视窗 CP1257 波罗的海 是的 是的 1
WIN1258 视窗 CP1258 越南语 是的 是的 1 美国广播公司,TCVN,TCVN5712,VSCI

并非所有客户端 API 都支持所有列出的字符集。例如,PostgreSQL JDBC 驱动程序不支持MULE_INTERNAL,拉丁语6,拉丁语8, 和拉丁语10.

SQL_ASCII设置的行为与其他设置有很大不同。当服务器字符集为SQL_ASCII,服务器根据ASCII标准解释字节值0–127,而字节值128–255被视为未解释的字符。设置为0时,不会进行编码转换SQL_ASCII因此,该设置与其说是一种声明,不如说是一种声明,表明对编码一无所知。在大多数情况下,如果使用任何非ASCII数据,使用SQL_ASCII设置,因为PostgreSQL将无法通过转换或验证非ASCII字符来帮助您。

# 24.3.2.设置角色集

initdb定义PostgreSQL群集的默认字符集(编码)。例如

initdb -E EUC_JP

将默认字符集设置为Eu_JP(日语的扩展Unix代码)。你可以用--编码而不是-E如果你喜欢更长的选项字符串。如果没有-E--编码如果有选择权,initdb尝试根据指定或默认区域设置确定要使用的适当编码。

您可以在数据库创建时指定非默认编码,前提是编码与所选区域设置兼容:

createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean

这将创建一个名为韩国人使用角色集的欧克,以及地点库克。另一种方法是使用以下SQL命令:

CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;

请注意,上面的命令指定了复制模板0数据库复制任何其他数据库时,不能更改源数据库的编码和区域设置,因为这可能会导致数据损坏。有关更多信息,请参阅第23.3节.

数据库的编码存储在系统目录中pg_数据库.你可以通过使用psql -l选项还是\l命令

$ psql -l
                                         List of databases
   Name    |  Owner   | Encoding  |  Collation  |    Ctype    |          Access Privileges          
### Important

 On most modern operating systems, PostgreSQL can determine which character set is implied by the `LC_CTYPE` setting, and it will enforce that only the matching database encoding is used. On older systems it is your responsibility to ensure that you use the encoding expected by the locale you have selected. A mistake in this area is likely to lead to strange behavior of locale-dependent operations such as sorting.

PostgreSQL will allow superusers to create databases with `SQL_ASCII` encoding even when `LC_CTYPE` is not `C` or `POSIX`. As noted above, `SQL_ASCII` does not enforce that the data stored in the database has any particular encoding, and so this choice poses risks of locale-dependent misbehavior. Using this combination of settings is deprecated and may someday be forbidden altogether.

### 24.3.3. Automatic Character Set Conversion Between Server and Client

PostgreSQL supports automatic character set conversion between server and client for many combinations of character sets ([Section 24.3.4](multibyte.html#MULTIBYTE-CONVERSIONS-SUPPORTED) shows which ones).

 To enable automatic character set conversion, you have to tell PostgreSQL the character set (encoding) you would like to use in the client. There are several ways to accomplish this:

* Using the `\encoding` command in psql. `\encoding` allows you to change client encoding on the fly. For example, to change the encoding to `SJIS`, type:

\编码SJIS

* libpq ([Section 34.11](libpq-control.html)) has functions to control the client encoding.

* Using `SET client_encoding TO`. Setting the client encoding can be done with this SQL command:

将客户端_编码设置为“值”;

 Also you can use the standard SQL syntax `SET NAMES` for this purpose:

设置名称“值”;

 To query the current client encoding:

显示客户端编码;

 To return to the default encoding:

重置客户端编码;

* Using `PGCLIENTENCODING`. If the environment variable `PGCLIENTENCODING` is defined in the client's environment, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.)

* Using the configuration variable [client\_encoding](runtime-config-client.html#GUC-CLIENT-ENCODING). If the `client_encoding` variable is set, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.)

If the conversion of a particular character is not possible — suppose you chose `EUC_JP` for the server and `LATIN1` for the client, and some Japanese characters are returned that do not have a representation in `LATIN1` — an error is reported.

If the client character set is defined as `SQL_ASCII`, encoding conversion is disabled, regardless of the server's character set. (However, if the server's character set is not `SQL_ASCII`, the server will still check that incoming data is valid for that encoding; so the net effect is as though the client character set were the same as the server's.) Just as for the server, use of `SQL_ASCII` is unwise unless you are working with all-ASCII data.

### 24.3.4. Available Character Set Conversions

PostgreSQL allows conversion between any two character sets for which a conversion function is listed in the [`pg_conversion`](catalog-pg-conversion.html) system catalog. PostgreSQL comes with some predefined conversions, as summarized in [Table 24.2](multibyte.html#MULTIBYTE-TRANSLATION-TABLE) and shown in more detail in [Table 24.3](multibyte.html#BUILTIN-CONVERSIONS-TABLE). You can create a new conversion using the SQL command [CREATE CONVERSION](sql-createconversion.html). (To be used for automatic client/server conversions, a conversion must be marked as “default” for its character set pair.)

**Table 24.2. Built-in Client/Server Character Set Conversions**

|Server Character Set|                                                           Available Client Character Sets                                                           |
|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
|       `BIG5`       |                                                        *not supported as a server encoding*                                                         |
|      `EUC_CN`      |                                                         *EUC\_CN*, `MULE_INTERNAL`, `UTF8`                                                          |
|      `EUC_JP`      |                                                     *EUC\_JP*, `MULE_INTERNAL`, `SJIS`, `UTF8`                                                      |
|   `EUC_JIS_2004`   |                                                     *EUC\_JIS\_2004*, `SHIFT_JIS_2004`, `UTF8`                                                      |
|      `EUC_KR`      |                                                         *EUC\_KR*, `MULE_INTERNAL`, `UTF8`                                                          |
|      `EUC_TW`      |                                                     *EUC\_TW*, `BIG5`, `MULE_INTERNAL`, `UTF8`                                                      |
|     `GB18030`      |                                                        *not supported as a server encoding*                                                         |
|       `GBK`        |                                                        *not supported as a server encoding*                                                         |
|    `ISO_8859_5`    |                                        *ISO\_8859\_5*, `KOI8R`, `MULE_INTERNAL`, `UTF8`, `WIN866`, `WIN1251`                                        |
|    `ISO_8859_6`    |                                                               *ISO\_8859\_6*, `UTF8`                                                                |
|    `ISO_8859_7`    |                                                               *ISO\_8859\_7*, `UTF8`                                                                |
|    `ISO_8859_8`    |                                                               *ISO\_8859\_8*, `UTF8`                                                                |
|      `JOHAB`       |                                                        *not supported as a server encoding*                                                         |
|      `KOI8R`       |                                         *KOI8R*, `ISO_8859_5`, `MULE_INTERNAL`, `UTF8`, `WIN866`, `WIN1251`                                         |
|      `KOI8U`       |                                                                   *KOI8U*, `UTF8`                                                                   |
|      `LATIN1`      |                                                          *LATIN1*, `MULE_INTERNAL`, `UTF8`                                                          |
|      `LATIN2`      |                                                    *LATIN2*, `MULE_INTERNAL`, `UTF8`, `WIN1250`                                                     |
|      `LATIN3`      |                                                          *LATIN3*, `MULE_INTERNAL`, `UTF8`                                                          |
|      `LATIN4`      |                                                          *LATIN4*, `MULE_INTERNAL`, `UTF8`                                                          |
|      `LATIN5`      |                                                                  *LATIN5*, `UTF8`                                                                   |
|      `LATIN6`      |                                                                  *LATIN6*, `UTF8`                                                                   |
|      `LATIN7`      |                                                                  *LATIN7*, `UTF8`                                                                   |
|      `LATIN8`      |                                                                  *LATIN8*, `UTF8`                                                                   |
|      `LATIN9`      |                                                                  *LATIN9*, `UTF8`                                                                   |
|     `LATIN10`      |                                                                  *LATIN10*, `UTF8`                                                                  |
|  `MULE_INTERNAL`   |*MULE\_INTERNAL*, `BIG5`, `EUC_CN`, `EUC_JP`, `EUC_KR`, `EUC_TW`, `ISO_8859_5`, `KOI8R`, `LATIN1` to `LATIN4`, `SJIS`, `WIN866`, `WIN1250`, `WIN1251`|
|       `SJIS`       |                                                        *not supported as a server encoding*                                                         |
|  `SHIFT_JIS_2004`  |                                                        *not supported as a server encoding*                                                         |
|    `SQL_ASCII`     |                                                       *any (no conversion will be performed)*                                                       |
|       `UHC`        |                                                        *not supported as a server encoding*                                                         |
|       `UTF8`       |                                                              *all supported encodings*                                                              |
|      `WIN866`      |                                         *WIN866*, `ISO_8859_5`, `KOI8R`, `MULE_INTERNAL`, `UTF8`, `WIN1251`                                         |
|      `WIN874`      |                                                                  *WIN874*, `UTF8`                                                                   |
|     `WIN1250`      |                                                    *WIN1250*, `LATIN2`, `MULE_INTERNAL`, `UTF8`                                                     |
|     `WIN1251`      |                                         *WIN1251*, `ISO_8859_5`, `KOI8R`, `MULE_INTERNAL`, `UTF8`, `WIN866`                                         |
|     `WIN1252`      |                                                                  *WIN1252*, `UTF8`                                                                  |
|     `WIN1253`      |                                                                  *WIN1253*, `UTF8`                                                                  |
|     `WIN1254`      |                                                                  *WIN1254*, `UTF8`                                                                  |
|     `WIN1255`      |                                                                  *WIN1255*, `UTF8`                                                                  |
|     `WIN1256`      |                                                                  *WIN1256*, `UTF8`                                                                  |
|     `WIN1257`      |                                                                  *WIN1257*, `UTF8`                                                                  |
|     `WIN1258`      |                                                                  *WIN1258*, `UTF8`                                                                  |

**Table 24.3. All Built-in Character Set Conversions**

|                                                                                                                                                             Conversion Name [<sup class="footnote" id="id-1.6.11.5.8.4.2.4.1.1.1">[a]</sup>](#ftn.id-1.6.11.5.8.4.2.4.1.1.1)                                                                                                                                                             |Source Encoding |Destination Encoding|
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|--------------------|
|                                                                                                                                                                                                             `big5_to_euc_tw`                                                                                                                                                                                                             |     `BIG5`     |      `EUC_TW`      |
|                                                                                                                                                                                                              `big5_to_mic`                                                                                                                                                                                                               |     `BIG5`     |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                              `big5_to_utf8`                                                                                                                                                                                                              |     `BIG5`     |       `UTF8`       |
|                                                                                                                                                                                                             `euc_cn_to_mic`                                                                                                                                                                                                              |    `EUC_CN`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                             `euc_cn_to_utf8`                                                                                                                                                                                                             |    `EUC_CN`    |       `UTF8`       |
|                                                                                                                                                                                                             `euc_jp_to_mic`                                                                                                                                                                                                              |    `EUC_JP`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                             `euc_jp_to_sjis`                                                                                                                                                                                                             |    `EUC_JP`    |       `SJIS`       |
|                                                                                                                                                                                                             `euc_jp_to_utf8`                                                                                                                                                                                                             |    `EUC_JP`    |       `UTF8`       |
|                                                                                                                                                                                                             `euc_kr_to_mic`                                                                                                                                                                                                              |    `EUC_KR`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                             `euc_kr_to_utf8`                                                                                                                                                                                                             |    `EUC_KR`    |       `UTF8`       |
|                                                                                                                                                                                                             `euc_tw_to_big5`                                                                                                                                                                                                             |    `EUC_TW`    |       `BIG5`       |
|                                                                                                                                                                                                             `euc_tw_to_mic`                                                                                                                                                                                                              |    `EUC_TW`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                             `euc_tw_to_utf8`                                                                                                                                                                                                             |    `EUC_TW`    |       `UTF8`       |
|                                                                                                                                                                                                            `gb18030_to_utf8`                                                                                                                                                                                                             |   `GB18030`    |       `UTF8`       |
|                                                                                                                                                                                                              `gbk_to_utf8`                                                                                                                                                                                                               |     `GBK`      |       `UTF8`       |
|                                                                                                                                                                                                          `iso_8859_10_to_utf8`                                                                                                                                                                                                           |    `LATIN6`    |       `UTF8`       |
|                                                                                                                                                                                                          `iso_8859_13_to_utf8`                                                                                                                                                                                                           |    `LATIN7`    |       `UTF8`       |
|                                                                                                                                                                                                          `iso_8859_14_to_utf8`                                                                                                                                                                                                           |    `LATIN8`    |       `UTF8`       |
|                                                                                                                                                                                                          `iso_8859_15_to_utf8`                                                                                                                                                                                                           |    `LATIN9`    |       `UTF8`       |
|                                                                                                                                                                                                          `iso_8859_16_to_utf8`                                                                                                                                                                                                           |   `LATIN10`    |       `UTF8`       |
|                                                                                                                                                                                                           `iso_8859_1_to_mic`                                                                                                                                                                                                            |    `LATIN1`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                           `iso_8859_1_to_utf8`                                                                                                                                                                                                           |    `LATIN1`    |       `UTF8`       |
|                                                                                                                                                                                                           `iso_8859_2_to_mic`                                                                                                                                                                                                            |    `LATIN2`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                           `iso_8859_2_to_utf8`                                                                                                                                                                                                           |    `LATIN2`    |       `UTF8`       |
|                                                                                                                                                                                                       `iso_8859_2_to_windows_1250`                                                                                                                                                                                                       |    `LATIN2`    |     `WIN1250`      |
|                                                                                                                                                                                                           `iso_8859_3_to_mic`                                                                                                                                                                                                            |    `LATIN3`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                           `iso_8859_3_to_utf8`                                                                                                                                                                                                           |    `LATIN3`    |       `UTF8`       |
|                                                                                                                                                                                                           `iso_8859_4_to_mic`                                                                                                                                                                                                            |    `LATIN4`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                           `iso_8859_4_to_utf8`                                                                                                                                                                                                           |    `LATIN4`    |       `UTF8`       |
|                                                                                                                                                                                                          `iso_8859_5_to_koi8_r`                                                                                                                                                                                                          |  `ISO_8859_5`  |      `KOI8R`       |
|                                                                                                                                                                                                           `iso_8859_5_to_mic`                                                                                                                                                                                                            |  `ISO_8859_5`  |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                           `iso_8859_5_to_utf8`                                                                                                                                                                                                           |  `ISO_8859_5`  |       `UTF8`       |
|                                                                                                                                                                                                       `iso_8859_5_to_windows_1251`                                                                                                                                                                                                       |  `ISO_8859_5`  |     `WIN1251`      |
|                                                                                                                                                                                                       `iso_8859_5_to_windows_866`                                                                                                                                                                                                        |  `ISO_8859_5`  |      `WIN866`      |
|                                                                                                                                                                                                           `iso_8859_6_to_utf8`                                                                                                                                                                                                           |  `ISO_8859_6`  |       `UTF8`       |
|                                                                                                                                                                                                           `iso_8859_7_to_utf8`                                                                                                                                                                                                           |  `ISO_8859_7`  |       `UTF8`       |
|                                                                                                                                                                                                           `iso_8859_8_to_utf8`                                                                                                                                                                                                           |  `ISO_8859_8`  |       `UTF8`       |
|                                                                                                                                                                                                           `iso_8859_9_to_utf8`                                                                                                                                                                                                           |    `LATIN5`    |       `UTF8`       |
|                                                                                                                                                                                                             `johab_to_utf8`                                                                                                                                                                                                              |    `JOHAB`     |       `UTF8`       |
|                                                                                                                                                                                                          `koi8_r_to_iso_8859_5`                                                                                                                                                                                                          |    `KOI8R`     |    `ISO_8859_5`    |
|                                                                                                                                                                                                             `koi8_r_to_mic`                                                                                                                                                                                                              |    `KOI8R`     |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                             `koi8_r_to_utf8`                                                                                                                                                                                                             |    `KOI8R`     |       `UTF8`       |
|                                                                                                                                                                                                         `koi8_r_to_windows_1251`                                                                                                                                                                                                         |    `KOI8R`     |     `WIN1251`      |
|                                                                                                                                                                                                         `koi8_r_to_windows_866`                                                                                                                                                                                                          |    `KOI8R`     |      `WIN866`      |
|                                                                                                                                                                                                             `koi8_u_to_utf8`                                                                                                                                                                                                             |    `KOI8U`     |       `UTF8`       |
|                                                                                                                                                                                                              `mic_to_big5`                                                                                                                                                                                                               |`MULE_INTERNAL` |       `BIG5`       |
|                                                                                                                                                                                                             `mic_to_euc_cn`                                                                                                                                                                                                              |`MULE_INTERNAL` |      `EUC_CN`      |
|                                                                                                                                                                                                             `mic_to_euc_jp`                                                                                                                                                                                                              |`MULE_INTERNAL` |      `EUC_JP`      |
|                                                                                                                                                                                                             `mic_to_euc_kr`                                                                                                                                                                                                              |`MULE_INTERNAL` |      `EUC_KR`      |
|                                                                                                                                                                                                             `mic_to_euc_tw`                                                                                                                                                                                                              |`MULE_INTERNAL` |      `EUC_TW`      |
|                                                                                                                                                                                                           `mic_to_iso_8859_1`                                                                                                                                                                                                            |`MULE_INTERNAL` |      `LATIN1`      |
|                                                                                                                                                                                                           `mic_to_iso_8859_2`                                                                                                                                                                                                            |`MULE_INTERNAL` |      `LATIN2`      |
|                                                                                                                                                                                                           `mic_to_iso_8859_3`                                                                                                                                                                                                            |`MULE_INTERNAL` |      `LATIN3`      |
|                                                                                                                                                                                                           `mic_to_iso_8859_4`                                                                                                                                                                                                            |`MULE_INTERNAL` |      `LATIN4`      |
|                                                                                                                                                                                                           `mic_to_iso_8859_5`                                                                                                                                                                                                            |`MULE_INTERNAL` |    `ISO_8859_5`    |
|                                                                                                                                                                                                             `mic_to_koi8_r`                                                                                                                                                                                                              |`MULE_INTERNAL` |      `KOI8R`       |
|                                                                                                                                                                                                              `mic_to_sjis`                                                                                                                                                                                                               |`MULE_INTERNAL` |       `SJIS`       |
|                                                                                                                                                                                                          `mic_to_windows_1250`                                                                                                                                                                                                           |`MULE_INTERNAL` |     `WIN1250`      |
|                                                                                                                                                                                                          `mic_to_windows_1251`                                                                                                                                                                                                           |`MULE_INTERNAL` |     `WIN1251`      |
|                                                                                                                                                                                                           `mic_to_windows_866`                                                                                                                                                                                                           |`MULE_INTERNAL` |      `WIN866`      |
|                                                                                                                                                                                                             `sjis_to_euc_jp`                                                                                                                                                                                                             |     `SJIS`     |      `EUC_JP`      |
|                                                                                                                                                                                                              `sjis_to_mic`                                                                                                                                                                                                               |     `SJIS`     |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                              `sjis_to_utf8`                                                                                                                                                                                                              |     `SJIS`     |       `UTF8`       |
|                                                                                                                                                                                                          `windows_1258_to_utf8`                                                                                                                                                                                                          |   `WIN1258`    |       `UTF8`       |
|                                                                                                                                                                                                              `uhc_to_utf8`                                                                                                                                                                                                               |     `UHC`      |       `UTF8`       |
|                                                                                                                                                                                                              `utf8_to_big5`                                                                                                                                                                                                              |     `UTF8`     |       `BIG5`       |
|                                                                                                                                                                                                             `utf8_to_euc_cn`                                                                                                                                                                                                             |     `UTF8`     |      `EUC_CN`      |
|                                                                                                                                                                                                             `utf8_to_euc_jp`                                                                                                                                                                                                             |     `UTF8`     |      `EUC_JP`      |
|                                                                                                                                                                                                             `utf8_to_euc_kr`                                                                                                                                                                                                             |     `UTF8`     |      `EUC_KR`      |
|                                                                                                                                                                                                             `utf8_to_euc_tw`                                                                                                                                                                                                             |     `UTF8`     |      `EUC_TW`      |
|                                                                                                                                                                                                            `utf8_to_gb18030`                                                                                                                                                                                                             |     `UTF8`     |     `GB18030`      |
|                                                                                                                                                                                                              `utf8_to_gbk`                                                                                                                                                                                                               |     `UTF8`     |       `GBK`        |
|                                                                                                                                                                                                           `utf8_to_iso_8859_1`                                                                                                                                                                                                           |     `UTF8`     |      `LATIN1`      |
|                                                                                                                                                                                                          `utf8_to_iso_8859_10`                                                                                                                                                                                                           |     `UTF8`     |      `LATIN6`      |
|                                                                                                                                                                                                          `utf8_to_iso_8859_13`                                                                                                                                                                                                           |     `UTF8`     |      `LATIN7`      |
|                                                                                                                                                                                                          `utf8_to_iso_8859_14`                                                                                                                                                                                                           |     `UTF8`     |      `LATIN8`      |
|                                                                                                                                                                                                          `utf8_to_iso_8859_15`                                                                                                                                                                                                           |     `UTF8`     |      `LATIN9`      |
|                                                                                                                                                                                                          `utf8_to_iso_8859_16`                                                                                                                                                                                                           |     `UTF8`     |     `LATIN10`      |
|                                                                                                                                                                                                           `utf8_to_iso_8859_2`                                                                                                                                                                                                           |     `UTF8`     |      `LATIN2`      |
|                                                                                                                                                                                                           `utf8_to_iso_8859_3`                                                                                                                                                                                                           |     `UTF8`     |      `LATIN3`      |
|                                                                                                                                                                                                           `utf8_to_iso_8859_4`                                                                                                                                                                                                           |     `UTF8`     |      `LATIN4`      |
|                                                                                                                                                                                                           `utf8_to_iso_8859_5`                                                                                                                                                                                                           |     `UTF8`     |    `ISO_8859_5`    |
|                                                                                                                                                                                                           `utf8_to_iso_8859_6`                                                                                                                                                                                                           |     `UTF8`     |    `ISO_8859_6`    |
|                                                                                                                                                                                                           `utf8_to_iso_8859_7`                                                                                                                                                                                                           |     `UTF8`     |    `ISO_8859_7`    |
|                                                                                                                                                                                                           `utf8_to_iso_8859_8`                                                                                                                                                                                                           |     `UTF8`     |    `ISO_8859_8`    |
|                                                                                                                                                                                                           `utf8_to_iso_8859_9`                                                                                                                                                                                                           |     `UTF8`     |      `LATIN5`      |
|                                                                                                                                                                                                             `utf8_to_johab`                                                                                                                                                                                                              |     `UTF8`     |      `JOHAB`       |
|                                                                                                                                                                                                             `utf8_to_koi8_r`                                                                                                                                                                                                             |     `UTF8`     |      `KOI8R`       |
|                                                                                                                                                                                                             `utf8_to_koi8_u`                                                                                                                                                                                                             |     `UTF8`     |      `KOI8U`       |
|                                                                                                                                                                                                              `utf8_to_sjis`                                                                                                                                                                                                              |     `UTF8`     |       `SJIS`       |
|                                                                                                                                                                                                          `utf8_to_windows_1258`                                                                                                                                                                                                          |     `UTF8`     |     `WIN1258`      |
|                                                                                                                                                                                                              `utf8_to_uhc`                                                                                                                                                                                                               |     `UTF8`     |       `UHC`        |
|                                                                                                                                                                                                          `utf8_to_windows_1250`                                                                                                                                                                                                          |     `UTF8`     |     `WIN1250`      |
|                                                                                                                                                                                                          `utf8_to_windows_1251`                                                                                                                                                                                                          |     `UTF8`     |     `WIN1251`      |
|                                                                                                                                                                                                          `utf8_to_windows_1252`                                                                                                                                                                                                          |     `UTF8`     |     `WIN1252`      |
|                                                                                                                                                                                                          `utf8_to_windows_1253`                                                                                                                                                                                                          |     `UTF8`     |     `WIN1253`      |
|                                                                                                                                                                                                          `utf8_to_windows_1254`                                                                                                                                                                                                          |     `UTF8`     |     `WIN1254`      |
|                                                                                                                                                                                                          `utf8_to_windows_1255`                                                                                                                                                                                                          |     `UTF8`     |     `WIN1255`      |
|                                                                                                                                                                                                          `utf8_to_windows_1256`                                                                                                                                                                                                          |     `UTF8`     |     `WIN1256`      |
|                                                                                                                                                                                                          `utf8_to_windows_1257`                                                                                                                                                                                                          |     `UTF8`     |     `WIN1257`      |
|                                                                                                                                                                                                          `utf8_to_windows_866`                                                                                                                                                                                                           |     `UTF8`     |      `WIN866`      |
|                                                                                                                                                                                                          `utf8_to_windows_874`                                                                                                                                                                                                           |     `UTF8`     |      `WIN874`      |
|                                                                                                                                                                                                       `windows_1250_to_iso_8859_2`                                                                                                                                                                                                       |   `WIN1250`    |      `LATIN2`      |
|                                                                                                                                                                                                          `windows_1250_to_mic`                                                                                                                                                                                                           |   `WIN1250`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                          `windows_1250_to_utf8`                                                                                                                                                                                                          |   `WIN1250`    |       `UTF8`       |
|                                                                                                                                                                                                       `windows_1251_to_iso_8859_5`                                                                                                                                                                                                       |   `WIN1251`    |    `ISO_8859_5`    |
|                                                                                                                                                                                                         `windows_1251_to_koi8_r`                                                                                                                                                                                                         |   `WIN1251`    |      `KOI8R`       |
|                                                                                                                                                                                                          `windows_1251_to_mic`                                                                                                                                                                                                           |   `WIN1251`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                          `windows_1251_to_utf8`                                                                                                                                                                                                          |   `WIN1251`    |       `UTF8`       |
|                                                                                                                                                                                                      `windows_1251_to_windows_866`                                                                                                                                                                                                       |   `WIN1251`    |      `WIN866`      |
|                                                                                                                                                                                                          `windows_1252_to_utf8`                                                                                                                                                                                                          |   `WIN1252`    |       `UTF8`       |
|                                                                                                                                                                                                          `windows_1256_to_utf8`                                                                                                                                                                                                          |   `WIN1256`    |       `UTF8`       |
|                                                                                                                                                                                                       `windows_866_to_iso_8859_5`                                                                                                                                                                                                        |    `WIN866`    |    `ISO_8859_5`    |
|                                                                                                                                                                                                         `windows_866_to_koi8_r`                                                                                                                                                                                                          |    `WIN866`    |      `KOI8R`       |
|                                                                                                                                                                                                           `windows_866_to_mic`                                                                                                                                                                                                           |    `WIN866`    |  `MULE_INTERNAL`   |
|                                                                                                                                                                                                          `windows_866_to_utf8`                                                                                                                                                                                                           |    `WIN866`    |       `UTF8`       |
|                                                                                                                                                                                                      `windows_866_to_windows_1251`                                                                                                                                                                                                       |    `WIN866`    |       `WIN`        |
|                                                                                                                                                                                                          `windows_874_to_utf8`                                                                                                                                                                                                           |    `WIN874`    |       `UTF8`       |
|                                                                                                                                                                                                          `euc_jis_2004_to_utf8`                                                                                                                                                                                                          | `EUC_JIS_2004` |       `UTF8`       |
|                                                                                                                                                                                                          `utf8_to_euc_jis_2004`                                                                                                                                                                                                          |     `UTF8`     |   `EUC_JIS_2004`   |
|                                                                                                                                                                                                         `shift_jis_2004_to_utf8`                                                                                                                                                                                                         |`SHIFT_JIS_2004`|       `UTF8`       |
|                                                                                                                                                                                                         `utf8_to_shift_jis_2004`                                                                                                                                                                                                         |     `UTF8`     |  `SHIFT_JIS_2004`  |
|                                                                                                                                                                                                     `euc_jis_2004_to_shift_jis_2004`                                                                                                                                                                                                     | `EUC_JIS_2004` |  `SHIFT_JIS_2004`  |
|                                                                                                                                                                                                     `shift_jis_2004_to_euc_jis_2004`                                                                                                                                                                                                     |`SHIFT_JIS_2004`|   `EUC_JIS_2004`   |
|[<sup class="para">[a] </sup>](#id-1.6.11.5.8.4.2.4.1.1.1) The conversion names follow a standard naming scheme: The official name of the source encoding with all non-alphanumeric characters replaced by underscores, followed by `_to_`, followed by the similarly processed destination encoding name. Therefore, these names sometimes deviate from the customary encoding names shown in [Table 24.1](multibyte.html#CHARSET-TABLE).|                |                    |

### 24.3.5. Further Reading

These are good sources to start learning about various kinds of encoding systems.

*CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing*

Contains detailed explanations of `EUC_JP`, `EUC_CN`, `EUC_KR`, `EUC_TW`.

[https://www.unicode.org/](https://www.unicode.org/)

The web site of the Unicode Consortium.

[RFC 3629](https://tools.ietf.org/html/rfc3629)

UTF-8 (8-bit UCS/Unicode Transformation Format) is defined here.