# 24.3.字符集支持
PostgreSQL中的字符集支持允许以各种字符集(也称为编码)存储文本,包括单字节字符集(如ISO 8859系列)和多字节字符集(如EUC(扩展Unix代码)、UTF-8和Mule内部代码)。客户端可以透明地使用所有受支持的字符集,但服务器内部不支持使用少数字符集(即,作为服务器端编码)。使用初始化PostgreSQL数据库集群时,会选择默认字符集initdb
。在创建数据库时可以覆盖它,因此可以有多个数据库,每个数据库具有不同的字符集。
然而,一个重要的限制是,每个数据库的字符集必须与数据库的字符集兼容LC_CTYPE
(字符分类)和立法会
(字符串排序顺序)区域设置。对于C
或POSIX
语言环境,任何字符集都是允许的,但对于其他libc提供的语言环境,只有一个字符集可以正常工作。(不过,在Windows上,UTF-8编码可以用于任何语言环境。)如果配置了ICU支持,ICU提供的区域设置可以用于大多数但不是所有的服务器端编码。
# 24.3.1.支持的字符集
表24.1显示可在PostgreSQL中使用的字符集。
表24.1.PostgreSQL字符集
名称 | 描述 | 语言 | 服务器 | 重症监护室? | 字节/烧焦 | 化名 |
---|---|---|---|---|---|---|
大5 | 五巨头 | 繁体中文 | 不 | 不 | 1-2 | WIN950 , 视窗950 |
EUC_CN | 扩展 UNIX 代码-CN | 简体中文 | 是的 | 是的 | 1-3 | |
EUC_JP | 扩展 UNIX 代码-JP | 日本人 | 是的 | 是的 | 1-3 | |
EUC_JIS_2004 | 扩展 UNIX 代码-JP,JIS X 0213 | 日本人 | 是的 | 不 | 1-3 | |
EUC_KR | 扩展 UNIX 代码-KR | 韩国人 | 是的 | 是的 | 1-3 | |
EUC_TW | 扩展 UNIX 代码-TW | 繁体中文, 台湾话 | 是的 | 是的 | 1-3 | |
GB18030 | 国家标准 | 中国人 | 不 | 不 | 1-4 | |
GBK | 扩展国家标准 | 简体中文 | 不 | 不 | 1-2 | WIN936 , 视窗936 |
ISO_8859_5 | ISO 8859-5、ECMA 113 | 拉丁文/西里尔文 | 是的 | 是的 | 1 | |
ISO_8859_6 | ISO 8859-6、ECMA 114 | 拉丁语/阿拉伯语 | 是的 | 是的 | 1 | |
ISO_8859_7 | ISO 8859-7、ECMA 118 | 拉丁语/希腊语 | 是的 | 是的 | 1 | |
ISO_8859_8 | ISO 8859-8、ECMA 121 | 拉丁语/希伯来语 | 是的 | 是的 | 1 | |
乔哈布 | 乔哈布 | 韩语(韩文) | 不 | 不 | 1-3 | |
KOI8R | KOI8-R | 西里尔文(俄语) | 是的 | 是的 | 1 | KOI8 |
KOI8U | KOI8-U | 西里尔文(乌克兰文) | 是的 | 是的 | 1 | |
拉丁语1 | ISO 8859-1、ECMA 94 | 西欧 | 是的 | 是的 | 1 | ISO88591 |
拉丁语2 | ISO 8859-2、ECMA 94 | 中欧 | 是的 | 是的 | 1 | ISO88592 |
拉丁语3 | ISO 8859-3、ECMA 94 | 南欧 | 是的 | 是的 | 1 | ISO88593 |
拉丁语4 | ISO 8859-4、ECMA 94 | 北欧 | 是的 | 是的 | 1 | ISO88594 |
拉丁语5 | ISO 8859-9、ECMA 128 | 土耳其 | 是的 | 是的 | 1 | ISO88599 |
拉丁语6 | ISO 8859-10、ECMA 144 | 北欧的 | 是的 | 是的 | 1 | ISO885910 |
拉丁语7 | ISO 8859-13 | 波罗的海 | 是的 | 是的 | 1 | ISO885913 |
拉丁语8 | ISO 8859-14 | 凯尔特人 | 是的 | 是的 | 1 | ISO885914 |
拉丁语9 | ISO 8859-15 | 带有欧元和口音的 LATIN1 | 是的 | 是的 | 1 | ISO885915 |
拉丁语10 | ISO 8859-16,ASRO SR 14111 | 罗马尼亚语 | 是的 | 不 | 1 | ISO885916 |
MULE_INTERNAL | 骡子内部代码 | 多语言 Emacs | 是的 | 不 | 1-4 | |
SJIS | 移位 JIS | 日本人 | 不 | 不 | 1-2 | 姆坎吉 , ShiftJIS , WIN932 , 视窗932 |
SHIFT_JIS_2004 | 移位 JIS,JIS X 0213 | 日本人 | 不 | 不 | 1-2 | |
SQL_ASCII | 未指定(见正文) | 任何 | 是的 | 不 | 1 | |
全民健康覆盖 | 统一韩文代码 | 韩国人 | 不 | 不 | 1-2 | WIN949 , 视窗949 |
UTF8 | Unicode,8 位 | 全部 | 是的 | 是的 | 1-4 | 统一码 |
WIN866 | 视窗 CP866 | 西里尔 | 是的 | 是的 | 1 | ALT |
WIN874 | 视窗 CP874 | 泰国 | 是的 | 不 | 1 | |
WIN1250 | 视窗 CP1250 | 中欧 | 是的 | 是的 | 1 | |
WIN1251 | 视窗 CP1251 | 西里尔 | 是的 | 是的 | 1 | 赢 |
WIN1252 | 视窗 CP1252 | 西欧 | 是的 | 是的 | 1 | |
WIN1253 | 视窗 CP1253 | 希腊语 | 是的 | 是的 | 1 | |
WIN1254 | 视窗 CP1254 | 土耳其 | 是的 | 是的 | 1 | |
WIN1255 | 视窗 CP1255 | 希伯来语 | 是的 | 是的 | 1 | |
WIN1256 | 视窗 CP1256 | 阿拉伯 | 是的 | 是的 | 1 | |
WIN1257 | 视窗 CP1257 | 波罗的海 | 是的 | 是的 | 1 | |
WIN1258 | 视窗 CP1258 | 越南语 | 是的 | 是的 | 1 | 美国广播公司 ,TCVN ,TCVN5712 ,VSCI |
并非所有客户端 API 都支持所有列出的字符集。例如,PostgreSQL JDBC 驱动程序不支持MULE_INTERNAL
,拉丁语6
,拉丁语8
, 和拉丁语10
.
这SQL_ASCII
设置的行为与其他设置有很大不同。当服务器字符集为SQL_ASCII
,服务器根据ASCII标准解释字节值0–127,而字节值128–255被视为未解释的字符。设置为0时,不会进行编码转换SQL_ASCII
因此,该设置与其说是一种声明,不如说是一种声明,表明对编码一无所知。在大多数情况下,如果使用任何非ASCII数据,使用SQL_ASCII
设置,因为PostgreSQL将无法通过转换或验证非ASCII字符来帮助您。
# 24.3.2.设置角色集
initdb
定义PostgreSQL群集的默认字符集(编码)。例如
initdb -E EUC_JP
将默认字符集设置为Eu_JP
(日语的扩展Unix代码)。你可以用--编码
而不是-E
如果你喜欢更长的选项字符串。如果没有-E
或--编码
如果有选择权,initdb
尝试根据指定或默认区域设置确定要使用的适当编码。
您可以在数据库创建时指定非默认编码,前提是编码与所选区域设置兼容:
createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
这将创建一个名为韩国人
使用角色集的欧克
,以及地点库克
。另一种方法是使用以下SQL命令:
CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
请注意,上面的命令指定了复制模板0
数据库复制任何其他数据库时,不能更改源数据库的编码和区域设置,因为这可能会导致数据损坏。有关更多信息,请参阅第23.3节.
数据库的编码存储在系统目录中pg_数据库
.你可以通过使用psql
-l
选项还是\l
命令
$ psql -l
List of databases
Name | Owner | Encoding | Collation | Ctype | Access Privileges
### Important
On most modern operating systems, PostgreSQL can determine which character set is implied by the `LC_CTYPE` setting, and it will enforce that only the matching database encoding is used. On older systems it is your responsibility to ensure that you use the encoding expected by the locale you have selected. A mistake in this area is likely to lead to strange behavior of locale-dependent operations such as sorting.
PostgreSQL will allow superusers to create databases with `SQL_ASCII` encoding even when `LC_CTYPE` is not `C` or `POSIX`. As noted above, `SQL_ASCII` does not enforce that the data stored in the database has any particular encoding, and so this choice poses risks of locale-dependent misbehavior. Using this combination of settings is deprecated and may someday be forbidden altogether.
### 24.3.3. Automatic Character Set Conversion Between Server and Client
PostgreSQL supports automatic character set conversion between server and client for many combinations of character sets ([Section 24.3.4](multibyte.html#MULTIBYTE-CONVERSIONS-SUPPORTED) shows which ones).
To enable automatic character set conversion, you have to tell PostgreSQL the character set (encoding) you would like to use in the client. There are several ways to accomplish this:
* Using the `\encoding` command in psql. `\encoding` allows you to change client encoding on the fly. For example, to change the encoding to `SJIS`, type:
\编码SJIS
* libpq ([Section 34.11](libpq-control.html)) has functions to control the client encoding.
* Using `SET client_encoding TO`. Setting the client encoding can be done with this SQL command:
将客户端_编码设置为“值”;
Also you can use the standard SQL syntax `SET NAMES` for this purpose:
设置名称“值”;
To query the current client encoding:
显示客户端编码;
To return to the default encoding:
重置客户端编码;
* Using `PGCLIENTENCODING`. If the environment variable `PGCLIENTENCODING` is defined in the client's environment, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.)
* Using the configuration variable [client\_encoding](runtime-config-client.html#GUC-CLIENT-ENCODING). If the `client_encoding` variable is set, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.)
If the conversion of a particular character is not possible — suppose you chose `EUC_JP` for the server and `LATIN1` for the client, and some Japanese characters are returned that do not have a representation in `LATIN1` — an error is reported.
If the client character set is defined as `SQL_ASCII`, encoding conversion is disabled, regardless of the server's character set. (However, if the server's character set is not `SQL_ASCII`, the server will still check that incoming data is valid for that encoding; so the net effect is as though the client character set were the same as the server's.) Just as for the server, use of `SQL_ASCII` is unwise unless you are working with all-ASCII data.
### 24.3.4. Available Character Set Conversions
PostgreSQL allows conversion between any two character sets for which a conversion function is listed in the [`pg_conversion`](catalog-pg-conversion.html) system catalog. PostgreSQL comes with some predefined conversions, as summarized in [Table 24.2](multibyte.html#MULTIBYTE-TRANSLATION-TABLE) and shown in more detail in [Table 24.3](multibyte.html#BUILTIN-CONVERSIONS-TABLE). You can create a new conversion using the SQL command [CREATE CONVERSION](sql-createconversion.html). (To be used for automatic client/server conversions, a conversion must be marked as “default” for its character set pair.)
**Table 24.2. Built-in Client/Server Character Set Conversions**
|Server Character Set| Available Client Character Sets |
|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| `BIG5` | *not supported as a server encoding* |
| `EUC_CN` | *EUC\_CN*, `MULE_INTERNAL`, `UTF8` |
| `EUC_JP` | *EUC\_JP*, `MULE_INTERNAL`, `SJIS`, `UTF8` |
| `EUC_JIS_2004` | *EUC\_JIS\_2004*, `SHIFT_JIS_2004`, `UTF8` |
| `EUC_KR` | *EUC\_KR*, `MULE_INTERNAL`, `UTF8` |
| `EUC_TW` | *EUC\_TW*, `BIG5`, `MULE_INTERNAL`, `UTF8` |
| `GB18030` | *not supported as a server encoding* |
| `GBK` | *not supported as a server encoding* |
| `ISO_8859_5` | *ISO\_8859\_5*, `KOI8R`, `MULE_INTERNAL`, `UTF8`, `WIN866`, `WIN1251` |
| `ISO_8859_6` | *ISO\_8859\_6*, `UTF8` |
| `ISO_8859_7` | *ISO\_8859\_7*, `UTF8` |
| `ISO_8859_8` | *ISO\_8859\_8*, `UTF8` |
| `JOHAB` | *not supported as a server encoding* |
| `KOI8R` | *KOI8R*, `ISO_8859_5`, `MULE_INTERNAL`, `UTF8`, `WIN866`, `WIN1251` |
| `KOI8U` | *KOI8U*, `UTF8` |
| `LATIN1` | *LATIN1*, `MULE_INTERNAL`, `UTF8` |
| `LATIN2` | *LATIN2*, `MULE_INTERNAL`, `UTF8`, `WIN1250` |
| `LATIN3` | *LATIN3*, `MULE_INTERNAL`, `UTF8` |
| `LATIN4` | *LATIN4*, `MULE_INTERNAL`, `UTF8` |
| `LATIN5` | *LATIN5*, `UTF8` |
| `LATIN6` | *LATIN6*, `UTF8` |
| `LATIN7` | *LATIN7*, `UTF8` |
| `LATIN8` | *LATIN8*, `UTF8` |
| `LATIN9` | *LATIN9*, `UTF8` |
| `LATIN10` | *LATIN10*, `UTF8` |
| `MULE_INTERNAL` |*MULE\_INTERNAL*, `BIG5`, `EUC_CN`, `EUC_JP`, `EUC_KR`, `EUC_TW`, `ISO_8859_5`, `KOI8R`, `LATIN1` to `LATIN4`, `SJIS`, `WIN866`, `WIN1250`, `WIN1251`|
| `SJIS` | *not supported as a server encoding* |
| `SHIFT_JIS_2004` | *not supported as a server encoding* |
| `SQL_ASCII` | *any (no conversion will be performed)* |
| `UHC` | *not supported as a server encoding* |
| `UTF8` | *all supported encodings* |
| `WIN866` | *WIN866*, `ISO_8859_5`, `KOI8R`, `MULE_INTERNAL`, `UTF8`, `WIN1251` |
| `WIN874` | *WIN874*, `UTF8` |
| `WIN1250` | *WIN1250*, `LATIN2`, `MULE_INTERNAL`, `UTF8` |
| `WIN1251` | *WIN1251*, `ISO_8859_5`, `KOI8R`, `MULE_INTERNAL`, `UTF8`, `WIN866` |
| `WIN1252` | *WIN1252*, `UTF8` |
| `WIN1253` | *WIN1253*, `UTF8` |
| `WIN1254` | *WIN1254*, `UTF8` |
| `WIN1255` | *WIN1255*, `UTF8` |
| `WIN1256` | *WIN1256*, `UTF8` |
| `WIN1257` | *WIN1257*, `UTF8` |
| `WIN1258` | *WIN1258*, `UTF8` |
**Table 24.3. All Built-in Character Set Conversions**
| Conversion Name [<sup class="footnote" id="id-1.6.11.5.8.4.2.4.1.1.1">[a]</sup>](#ftn.id-1.6.11.5.8.4.2.4.1.1.1) |Source Encoding |Destination Encoding|
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|--------------------|
| `big5_to_euc_tw` | `BIG5` | `EUC_TW` |
| `big5_to_mic` | `BIG5` | `MULE_INTERNAL` |
| `big5_to_utf8` | `BIG5` | `UTF8` |
| `euc_cn_to_mic` | `EUC_CN` | `MULE_INTERNAL` |
| `euc_cn_to_utf8` | `EUC_CN` | `UTF8` |
| `euc_jp_to_mic` | `EUC_JP` | `MULE_INTERNAL` |
| `euc_jp_to_sjis` | `EUC_JP` | `SJIS` |
| `euc_jp_to_utf8` | `EUC_JP` | `UTF8` |
| `euc_kr_to_mic` | `EUC_KR` | `MULE_INTERNAL` |
| `euc_kr_to_utf8` | `EUC_KR` | `UTF8` |
| `euc_tw_to_big5` | `EUC_TW` | `BIG5` |
| `euc_tw_to_mic` | `EUC_TW` | `MULE_INTERNAL` |
| `euc_tw_to_utf8` | `EUC_TW` | `UTF8` |
| `gb18030_to_utf8` | `GB18030` | `UTF8` |
| `gbk_to_utf8` | `GBK` | `UTF8` |
| `iso_8859_10_to_utf8` | `LATIN6` | `UTF8` |
| `iso_8859_13_to_utf8` | `LATIN7` | `UTF8` |
| `iso_8859_14_to_utf8` | `LATIN8` | `UTF8` |
| `iso_8859_15_to_utf8` | `LATIN9` | `UTF8` |
| `iso_8859_16_to_utf8` | `LATIN10` | `UTF8` |
| `iso_8859_1_to_mic` | `LATIN1` | `MULE_INTERNAL` |
| `iso_8859_1_to_utf8` | `LATIN1` | `UTF8` |
| `iso_8859_2_to_mic` | `LATIN2` | `MULE_INTERNAL` |
| `iso_8859_2_to_utf8` | `LATIN2` | `UTF8` |
| `iso_8859_2_to_windows_1250` | `LATIN2` | `WIN1250` |
| `iso_8859_3_to_mic` | `LATIN3` | `MULE_INTERNAL` |
| `iso_8859_3_to_utf8` | `LATIN3` | `UTF8` |
| `iso_8859_4_to_mic` | `LATIN4` | `MULE_INTERNAL` |
| `iso_8859_4_to_utf8` | `LATIN4` | `UTF8` |
| `iso_8859_5_to_koi8_r` | `ISO_8859_5` | `KOI8R` |
| `iso_8859_5_to_mic` | `ISO_8859_5` | `MULE_INTERNAL` |
| `iso_8859_5_to_utf8` | `ISO_8859_5` | `UTF8` |
| `iso_8859_5_to_windows_1251` | `ISO_8859_5` | `WIN1251` |
| `iso_8859_5_to_windows_866` | `ISO_8859_5` | `WIN866` |
| `iso_8859_6_to_utf8` | `ISO_8859_6` | `UTF8` |
| `iso_8859_7_to_utf8` | `ISO_8859_7` | `UTF8` |
| `iso_8859_8_to_utf8` | `ISO_8859_8` | `UTF8` |
| `iso_8859_9_to_utf8` | `LATIN5` | `UTF8` |
| `johab_to_utf8` | `JOHAB` | `UTF8` |
| `koi8_r_to_iso_8859_5` | `KOI8R` | `ISO_8859_5` |
| `koi8_r_to_mic` | `KOI8R` | `MULE_INTERNAL` |
| `koi8_r_to_utf8` | `KOI8R` | `UTF8` |
| `koi8_r_to_windows_1251` | `KOI8R` | `WIN1251` |
| `koi8_r_to_windows_866` | `KOI8R` | `WIN866` |
| `koi8_u_to_utf8` | `KOI8U` | `UTF8` |
| `mic_to_big5` |`MULE_INTERNAL` | `BIG5` |
| `mic_to_euc_cn` |`MULE_INTERNAL` | `EUC_CN` |
| `mic_to_euc_jp` |`MULE_INTERNAL` | `EUC_JP` |
| `mic_to_euc_kr` |`MULE_INTERNAL` | `EUC_KR` |
| `mic_to_euc_tw` |`MULE_INTERNAL` | `EUC_TW` |
| `mic_to_iso_8859_1` |`MULE_INTERNAL` | `LATIN1` |
| `mic_to_iso_8859_2` |`MULE_INTERNAL` | `LATIN2` |
| `mic_to_iso_8859_3` |`MULE_INTERNAL` | `LATIN3` |
| `mic_to_iso_8859_4` |`MULE_INTERNAL` | `LATIN4` |
| `mic_to_iso_8859_5` |`MULE_INTERNAL` | `ISO_8859_5` |
| `mic_to_koi8_r` |`MULE_INTERNAL` | `KOI8R` |
| `mic_to_sjis` |`MULE_INTERNAL` | `SJIS` |
| `mic_to_windows_1250` |`MULE_INTERNAL` | `WIN1250` |
| `mic_to_windows_1251` |`MULE_INTERNAL` | `WIN1251` |
| `mic_to_windows_866` |`MULE_INTERNAL` | `WIN866` |
| `sjis_to_euc_jp` | `SJIS` | `EUC_JP` |
| `sjis_to_mic` | `SJIS` | `MULE_INTERNAL` |
| `sjis_to_utf8` | `SJIS` | `UTF8` |
| `windows_1258_to_utf8` | `WIN1258` | `UTF8` |
| `uhc_to_utf8` | `UHC` | `UTF8` |
| `utf8_to_big5` | `UTF8` | `BIG5` |
| `utf8_to_euc_cn` | `UTF8` | `EUC_CN` |
| `utf8_to_euc_jp` | `UTF8` | `EUC_JP` |
| `utf8_to_euc_kr` | `UTF8` | `EUC_KR` |
| `utf8_to_euc_tw` | `UTF8` | `EUC_TW` |
| `utf8_to_gb18030` | `UTF8` | `GB18030` |
| `utf8_to_gbk` | `UTF8` | `GBK` |
| `utf8_to_iso_8859_1` | `UTF8` | `LATIN1` |
| `utf8_to_iso_8859_10` | `UTF8` | `LATIN6` |
| `utf8_to_iso_8859_13` | `UTF8` | `LATIN7` |
| `utf8_to_iso_8859_14` | `UTF8` | `LATIN8` |
| `utf8_to_iso_8859_15` | `UTF8` | `LATIN9` |
| `utf8_to_iso_8859_16` | `UTF8` | `LATIN10` |
| `utf8_to_iso_8859_2` | `UTF8` | `LATIN2` |
| `utf8_to_iso_8859_3` | `UTF8` | `LATIN3` |
| `utf8_to_iso_8859_4` | `UTF8` | `LATIN4` |
| `utf8_to_iso_8859_5` | `UTF8` | `ISO_8859_5` |
| `utf8_to_iso_8859_6` | `UTF8` | `ISO_8859_6` |
| `utf8_to_iso_8859_7` | `UTF8` | `ISO_8859_7` |
| `utf8_to_iso_8859_8` | `UTF8` | `ISO_8859_8` |
| `utf8_to_iso_8859_9` | `UTF8` | `LATIN5` |
| `utf8_to_johab` | `UTF8` | `JOHAB` |
| `utf8_to_koi8_r` | `UTF8` | `KOI8R` |
| `utf8_to_koi8_u` | `UTF8` | `KOI8U` |
| `utf8_to_sjis` | `UTF8` | `SJIS` |
| `utf8_to_windows_1258` | `UTF8` | `WIN1258` |
| `utf8_to_uhc` | `UTF8` | `UHC` |
| `utf8_to_windows_1250` | `UTF8` | `WIN1250` |
| `utf8_to_windows_1251` | `UTF8` | `WIN1251` |
| `utf8_to_windows_1252` | `UTF8` | `WIN1252` |
| `utf8_to_windows_1253` | `UTF8` | `WIN1253` |
| `utf8_to_windows_1254` | `UTF8` | `WIN1254` |
| `utf8_to_windows_1255` | `UTF8` | `WIN1255` |
| `utf8_to_windows_1256` | `UTF8` | `WIN1256` |
| `utf8_to_windows_1257` | `UTF8` | `WIN1257` |
| `utf8_to_windows_866` | `UTF8` | `WIN866` |
| `utf8_to_windows_874` | `UTF8` | `WIN874` |
| `windows_1250_to_iso_8859_2` | `WIN1250` | `LATIN2` |
| `windows_1250_to_mic` | `WIN1250` | `MULE_INTERNAL` |
| `windows_1250_to_utf8` | `WIN1250` | `UTF8` |
| `windows_1251_to_iso_8859_5` | `WIN1251` | `ISO_8859_5` |
| `windows_1251_to_koi8_r` | `WIN1251` | `KOI8R` |
| `windows_1251_to_mic` | `WIN1251` | `MULE_INTERNAL` |
| `windows_1251_to_utf8` | `WIN1251` | `UTF8` |
| `windows_1251_to_windows_866` | `WIN1251` | `WIN866` |
| `windows_1252_to_utf8` | `WIN1252` | `UTF8` |
| `windows_1256_to_utf8` | `WIN1256` | `UTF8` |
| `windows_866_to_iso_8859_5` | `WIN866` | `ISO_8859_5` |
| `windows_866_to_koi8_r` | `WIN866` | `KOI8R` |
| `windows_866_to_mic` | `WIN866` | `MULE_INTERNAL` |
| `windows_866_to_utf8` | `WIN866` | `UTF8` |
| `windows_866_to_windows_1251` | `WIN866` | `WIN` |
| `windows_874_to_utf8` | `WIN874` | `UTF8` |
| `euc_jis_2004_to_utf8` | `EUC_JIS_2004` | `UTF8` |
| `utf8_to_euc_jis_2004` | `UTF8` | `EUC_JIS_2004` |
| `shift_jis_2004_to_utf8` |`SHIFT_JIS_2004`| `UTF8` |
| `utf8_to_shift_jis_2004` | `UTF8` | `SHIFT_JIS_2004` |
| `euc_jis_2004_to_shift_jis_2004` | `EUC_JIS_2004` | `SHIFT_JIS_2004` |
| `shift_jis_2004_to_euc_jis_2004` |`SHIFT_JIS_2004`| `EUC_JIS_2004` |
|[<sup class="para">[a] </sup>](#id-1.6.11.5.8.4.2.4.1.1.1) The conversion names follow a standard naming scheme: The official name of the source encoding with all non-alphanumeric characters replaced by underscores, followed by `_to_`, followed by the similarly processed destination encoding name. Therefore, these names sometimes deviate from the customary encoding names shown in [Table 24.1](multibyte.html#CHARSET-TABLE).| | |
### 24.3.5. Further Reading
These are good sources to start learning about various kinds of encoding systems.
*CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing*
Contains detailed explanations of `EUC_JP`, `EUC_CN`, `EUC_KR`, `EUC_TW`.
[https://www.unicode.org/](https://www.unicode.org/)
The web site of the Unicode Consortium.
[RFC 3629](https://tools.ietf.org/html/rfc3629)
UTF-8 (8-bit UCS/Unicode Transformation Format) is defined here.