## 10.2 MySQL 中的字符集和排序规则 [10.2.1 字符集曲目](charset-repertoire.html) [10.2.2 元数据的 UTF-8](charset-metadata.html) MySQL Server 支持多个字符集,包括几个 Unicode 字符集。要显示可用的字符集,请使用`INFORMATION_SCHEMA` [`CHARACTER_SETS`](information-schema-character-sets-table.html)表或[`显示字符集`](show-character-set.html)陈述。部分清单如下。有关更完整的信息,请参阅[第 10.10 节,“支持的字符集和排序规则”](charset-charsets.html). ``` mysql> SHOW CHARACTER SET; +----------+---------------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+---------------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | binary | Binary pseudo charset | binary | 1 | ... | latin1 | cp1252 West European | latin1_swedish_ci | 1 | ... | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | ... | utf8mb3 | UTF-8 Unicode | utf8_general_ci | 3 | | utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 | ... ``` 默认情况下,[`显示字符集`](show-character-set.html)语句显示所有可用的字符集。它需要一个可选的[`喜欢`](string-comparison-functions.html#operator_like)要么`在哪里`子句,指示要匹配的字符集名称。以下示例显示了一些 Unicode 字符集(基于 Unicode 转换格式的字符集): ``` mysql> SHOW CHARACTER SET LIKE 'utf%'; +---------+------------------+--------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+------------------+--------------------+--------+ | utf16 | UTF-16 Unicode | utf16_general_ci | 4 | | utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 | | utf32 | UTF-32 Unicode | utf32_general_ci | 4 | | utf8mb3 | UTF-8 Unicode | utf8_general_ci | 3 | | utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 | +---------+------------------+--------------------+--------+ ``` 给定的字符集总是至少有一个排序规则,并且大多数字符集都有几个。要列出字符集的显示排序规则,请使用`INFORMATION_SCHEMA` [`校对`](information-schema-collations-table.html)表或[`显示整理`](show-collation.html)陈述。 默认情况下,[`显示整理`](show-collation.html)语句显示所有可用的排序规则。它需要一个可选的[`喜欢`](string-comparison-functions.html#operator_like)要么`在哪里`指示要显示哪些排序规则名称的子句。例如,要查看默认字符集的排序规则,`utf8mb4`, 使用这个语句: ``` mysql> SHOW COLLATION WHERE Charset = 'utf8mb4'; +----------------------------+---------+-----+---------+----------+---------+---------------+ | Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | +----------------------------+---------+-----+---------+----------+---------+---------------+ | utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD | | utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD | | utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD | | utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD | | utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | PAD SPACE | | utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | PAD SPACE | | utf8mb4_cs_0900_ai_ci | utf8mb4 | 266 | | Yes | 0 | NO PAD | | utf8mb4_cs_0900_as_cs | utf8mb4 | 289 | | Yes | 0 | NO PAD | | utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | PAD SPACE | | utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 | PAD SPACE | | utf8mb4_da_0900_ai_ci | utf8mb4 | 267 | | Yes | 0 | NO PAD | | utf8mb4_da_0900_as_cs | utf8mb4 | 290 | | Yes | 0 | NO PAD | | utf8mb4_de_pb_0900_ai_ci | utf8mb4 | 256 | | Yes | 0 | NO PAD | | utf8mb4_de_pb_0900_as_cs | utf8mb4 | 279 | | Yes | 0 | NO PAD | | utf8mb4_eo_0900_ai_ci | utf8mb4 | 273 | | Yes | 0 | NO PAD | | utf8mb4_eo_0900_as_cs | utf8mb4 | 296 | | Yes | 0 | NO PAD | | utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 | PAD SPACE | | utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 | PAD SPACE | | utf8mb4_es_0900_ai_ci | utf8mb4 | 263 | | Yes | 0 | NO PAD | | utf8mb4_es_0900_as_cs | utf8mb4 | 286 | | Yes | 0 | NO PAD | | utf8mb4_es_trad_0900_ai_ci | utf8mb4 | 270 | | Yes | 0 | NO PAD | | utf8mb4_es_trad_0900_as_cs | utf8mb4 | 293 | | Yes | 0 | NO PAD | | utf8mb4_et_0900_ai_ci | utf8mb4 | 262 | | Yes | 0 | NO PAD | | utf8mb4_et_0900_as_cs | utf8mb4 | 285 | | Yes | 0 | NO PAD | | utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE | | utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 | PAD SPACE | | utf8mb4_hr_0900_ai_ci | utf8mb4 | 275 | | Yes | 0 | NO PAD | | utf8mb4_hr_0900_as_cs | utf8mb4 | 298 | | Yes | 0 | NO PAD | | utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 | PAD SPACE | | utf8mb4_hu_0900_ai_ci | utf8mb4 | 274 | | Yes | 0 | NO PAD | | utf8mb4_hu_0900_as_cs | utf8mb4 | 297 | | Yes | 0 | NO PAD | | utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 | PAD SPACE | | utf8mb4_is_0900_ai_ci | utf8mb4 | 257 | | Yes | 0 | NO PAD | | utf8mb4_is_0900_as_cs | utf8mb4 | 280 | | Yes | 0 | NO PAD | | utf8mb4_ja_0900_as_cs | utf8mb4 | 303 | | Yes | 0 | NO PAD | | utf8mb4_ja_0900_as_cs_ks | utf8mb4 | 304 | | Yes | 24 | NO PAD | | utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 | PAD SPACE | | utf8mb4_la_0900_ai_ci | utf8mb4 | 271 | | Yes | 0 | NO PAD | | utf8mb4_la_0900_as_cs | utf8mb4 | 294 | | Yes | 0 | NO PAD | | utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 | PAD SPACE | | utf8mb4_lt_0900_ai_ci | utf8mb4 | 268 | | Yes | 0 | NO PAD | | utf8mb4_lt_0900_as_cs | utf8mb4 | 291 | | Yes | 0 | NO PAD | | utf8mb4_lv_0900_ai_ci | utf8mb4 | 258 | | Yes | 0 | NO PAD | | utf8mb4_lv_0900_as_cs | utf8mb4 | 281 | | Yes | 0 | NO PAD | | utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 | PAD SPACE | | utf8mb4_pl_0900_ai_ci | utf8mb4 | 261 | | Yes | 0 | NO PAD | | utf8mb4_pl_0900_as_cs | utf8mb4 | 284 | | Yes | 0 | NO PAD | | utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 | PAD SPACE | | utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 | PAD SPACE | | utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 | PAD SPACE | | utf8mb4_ro_0900_ai_ci | utf8mb4 | 259 | | Yes | 0 | NO PAD | | utf8mb4_ro_0900_as_cs | utf8mb4 | 282 | | Yes | 0 | NO PAD | | utf8mb4_ru_0900_ai_ci | utf8mb4 | 306 | | Yes | 0 | NO PAD | | utf8mb4_ru_0900_as_cs | utf8mb4 | 307 | | Yes | 0 | NO PAD | | utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 | PAD SPACE | | utf8mb4_sk_0900_ai_ci | utf8mb4 | 269 | | Yes | 0 | NO PAD | | utf8mb4_sk_0900_as_cs | utf8mb4 | 292 | | Yes | 0 | NO PAD | | utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 | PAD SPACE | | utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 | PAD SPACE | | utf8mb4_sl_0900_ai_ci | utf8mb4 | 260 | | Yes | 0 | NO PAD | | utf8mb4_sl_0900_as_cs | utf8mb4 | 283 | | Yes | 0 | NO PAD | | utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 | PAD SPACE | | utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 | PAD SPACE | | utf8mb4_sv_0900_ai_ci | utf8mb4 | 264 | | Yes | 0 | NO PAD | | utf8mb4_sv_0900_as_cs | utf8mb4 | 287 | | Yes | 0 | NO PAD | | utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 | PAD SPACE | | utf8mb4_tr_0900_ai_ci | utf8mb4 | 265 | | Yes | 0 | NO PAD | | utf8mb4_tr_0900_as_cs | utf8mb4 | 288 | | Yes | 0 | NO PAD | | utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 | PAD SPACE | | utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 | PAD SPACE | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE | | utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 | PAD SPACE | | utf8mb4_vi_0900_ai_ci | utf8mb4 | 277 | | Yes | 0 | NO PAD | | utf8mb4_vi_0900_as_cs | utf8mb4 | 300 | | Yes | 0 | NO PAD | | utf8mb4_zh_0900_as_cs | utf8mb4 | 308 | | Yes | 0 | NO PAD | +----------------------------+---------+-----+---------+----------+---------+---------------+ ``` 有关这些排序规则的更多信息,请参阅[第 10.10.1 节,“Unicode 字符集”](charset-unicode-sets.html). 归类具有以下一般特征: - 两个不同的字符集不能有相同的排序规则。 - 每个字符集都有一个*默认排序规则*.例如,默认排序规则`utf8mb4`和`拉丁语1`是`utf8mb4_0900_ai_ci`和`latin1_swedish_ci`, 分别。这`INFORMATION_SCHEMA` [`CHARACTER_SETS`](information-schema-character-sets-table.html)表和[`显示字符集`](show-character-set.html)语句指示每个字符集的默认排序规则。这`INFORMATION_SCHEMA` [`校对`](information-schema-collations-table.html)表和[`显示整理`](show-collation.html)语句有一个列指示每个排序规则是否是其字符集的默认值(`是的`如果是,则为空)。 - 排序规则名称以与其关联的字符集的名称开头,通常后跟一个或多个表示其他排序规则特征的后缀。有关命名约定的更多信息,请参阅[第 10.3.1 节,“排序规则命名约定”](charset-collation-names.html). 当一个字符集有多个排序规则时,可能不清楚哪种排序规则最适合给定的应用程序。为避免选择不适当的排序规则,请与代表性数据值进行一些比较,以确保给定排序规则以您期望的方式对值进行排序。