这个问题已经有很多答案,但是Mathias Bynens提到,为了更好地支持UTF-8,应使用'utf8mb4'而不是'utf8'('utf8'不支持4字节字符,插入时字段会被截断)。我认为这是一个重要的区别。因此,这是有关如何设置默认字符集和排序规则的另一个答案。一个可以让您插入一堆便便(💩)的地方。
这适用于MySQL 5.5.35。
请注意,某些设置可能是可选的。由于我不能完全确定自己没有忘记什么,因此我将把这个答案作为社区Wiki。
旧设定
mysql> SHOW VARIABLES LIKE 'char%'; SHOW VARIABLES LIKE 'collation%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
设定档
# 💩 𝌆
# UTF-8 should be used instead of Latin1. Obviously.
# NOTE "utf8" in MySQL is NOT full UTF-8: http://mathiasbynens.be/notes/mysql-utf8mb4
[client]
default-character-set = utf8mb4
[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
[mysql]
default-character-set = utf8mb4
新设定
mysql> SHOW VARIABLES LIKE 'char%'; SHOW VARIABLES LIKE 'collation%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+----------------------+--------------------+
3 rows in set (0.00 sec)
character_set_system 始终为utf8。
这不会影响现有表,只是默认设置(用于新表)。以下ALTER代码可用于转换现有表(没有转储-恢复解决方法):
ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
编辑:
在MySQL 5.0服务器上:character_set_client,character_set_connection,character_set_results,collation_connection保留在latin1上。发行SET NAMES utf8
(该版本不提供utf8mb4)也将其设置为utf8。
警告:如果您有一个utf8表,其索引列的类型为VARCHAR(255),则在某些情况下无法转换该表,因为超过了最大密钥长度(Specified key was too long; max key length is 767 bytes.
)。如果可能,请将列大小从255减小到191(因为191 * 4 = 764 <767 <192 * 4 = 768)。之后,可以转换表。
utf8mb4
,即具有完全Unicode支持的真实UTF-8。请参阅如何在MySQL数据库中支持完整Unicode。