如何在MySQL 5.5中轻松将utf8表转换为utf8mb4


71

我有一个数据库,现在需要支持4个字节的字符(中文)。幸运的是,我已经在生产MySQL 5.5。

所以我只想将所有utf8_bin的排序规则都转换为utf8mb4_bin。

我相信此更改不会导致性能损失/收益,只是会增加一点存储开销。

Answers:


93

在我的指南《如何在MySQL数据库中支持完整Unicode》中,这是您可以用来更新数据库,表或列的字符集和排序规则的查询:

对于每个数据库:

ALTER DATABASE
    database_name
    CHARACTER SET = utf8mb4
    COLLATE = utf8mb4_unicode_ci;

对于每个表:

ALTER TABLE
    table_name
    CONVERT TO CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

对于每一列:

ALTER TABLE
    table_name
    CHANGE column_name column_name
    VARCHAR(191)
    CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;

(不要盲目地复制粘贴此内容!确切的声明取决于列的类型,最大长度和其他属性。上一行只是VARCHAR列的示例。)

但是请注意,您无法完全自动进行从utf8到的转换utf8mb4。如上述指南的第4步中所述,您需要检查列和索引键的最大长度,因为指定的数字在utf8mb4代替时具有不同的含义utf8

《 MySQL 5.5参考手册》的10.1.11节对此有更多信息。


31

我有一个解决方案,可以通过运行一些命令来转换数据库和表。它还转换类型的所有列varchartexttinytextmediumtextlongtextchar。您还应该备份数据库,以防万一发生问题。

将以下代码复制到名为preAlterTables.sql的文件中:

use information_schema;
SELECT concat("ALTER DATABASE `",table_schema,"` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;") as _sql 
FROM `TABLES` where table_schema like "yourDbName" group by table_schema;
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name,"` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;") as _sql  
FROM `TABLES` where table_schema like "yourDbName" group by table_schema, table_name;
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name, "` CHANGE `",column_name,"` `",column_name,"` ",data_type,"(",character_maximum_length,") CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci",IF(is_nullable="YES"," NULL"," NOT NULL"),";") as _sql 
FROM `COLUMNS` where table_schema like "yourDbName" and data_type in ('varchar','char');
SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name, "` CHANGE `",column_name,"` `",column_name,"` ",data_type," CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci",IF(is_nullable="YES"," NULL"," NOT NULL"),";") as _sql 
FROM `COLUMNS` where table_schema like "yourDbName" and data_type in ('text','tinytext','mediumtext','longtext');

将所有出现的“ yourDbName”替换为要转换的数据库。然后运行:

mysql -uroot < preAlterTables.sql | egrep '^ALTER' > alterTables.sql

这将生成一个新文件alterTables.sql,其中包含转换数据库所需的所有查询。运行以下命令以开始转换:

mysql -uroot < alterTables.sql

您还可以通过更改table_schema的条件使它适合于在多个数据库中运行。例如,table_schema like "wiki_%"将转换所有带有名称前缀的数据库wiki_。要转换所有数据库,请用替换条件table_type!='SYSTEM VIEW'

可能出现的问题。我在mysql键中有一些varchar(255)列。这会导致错误:

ERROR 1071 (42000) at line 2229: Specified key was too long; max key length is 767 bytes

如果发生这种情况,您可以简单地将列更改为较小,例如varchar(150),然后重新运行命令。

请注意:此答案将数据库转换为,utf8mb4_unicode_ci而不是utf8mb4_bin问题中要求的。但是您可以简单地替换它。


出色的脚本编写,仅需注意一些事项;当前的MiariaDb安装要求提供密码,因此mysql -uroot -pThatrootPassWord < alterTables.sql可以正常工作。正如您已经指出的,utf8mb4_bin是nextcloud的建议之一。
朱利叶斯

但utf8mb4_0900_ai_ci现在是默认设置,请参见monolune.com/what-is-the-utf8mb4_0900_ai_ci-collat​​ion
Julius

我必须使用“ SET foreign_key_checks = 0;”,然后应用更改,然后使用“ SET foreign_key_checks = 1;”。
dfrankow

谢谢你,兄弟。这是Redmin中将全部更改为utf8mb4的解决方案。
Luciano Fantuzzi

5

我使用了以下shell脚本。它以数据库名称为参数,并将所有表转换为另一个字符集和排序规则(由脚本中定义的另一个参数或默认值提供)。

#!/bin/bash

# mycollate.sh <database> [<charset> <collation>]
# changes MySQL/MariaDB charset and collation for one database - all tables and
# all columns in all tables

DB="$1"
CHARSET="$2"
COLL="$3"

[ -n "$DB" ] || exit 1
[ -n "$CHARSET" ] || CHARSET="utf8mb4"
[ -n "$COLL" ] || COLL="utf8mb4_general_ci"

echo $DB
echo "ALTER DATABASE \`$DB\` CHARACTER SET $CHARSET COLLATE $COLL;" | mysql

echo "USE \`$DB\`; SHOW TABLES;" | mysql -s | (
    while read TABLE; do
        echo $DB.$TABLE
        echo "ALTER TABLE \`$TABLE\` CONVERT TO CHARACTER SET $CHARSET COLLATE $COLL;" | mysql $DB
    done
)

3

我将编写一个脚本(用Perl或其他语言编写)以使用information_schema(TABLES和COLUMNS)遍历所有表,并对每个CHAR / VARCHAR / TEXT字段执行MODIFY COLUMN。我将所有修改收集到每个表的单个ALTER中;这样会更有效率。

我认为(但不确定)Raihan的建议只会更改表格的默认设置


3

遇到这种情况;这是我用来转换数据库的方法:

  1. 首先,您需要进行编辑my.cnf以使默认数据库连接(应用程序和MYSQL之间)与utf8mb4_unicode_ci兼容。没有此字符,例如表情符号和您的应用程序提交的类似字符,将无法以正确的字节/编码将其放置到表中(除非您的应用程序的DB CNN参数指定utf8mb4连接)。

    这里给出的指示。

  2. 执行以下SQL(无需准备SQL即可更改单个列,ALTER TABLE语句将执行此操作)。

    在执行以下代码之前,请用实际的数据库名称替换“ DbName”。

    USE information_schema;
    
    SELECT concat("ALTER DATABASE `",table_schema,
                  "` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;") as _sql
      FROM `TABLES`
     WHERE table_schema like "DbName"
     GROUP BY table_schema;
    
    SELECT concat("ALTER TABLE `",table_schema,"`.`",table_name,
                  "` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;") as _sql
      FROM `TABLES`
     WHERE table_schema like "DbName"
     GROUP BY table_schema, table_name;
    
  3. 收集上述SQL的输出并将其保存在点sql文件中并执行。

  4. 如果您收到类似#1071 - Specified key was too long; max key length is 1000 bytes.问题的表名之类的错误,则意味着该表某列的索引键(本应转换为MB4字符串)将很大,因此Varchar列应<= 250,以便其索引键最大为1000个字节。检查您有索引的列,如果其中之一是varchar> 250(最有可能是255),则

    • 步骤1:检查该列中的数据,以确保该列中的最大字符串大小为<= 250。

      查询示例:

      select `id`,`username`, `email`,
             length(`username`) as l1,
             char_length(`username`) as l2,
             length(`email`) as l3,
             char_length(`email`) as l4
        from jos_users
       order by l4 Desc;
      
    • 步骤2:如果索引列数据的最大字符长度<= 250,则将列长度更改为250。如果不可能,则删除该列上的索引

    • 步骤3:然后再次对该表运行alter table查询,现在应该成功将表转换为utf8mb4。

干杯!


有一种方法可以对超过191个字符的长VARCHAR使用索引。您必须具有DBA / SUPER USER特权才能执行以下操作:设置数据库参数:innodb_large_prefix:ON; innodb_file_format:梭子鱼; innodb_file_format_max:梭子鱼;
洲鸿岭市

2

我写了本指南:http : //hanoian.com/content/index.php/24-automate-the-converting-a-mysql-database-character-set-to-utf8mb4

从我的工作中,我发现仅改变数据库和表是不够的。我必须进入每个表并更改每个text / mediumtext / varchar列。

幸运的是,我能够编写一个脚本来检测MySQL数据库的元数据,因此它可以遍历表和列并自动更改它们。

MySQL 5.6的长索引:

您必须具有DBA / SUPER USER特权才能执行以下操作:设置数据库参数:

innodb_large_prefix:开
innodb_file_format:梭子鱼 
innodb_file_format_max:梭子鱼

在此问题的答案中,有说明如何在上面设置这些参数:https : //stackoverflow.com/questions/35847015/mysql-change-innodb-large-prefix

当然,在我的文章中也有说明。

对于MySQL 5.7或更高版本,innodb_large_prefix默认情况下为ON,innodb_file_format默认情况下也为梭子鱼。


2

对于可能遇到此问题的人,最佳解决方案是根据此表将列首先修改为二进制类型:

  1. CHAR => BINARY
  2. 文字=> BLOB
  3. TINYTEXT => TINYBLOB
  4. MEDIUMTEXT => MEDIUMBLOB
  5. LONGTEXT => LONGBLOB
  6. VARCHAR => VARBINARY

然后,将列修改回原来的类型并使用所需的字符集。

例如。:

ALTER TABLE [TABLE_SCHEMA].[TABLE_NAME] MODIFY [COLUMN_NAME] LONGBLOB;
ALTER TABLE [TABLE_SCHEMA].[TABLE_NAME] MODIFY [COLUMN_NAME] VARCHAR(140) CHARACTER SET utf8mb4;

我尝试了几个latin1表,它保留了所有变音符号。

您可以为执行此操作的所有列提取此查询:

SELECT
CONCAT('ALTER TABLE ', TABLE_SCHEMA,'.', TABLE_NAME,' MODIFY ', COLUMN_NAME,' VARBINARY;'),
CONCAT('ALTER TABLE ', TABLE_SCHEMA,'.', TABLE_NAME,' MODIFY ', COLUMN_NAME,' ', COLUMN_TYPE,' CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;')
FROM information_schema.columns
WHERE TABLE_SCHEMA IN ('[TABLE_SCHEMA]')
AND COLUMN_TYPE LIKE 'varchar%'
AND (COLLATION_NAME IS NOT NULL AND COLLATION_NAME NOT LIKE 'utf%');

0

制作了一个脚本,它或多或少地自动执行此操作:

<?php
/**
 * Requires php >= 5.5
 * 
 * Use this script to convert utf-8 data in utf-8 mysql tables stored via latin1 connection
 * This is a PHP port from: https://gist.github.com/njvack/6113127
 *
 * BACKUP YOUR DATABASE BEFORE YOU RUN THIS SCRIPT!
 *
 * Once the script ran over your databases, change your database connection charset to utf8:
 *
 * $dsn = 'mysql:host=localhost;port=3306;charset=utf8';
 * 
 * DON'T RUN THIS SCRIPT MORE THAN ONCE!
 *
 * @author hollodotme
 *
 * @author derclops since 2019-07-01
 *
 *         I have taken the liberty to adapt this script to also do the following:
 *
 *         - convert the database to utf8mb4
 *         - convert all tables to utf8mb4
 *         - actually then also convert the data to utf8mb4
 *
 */

header('Content-Type: text/plain; charset=utf-8');

$dsn      = 'mysql:host=localhost;port=3306;charset=utf8';
$user     = 'root';
$password = 'root';
$options  = [
    \PDO::ATTR_CURSOR                   => \PDO::CURSOR_FWDONLY,
    \PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => true,
    \PDO::MYSQL_ATTR_INIT_COMMAND       => "SET CHARACTER SET latin1",
];


$dbManager = new \PDO( $dsn, $user, $password, $options );

$databasesToConvert = [ 'database1',/** database3, ... */ ];
$typesToConvert     = [ 'char', 'varchar', 'tinytext', 'mediumtext', 'text', 'longtext' ];

foreach ( $databasesToConvert as $database )
{
    echo $database, ":\n";
    echo str_repeat( '=', strlen( $database ) + 1 ), "\n";

    $dbManager->exec( "USE `{$database}`" );

    echo "converting database to correct locale too ... \n";

    $dbManager->exec("ALTER DATABASE `{$database}` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci");


    $tablesStatement = $dbManager->query( "SHOW TABLES" );
    while ( ($table = $tablesStatement->fetchColumn()) )
    {
        echo "Table: {$table}:\n";
        echo str_repeat( '-', strlen( $table ) + 8 ), "\n";

        $columnsToConvert = [ ];

        $columsStatement = $dbManager->query( "DESCRIBE `{$table}`" );

        while ( ($tableInfo = $columsStatement->fetch( \PDO::FETCH_ASSOC )) )
        {
            $column = $tableInfo['Field'];
            echo ' * ' . $column . ': ' . $tableInfo['Type'];

            $type = preg_replace( "#\(\d+\)#", '', $tableInfo['Type'] );

            if ( in_array( $type, $typesToConvert ) )
            {
                echo " => must be converted\n";

                $columnsToConvert[] = $column;
            }
            else
            {
                echo " => not relevant\n";
            }
        }


        //convert table also!!!
        $convert = "ALTER TABLE `{$table}` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci";

        echo "\n", $convert, "\n";
        $dbManager->exec( $convert );
        $databaseErrors = $dbManager->errorInfo();
        if( !empty($databaseErrors[1]) ){
            echo "\n !!!!!!!!!!!!!!!!! ERROR OCCURED ".print_r($databaseErrors, true)." \n";
            exit;
        }


        if ( !empty($columnsToConvert) )
        {
            $converts = array_map(
                function ( $column )
                {
                    //return "`{$column}` = IFNULL(CONVERT(CAST(CONVERT(`{$column}` USING latin1) AS binary) USING utf8mb4),`{$column}`)";
                    return "`{$column}` = CONVERT(BINARY(CONVERT(`{$column}` USING latin1)) USING utf8mb4)";
                },
                $columnsToConvert
            );

            $query = "UPDATE IGNORE `{$table}` SET " . join( ', ', $converts );

            //alternative
            // UPDATE feedback SET reply = CONVERT(BINARY(CONVERT(reply USING latin1)) USING utf8mb4) WHERE feedback_id = 15015;


            echo "\n", $query, "\n";


            $dbManager->exec( $query );

            $databaseErrors = $dbManager->errorInfo();
            if( !empty($databaseErrors[1]) ){
                echo "\n !!!!!!!!!!!!!!!!! ERROR OCCURED ".print_r($databaseErrors, true)." \n";
                exit;
            }
        }

        echo "\n--\n";
    }

    echo "\n";
}
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.