无NULL，但编码“ UTF8”的字节序列无效：0x00

12

我花了最后8个小时尝试将mysqldump --compatible = postgresql的输出导入到PostgreSQL 8.4.9中，并且在这里和其他地方已经阅读了至少20个不同的线程，但是都没有找到关于此特定问题的信息。实际可行的答案。

MySQL 5.1.52数据转储：

mysqldump -u root -p --compatible=postgresql --no-create-info --no-create-db --default-character-set=utf8 --skip-lock-tables rt3 > foo

PostgreSQL 8.4.9服务器作为目标

使用'psql -U rt_user -f foo'加载数据正在报告（其中很多，这是一个示例）：

psql:foo:29: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

根据以下内容，输入文件中没有NULL（0x00）字符。

database-dumps:rcf-temp1# sed 's/\x0/ /g' < foo > nonulls
database-dumps:rcf-temp1# sum foo nonulls
04730 2545610 foo
04730 2545610 nonulls
database-dumps:rcf-temp1# rm nonulls

同样，使用Perl进行的另一次检查也没有显示NULL：

database-dumps:rcf-temp1# perl -ne '/\000/ and print;' foo
database-dumps:rcf-temp1#

正如错误中提到的“提示”一样，我尝试了所有可能的方法将“ client_encoding”设置为“ UTF8”，虽然成功了，但对解决我的问题没有任何作用。

database-dumps:rcf-temp1# psql -U rt_user --variable=client_encoding=utf-8 -c "SHOW client_encoding;" rt3
 client_encoding
-----------------
 UTF8
(1 row)

database-dumps:rcf-temp1#

完美，但是：

database-dumps:rcf-temp1# psql -U rt_user -f foo --variable=client_encoding=utf-8 rt3
...
psql:foo:29: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
...

除非听到正确的答案“ Hoyle Hoyle”，并且知道我真的不在乎为这个很少引用的数据保留任何非ASCII字符，否则您有什么建议？

更新：导入时，同一转储文件的纯ASCII版本出现相同的错误。真正令人难以置信：

database-dumps:rcf-temp1# # convert any non-ASCII character to a space
database-dumps:rcf-temp1# perl -i.bk -pe 's/[^[:ascii:]]/ /g;' mysql5-dump.sql
database-dumps:rcf-temp1# sum mysql5-dump.sql mysql5-dump.sql.bk
41053 2545611 mysql5-dump.sql
50145 2545611 mysql5-dump.sql.bk
database-dumps:rcf-temp1# cmp mysql5-dump.sql mysql5-dump.sql.bk
mysql5-dump.sql mysql5-dump.sql.bk differ: byte 1304850, line 30
database-dumps:rcf-temp1# # GOOD!
database-dumps:rcf-temp1# psql -U postgres -f mysql5-dump.sql --variable=client_encoding=utf-8 rt3
...
INSERT 0 416
psql:mysql5-dump.sql:30: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 455
INSERT 0 424
INSERT 0 483
INSERT 0 447
INSERT 0 503
psql:mysql5-dump.sql:36: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 502
INSERT 0 507
INSERT 0 318
INSERT 0 284
psql:mysql5-dump.sql:41: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 382
INSERT 0 419
INSERT 0 247
psql:mysql5-dump.sql:45: ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match the encod.
INSERT 0 267
INSERT 0 348
^C

有问题的表之一定义为：

                                        Table "public.attachments"
     Column      |            Type             |                        Modifie
-----------------+-----------------------------+--------------------------------
 id              | integer                     | not null default nextval('atta)
 transactionid   | integer                     | not null
 parent          | integer                     | not null default 0
 messageid       | character varying(160)      |
 subject         | character varying(255)      |
 filename        | character varying(255)      |
 contenttype     | character varying(80)       |
 contentencoding | character varying(80)       |
 content         | text                        |
 headers         | text                        |
 creator         | integer                     | not null default 0
 created         | timestamp without time zone |
Indexes:
    "attachments_pkey" PRIMARY KEY, btree (id)
    "attachments1" btree (parent)
    "attachments2" btree (transactionid)
    "attachments3" btree (parent, transactionid)

我没有更改数据库架构任何部分的类型的自由。这样做可能会中断软件等的未来升级。

可能的问题列是“文本”类型的“内容”（也许在其他表中也是如此）。正如我从先前的研究中已经知道的那样，PostgreSQL不允许在'text'值中使用NULL。但是，请参见上面的sed和Perl均不显示NULL字符的地方，然后再向下查看我从整个转储文件中剥离所有非ASCII字符的位置，但仍会阻塞。

mysql postgresql mysqldump

— 杰布琳
source

2

转储文件的第29行是什么样的？类似的东西head -29 foo | tail -1 | cat -v可能有用。

— 亩太短

受影响的表的定义是什么，违规行是什么样的？

— tscho 2011年

大约1MB的公司数据。我当然知道你要去哪里。这是该思路的结尾（请原谅我的法语为“ gist / paste”的结尾）：gist.github.com/1525788

— jblaine 2011年

tscho：如前所述，示例错误行是其中数百种错误之一。

— jblaine

3

这些字符/文本字段中的一个或多个，其内容可以为0x00。

请尝试以下操作：

SELECT * FROM rt3 where some_text_field = 0x00 LIMIT 1;

如果返回任何一行，请尝试使用以下命令更新这些字符/文本字段：

UPDATE rt3 SET some_text_field = '' WHERE some_text_field = 0x00;

然后，尝试另一个MYSQLDUMP ...（和PostgreSQL的import方法）。

— 法利·英格利斯（Farley Inglis）
source

尽管需要使用，但这有助于我找到流浪的空字符colname LIKE concat('%', 0x00, '%')。在包含序列化PHP数组的字段中找到它们。

— cimmanon

5

使用MySQL 5.0.51版和Postgres 9.3.4.0版时，我遇到了同样的问题。在看到DanielVérité的评论“在postgresql模式下的mysqldump会将字符串中的空字节转储为\ 0时，我解决了“用于编码“ UTF8”：0x00“的无效字节序列”的问题，因此您可能想搜索该字符序列。

果然grep最终显示了NULL字符。

grep \\\\0 dump.sql

我使用以下命令替换了NULL字符

sed -i BAK 's/\\0//g' dump.sql

然后Postgres能够成功加载dump.sql

— 杰登斯
source

4

您可以在文件中没有任何NULL字节或任何非ascii字符的情况下得到此错误。utf8数据库中的示例：

select E'ab\0cd';

将产生：

错误：编码“ UTF8”的字节序列无效：0x00提示：如果字节序列与服务器期望的编码不匹配，该错误由“ client_encoding”控制，则也会发生此错误。

PostgreSQL模式下的mysqldump将以\ 0的形式将空字节转储为字符串，因此您可能要搜索该字符序列。

— 丹尼尔·韦里特
source

0

我一半记得这样的问题。我想我最终迁移了架构，然后将数据转储为csv并从csv文件加载了数据。我记得必须更新csv文件（使用sed或unixtodos之类的unix工具）或使用open office calc（excell）修复某些在导入步骤中出错的项目-就像打开并重新保存该文件一样简单文件。

— 亚当
source