TypeError：'str'不支持缓冲区接口

267

plaintext = input("Please enter the text you want to compress")
filename = input("Please enter the desired filename")
with gzip.open(filename + ".gz", "wb") as outfile:
    outfile.write(plaintext)

上面的python代码给了我以下错误：

Traceback (most recent call last):
  File "C:/Users/Ankur Gupta/Desktop/Python_works/gzip_work1.py", line 33, in <module>
    compress_string()
  File "C:/Users/Ankur Gupta/Desktop/Python_works/gzip_work1.py", line 15, in compress_string
    outfile.write(plaintext)
  File "C:\Python32\lib\gzip.py", line 312, in write
    self.crc = zlib.crc32(data, self.crc) & 0xffffffff
TypeError: 'str' does not support the buffer interface

python string gzip

— 未来之王
source

1

@MikePennington：请解释为什么压缩文本没有用？

— galinette

295

如果您使用的是Python3x，则string与Python 2.x的类型不同，则必须将其转换为字节（对其进行编码）。

plaintext = input("Please enter the text you want to compress")
filename = input("Please enter the desired filename")
with gzip.open(filename + ".gz", "wb") as outfile:
    outfile.write(bytes(plaintext, 'UTF-8'))

也不要使用像string或那样的变量file名作为模块或函数的名称。

编辑@汤姆

是的，非ASCII文本也会被压缩/解压缩。我使用UTF-8编码的波兰字母：

plaintext = 'Polish text: ąćęłńóśźżĄĆĘŁŃÓŚŹŻ'
filename = 'foo.gz'
with gzip.open(filename, 'wb') as outfile:
    outfile.write(bytes(plaintext, 'UTF-8'))
with gzip.open(filename, 'r') as infile:
    outfile_content = infile.read().decode('UTF-8')
print(outfile_content)

— 米查尔·尼克拉斯（MichałNiklas）
source

修复这个问题很奇怪；原始代码在3.1下为我工作，文档中的示例代码也未明确编码。如果在非ASCII文本上使用它，gunzip是否可以将其解压缩？我有一个错误。

— Tom Zych

我在Unicode印地语中输入了我的姓名，并成功将其压缩到gzip中。我正在使用Python 3.2

— Future King

@Tom Zych：可能与3.2中的更改有关：docs.python.org/dev/whatsnew/3.2.html#gzip-and-zipfile

— Skurmedel 2011年

我使用ActiveState Python 3.1和3.2对其进行了测试。在我的机器上，两者均可工作。

— 米哈尔尼克拉斯·

1

对于文件压缩，您应该始终以二进制模式打开输入：您需要稍后能够解压缩文件并获得完全相同的内容。str不需要先转换为Unicode（），反而可能会解码错误或输入和输出之间不匹配。

— 亚历克西斯

96

有一个更容易解决此问题的方法。

您只需要向t模式添加a 即可wt。这会导致Python将文件打开为文本文件，而不是二进制文件。然后一切都会正常。

完整的程序变为：

plaintext = input("Please enter the text you want to compress")
filename = input("Please enter the desired filename")
with gzip.open(filename + ".gz", "wt") as outfile:
    outfile.write(plaintext)

— 用户名
source

它也可以在python2上工作吗？难道是使代码在python2和python3上工作的一种方法？

— 卢瓦克福雷-拉克鲁瓦

哇，伙计，你很好！谢谢！让我投票给你。这应该是公认的答案：））

— Loïc2015年

15

添加“ t”可能会有副作用。在Windows上，编码为文本的文件会将换行符（“ \ n”）转换为CRLF（“ \ r \ n”）。

— BitwiseMan '16

42

您不能将Python 3的“字符串”序列化为字节，而无需显式转换为某些编码。

outfile.write(plaintext.encode('utf-8'))

可能就是您想要的。同样适用于python 2.x和3.x。

— 安德烈亚斯·荣格
source

28

对于Python 3.x，您可以通过以下方式将文本转换为原始字节：

bytes("my data", "encoding")

例如：

bytes("attack at dawn", "utf-8")

返回的对象将与一起使用outfile.write。

— Skurmedel
source

9

从py2切换到py3时，通常会出现此问题。在py2 plaintext中既是字符串也是字节数组类型。在py3 plaintext中只有一个字符串，并且在二进制模式下打开时，该方法outfile.write()实际上采用字节数组outfile，因此会引发异常。更改输入以plaintext.encode('utf-8')解决问题。继续阅读，如果这困扰您。

在py2中，file.write的声明使其看起来像您传入了一个字符串：file.write(str)。实际上，您正在传入一个字节数组，您应该已经读过这样的声明：file.write(bytes)。如果您这样阅读，问题很简单，file.write(bytes)需要一个字节类型，并且在py3中要从str中获取字节，您可以将其转换：

py3>> outfile.write(plaintext.encode('utf-8'))

为何py2 docs声明file.write使用字符串？在py2中，声明区别并不重要，因为：

py2>> str==bytes         #str and bytes aliased a single hybrid class in py2
True

py2 的str-bytes类具有一些方法/构造函数，这些方法/构造函数使其在某些方面类似于字符串类，在某些方面类似于字节数组类。方便file.write吗？

py2>> plaintext='my string literal'
py2>> type(plaintext)
str                              #is it a string or is it a byte array? it's both!

py2>> outfile.write(plaintext)   #can use plaintext as a byte array

为什么py3破坏了这个不错的系统？好吧，因为在py2中，基本字符串函数不适用于世界其他地方。测量具有非ASCII字符的单词的长度？

py2>> len('¡no')        #length of string=3, length of UTF-8 byte array=4, since with variable len encoding the non-ASCII chars = 2-6 bytes
4                       #always gives bytes.len not str.len

一直以来，您一直以为在py2 中请求字符串的len，所以您一直在从编码中获取字节数组的长度。这种含糊不清是双重责任阶层的根本问题。您实现哪个版本的方法调用？

好消息是py3可以解决此问题。它解开了str和bytes类。的STR类有绳状的方法中，单独的字节类具有字节阵列方法：

py3>> len('¡ok')       #string
3
py3>> len('¡ok'.encode('utf-8'))     #bytes
4

希望知道这一点有助于揭开问题的神秘面纱，并使迁移的痛苦更容易承担。

— 里亚兹·里兹维（Riaz Rizvi）
source

4

>>> s = bytes("s","utf-8")
>>> print(s)
b's'
>>> s = s.decode("utf-8")
>>> print(s)
s

好吧，如果对消除烦人的'b'字符有用，如果有人有更好的主意，请建议我或随时在这里随时编辑我。我只是新手

— Tapasit Suesasiton
source

您也可以使用s.encode('utf-8')Python s.decode('utf-8')代替s = bytes("s", "utf-8")

— Hans Zimermann，2015年

4

为了Django在django.test.TestCase单元测试，我改变了我的Python2语法：

def test_view(self):
    response = self.client.get(reverse('myview'))
    self.assertIn(str(self.obj.id), response.content)
    ...

要使用Python3 .decode('utf8')语法：

def test_view(self):
    response = self.client.get(reverse('myview'))
    self.assertIn(str(self.obj.id), response.content.decode('utf8'))
    ...

— 亚伦·勒里维尔（Aaron Lelevier）
source