不支持Python解码Unicode

我在Python中的编码有问题。我尝试了不同的方法，但似乎找不到找到将输出编码为UTF-8的最佳方法。

这就是我想要做的：

result = unicode(google.searchGoogle(param), "utf-8").encode("utf-8")

searchGoogle传回的第一个Google结果param。

这是我得到的错误：

exceptions.TypeError: decoding Unicode is not supported

有谁知道我该如何使Python用UTF-8编码输出以避免这种错误？

— 西蒙布斯
source

102

看起来google.searchGoogle(param)已经返回unicode：

>>> unicode(u'foo', 'utf-8')

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    unicode(u'foo', 'utf-8')
TypeError: decoding Unicode is not supported

因此，您想要的是：

result = google.searchGoogle(param).encode("utf-8")

附带说明一下，您的代码希望它返回一个utf-8编码后的字符串，那么对它进行解码（使用unicode()）和.encode()使用相同的编码进行编码（使用）有什么意义呢？

— ak牛
source

老实说，他们unicode()只是在鬼混，试图了解正在发生的事情。非常感谢您：-)

— simonbs 2011年

现在我有时会得到ascii' codec can't decode byte 0xc3 in position。你知道为什么吗？

— simonbs 2011年

在这行中我建议？这将意味着searchGoogle（）返回了一个0xC3字节的字符串。调用.encode()该代码会导致Python首先尝试转换为unicode（使用ascii编码）。我不知道为什么searchGoogle（）有时会返回unicode，有时还会返回字符串。也许这取决于您提供什么param？尝试坚持一种类型。

— 牦牛

我希望有一种安全，简单的方法可以转换为unicode。

— 埃里克·沃克

@EricWalker您可以编写一个笨拙的帮助器函数，例如def uors2u(object, encoding=..., errors=...)，object如果该参数已经使用Unicode ，则返回的参数将保持不变；如果使用str，则将其转换。但是，此代码闻起来。您应该在从外部（如文件系统）接收到所有输入后，立即将其转换为Unicode，并在需要时将其转换回Unicode，然后再将其发送回。将str转换为unicode的位置应该只有一个，因此不需要像我描述的那样的辅助函数。

— Leonid