bs4.FeatureNotFound：找不到具有您请求的功能的树构建器：lxml。您需要安装解析器库吗？

222

...
soup = BeautifulSoup(html, "lxml")
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

以上输出在我的终端上。我在Mac OS 10.7.x上。我有Python 2.7.1，并按照本教程操作获得了Beautiful Soup和lxml，它们都已成功安装并与位于此处的单独测试文件一起使用。在导致此错误的Python脚本中，我包含以下行： from pageCrawler import comparePages 在pageCrawler文件中，我包含以下两行： from bs4 import BeautifulSoup from urllib2 import urlopen

找出问题所在以及如何解决的任何帮助将不胜感激。

— 用户名
source

1

看到这个答案-stackoverflow.com/questions/17766725/how-to-re-install-lxml

— Md。Mohsin 2014年

是htmlurl还是html内容？

— tommy.carstensen

226

我怀疑这与BS将用于读取HTML的解析器有关。他们的文档在这里，但是如果您像我（在OSX上）一样，可能会遇到一些麻烦，需要做一些工作：

您会注意到，在上面的BS4文档页面中，他们指出，默认情况下，BS4将使用Python内置的HTML解析器。假设您使用的是OSX，则Apple捆绑的Python版本是2.7.2，它对字符格式不宽容。我遇到了同样的问题，因此我升级了Python版本以解决此问题。在virtualenv中执行此操作可以最大程度地减少对其他项目的破坏。

如果这样做听起来很痛苦，则可以切换到LXML解析器：

pip install lxml

然后尝试：

soup = BeautifulSoup(html, "lxml")

根据您的情况，这可能就足够了。我发现这很烦人，需要升级我的Python版本。使用的virtualenv，您可以迁移的包很容易。

— 詹姆斯·埃里科
source

1

在安装pip之后进行测试：

python -c 'import requests ; from bs4 import BeautifulSoup ; r = requests.get("https://www.allrecipes.com/recipes/96/salad/") ; soup = BeautifulSoup(r.text, "lxml") '

— ViFI

在我的虚拟ENV，我需要安装requests，bs4和lxml之前BeautifulSoup会分析我的网页内容。

— noobninja

ff！疯狂的Mac，我不知道何时才能停止后悔购买Mac的决定！

— 伊克拉。

48

对于安装了bs4的基本开箱即用的python，则可以使用以下命令处理xml

soup = BeautifulSoup(html, "html5lib")

但是，如果您想使用formatter ='xml'，则需要

pip3 install lxml

soup = BeautifulSoup(html, features="xml")

— 蒂姆·塞德（Tim Seed）
source

3

在新启动的远程服务器上，html5lib对我而言不是开箱即用的。我仍然必须执行pip install html5lib，之后一切正常。

— petercoles

不适用于我：

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?

如果我将其更改为html.parser有效

— 8bitjunkie

41

我首选内置python html解析器，不安装不依赖

soup = BeautifulSoup(s, "html.parser")

— 恩斯特
source

它在@Ernst时有效，而前一个无效。谢谢！

— adrCoder

14

我正在使用Python 3.6，并且在这篇文章中有相同的原始错误。运行命令后：

python3 -m pip install lxml

它解决了我的问题

— 巴沙尔
source

在泊坞它也有必要apt install python-lxml

— 瓦尔特

14

运行以下三个命令，以确保已安装所有相关的软件包：

pip install bs4
pip install html5lib
pip install lxml

然后根据需要重新启动Python IDE。

那应该处理与这个问题有关的任何事情。

— 皮卡曼德2
source

1

这是实际的解决方案。

— 约翰·梭哈

8

您可以使用以下代码代替使用lxml和html.parser：

soup = BeautifulSoup(html, 'html.parser')

— 约格什
source

2

vendor.bs.bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html.parser. Do you need to install a parser library?

— alex

4

尽管BeautifulSoup默认情况下支持HTML解析器。但是，如果您想使用任何其他第三方Python解析器，则需要安装该外部解析器，例如（lxml）。

soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser

但是，如果您未将任何解析器指定为参数，则会收到一条警告，提示您未指定解析器。

soup_object= BeautifulSoup(markup) #Warnning

要使用任何其他外部解析器，您需要先安装它，然后再指定它。喜欢

pip install lxml

soup_object= BeautifulSoup(markup,'lxml') # C dependent parser

外部解析器具有c和python依赖关系，这可能有一些优点和缺点。

— Projesh Bhoumik
source

3

我遇到了同样的问题。我发现原因是我有一个过时的python 6软件包。

>>> import html5lib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module>
    from .html5parser import HTMLParser, parse, parseFragment
  File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module>
    from six import with_metaclass, viewkeys, PY3
ImportError: cannot import name viewkeys

升级六个软件包将解决此问题：

sudo pip install six=1.10.0

— 乔扬
source

sudo pip install six==1.10.0

— pyd

2

在python环境中安装LXML分析器。

pip install lxml

您的问题将得到解决。您还可以使用内置的python软件包，其用法与以下相同：

soup = BeautifulSoup(s,  "html.parser")

注意：在Python3中，“ HTMLParser”模块已重命名为“ html.parser”

— 香卡·维斯努（Shankar Vishnu）
source

0

在某些参考中，使用第二个而不是第一个：

soup_object= BeautifulSoup(markup,'html-parser')
soup_object= BeautifulSoup(markup,'html.parser')

— AbhishekPakrashi
source

您应该在回答中提供更多细节

— Michael

0

由于使用了解析器，因此出现错误。通常，如果您具有HTML文件/代码，则需要使用html5lib（文档可在此处找到）；如果您具有XML文件/数据，则需要使用lxml（文档可在此处找到）。您也可以将其lxml用于HTML文件/代码，但有时会出现上述错误。因此，最好根据数据/文件的类型明智地选择软件包。您也可以使用html_parser内置模块。但是，这有时有时也不起作用。

有关何时使用哪个软件包的更多详细信息，请参见此处的详细信息。

— 普拉纳夫（Pranav Bhendawade）
source

0

空白参数将导致警告，提示您最好使用该参数。
汤= BeautifulSoup（html）

--------------- // UserWarning：未明确指定解析器，因此我正在为此系统使用最佳的HTML解析器（“ html5lib”）。通常这不是问题，但是如果您在另一个系统或不同的虚拟环境中运行此代码，则它可能使用不同的解析器并且行为不同。 ------- /

python --version Python 3.7.7

PyCharm 19.3.4 CE

— 用户名
source