文章/答案/技术大牛

发布

社区首页 >问答首页 >我们可以在BeautifulSoup中使用xpath吗？

问我们可以在BeautifulSoup中使用xpath吗？
EN

Stack Overflow用户

提问于 2012-07-13 14:55:20

回答 9查看 219.1K关注 0票数 135

我正在使用BeautifulSoup抓取一个url，我有以下代码

import urllib
import urllib2
from BeautifulSoup import BeautifulSoup

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
soup = BeautifulSoup(the_page)
soup.findAll('td',attrs={'class':'empformbody'})

现在在上面的代码中，我们可以使用

来获取标记和相关信息，但我想使用xpath。可以在BeautifulSoup中使用xpath吗？如果可能，谁能给我提供一个示例代码，以便它将是更有帮助的？

xpath

beautifulsoup

urllib

python

web-scraping

回答 9

Stack Overflow用户

回答已采纳

发布于 2012-07-13 15:31:41

不，BeautifulSoup本身不支持XPath表达式。

另一个库，

lxml

有吗？

支持XPath 1.0。它有一个

BeautifulSoup兼容模式

在那里它会尝试像Soup那样解析损坏的HTML。但是，

默认lxml HTML解析器

在解析损坏的HTML时也做得同样好，而且我相信速度会更快。

一旦将文档解析成lxml树，就可以使用

方法来搜索元素。

try:
    # Python 2
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen
from lxml import etree

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
tree.xpath(xpathselector)

还有一个

专用

模块

具有附加功能。

请注意，在上面的示例中，我传递了

对象直接添加到

，因为让解析器直接从流中读取比先将响应读入大字符串更有效。方法来执行相同的操作。

库中，您想要设置

并传入

对象

启用透明传输解压后

import lxml.html
import requests

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = requests.get(url, stream=True)
response.raw.decode_content = True
tree = lxml.html.parse(response.raw)

您可能感兴趣的是

CSS选择器支持

；

类将CSS语句转换为XPath表达式，使您可以搜索

这就容易多了：

from lxml.cssselect import CSSSelector

td_empformbody = CSSSelector('td.empformbody')
for elem in td_empformbody(tree):
    # Do something with these table cells.

回到原点: BeautifulSoup本身

有吗？

有非常完整的

CSS选择器支持

for cell in soup.select('table#foobar td.empformbody'):
    # Do something with these table cells.

票数 202

Stack Overflow用户

发布于 2012-07-13 19:44:46

我可以确认在Beautiful Soup中没有XPath支持。

票数 163

Stack Overflow用户

发布于 2017-01-07 05:38:08

正如其他人所说，BeautifulSoup不支持xpath。从xpath获取内容的方法可能有很多，包括使用Selenium。然而，这里有一个在Python 2或3中工作的解决方案：

from lxml import html
import requests

page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')

print('Buyers: ', buyers)
print('Prices: ', prices)

我用过

这个

作为参考。

票数 53

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/11465555

复制

相似问题

问我们可以在BeautifulSoup中使用xpath吗？
EN

回答 9

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我们可以在BeautifulSoup中使用xpath吗？EN

回答 9

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我们可以在BeautifulSoup中使用xpath吗？
EN