有没有库可以为我提供HTML页面中所有节点的XPATH?
发布于 2015-02-18 04:31:43
如果这对其他人有帮助,如果您使用的是python/lxml,那么首先需要有一个树,然后用Dimitre在上面列出的XPATH路径查询该树。
要获取树,请执行以下操作:
import lxml
from lxml import html, etree
your_webpage_string = "<html><head><title>test<body><h1>page title</h3>"
bad_html = lxml.html.fromstring(your_webpage_string)
good_html = etree.tostring(root, pretty_print=True).strip()
your_tree = etree.fromstring(good_html)
all_xpaths = your_tree.xpath('//*')
在最后一行,将'//*‘替换为您想要的任何xpath。all_xpaths
现在是一个如下所示的列表:
[<Element html at 0x7ff740b24b90>,
<Element head at 0x7ff740b24d88>,
<Element title at 0x7ff740b24dd0>,
<Element body at 0x7ff740b24e18>,
<Element h1 at 0x7ff740b24e60>]
https://stackoverflow.com/questions/5643323
复制相似问题