我需要使用python的lxml根据属性的内容完全删除元素。示例:
import lxml.etree as et
xml="""
<groceries>
<fruit state="rotten">apple</fruit>
<fruit state="fresh">pear</fruit>
<fruit state="fresh">starfruit</fruit>
<fruit state="rotten">mango</fruit>
<fruit state="fresh">peach</fruit>
</groceries>
"""
tree=et.fromstring(xml)
for bad in tree.xpath("//fruit[@state=\'rotten\']"):
#remove this element from the tree
print et.tostring(tree, pretty_print=True)我想要打印这个:
<groceries>
<fruit state="fresh">pear</fruit>
<fruit state="fresh">starfruit</fruit>
<fruit state="fresh">peach</fruit>
</groceries>有没有一种方法可以做到这一点,而无需存储临时变量并手动打印到它,如下所示:
newxml="<groceries>\n"
for elt in tree.xpath('//fruit[@state=\'fresh\']'):
newxml+=et.tostring(elt)
newxml+="</groceries>"发布于 2011-11-02 22:22:54
使用xmlElement的remove方法:
tree=et.fromstring(xml)
for bad in tree.xpath("//fruit[@state=\'rotten\']"):
bad.getparent().remove(bad) # here I grab the parent of the element to call the remove directly on it
print et.tostring(tree, pretty_print=True, xml_declaration=True)如果我必须与@Acorn版本进行比较,那么即使要删除的元素不直接位于xml的根节点下,我的版本也可以正常工作。
发布于 2018-12-02 00:33:23
如前所述,您可以使用remove()方法从树中删除(子)元素:
for bad in tree.xpath("//fruit[@state=\'rotten\']"):
bad.getparent().remove(bad)但是它删除了元素,包括它的tail,如果您正在处理混合内容的文档,这是一个问题:
<div><fruit state="rotten">avocado</fruit> Hello!</div>变成了
<div></div>我想这是你并不总是想要的:)我已经创建了helper函数来只删除元素并保留它的尾部:
def remove_element(el):
parent = el.getparent()
if el.tail.strip():
prev = el.getprevious()
if prev:
prev.tail = (prev.tail or '') + el.tail
else:
parent.text = (parent.text or '') + el.tail
parent.remove(el)
for bad in tree.xpath("//fruit[@state=\'rotten\']"):
remove_element(bad)这样,它将保留尾部文本:
<div> Hello!</div>发布于 2019-11-23 17:25:54
您也可以使用lxml中的html来解决这个问题:
from lxml import html
xml="""
<groceries>
<fruit state="rotten">apple</fruit>
<fruit state="fresh">pear</fruit>
<fruit state="fresh">starfruit</fruit>
<fruit state="rotten">mango</fruit>
<fruit state="fresh">peach</fruit>
</groceries>
"""
tree = html.fromstring(xml)
print("//BEFORE")
print(html.tostring(tree, pretty_print=True).decode("utf-8"))
for i in tree.xpath("//fruit[@state='rotten']"):
i.drop_tree()
print("//AFTER")
print(html.tostring(tree, pretty_print=True).decode("utf-8"))它应该输出以下内容:
//BEFORE
<groceries>
<fruit state="rotten">apple</fruit>
<fruit state="fresh">pear</fruit>
<fruit state="fresh">starfruit</fruit>
<fruit state="rotten">mango</fruit>
<fruit state="fresh">peach</fruit>
</groceries>
//AFTER
<groceries>
<fruit state="fresh">pear</fruit>
<fruit state="fresh">starfruit</fruit>
<fruit state="fresh">peach</fruit>
</groceries>https://stackoverflow.com/questions/7981840
复制相似问题