blocks|key|1323735|text|考虑在xpath级别上运行<div>，然后分别解析子<p>和属性@n项。下面运行一个列表/字典理解，以返回所需项目的字典列表。此外，示例XML使用一个根标记和额外的</p>结束标记进行了修复：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1323736|from+lxml+import+etree

mystring='''\
<root>
++++<div+n="0001"+type="doc"+xml:id="_3168060002">
+++++++<p+xml:id="_3168060003">[car+1]+Séquence+préparatoire+pour+<p+xml:id="_3168060005">a)+la+définition+</p></p>
++++</div>
++++<div+n="0002"+type="doc"+xml:id="_3168060012">
+++++++<p+xml:id="_3168060003">[blue]+la+voiture+pour+<p+xml:id="_3168060005">a)+la+définition+</p></p>
++++</div>
</root>'''

parser+=+etree.XMLParser(
++++resolve_entities=False,+strip_cdata=False,+recover=True,+ns_clean=True
)
XML_tree+=+etree.fromstring(mystring,+parser=parser)

all_divs+=+XML_tree.xpath('.//div')
all_divs

div_dict+=+[
++++{'div':+div.find("p").text+if+div.find("p")+else+None,
+++++'n':+div.attrib["n"]}+
++++for+div+in+all_divs
]
++++
div_dict
#+[{'div':+'[car+1]+Séquence+préparatoire+pour+',+'n':+'0001'},
#++{'div':+'[blue]+la+voiture+pour+',+'n':+'0002'}]|code-block|syntax|javascript|1323737|entityMap^0|3|5|D|5|Q|3|W|2|2A|4|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]|$9|R|A|S|B|C]|$9|T|A|U|B|C]|$9|V|A|W|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|X|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Y|8|@]|D|@]|E|$]]]|L|$]]

<p>Consider running <code>xpath</code> on the <code>&lt;div&gt;</code> level, then parse child <code>&lt;p&gt;</code> and attribute <code>@n</code> items separately. Below runs a list/dictionary comprehension to return a list of dictionaries for needed items. Also, the example XML was fixed with a root and extra <code>&lt;/p&gt;</code> closing tag:</p>
<pre class="lang-py prettyprint-override"><code>from lxml import etree

mystring='''\
&lt;root&gt;
    &lt;div n=&quot;0001&quot; type=&quot;doc&quot; xml:id=&quot;_3168060002&quot;&gt;
       &lt;p xml:id=&quot;_3168060003&quot;&gt;[car 1] Séquence préparatoire pour &lt;p xml:id=&quot;_3168060005&quot;&gt;a) la définition &lt;/p&gt;&lt;/p&gt;
    &lt;/div&gt;
    &lt;div n=&quot;0002&quot; type=&quot;doc&quot; xml:id=&quot;_3168060012&quot;&gt;
       &lt;p xml:id=&quot;_3168060003&quot;&gt;[blue] la voiture pour &lt;p xml:id=&quot;_3168060005&quot;&gt;a) la définition &lt;/p&gt;&lt;/p&gt;
    &lt;/div&gt;
&lt;/root&gt;'''

parser = etree.XMLParser(
    resolve_entities=False, strip_cdata=False, recover=True, ns_clean=True
)
XML_tree = etree.fromstring(mystring, parser=parser)

all_divs = XML_tree.xpath('.//div')
all_divs

div_dict = [
    {'div': div.find(&quot;p&quot;).text if div.find(&quot;p&quot;) else None,
     'n': div.attrib[&quot;n&quot;]} 
    for div in all_divs
]
    
div_dict
# [{'div': '[car 1] Séquence préparatoire pour ', 'n': '0001'},
#  {'div': '[blue] la voiture pour ', 'n': '0002'}]
</code></pre>


blocks|key|1323783|text|一个简单的备选方案：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1323784|for+car+in+XML_tree.xpath('//div[@n]'):
++++print(car.xpath('@n')[0],car.xpath('normalize-space(.//p[@*[local-name()="xml:id"]]/text())'))|code-block|syntax|javascript|1323785|输出：|1323786|0001+[car+1]+Séquence+préparatoire+pour
0002+[blue]+la+voiture+pour|1323787|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

<p>A simple alternative:</p>
<pre><code>for car in XML_tree.xpath('//div[@n]'):
    print(car.xpath('@n')[0],car.xpath('normalize-space(.//p[@*[local-name()=&quot;xml:id&quot;]]/text())'))
</code></pre>
<p>Output:</p>
<pre><code>0001 [car 1] Séquence préparatoire pour
0002 [blue] la voiture pour
</code></pre>


<p>I am parsing an XML file with this shape:</p>
<pre><code>from lxml import etree
mystring='''&lt;div n=&quot;0001&quot; type=&quot;doc&quot; xml:id=&quot;_3168060002&quot;&gt;
&lt;p xml:id=&quot;_3168060003&quot;&gt;[car 1] Séquence préparatoire pour &lt;p xml:id=&quot;_3168060005&quot;&gt;a) la définition &lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div n=&quot;0002&quot; type=&quot;doc&quot; xml:id=&quot;_3168060012&quot;&gt;&lt;p xml:id=&quot;_3168060003&quot;&gt;[blue] la voiture pour &lt;p xml:id=&quot;_3168060005&quot;&gt;a) la définition &lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/div&gt;
</code></pre>
<p>I would like to catch whatever is inside a div follow by p tag BUT also the n attribute of div.
My parsing strategy is as follows:</p>
<pre><code>parser = etree.XMLParser(resolve_entities=False, strip_cdata=False, recover=True, ns_clean=True)
XML_tree = etree.fromstring(claims_PDM.encode() , parser=parser)
paragraphs = './/div[@n]/p[@xml:id]'
xml_query = paragraphs
all_paras = XML_tree.xpath(xml_query)
for para in all_paras:
    print(para.tag)
</code></pre>
<p>It works, but I dont know how to extract at the same time all what is inside the p tag and also the n attribute of div since the tag and atrributes of the element are the ones of p and not div.</p>
<p>Any Idea how can I access the attributes of the parent of an element?</p>
<p>Thanks.</p>


python accessing the attributes of tags parsing XML with xpath

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我正在用这个形状解析一个XML文件：from lxml import etreemystring='''<div n="0001" type="doc" xml:id="_3168060002"><p xml:id="_3168060003">[car 1] Séquence préparatoire pour <p ...

问python访问使用xpath解析XML的标记的属性
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问python访问使用xpath解析XML的标记的属性EN