首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >使用panda解析xml

使用panda解析xml
EN

Stack Overflow用户
提问于 2018-07-24 13:18:27
回答 1查看 48关注 0票数 0

尝试解析xml,然后将其作为Pandas数据帧发送

代码语言:javascript
运行
复制
<?xml version="1.0"?><results>
<header>
  <cloc_url>github.com/AlDanial/cloc</cloc_url>
  <cloc_version>1.74</cloc_version>
  <elapsed_seconds>0.940369129180908</elapsed_seconds>
  <n_files>124</n_files>
  <n_lines>8440</n_lines>
  <files_per_second>131.863112209998</files_per_second>
  <lines_per_second>8975.19892784178</lines_per_second>
  <report_file>/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem_cloc.xml</report_file>
</header>
<files>
  <file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-parent/pom.xml" blank="13" comment="23" code="491"  language="Maven" />
  <file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-core/aem-core-bundle/src/test/resources/assets.json" blank="0" comment="0" code="357"  language="JSON" />
  <file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-core/aem-core-bundle/src/main/java/com/chute/aem/core/api/impl/UserServiceImpl.java" blank="26" comment="21" code="202"  language="Java" />

输出类似以下内容

代码语言:javascript
运行
复制
file name                                 blank  comment language code
Repo/ignite-chute-aem/aem-parent/pom.xml"  "13"   "23"     Maven   491
<fullpath>/assets.json"                     "12"   "3"      c       432

我只能写几句台词

代码语言:javascript
运行
复制
import pandas as pd
from xml.etree import ElementTree
tree = ElementTree.parse('/Users/hariomsingh/Desktop/individualxml/ignite-chute-aem_cloc.xml')
root = tree.getroot()

print(root)
print(tree.iter())

csv_data = []
fields =  ['file name','blank','comment', 'language', 'code']
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-07-24 14:14:56

假设您可以安装beautifulsoup4 (即,pip3 install beautifulsoup4)和pandas (即,pip3 install pandas),那么下面的代码应该可以完成以下工作:

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup as Soup
import pandas

xml = """
<?xml version="1.0"?><results>
<header>
  <cloc_url>github.com/AlDanial/cloc</cloc_url>
  <cloc_version>1.74</cloc_version>
  <elapsed_seconds>0.940369129180908</elapsed_seconds>
  <n_files>124</n_files>
  <n_lines>8440</n_lines>
  <files_per_second>131.863112209998</files_per_second>
  <lines_per_second>8975.19892784178</lines_per_second>
  <report_file>/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem_cloc.xml</report_file>
</header>
<files>
  <file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-parent/pom.xml" blank="13" comment="23" code="491"  language="Maven" />
  <file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-core/aem-core-bundle/src/test/resources/assets.json" blank="0" comment="0" code="357"  language="JSON" />
  <file name="/Users/hariomsingh/Desktop/ignitechute/Repo/ignite-chute-aem/aem-core/aem-core-bundle/src/main/java/com/chute/aem/core/api/impl/UserServiceImpl.java" blank="26" comment="21" code="202"  language="Java" />
"""

soup = Soup(xml, 'lxml')

records = []

for file in soup.findAll('file'):
    records.append(file.attrs)

data_table = pandas.DataFrame(records)

# this prints the table without the long file name to ease seeing all other fields
print(data_table.drop('name', axis=1))

# this prints just the names (or at least the bit that pandas prints by default)
print(data_table['name'])

# saving them to disk so you can see the entire table in excel or similar
data_table.to_csv('output.csv', index=False)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51490944

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档