blocks|key|2171098|text|您可以尝试这样做:制作一个生成器，它将从存储在字典中的XML文件中生成值。XML文件中缺少的值将在此字典中存储为'Blank‘：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2171099|from+bs4+import+BeautifulSoup

data+=+"""<CI_INFO>
++<CI_JOURNAL>
++++<CI_AUTHOR>CAMPBELL+D</CI_AUTHOR>
++++<CI_VOLUME>0079</CI_VOLUME>
++++<CI_PAGE>00034</CI_PAGE>
++++<CI_YEAR>2013</CI_YEAR>
++++<CI_TITLE>+<![CDATA[+ALASKA+MAGAZINE+FEB+]]></CI_TITLE>
++</CI_JOURNAL>

++<CI_JOURNAL>
++++<CI_AUTHOR>BURKE+CH</CI_AUTHOR>
++++<CI_YEAR>1961</CI_YEAR>
++++<CI_TITLE>+<![CDATA[+DOCTOR+HAP+]]>+</CI_TITLE>
++</CI_JOURNAL>


++<CI_JOURNAL>
++++<CI_YEAR>1905</CI_YEAR>
++++<CI_TITLE>+<![CDATA[+REPORT+GOVERNOR+ALAS+]]></CI_TITLE>
++</CI_JOURNAL>
</CI_INFO>"""


def+parse_data(soup):
++++_text+=+lambda+soup,+name:+soup.find(name).text.strip()+if+soup.find(name)+else+'Blank'
++++for+j+in+soup.select('CI_JOURNAL'):
++++++++d+=+{}
++++++++d['author']+=+_text(j,+'CI_AUTHOR')
++++++++d['vol']+=+_text(j,+'CI_VOLUME')
++++++++d['page']+=+_text(j,+'CI_PAGE')
++++++++d['year']+=+_text(j,+'CI_YEAR')
++++++++d['title']+=+_text(j,+'CI_TITLE')
++++++++yield+d

for+info+in+parse_data(BeautifulSoup(data,+'xml')):
++++print(info['author'])
++++print(info['vol'])
++++print(info['page'])
++++print(info['year'])
++++print(info['title'])
++++print('-'+*+80)|code-block|syntax|javascript|2171100|这将打印以下内容：|2171101|CAMPBELL+D
0079
00034
2013
ALASKA+MAGAZINE+FEB
--------------------------------------------------------------------------------
BURKE+CH
Blank
Blank
1961
DOCTOR+HAP
--------------------------------------------------------------------------------
Blank
Blank
Blank
1905
REPORT+GOVERNOR+ALAS
--------------------------------------------------------------------------------|2171102|编辑：|2171103|如果您想要分隔列，可以这样做：|2171104|author,+vol,+page,+year,+title+=+[],+[],+[],+[],+[]
for+d+in+parse_data(BeautifulSoup(data,+'xml')):
++++author.append(d['author'])
++++vol.append(d['vol'])
++++page.append(d['page'])
++++year.append(d['year'])
++++title.append(d['title'])

print(author)
print(vol)
print(page)
print(year)
print(title)|2171105|这将打印：|2171106|['CAMPBELL+D',+'BURKE+CH',+'Blank']
['0079',+'Blank',+'Blank']
['00034',+'Blank',+'Blank']
['2013',+'1961',+'1905']
['ALASKA+MAGAZINE+FEB',+'DOCTOR+HAP',+'REPORT+GOVERNOR+ALAS']|2171107|2171108|要使用'\t'打印，可以使用以下代码：|offset|length|style|CODE|2171109|print('>\t'+%2B+str(author))
print('\t'+%2B+str(vol))
print('\t'+%2B+str(page))
print('\t'+%2B+str(year))
print('\t'+%2B+str(title))|2171110|2171111|>+++['CAMPBELL+D',+'BURKE+CH',+'Blank']
++++['0079',+'Blank',+'Blank']
++++['00034',+'Blank',+'Blank']
++++['2013',+'1961',+'1905']
++++['ALASKA+MAGAZINE+FEB',+'DOCTOR+HAP',+'REPORT+GOVERNOR+ALAS']|2171112|entityMap^0|0|0|0|0|0|0|0|0|0|0|3|4|0|0|0|0^^$0|@$1|2|3|4|5|6|7|18|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|19|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|1A|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|1B|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|1C|8|@]|9|@]|A|$]]|$1|M|3|N|5|6|7|1D|8|@]|9|@]|A|$]]|$1|O|3|P|5|D|7|1E|8|@]|9|@]|A|$E|F]]|$1|Q|3|R|5|6|7|1F|8|@]|9|@]|A|$]]|$1|S|3|T|5|D|7|1G|8|@]|9|@]|A|$E|F]]|$1|U|3|L|5|6|7|1H|8|@]|9|@]|A|$]]|$1|V|3|W|5|6|7|1I|8|@$X|1J|Y|1K|Z|10]]|9|@]|A|$]]|$1|11|3|12|5|D|7|1L|8|@]|9|@]|A|$E|F]]|$1|13|3|H|5|6|7|1M|8|@]|9|@]|A|$]]|$1|14|3|15|5|D|7|1N|8|@]|9|@]|A|$E|F]]|$1|16|3|-4|5|6|7|1O|8|@]|9|@]|A|$]]]|17|$]]

You can try something like this: make a generator, that will yield values from XML file stored in dictionary. The missing values from XML file will be stored as 'Blank' in this dictionary:

<pre><code>from bs4 import BeautifulSoup

data = """&lt;CI_INFO&gt;
 &lt;CI_JOURNAL&gt;
 &lt;CI_AUTHOR&gt;CAMPBELL D&lt;/CI_AUTHOR&gt;
 &lt;CI_VOLUME&gt;0079&lt;/CI_VOLUME&gt;
 &lt;CI_PAGE&gt;00034&lt;/CI_PAGE&gt;
 &lt;CI_YEAR&gt;2013&lt;/CI_YEAR&gt;
 &lt;CI_TITLE&gt; &lt;![CDATA[ ALASKA MAGAZINE FEB ]]&gt;&lt;/CI_TITLE&gt;
 &lt;/CI_JOURNAL&gt;

 &lt;CI_JOURNAL&gt;
 &lt;CI_AUTHOR&gt;BURKE CH&lt;/CI_AUTHOR&gt;
 &lt;CI_YEAR&gt;1961&lt;/CI_YEAR&gt;
 &lt;CI_TITLE&gt; &lt;![CDATA[ DOCTOR HAP ]]&gt; &lt;/CI_TITLE&gt;
 &lt;/CI_JOURNAL&gt;


 &lt;CI_JOURNAL&gt;
 &lt;CI_YEAR&gt;1905&lt;/CI_YEAR&gt;
 &lt;CI_TITLE&gt; &lt;![CDATA[ REPORT GOVERNOR ALAS ]]&gt;&lt;/CI_TITLE&gt;
 &lt;/CI_JOURNAL&gt;
&lt;/CI_INFO&gt;"""


def parse_data(soup):
 _text = lambda soup, name: soup.find(name).text.strip() if soup.find(name) else 'Blank'
 for j in soup.select('CI_JOURNAL'):
 d = {}
 d['author'] = _text(j, 'CI_AUTHOR')
 d['vol'] = _text(j, 'CI_VOLUME')
 d['page'] = _text(j, 'CI_PAGE')
 d['year'] = _text(j, 'CI_YEAR')
 d['title'] = _text(j, 'CI_TITLE')
 yield d

for info in parse_data(BeautifulSoup(data, 'xml')):
 print(info['author'])
 print(info['vol'])
 print(info['page'])
 print(info['year'])
 print(info['title'])
 print('-' * 80)
</code></pre>

This will print:

<pre><code>CAMPBELL D
0079
00034
2013
ALASKA MAGAZINE FEB
--------------------------------------------------------------------------------
BURKE CH
Blank
Blank
1961
DOCTOR HAP
--------------------------------------------------------------------------------
Blank
Blank
Blank
1905
REPORT GOVERNOR ALAS
--------------------------------------------------------------------------------
</code></pre>

EDIT:

If you want separated columns, you can do this:

<pre><code>author, vol, page, year, title = [], [], [], [], []
for d in parse_data(BeautifulSoup(data, 'xml')):
 author.append(d['author'])
 vol.append(d['vol'])
 page.append(d['page'])
 year.append(d['year'])
 title.append(d['title'])

print(author)
print(vol)
print(page)
print(year)
print(title)
</code></pre>

This prints:

<pre><code>['CAMPBELL D', 'BURKE CH', 'Blank']
['0079', 'Blank', 'Blank']
['00034', 'Blank', 'Blank']
['2013', '1961', '1905']
['ALASKA MAGAZINE FEB', 'DOCTOR HAP', 'REPORT GOVERNOR ALAS']
</code></pre>

EDIT:

For printing with <code>'\t'</code>, you can use this code:

<pre><code>print('&gt;\t' + str(author))
print('\t' + str(vol))
print('\t' + str(page))
print('\t' + str(year))
print('\t' + str(title))
</code></pre>

This will print:

<pre><code>&gt; ['CAMPBELL D', 'BURKE CH', 'Blank']
 ['0079', 'Blank', 'Blank']
 ['00034', 'Blank', 'Blank']
 ['2013', '1961', '1905']
 ['ALASKA MAGAZINE FEB', 'DOCTOR HAP', 'REPORT GOVERNOR ALAS']
</code></pre>

Ignore the below Texts Paragraph

XML code, a formal recommendation from the World Wide Web Consortium (W3C), is similar to Hypertext Markup Language (HTML). Both XML and HTML contain markup symbols to describe page or file contents. HTML code describes Web page content (mainly text and graphic images) only in terms of how it is to be displayed and interacted with.

XML data is known as self-describing or self-defining, meaning that the structure of the data is embedded with the data, thus when the data arrives there is no need to pre-build the structure to store the data; it is dynamically understood within the XML. The XML format can be used by any individual or group of individuals or companies that want to share information in a consistent way. XML is actually a simpler and easier-to-use subset of the Standard Generalized Markup Language (SGML), which is the standard to create a document structure.

So, for I used the below code to extract all 5 the fields.

<pre><code>import requests
from bs4 import BeautifulSoup
import lxml

soup = BeautifulSoup(contents,'lxml') 

a=[v.get_text() for v in soup.select('cia')]
v=[v.get_text() for v in soup.select('civ')]
p=[v.get_text() for v in soup.select('cip')]
y=[v.get_text() for v in soup.select('ciy')]
t=[v.get_text() for v in soup.select('cit')]
print (a)
print (v)
print (p)
print (y)
print (t)
</code></pre>

Python: if the XML tag doesn't exist, I need to print 'Blank' along with Output

忽略下面的文本段落XML code是万维网联盟(W3C)的正式推荐标准，类似于超文本标记语言(HTML)。XML和HTML都包含用于描述页面或文件内容的标记符号。HTML代码仅根据网页的显示和交互方式来描述网页内容(主要是文本和图形图像)。XML数据被称为自描述或自定义，这意味着数据的结构嵌入到数据中，因此当数据到达时...

问Python:如果XML标记不存在，我需要输出'Blank‘
EN

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python:如果XML标记不存在，我需要输出'Blank‘EN

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python:如果XML标记不存在，我需要输出'Blank‘
EN