我如何提取代理费用,卧室和浴室的信息使用美丽的汤在蟒蛇。这是我正在刮的网页。
<ul class="important-fields">
<li class="">
<span> Agency Fees: </span>
<strong> AED 5000 </strong>
</li>
<li class="">
<span> Bedrooms: </span>
<strong> Studio </strong>
</li>
<li class="">
<span> Bathrooms: </span>
<strong> 1 </strong>
</li>
<li>
</ul>发布于 2014-02-02 16:01:57
>>> from bs4 import BeautifulSoup
>>>
>>> html = '''
... <ul class="important-fields">
... <li class="">
... <span> Agency Fees: </span>
... <strong> AED 5000 </strong>
... </li>
... <li class="">
... <span> Bedrooms: </span>
... <strong> Studio </strong>
... </li>
... <li class="">
... <span> Bathrooms: </span>
... <strong> 1 </strong>
... </li>
... </ul>
... '''
>>>
>>> soup = BeautifulSoup(html)
>>> spans = [x.text.strip() for x in soup.select('ul.important-fields li span')]
>>> strongs = [x.text.strip() for x in soup.select('ul.important-fields li strong')]
>>> spans
[u'Agency Fees:', u'Bedrooms:', u'Bathrooms:']
>>> strongs
[u'AED 5000', u'Studio', u'1']
>>> for name, value in zip(spans, strongs):
... print('{} {}'.format(name, value))
...
Agency Fees: AED 5000
Bedrooms: Studio
Bathrooms: 1发布于 2014-02-02 15:54:57
您可以使用python中的lxml库使用Xpath (http://www.w3schools.com/xpath/)从HTML中获取数据,并且可以在lxml教程(http://lxml.de/tutorial.html)中找到示例。
https://stackoverflow.com/questions/21512572
复制相似问题