如何使用Python检索网页的链接并复制链接的url地址?
发布于 2009-07-03 18:53:56
下面是在BeautifulSoup中使用SoupStrainer类的一小段代码:
import httplib2
from bs4 import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')
for link in BeautifulSoup(response, parse_only=SoupStrainer('a')):
if link.has_attr('href'):
print(link['href'])
BeautifulSoup文档实际上相当不错,涵盖了许多典型场景:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
编辑:请注意,我使用了SoupStrainer类,因为如果事先知道要解析的内容,它的效率会更高一些(内存和速度)。
发布于 2009-07-03 18:37:54
import urllib2
import BeautifulSoup
request = urllib2.Request("http://www.gpsbasecamp.com/national-parks")
response = urllib2.urlopen(request)
soup = BeautifulSoup.BeautifulSoup(response)
for a in soup.findAll('a'):
if 'national-park' in a['href']:
print 'found a url with national-park in the link'
发布于 2014-02-07 22:17:08
下面的代码使用urllib2
和BeautifulSoup4
检索网页中所有可用的链接
import urllib2
from bs4 import BeautifulSoup
url = urllib2.urlopen("http://www.espncricinfo.com/").read()
soup = BeautifulSoup(url)
for line in soup.find_all('a'):
print(line.get('href'))
https://stackoverflow.com/questions/1080411
复制相似问题