文章/答案/技术大牛

发布

社区首页 >问答首页 >在python 2中使用漂亮的汤

问在python 2中使用漂亮的汤
EN

Stack Overflow用户

提问于 2016-02-03 12:52:42

回答 1查看 372关注 0票数 1

我正在尝试用python2.7中的漂亮汤构建一个基本的web爬虫。这是我的代码：

import re
import httplib
import urllib2
from urlparse import urlparse
from bs4 import BeautifulSoup

regex = re.compile(
        r'^(?:http|https)s?://' # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
        r'localhost|' #localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?' # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

def isValidUrl(url):
    if regex.match(url) is not None:
        return True;
    return False

def crawler(SeedUrl):
    tocrawl=[SeedUrl]
    crawled=[]
    while tocrawl:
        page=tocrawl.pop()
        print 'Crawled:'+page
        pagesource=urllib2.urlopen(page)
        s=pagesource.read()
        soup=BeautifulSoup.BeautifulSoup(s)
        links=soup.findAll('a',href=True)        
        if page not in crawled:
            for l in links:
                if isValidUrl(l['href']):
                    tocrawl.append(l['href'])
            crawled.append(page)   
    return crawled

crawler('https://www.google.co.in/?gfe_rd=cr&ei=SfWxVs65JK_v8we9zrj4AQ&gws_rd=ssl')

我发现了一个错误：

爬行：rd=ssl跟踪(最近一次调用)：文件"web_crawler_python_2.py"，第38行，在爬虫(‘rd=ssl’)文件“web_crawler_python_2.py”中，第29行，在爬虫soup=BeautifulSoup.BeautifulSoup(s) AttributeError中:输入object 'BeautifulSoup‘没有属性'BeautifulSoup’

我试了很多次，但似乎无法调试它。谁能告诉我这个问题。(顺便提一句，我知道很多网站都不允许爬行，但我只是为了学习。)

谢谢，任何帮助都将不胜感激。

代码使用的源：简易网络爬虫

python

beautifulsoup

web-crawler

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-02-03 12:55:15

这个类没有属性BeautifulSoup。我不知道你为什么用它。来自文档的示例

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

你需要更换：

BeautifulSoup.BeautifulSoup

至

BeautifulSoup

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/35177635

复制

相似问题

问在python 2中使用漂亮的汤
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在python 2中使用漂亮的汤EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在python 2中使用漂亮的汤
EN