我从here下载了源代码。我试着运行Toby Segaran的“编程集体智能”一书的第四章中的例子。我的python版本是2.7.2。我在解释器中输入这段代码:
import searchengine
pages=['http://en.wikipedia.org/wiki/Programming_language']
crawler = searchengine.crawler('searchindex.db')
crawler.crawl(pages)并获取消息:
Could not open http://en.wikipedia.org/wiki/Programming_language或者有时会收到消息:
Indexing http://en.wikipedia.org/wiki/Programming_language
Could not parse page http://en.wikipedia.org/wiki/Programming_language总而言之,爬虫不会为页面建立索引。我做错了什么?
发布于 2013-12-24 16:45:56
将def separateWords(self,text)大写W转换为小写,在gettextonly(self,soup)中,将v==Null转换为None。此外,您还必须执行后面的步骤,如
>> crawler=searchengine.crawler('searchindex.db') 
>> crawler.createindextables()
>> crawler=searchengine.crawler('searchindex.db') 首先,然后尝试运行page=['***']和其他步骤。
https://stackoverflow.com/questions/15730091
复制相似问题