当运行以下代码python代码时,我得到以下代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import datetime
import random
import pymysql
import re
conn = pymysql.connect(host='127.0.0.1', user='root', passwd ='mypass',
db='mysql',charset='utf8')
cur = conn.cursor()
cur.execute('USE scraping')
random.seed(datetime.datetime.now())
def store(title, content):
cur.execute("DROP TABLE IF EXISTS pages")
sql = """CREATE TABLE pages (id BIGINT(7) NOT NULL AUTO_INCREMENT, title VARCHAR(200)
, content VARCHAR(10000), created TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY(id))"""
cur.execute(sql)
cur.execute("""INSERT INTO pages (title, content) VALUES ("%s", "%s")""", (title, content))
cur.connection.commit()
def getLinks(articleUrl):
html = urlopen('http://en.wikipedia.org'+articleUrl)
bs = BeautifulSoup(html, 'html.parser')
title = bs.find('h1').get_text()
content = bs.find('div', {'id':'mw-content-text'}).find('p').get_text
store(title, content)
#return bs.find('div',{'id':'bodyContent'}).findAll('a',href=re.compile('^(/wiki/)((?!:).)*$'))
links = getLinks('/wiki/Kevin_Bacon')
#print(links)
我得到以下错误消息:
AttributeError: 'function' object has no attribute 'translate'
据我所知,失败点似乎在代码中的这一点:
cur.execute("""INSERT INTO pages (title, content) VALUES ("%s", "%s")""", (title, content))
我试着通过查看以下内容来解决这个问题:
- File "C:\Users\mypath\PycharmProjects\Scraper\venv\lib\site-packages\pymysql\converters.py", line 118, in escape_unicode
return u"'%s'" % _escape_unicode(value)
- File "C:\Users\mypath\PycharmProjects\Scraper\venv\lib\site-packages\pymysql\converters.py", line 73, in _escape_unicode
return value.translate(_escape_table)
有什么可能导致这个问题的想法吗?
发布于 2018-06-07 07:08:56
您忘记在get_text
函数调用中添加括号,应该是:
content = bs.find('div', {'id':'mw-content-text'}).find('p').get_text()
https://stackoverflow.com/questions/50730264
复制相似问题