因此,我尝试编写一个简单的函数来清理文本并对其进行总结:
def getTextWaPo(url):
page = urllib2.urlopen(url).read().decode('utf8')
soup = BeautifulSoup(page, "lxml")
text = ' '.join(map(lambda p: p.text, soup.find_all('article')))
return text.encode('ascii', errors='replace').replace("?"," ")但是对于这段代码,我得到了以下错误:
File "Autosummarizer.py", line 12, in getTextWaPo
return text.encode('ascii', errors='replace').replace("?"," ")
TypeError: a bytes-like object is required, not 'str'
line 12 ==> text = getTextWaPo(articleURL)我该怎么办?
发布于 2018-12-11 18:37:36
您正在对第12行中的数据进行编码,您必须使用字节。as replace(b"?", b" ")
代码看起来就像
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup
def getTextWaPo(url):
page = urlopen(url).read().decode('utf8')
soup = BeautifulSoup(page, "lxml")
text = ' '.join(map(lambda p: p.text, soup.find_all('article')))
return text.encode('ascii', errors='replace').replace(b"?",b" ")
getTextWaPo("https://stackoverflow.com/")发布于 2018-12-11 18:31:33
您必须将最后一行return text.encode('ascii', errors='replace').replace("?"," ")更改为return text.encode('ascii', errors='replace').replace(b"?", b" "),因为在encode()之后,您要在bytes上操作,并且必须用其他字节替换字节。
https://stackoverflow.com/questions/53728305
复制相似问题