问python发出字节类型序列化
EN

Stack Overflow用户

提问于 2020-02-26 18:20:31

回答 1查看 149关注 0票数 2

我正在遵循一个教程从一个静态网站构建一个简单的webscraper，但是我得到了以下TypeError: TypeError(TypeError类型的f‘Object’TypeError:类型为字节的对象不是JSON序列化的

到目前为止，我的代码如下:从bs4导入BeautifulSoup导入请求导入json

url = 'http://ethans_fake_twitter_site.surge.sh/'
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
tweetArr = []

for tweet in content.findAll('div', attrs = {'class': 'tweetcontainer'}):
    tweetObject = {
        "author": tweet.find('h2', attrs= {'class': 'author'}).text.encode('utf-8'),
        "date": tweet.find('h5', attrs= {'class': 'dateTime'}).text.encode('utf-8'),
        "content": tweet.find('p', attrs= {'class': 'content'}).text.encode('utf-8'),
        "likes": tweet.find('p', attrs= {'class': 'likes'}).text.encode('utf-8'),
        "shares": tweet.find('p', attrs= {'class': 'shares'}).text.encode('utf-8')
    }
    tweetArr.append(tweetObject)
with open('twitterData.json', 'w') as outfile:
    json.dump(tweetArr, outfile)

我唯一可以假设的错误是，本文使用的是早期版本的python，但这篇文章是最近的，所以不应该是这样的。正在执行代码并创建json文件，但其中唯一的数据是"author:“。如果答案对你们中的一些人来说是显而易见的，我很抱歉，但我才刚刚开始学习。

以下是整个错误日志：(教程-env) C:\Users\afaal\Desktop\python\webscraper>python webscraper.py跟踪(最近一次调用)：文件"webscraper.py"，第20行，在json.dump(tweetArr，outfile)文件"C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json__init__.py"，第179行中，在可迭代文件"C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py"，中的块转储中第429行，在来自"C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py"，(o，_current_indent_level)文件的_iterencode_list(o，_current_indent_level)第325行的_iterencode产量中，在来自块文件的_iterencode_list产量中，在第405行中，在代码块文件"C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py"，第438行的_iterencode_dict产量中，在_iterencode o= _default(o) File _iterencode_dict第179行中，默认情况下引发TypeError(f'Object of type {o.class.name}‘TypeError: Object of type {o.class.name}’TypeError:Object of type字节)不是JSON可序列化的

python

web-scraping

回答 1

Stack Overflow用户

发布于 2020-02-27 22:40:05

好的，所以我需要删除".text“之后的所有内容，也只需要谷歌"Json序列化”(我只是尝试搜索我的特定TypeError，没有得到任何结论性的信息)。如果像我这样的业余爱好者有同样的问题，正确的代码如下：

url = 'http://ethans_fake_twitter_site.surge.sh/'
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
tweetArr = []

for tweet in content.findAll('div', attrs = {'class': 'tweetcontainer'}):
    tweetObject = {
        "author": tweet.find('h2', attrs= {'class': 'author'}).text,
        "date": tweet.find('h5', attrs= {'class': 'dateTime'}).text,
        "content": tweet.find('p', attrs= {'class': 'content'}).text,
        "likes": tweet.find('p', attrs= {'class': 'likes'}).text,
        "shares": tweet.find('p', attrs= {'class': 'shares'}).text
    }
    tweetArr.append(tweetObject)
with open('twitterData.json', 'w') as outfile:
    json.dump(tweetArr, outfile)

所有的功劳都归功于@juanpa.arrivillaga，非常感谢你彻底清除了这一切！

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60420188

复制

相似问题

问python发出字节类型序列化
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问python发出字节类型序列化EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问python发出字节类型序列化
EN