首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >使用漂亮汤和输出错误进行JSONdata解析

使用漂亮汤和输出错误进行JSONdata解析
EN

Stack Overflow用户
提问于 2019-06-10 07:25:45
回答 1查看 75关注 0票数 0

当我运行以下代码时,会产生以下错误:

代码语言:javascript
复制
import requests
import json
from bs4 import BeautifulSoup

JSONDATA = requests.request("GET", "https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1")
JSONDATA = JSONDATA.json()

for line in JSONDATA['posts']:
    soup = BeautifulSoup(line['episodeNumber'])
    soup = BeautifulSoup(line['title'])
    soup = BeautifulSoup(line['audioSource'])
    soup = BeautifulSoup(line['large'])
    soup = BeautifulSoup(line['long'])
    print soup.prettify()

产生了以下错误(我已经尝试了它所建议的re LXML的各种变体):

  • LXML issue
  • 关于不喜欢.mp3链接的问题但这不应该是问题,因为此链接是查找‘大’缩略图的问题,但使用标题、audioSource等的等效字段不会产生相同的错误,但查看网站数据它是正确的框吗?

输出错误

代码语言:javascript
复制
python ./test2.py
./test2.py:14: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 14 of the file ./test2.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup("features=lxml")(line['episodeNumber'])
./test2.py:16: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 16 of the file     ./test2.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup(line['title'])
./test2.py:18: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 18 of the file ./test2.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup(line['audioSource'])

/home/leo/.local/lib/python2.7/site-packages/bs4/init.py:335:

代码语言:javascript
复制
 UserWarning:

"https://dts.podtrac.com/redirect.mp3/dovetail.prxu.org/criminal/85cd4e4d-fa8b-4df2-8a8c-78ad0e800574/Episode_116_190504_audition_mix_neg18_part_1.mp3“看起来像一个网址。Beautiful不是一个HTTP客户端。您可能应该使用类似于HTTP客户端的请求来获取URL后面的文档,并将该文档提供给Beautiful Soup。‘该文档指向美丽汤。‘%test2.py回溯(最近一次调用):文件“./ decoded_markup 2.py”,第20行,in soup =BeautifulSoup(行’decoded_markup‘) KeyError:'large’

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-06-10 07:45:44

如果你只想获取json中的数据,这是可行的。

代码语言:javascript
复制
import pandas as pd

import requests
import json
from bs4 import BeautifulSoup

JSONDATA = requests.request("GET", "https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1")
JSONDATA = JSONDATA.json()

#loads the Json in a dataframe
df = pd.io.json.json_normalize(JSONDATA['posts'])
df.to_csv('posts.csv')

这是因为BeautifulSoup需要一个BeautifulSoup解析器来创建一个soup对象。如果没有lxml,可以使用lxml。

代码语言:javascript
复制
pip install lxml

第二个警告是关于你传递一个url来创建soup对象,这个对象不起作用,因为正如警告所说的,它不知道如何下载链接。

最后,您的最后一个错误是由于链接json没有名为'large‘的键

在那里你需要一个异常块。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56519087

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档