文章/答案/技术大牛

发布

社区首页 >问答首页 >使用<script>变量从BeautifulSoup中获取数据

问使用<script>变量从BeautifulSoup中获取数据
EN

Stack Overflow用户

提问于 2021-09-22 12:39:45

回答 1查看 203关注 0票数 0

Url = https://letterboxd.com/film/dogville/

我想知道BeautifulSoup的电影名称和发行年。

import requests
from bs4 import BeautifulSoup
url = 'https://letterboxd.com/film/dogville/'
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
soup.find_all("script")[10]

输出：

<script>
    var filmData = { id: 51565, name: "Dogville", gwiId: 39220, releaseYear: "2003", posterURL: "/film/dogville/image-150/", path: "/film/dogville/" };



</script>

我设法得到了块，但我不知道如何得到name和releaseYear。我怎么弄到它们？

python

web-scraping

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-09-22 13:48:13

问题是bs4不是javascript解析器。您到达了它的边界，而不是需要smt else (一个javascript解析器)。一些较弱的解决方案可能会使用标准库中的json模块将字符串字典转换为python字典。

一旦得到包含js-代码的字符串，或者将其正则化以提取字典，就像string或其他方式一样。

在这里，相反的方向

...
script_text = str(soup.find(script, string=True).string) # or what ever

# here the template
script_text = '    var filmData = { id: 51565, name: "Dogville", gwiId: 39220, releaseYear: "2003", posterURL: "/film/dogville/image-150/", path: "/film/dogville/" };'

script_text = script_text.strip()[:-1]
# substring starting after the 1st {
script_text = script_text[script_text.find('{')+1:]

script_text=script_text.replace(':', '=')
# find index of the closing }
i_close = len(script_text) - script_text[::-1].find('}')
# 
script_text_d = 'dict(' + script_text[:i_close-1] + ')'
# evaluate the string
script_text_d = eval(script_text_d)

print(script_text_d)
print(script_text_d['name'])

输出

{'id': 51565, 'name': 'Dogville', 'gwiId': 39220, 'releaseYear': '2003', 'posterURL': '/film/dogville/image-150/', 'path': '/film/dogville/'}
Dogville

备注：

我选择通过built函数来构造字典构造函数，为了避免额外的工作
for json.loads，我想您需要将它放在{}格式中，但是您需要双引号，所有类似键的字符串
都使用javascript解析器h 212/code>f 213

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69284422

复制

相似问题

问使用<script>变量从BeautifulSoup中获取数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用<script>变量从BeautifulSoup中获取数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用<script>变量从BeautifulSoup中获取数据
EN