问如何延迟Python请求库以允许填充数据
EN

Stack Overflow用户

提问于 2018-06-03 05:08:26

回答 2查看 2.2K关注 0票数 0

我正在尝试从使用.aspx的网页中获取数据。我能够获得除一个值之外的所有数据，因为在HTML加载后似乎需要几分钟时间才能加载。

我的代码目前看起来像这样：

import requests # Requesting HTML
import bs4 as bs # Parsing HTML
url_two = "https://www.walottery.com/Scratch/Explorer.aspx?id=1463"   
r_two = requests.get(url_two)
soup = bs.BeautifulSoup(r_two.text, "lxml")
print(soup.find("strong", {"class": "ticket-explorer-detail-info-printed"}))

但是，当我打印时，值是<strong class="ticket-explorer-detail-info-printed">N/A</strong>。

如果你在网页上“检查元素”，你可以看到数据从我在上面粘贴的内容变成了这个：<strong class="ticket-explorer-detail-info-printed">2,428,400</strong>。

我怎样才能造成轻微的延迟，使我的请求库允许我获得计算值，而不是"N/A"？

python

beautifulsoup

python-requests

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-06-03 05:37:21

该网页是从嵌入在HTML中的脚本元素中的JSON动态生成的。您可以提取JSON并对其进行解析，以获得所需的数据，或者使用Selenium在页面上呈现JavaScript。要提取JSON，请执行以下操作：

import requests
import json
from bs4 import BeautifulSoup

url = 'https://www.walottery.com/Scratch/Explorer.aspx?id=1463'
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")
# Find the script element contaning th JSON the web-page is dynamically generated from.
anchor = "WaLottery.Scratch.data = "
s = soup.find(lambda tag:tag.name=="script" and anchor in tag.text)
# Extract the JSON.
j = s.text[s.text.find("parse")+7:s.text.find("'),")]
# Load the JSON.
d = json.loads(j)
# Read the data from the JSON.
for game in d['Games']:
    print ( game['Id'], game['TicketsPrinted'])

输出：

1503 3,232,300
1497 2,427,400
1496 2,585,600
1493 3,467,000
1491 2,169,350
1490 2,194,350
1489 3,862,600
1488 4,832,950
1486 1,801,975
1483 2,422,200
1482 2,410,200
1481 2,450,400
1480 1,802,100
1479 1,320,300
1478 1,822,000
1476 5,236,000
1475 3,496,200
1474 3,155,000
1473 2,127,300
1472 1,112,265
1470 2,350,250
1469 3,120,050
1468 955,800
1467 2,161,550
1466 1,339,400
1465 556,000
1464 2,213,350
1463 2,428,400
1462 2,419,600
1461 2,434,600
1460 2,591,900
1459 3,887,000
1458 3,468,500
1457 2,180,300
1456 2,110,100
1455 2,089,200
1454 543,235
1453 2,421,600
1452 2,418,200
1451 2,400,800
1450 3,127,050
1449 2,167,400
1448 2,379,950
1446 4,838,700
1445 1,233,550
1444 2,456,550
1442 1,770,425
1441 3,838,700
1440 13,647,500
1439 3,255,400
1433 2,859,400
1431 3,158,450
1422 3,332,500
1415 5,192,000
1410 1,836,575
1409 3,567,270
1405 2,409,500
1391 2,162,100
1379 2,467,725
1373 3,645,075

你看到的那个是：

 1463 2,428,400

票数 2

Stack Overflow用户

发布于 2018-06-03 05:35:44

您需要的所有数据都嵌入在HTML中的script标记中。使用BeautifulSoup读取脚本标记的内容并解析JSON。您可以在该JSON对象中找到售票。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50661184

复制

相似问题

问如何延迟Python请求库以允许填充数据
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何延迟Python请求库以允许填充数据EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何延迟Python请求库以允许填充数据
EN