我有一个网站,我正在刮通过使用下面的代码。网站上的信息来自API,来自10/2019-5/2020的范围。我遇到的问题是,当我运行程序并将信息导出到excel时,它只会给我3/2020的信息。我似乎在代码中找不到任何能阻止这一点的东西,所以我不确定它是否是API端的东西。
以下是完整的网址:http://www.nhl.com/stats/skaters?aggregate=0&reportType=game&seasonFrom=20192020&seasonTo=20192020&dateFromSeason&gameType=2&filter=gamesPlayed,gte,1&sort=points,goals,assists&page=0&pageSize=50
url = 'https://api.nhle.com/stats/rest/en/skater/summary'
payload = {
'isAggregate': 'false',
'isGame': 'true',
'start': 0,
'limit': '50',
'sort': '[{"property":"points","direction":"DESC"},{"property":"goals","direction":"DESC"},{"property":"assists","direction":"DESC"},{"property":"playerId","direction":"ASC"}]',
'factCayenneExp': 'gamesPlayed>=0',
'cayenneExp': 'seasonId<="20192020" and seasonId>="20192020" and gameTypeId=2',
}
for start in range(0, 100, 100):
x = randint(3,10)
sleep(x)
print('loading:', start)
payload['loading'] = start
response = requests.get(url, params=payload)
data = response.json()
df1 = df1.append(data['data'], ignore_index=True)
print(df1)
df1.to_excel('NHL Player Game Logs.xlsx', sheet_name='2019-2020')发布于 2021-12-10 09:50:31
一旦达到100,就会停止循环,所以它只会得到前50行。我会( a)将限制更改为100,因为这是允许的最大值,b)循环,直到它完成为止,和( c)有效载荷是为键"start",而不是"loading"的键(不在有效载荷中)。我也不认为你需要时间睡觉:
import requests
import pandas as pd
from time import sleep
from random import randint
url = 'https://api.nhle.com/stats/rest/en/skater/summary'
payload = {
'isAggregate': 'false',
'isGame': 'true',
'start': '0',
'limit': '100',
'sort': '[{"property":"points","direction":"DESC"},{"property":"goals","direction":"DESC"},{"property":"assists","direction":"DESC"},{"property":"playerId","direction":"ASC"}]',
'factCayenneExp': 'gamesPlayed>=0',
'cayenneExp': 'seasonId<="20192020" and seasonId>="20192020" and gameTypeId=2',
}
rows = []
start = 0
total = 10000
while True:
#x = randint(3,10)
#sleep(x)
print('loading:', start)
payload['start'] = start
data = requests.get(url, params=payload).json()['data']
rows += data
start += 100
if len(rows) == total:
print('Got all 10,000 records.')
break
df1 = pd.DataFrame(rows)
df1.to_excel('NHL Player Game Logs.xlsx', sheet_name='2019-2020')输出:
print(df1)
assists evGoals evPoints ... skaterFullName teamAbbrev timeOnIcePerGame
0 3 0 2 ... Connor McDavid EDM 1075.0
1 0 3 3 ... Mika Zibanejad NYR 1496.0
2 1 3 4 ... Leon Draisaitl EDM 1077.0
3 2 2 3 ... Tony DeAngelo NYR 1209.0
4 2 1 2 ... Sebastian Aho CAR 931.0
... ... ... ... ... ... ...
9995 1 0 0 ... Torey Krug BOS 1582.0
9996 1 0 0 ... Torey Krug BOS 1086.0
9997 1 0 1 ... Torey Krug BOS 1116.0
9998 1 0 0 ... Torey Krug BOS 1338.0
9999 1 0 0 ... Torey Krug BOS 1196.0
[10000 rows x 29 columns]https://stackoverflow.com/questions/70218386
复制相似问题