首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >刮擦时API限制

刮擦时API限制
EN

Stack Overflow用户
提问于 2021-12-03 17:26:04
回答 1查看 65关注 0票数 0

我有一个网站,我正在刮通过使用下面的代码。网站上的信息来自API,来自10/2019-5/2020的范围。我遇到的问题是,当我运行程序并将信息导出到excel时,它只会给我3/2020的信息。我似乎在代码中找不到任何能阻止这一点的东西,所以我不确定它是否是API端的东西。

以下是完整的网址:http://www.nhl.com/stats/skaters?aggregate=0&reportType=game&seasonFrom=20192020&seasonTo=20192020&dateFromSeason&gameType=2&filter=gamesPlayed,gte,1&sort=points,goals,assists&page=0&pageSize=50

代码语言:javascript
运行
复制
url = 'https://api.nhle.com/stats/rest/en/skater/summary'

payload = {
    'isAggregate': 'false',
    'isGame': 'true',
    'start': 0,
    'limit': '50',
    'sort': '[{"property":"points","direction":"DESC"},{"property":"goals","direction":"DESC"},{"property":"assists","direction":"DESC"},{"property":"playerId","direction":"ASC"}]',
    'factCayenneExp': 'gamesPlayed>=0',
    'cayenneExp': 'seasonId<="20192020" and seasonId>="20192020" and gameTypeId=2',
}

for start in range(0, 100, 100):
    x = randint(3,10)

    sleep(x)

    print('loading:', start)

    payload['loading'] = start

    response = requests.get(url, params=payload)

    data = response.json()

    df1 = df1.append(data['data'], ignore_index=True)

print(df1)

df1.to_excel('NHL Player Game Logs.xlsx', sheet_name='2019-2020')
EN

回答 1

Stack Overflow用户

发布于 2021-12-10 09:50:31

一旦达到100,就会停止循环,所以它只会得到前50行。我会( a)将限制更改为100,因为这是允许的最大值,b)循环,直到它完成为止,和( c)有效载荷是为键"start",而不是"loading"的键(不在有效载荷中)。我也不认为你需要时间睡觉:

代码语言:javascript
运行
复制
import requests
import pandas as pd
from time import sleep
from random import randint

url = 'https://api.nhle.com/stats/rest/en/skater/summary'

payload = {
    'isAggregate': 'false',
    'isGame': 'true',
    'start': '0',
    'limit': '100',
    'sort': '[{"property":"points","direction":"DESC"},{"property":"goals","direction":"DESC"},{"property":"assists","direction":"DESC"},{"property":"playerId","direction":"ASC"}]',
    'factCayenneExp': 'gamesPlayed>=0',
    'cayenneExp': 'seasonId<="20192020" and seasonId>="20192020" and gameTypeId=2',
}

rows = []
start = 0
total = 10000
while True:
    #x = randint(3,10)
    #sleep(x)

    print('loading:', start)
    payload['start'] = start

    data = requests.get(url, params=payload).json()['data']

    rows += data
    start += 100
    
    if len(rows) == total:
        print('Got all 10,000 records.')
        break
    
df1 = pd.DataFrame(rows)
df1.to_excel('NHL Player Game Logs.xlsx', sheet_name='2019-2020')

输出:

代码语言:javascript
运行
复制
print(df1)
      assists  evGoals  evPoints  ...  skaterFullName teamAbbrev  timeOnIcePerGame
0           3        0         2  ...  Connor McDavid        EDM            1075.0
1           0        3         3  ...  Mika Zibanejad        NYR            1496.0
2           1        3         4  ...  Leon Draisaitl        EDM            1077.0
3           2        2         3  ...   Tony DeAngelo        NYR            1209.0
4           2        1         2  ...   Sebastian Aho        CAR             931.0
      ...      ...       ...  ...             ...        ...               ...
9995        1        0         0  ...      Torey Krug        BOS            1582.0
9996        1        0         0  ...      Torey Krug        BOS            1086.0
9997        1        0         1  ...      Torey Krug        BOS            1116.0
9998        1        0         0  ...      Torey Krug        BOS            1338.0
9999        1        0         0  ...      Torey Krug        BOS            1196.0

[10000 rows x 29 columns]
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70218386

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档