首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >抓取API时获取下一页

抓取API时获取下一页
EN

Stack Overflow用户
提问于 2018-07-04 08:34:24
回答 2查看 115关注 0票数 0

我正在尝试使用分页,并在抓取完当前页面后转到下一页。这是我第一次抓取API,所以我有点迷路了,还没有在互联网上找到任何东西。

问:我需要做什么才能进入下一页?

接口名:https://games.crossfit.com/competitions/api/v1/competitions/open/2018/leaderboards?division=2®ion=0&scaled=0&sort=0&occupation=0&page=1

代码(我到目前为止拥有的代码):

代码语言:javascript
复制
import pandas as pd
import requests, re
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import json

url = 'https://games.crossfit.com/competitions/api/v1/competitions/open/2018/leaderboards?division=1&region=0&scaled=0&sort=0&occupation=0&page=1'

nameList = []
genderList = []
regionList = []
gymList = []
ageList = []
heightList = []
weightList = []
ordList = []
overallList = []
overallScoreList = []

response = requests.get(url)
data = response.text
parsed = json.loads(data)

year = parsed['competition']['year']
comp = parsed['competition']['competitionType']
year = parsed['competition']['year']
board = parsed['leaderboardRows']
for all in board:
    name = all['entrant']['competitorName']
    gender = all['entrant']['gender']
    region = all['entrant']['regionName']
    gym = all['entrant']['affiliateName']
    age = all['entrant']['age']
    overall = all['overallRank']
    overallS = all['overallScore']
    height = all['entrant']['height']
    weight = all['entrant']['weight']

    nameList.append(name)
    genderList.append(gender)
    regionList.append(region)
    gymList.append(gym)
    ageList.append(age)
    heightList.append(height)
    weightList.append(weight)
    overallList.append(overall)
    overallScoreList.append(overallS)
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-07-04 15:51:51

简单快捷的方法如下所示:

代码语言:javascript
复制
import requests

url = 'https://games.crossfit.com/competitions/api/v1/competitions/open/2018/leaderboards?division=1&region=0&scaled=0&sort=0&occupation=0&page={}'

for link in [url.format(page) for page in range(1,5)]:
    response = requests.get(link)
    for item in response.json()['leaderboardRows']:
        name = item['entrant']['competitorName']
        print(name)
票数 1
EN

Stack Overflow用户

发布于 2018-07-04 08:48:57

crossfit API在pagination部分提供了所有必要的信息。它会给你类似这样的东西:

代码语言:javascript
复制
"pagination":
    {
        "currentPage":1,
        "totalPages":3440,
        "totalCompetitors":171977
    },

要获取非1的页面,您需要更改url中的get参数:编写&page=2而不是&page=1。最好使用可以传递相关参数的函数来构建url,例如,url_for_page(20)将返回https://games.crossfit.com/competitions/api/v1/competitions/open/2018/leaderboards?division=2&region=0&scaled=0&sort=0&occupation=0&page=20

我希望这对你有帮助。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51164400

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档