寻求帮助!我发现了一些类似于我自己的问题的代码。从一个高水平,我希望从同一个网页刮多个表(例如“每场比赛”和“总计”)。
不确定这是否重要,但我在此活动中使用了JupyterLab。我用Python编写的知识非常有限(但我正在努力学习!)所以我很难从这两个网站中得到我想要的:
https://www.sports-reference.com/cbb/players/jaden-ivey-1.html
或
https://basketball.realgm.com/player/Jaden-Ivey/Summary/148740
本质上,下面的代码适用于fbref网页,但是当我用上面两个站点中的任何一个替换这个源链接时,我不知道如何获得我想要的内容。
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://fbref.com/en/comps/9/stats/Premier-League-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = BeautifulSoup(soup.select_one('#all_stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')
#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))
我知道在堆栈溢出方面也有类似的问题,所以如果这被认为是一个重复的请求,我会进行分析,但是我需要进一步的帮助,因为我是新手。
谢谢你,蒂姆
发布于 2022-06-21 18:45:43
你可以方便地利用熊猫来提取这些表格的数据。
import pandas as pd
df =pd.read_html('https://www.sports-reference.com/cbb/players/jaden-ivey-1.html')[0:5]
print(df)
输出:
[ Season School Conf G GS MP FG ... STL BLK TOV PF PTS Unnamed: 27
SOS
0 2020-21 Purdue Big Ten 23 12 24.2 3.9 ... 0.7 0.7 1.3 1.7 11.1 NaN 11.23
1 2021-22 Purdue Big Ten 36 34 31.4 5.6 ... 0.9 0.6 2.6 1.8 17.3 NaN 8.23
2 Career Purdue NaN 59 46 28.6 4.9 ... 0.8 0.6 2.1 1.7 14.9 NaN 9.73
[3 rows x 29 columns], Season School Conf G GS MP FG FGA ... DRB TRB AST STL BLK TOV PF PTS
0 2020-21 Purdue Big Ten 19 10 23.3 3.5 9.2 ... 2.7 3.6 2.1 0.8 0.7 1.4 1.6
10.3
1 2021-22 Purdue Big Ten 19 17 32.6 5.5 12.8 ... 3.3 4.2 2.9 0.9 0.5 2.5 1.9
17.5
2 Career Purdue NaN 38 27 27.9 4.5 11.0 ... 3.0 3.9 2.5 0.9 0.6 1.9 1.8
13.9
[3 rows x 27 columns], Season School Conf G GS MP FG FGA ... DRB TRB AST STL BLK TOV PF PTS
0 2020-21 Purdue Big Ten 23 12 557 89 223 ... 57 76 43 17 16 31 39 256
1 2021-22 Purdue Big Ten 36 34 1132 203 441 ... 152 176 110 33 20 94 63 624
2 Career Purdue NaN 59 46 1689 292 664 ... 209 252 153 50 36 125 102 880
[3 rows x 27 columns], Season School Conf G GS MP FG FGA ... DRB TRB AST STL BLK TOV PF PTS
0 2020-21 Purdue Big Ten 19 10 442 66 174 ... 51 68 39 15 13 26 31 195
1 2021-22 Purdue Big Ten 19 17 620 104 244 ... 62 79 55 18 10 47 36 333
2 Career Purdue NaN 38 27 1062 170 418 ... 113 147 94 33 23 73 67 528
[3 rows x 27 columns], Season School Conf G GS MP FG ... TRB AST STL BLK TOV PF PTS
0 2020-21 Purdue Big Ten 23 12 557 6.4 ... 5.5 3.1 1.2 1.1 2.2 2.8 18.4
1 2021-22 Purdue Big Ten 36 34 1132 7.2 ... 6.2 3.9 1.2 0.7 3.3 2.2 22.0
2 Career Purdue NaN 59 46 1689 6.9 ... 6.0 3.6 1.2 0.9 3.0 2.4 20.8
[3 rows x 25 columns]]
https://stackoverflow.com/questions/72705462
复制相似问题