首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >擦伤体育参考表

擦伤体育参考表
EN

Stack Overflow用户
提问于 2022-08-22 21:54:44
回答 2查看 63关注 0票数 1

我尝试了下面的脚本来抓取网页上的表格。

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.sports-reference.com/cfb/play-index/rivals.cgi?request=1&school_id=penn-state&opp_id=purdue'

headers = {'User-Agent': 
           'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

pageTree = requests.get(url, headers=headers)
soup = BeautifulSoup(pageTree.content, 'html.parser')

soup.find('tbody')

然而,这张桌子是不能拉的。即使是"pd.read_html“行也不起作用。这有什么原因吗?

EN

回答 2

Stack Overflow用户

发布于 2022-08-22 22:26:06

所需的表数据在html注释下面。通过删除注释,您可以仅使用熊猫提取表数据。

代码语言:javascript
运行
复制
import pandas as pd
import requests
from bs4 import BeautifulSoup

url= 'https://www.sports-reference.com/cfb/play-index/rivals.cgi?request=1&school_id=penn-state&opp_id=purdue'
res = requests.get(url).text.replace('<!--', '').replace('-->', '')

soup =BeautifulSoup(res,'lxml')

table = soup.select_one('#div_results')

df = pd.read_html(str(table))[0]
d = df.droplevel(0, axis=1)
print(d)

输出:

代码语言:javascript
运行
复制
 G        Date  Day           School Unnamed: 4_level_1     Opponent  ... Diff   W  L  T  Streak  Notes
0   19  2019-10-05  Sat  Penn State (12)                NaN       Purdue  ...   28  15  3  1     W 9    NaN
1   18  2016-10-29  Sat  Penn State (24)                  @       Purdue  ...   38  14  3  1     W 8    NaN
2   17  2013-11-16  Sat       Penn State                NaN       Purdue  ...   24  13  3  1     W 7    NaN
3   16  2012-11-03  Sat       Penn State                  @       Purdue  ...   25  12  3  1     W 6    NaN
4   15  2011-10-15  Sat       Penn State                NaN       Purdue  ...    5  11  3  1     W 5    NaN
5   14  2008-10-04  Sat   Penn State (6)                  @       Purdue  ...   14  10  3  1     W 4    NaN
6   13  2007-11-03  Sat       Penn State                NaN       Purdue  ...    7   9  3  1     W 3    NaN
7   12  2006-10-28  Sat       Penn State                  @       Purdue  ...   12   8  3  1     W 2    NaN
8   11  2005-10-29  Sat  Penn State (11)                NaN       Purdue  ...   18   7  3  1     W 1    NaN
9   10  2004-10-09  Sat       Penn State                NaN   Purdue (9)  ...   -7   6  3  1     L 2    NaN
10   9  2003-10-11  Sat       Penn State                  @  Purdue (18)  ...  -14   6  2  1     L 1    NaN
11   8  2000-09-30  Sat       Penn State                NaN  Purdue (22)  ...    2   6  1  1     W 6    NaN
12   7  1999-10-23  Sat   Penn State (2)                  @  Purdue (16)  ...    6   5  1  1     W 5    NaN
13   6  1998-10-17  Sat  Penn State (12)                NaN       Purdue  ...   18   4  1  1     W 4    NaN
14   5  1997-11-15  Sat   Penn State (6)                  @  Purdue (19)  ...   25   3  1  1     W 3    NaN
15   4  1996-10-12  Sat  Penn State (10)                NaN       Purdue  ...   17   2  1  1     W 2    NaN
16   3  1995-10-14  Sat  Penn State (20)                  @       Purdue  ...    3   1  1  1     W 1    NaN
17   2  1952-09-27  Sat       Penn State                NaN       Purdue  ...    0   0  1  1     T 1    NaN
18   1  1951-11-03  Sat       Penn State                  @       Purdue  ...  -28   0  1  0     L 1    NaN

[19 rows x 16 columns]
票数 3
EN

Stack Overflow用户

发布于 2022-08-22 22:08:39

<table>存储在HTML注释<!-- -->中,因此beautifulsoup通常不会看到它。要解析它,可以使用下一个示例:

代码语言:javascript
运行
复制
import requests
import pandas as pd
from bs4 import BeautifulSoup, Comment


url = "https://www.sports-reference.com/cfb/play-index/rivals.cgi?request=1&school_id=penn-state&opp_id=purdue"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
}

pageTree = requests.get(url, headers=headers)
soup = BeautifulSoup(pageTree.content, "html.parser")

df = pd.read_html("\n".join(soup.find_all(text=Comment)))[0]
df = df.droplevel(0, axis=1)
print(df)

指纹:

代码语言:javascript
运行
复制
     G        Date  Day           School Unnamed: 4_level_1     Opponent     Conf Unnamed: 7_level_1  Pts  Opp  Diff   W  L  T Streak  Notes
0   19  2019-10-05  Sat  Penn State (12)                NaN       Purdue  Big Ten                  W   35    7    28  15  3  1    W 9    NaN
1   18  2016-10-29  Sat  Penn State (24)                  @       Purdue  Big Ten                  W   62   24    38  14  3  1    W 8    NaN
2   17  2013-11-16  Sat       Penn State                NaN       Purdue  Big Ten                  W   45   21    24  13  3  1    W 7    NaN
3   16  2012-11-03  Sat       Penn State                  @       Purdue  Big Ten                  W   34    9    25  12  3  1    W 6    NaN
4   15  2011-10-15  Sat       Penn State                NaN       Purdue  Big Ten                  W   23   18     5  11  3  1    W 5    NaN
5   14  2008-10-04  Sat   Penn State (6)                  @       Purdue  Big Ten                  W   20    6    14  10  3  1    W 4    NaN
6   13  2007-11-03  Sat       Penn State                NaN       Purdue  Big Ten                  W   26   19     7   9  3  1    W 3    NaN
7   12  2006-10-28  Sat       Penn State                  @       Purdue  Big Ten                  W   12    0    12   8  3  1    W 2    NaN
8   11  2005-10-29  Sat  Penn State (11)                NaN       Purdue  Big Ten                  W   33   15    18   7  3  1    W 1    NaN
9   10  2004-10-09  Sat       Penn State                NaN   Purdue (9)  Big Ten                  L   13   20    -7   6  3  1    L 2    NaN
10   9  2003-10-11  Sat       Penn State                  @  Purdue (18)  Big Ten                  L   14   28   -14   6  2  1    L 1    NaN
11   8  2000-09-30  Sat       Penn State                NaN  Purdue (22)  Big Ten                  W   22   20     2   6  1  1    W 6    NaN
12   7  1999-10-23  Sat   Penn State (2)                  @  Purdue (16)  Big Ten                  W   31   25     6   5  1  1    W 5    NaN
13   6  1998-10-17  Sat  Penn State (12)                NaN       Purdue  Big Ten                  W   31   13    18   4  1  1    W 4    NaN
14   5  1997-11-15  Sat   Penn State (6)                  @  Purdue (19)  Big Ten                  W   42   17    25   3  1  1    W 3    NaN
15   4  1996-10-12  Sat  Penn State (10)                NaN       Purdue  Big Ten                  W   31   14    17   2  1  1    W 2    NaN
16   3  1995-10-14  Sat  Penn State (20)                  @       Purdue  Big Ten                  W   26   23     3   1  1  1    W 1    NaN
17   2  1952-09-27  Sat       Penn State                NaN       Purdue  Western                  T   20   20     0   0  1  1    T 1    NaN
18   1  1951-11-03  Sat       Penn State                  @       Purdue  Western                  L    0   28   -28   0  1  0    L 1    NaN
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73451375

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档