我被困在这个网站上。在过去的一周里,我做了一些关于BeatifulSoup的小代码学习,我做了一些关于如何使用它和相应的官方文档的研究。不仅如此,还要复习一些关于如何从网站解析表的教程和视频。我从以下几个网站使用soup.find()和soup.select()方法解析了表中的数据:
例如,对于MLB统计网站,我使用了以下代码:
from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as bs
def connection(url):
uclient = ureq(url)
page_html = uclient.read()
uclient.close()
soup = bs(page_html, "html.parser")
return(soup)
soup = connection('https://baseballsavant.mlb.com/team/146')
table = soup.findAll("div", {"class": "table-savant"}) #<--using method soup.find()
#table = soup.select("div.table-savant") #<-----------------using method soup.select()
for n in range(len(table)):
if (n==9): break
content = table[n]
columns = content.find("thead").find_all("th")
column_names = [str(c.string).strip() for c in columns]
table_rows = soup.findAll("tbody")[n].find_all("tr")
l = []
for tr in table_rows:
td = tr.find_all("td")
row = [str(tr.text).strip() for tr in td]
l.append(row)
print(l) 然后将它们转换为数据框架。但是有一个特定的网站 无法检索表的数据。我试过用find()打印内容
def connection(url):
uclient = ureq(url)
page_html = uclient.read()
uclient.close()
soup = bs(page_html, "html.parser")
return(soup)
soup = connection('https://baseballsavant.mlb.com/preview?game_pk=634607&game_date=2021-4-4')
table = soup.findAll("div", {"class": "table-savant"}) #<--using method soup.find()
print(table)
result: []用select()
table = soup.select("div.table-savant")
print(table)
result: []使用来自此select()的CSS路径的CSS
table = soup.select('#preview > div:nth-of-type(1) > div:nth-of-type(2) > div:nth-of-type(3) > table:nth-of-type(1) > tbody:nth-of-type(2) > tr:nth-of-type(2) > td:nth-of-type(3)')
print(table)
result: []我想从球员那里拿回数据,但我迷路了。如有任何建议,将不胜感激。谢谢。
发布于 2021-03-29 19:18:57
https://stackoverflow.com/questions/66860107
复制相似问题