文章/答案/技术大牛

发布

社区首页 >问答首页 >为什么python解析表BeatifulSoup不像预期的那样在这个网站上工作？

问为什么python解析表BeatifulSoup不像预期的那样在这个网站上工作？
EN

Stack Overflow用户

提问于 2021-03-29 19:04:29

回答 1查看 69关注 0票数 1

我被困在这个网站上。在过去的一周里，我做了一些关于BeatifulSoup的小代码学习，我做了一些关于如何使用它和相应的官方文档的研究。不仅如此，还要复习一些关于如何从网站解析表的教程和视频。我从以下几个网站使用soup.find()和soup.select()方法解析了表中的数据：

例如，对于MLB统计网站，我使用了以下代码：

from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as bs
    
def connection(url):
    uclient = ureq(url)
    page_html = uclient.read()
    uclient.close()
    soup = bs(page_html, "html.parser")
    return(soup)


soup = connection('https://baseballsavant.mlb.com/team/146')
   
table = soup.findAll("div", {"class": "table-savant"})  #<--using method soup.find()
#table = soup.select("div.table-savant") #<-----------------using method soup.select()   

for n in range(len(table)):
    if (n==9): break 
    content = table[n]
    columns = content.find("thead").find_all("th")    
    column_names = [str(c.string).strip() for c in columns] 
    table_rows = soup.findAll("tbody")[n].find_all("tr")
    l = [] 
    for tr in table_rows:
        td = tr.find_all("td")
        row = [str(tr.text).strip() for tr in td]
        l.append(row)
    print(l)

然后将它们转换为数据框架。但是有一个特定的网站 无法检索表的数据。我试过用find()打印内容

def connection(url):
    uclient = ureq(url)
    page_html = uclient.read()
    uclient.close()
    soup = bs(page_html, "html.parser")
    return(soup) 

soup = connection('https://baseballsavant.mlb.com/preview?game_pk=634607&game_date=2021-4-4')
   
table = soup.findAll("div", {"class": "table-savant"})  #<--using method soup.find()
print(table)

result: []

用select()

table = soup.select("div.table-savant") 
print(table)

result: []

使用来自此select()的CSS路径的CSS

table = soup.select('#preview > div:nth-of-type(1) > div:nth-of-type(2) > div:nth-of-type(3) > table:nth-of-type(1) > tbody:nth-of-type(2) > tr:nth-of-type(2) > td:nth-of-type(3)')
print(table)
    
result: []

我想从球员那里拿回数据，但我迷路了。如有任何建议，将不胜感激。谢谢。

beautifulsoup

python

dataframe

parsing

web-scraping

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-03-29 19:18:57

Problem：页面使用javascript来获取和显示内容，所以您不能仅仅使用请求或其他类似的方法，因为javascript代码不会被执行。

解决方案：使用硒加载页面，然后使用BeautifulSoup解析内容。

这里的示例代码：

from selenium import webdriver
d = webdriver.Chrome()
d.get(url)
bs = BeautifulSoup(d.page_source)

要使用webdriver.Chrome，您还必须从这里下载chromedriver，并将可执行文件放在项目的同一个文件夹或路径中。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66860107

复制

相似问题

问为什么python解析表BeatifulSoup不像预期的那样在这个网站上工作？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么python解析表BeatifulSoup不像预期的那样在这个网站上工作？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么python解析表BeatifulSoup不像预期的那样在这个网站上工作？
EN