首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >在BeautifulSoup中实现在一个动态网页中抓取多个表格

在BeautifulSoup中实现在一个动态网页中抓取多个表格
EN

Stack Overflow用户
提问于 2020-09-24 15:35:25
回答 2查看 94关注 0票数 1

我想从一个动态网页https://racing.hkjc.com/racing/information/english/Horse/BTResult.aspx?Date=2020/09/18刮多个表我已经尝试了以下代码,但收到以下错误。我想得到显示在底部的输出。

代码语言:javascript
运行
复制
df = pd.DataFrame()
driver = webdriver.Chrome('/Users/alau/Downloads/chromedriver')
driver.get('https://racing.hkjc.com/racing/information/english/Horse/BTResult.aspx?Date=2020/09/18')
res = driver.execute_script('return document.documentElement.outerHTML')
time.sleep(3)
driver.quit()
soup = BeautifulSoup(res, 'lxml')
tables = soup.find_all('table', {'class':'bigborder'})
subheads = soup.find_all('td', {'class':'subheader'}).text.replace('\n','!')
def tableDataText(tables):       
    rows = []
    trs = tables.find_all('tr')
    headerow = [td.get_text(strip=True) for td in trs[0].find_all('th')] # header row
    if headerow: # if there is a header row include first
        rows.append(headerow)
        trs = trs[1:]
    for tr in trs: # for every table row
        rows.append([td.get_text(strip=True) for td in tr.find_all('td')]) # data row    
    return rows
result_table = tableDataText(bt_table)
df = pd.DataFrame(result_table[1:], columns=result_table[0])

AttributeError: ResultSet对象没有特性'find_all‘。您可能会将一列项目视为单个项目。当您打算调用find_all()时,您是否调用了find()?

输出

enter image description here

EN

回答 2

Stack Overflow用户

发布于 2020-09-24 16:18:18

您必须发送一个带有anti-bot cookie的POST请求来获取响应中的HTML

下面是如何使用BeautifulSoup实现这一点

代码语言:javascript
运行
复制
import pandas as pd
import requests
from bs4 import BeautifulSoup


cookies = {
    "BotMitigationCookie_9518109003995423458": "381951001600933518cRI6X6LoZp9tUD7Ls04ETZpx41s=",
}
url = "https://racing.hkjc.com/racing/information/english/Horse/BTResult.aspx?Date=2020/09/18"

response = requests.post(url, cookies=cookies).text
soup = BeautifulSoup(response, "html.parser").find_all("table", {"class": "bigborder"})

columns = [
    "Horse", "Jockey", "Trainer", "Draw", "Gear", "LBW",
    "Running Position", "Time", "Result", "Comment",
]


def get_data():
    for table in soup.find_all("table", {"class": "bigborder"}):
        for tr in table.find_all("tr", {"bgcolor": "#eeeeee"}):
            yield [
                i.find("font").getText().strip().replace(";", "")
                for i in tr.find_all("td")
            ]


df = pd.DataFrame([table for table in get_data()], columns=columns)
df.to_csv("data.csv", index=False)

这将为您提供:

票数 0
EN

Stack Overflow用户

发布于 2020-09-24 17:49:27

代码语言:javascript
运行
复制
import pandas as pd
import requests

cookies = {
    'BotMitigationCookie_9518109003995423458': '343775001600940465b2KTzJpwY5pXpiVNIRRi97Z3ELk='
}


def main(url):
    r = requests.post(url, cookies=cookies)
    df = pd.read_html(r.content, header=0, attrs={'class': 'bigborder'})
    new = pd.concat(df, ignore_index=True)
    print(new)
    new.to_csv("data.csv", index=False)


main("https://racing.hkjc.com/racing/information/english/Horse/BTResult.aspx?Date=2020/09/18")

输出:view-online

代码语言:javascript
运行
复制
                       Horse  ...                                            Comment
0              LARSON (D199)  ...      Being freshened up; led all the way to score.        
1      PRIVATE ROCKET (C367)  ...         Sat behind the leader; ran on comfortably.        
2        WIND N GRASS (D197)  ...  Slightly slow to begin; made progress under a ...        
3      VOYAGE WARRIOR (C247)  ...           In 2nd position; slightly weakened late.        
4         BEAUTY RUSH (C475)  ...              Bounded on jumping; settled midfield.        
..                       ...  ...                                                ...        
59  BUNDLE OF DELIGHT (D236)  ...    Raced along the rail; ran on OK when persuaded.        
60          GOOD DAYS (A333)  ...              Hit the line well when clear at 300m.        
61   YOU HAVE MY WORD (V149)  ...  Well tested in the Straight; moved better than...        
62          PLIKCLONE (D003)  ...       Average to begin; raced under his own steam.        
63    REEVE'S MUNTJAC (C174)  ...  The stayer raced under his own steam to stretc...        

[64 rows x 10 columns]
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64041582

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档