系统: WIN10
IDE: MS VSCode
语言: Python版本3.7.3
图书馆:熊猫1.0.1版
数据源: https://hoopshype.com/salaries/#hh-tab-team-payroll
数据集:团队薪资
当我试图将str(表)转换为dataframe时,由于某些原因,我遇到了问题。我觉得我真的错过了一些东西。下面提供了示例代码,它一直只将顶部的数据行抛入数据帧。我想我错过了一些其他的转化过程。
采取了步骤:
tutorials
代码:
# import Libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
# data visualization
import matplotlib as plt
import seaborn as sns
# setting: For output purposes to show all columns
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 2000)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
# parsing the webpage
url = 'https://hoopshype.com/salaries/#hh-tab-team-payroll'
r = requests.get(url)
data = r.text
# create a beautfulsoup object
soup = BeautifulSoup(r.content,'lxml')
soup.prettify
# team salaries by year = tsby
table = soup.find_all('table')[0]
nbatsby = pd.read_html(str(table))
# this is where I am stuck
df = pd.DataFrame(nbatsby)
df.head(100)发布于 2020-06-13 01:06:44
熊猫有read_html,它可以抓取html表,并直接将其转换为数据:
import pandas as pd
import requests
r = requests.get("https://hoopshype.com/salaries/#hh-tab-team-payroll")
data = pd.read_html(r.text, attrs = {'class': 'hh-salaries-ranking-table'})[0]
print(data)输出:
Unnamed: 0 Team 2019/20 2020/21 2021/22 2022/23 2023/24 2024/25
0 1.0 Portland $XXX,XXX,XXX $XXXX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX
1 2.0 Miami $XXX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $0 $0
................................................................................................................
28 29.0 Indiana $XXX,XXX,XXX $XXX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $0
29 30.0 New York $XXX,XXX,XXX $XX,XXX,XXX $XX,XXX,XXX $0 $0 $0https://stackoverflow.com/questions/62353990
复制相似问题