最近,我从the following website上找到了一份大学橄榄球的名单,我得到了下面的输出……
Number Name Position Experience Height Weight Hometown / High School Previous School High School Hometown
0 0.0 Jonathan Sutherland S R-Sr. 5-11 203 Ottawa, Ontario, Canada / Episcopal (Va.) NaN Episcopal (Va.) Ottawa, Ontario, Canada
1 1.0 Jaquan Brisker S Sr.+ 6-1 204 Pittsburgh, Pa. / Gateway Lackawanna College Gateway Pittsburgh, Pa.
2 2.0 Keaton Ellis CB Jr. 5-11 192 State College, Pa. / State College Area NaN State College Area State College, Pa.
3 2.0 Ta'Quan Roberson QB R-So. 5-11 199 Orange, N.J. / DePaul Catholic NaN DePaul Catholic Orange, N.J.
4 3.0 Johnny Dixon CB Jr. 6-0 188 Tampa, Fla. / Chamberlain South Carolina Chamberlain Tampa, Fla.
... ... ... ... ... ... ... ... ... ... ...
104 NaN Jaylen Reed S Fr. 6-0 213 Detroit, Mich. / Martin Luther King High School NaN Martin Luther King High School Detroit, Mich.
105 NaN Jordan van den Berg DT So. 6-3 292 Johannesburg, South Africa / Providence Christ... Iowa Western Providence Christian Academy Johannesburg, South Africa
106 NaN Harrison Wallace III WR Fr. 6-1 187 Montgomery, Ala. / Pike Road High School NaN Pike Road High School Montgomery, Ala.
107 NaN Zakee Wheatley CB Fr. 6-2 190 Crofton, Md. / Archbishop Spalding High School NaN Archbishop Spalding High School Crofton, Md.
108 NaN Eric Wilson OL Sr.+ 6-4 320 Minnetrista, Minn. / Benilde-St. Margaret's Harvard Benilde-St. Margaret's Minnetrista, Minn.
109 rows × 10 columns对以下脚本执行...from命令。
roster = pd.read_html('https://gopsusports.com/sports/football/roster/2021')
roster_final = roster[2]
roster_final.columns = ['Number', 'Name', 'Position', 'Experience', 'Height', 'Weight', 'Hometown / High School', 'Previous School']
roster_final['High School'] = roster_final['Hometown / High School'].str.split('/').str[-1]
roster_final['Hometown'] = roster_final['Hometown / High School'].str.split('/').str[0]
roster_final我现在想做的是创建一个所谓的“奖学金网格”,其中花名册按职位和他们参与项目的时间进行了分解。奖学金网格的一个例子是from this webpage。
到目前为止,我已经用以下代码进行了尝试:
fifth_year = ['R-Sr.+', 'Sr.+', 'R-Sr.']
fourth_year = ['Sr.', 'R-Jr.']
third_year = ['Jr.', 'R-So.']
second_year = ['So.', 'R-Fr.']
first_year = ['Fr.']
roster_final.loc[(roster_final['Experience'] == 'R-Sr.+') | (roster_final['Experience'] == 'Sr.+') | (roster_final['Experience'] == 'R-Sr.'), 'Class'] = 'Fifth Year'
roster_final.loc[(roster_final['Experience'] == 'Sr.') | (roster_final['Experience'] == 'R-Jr.'), 'Class'] = 'Fourth Year'
roster_final.loc[(roster_final['Experience'] == 'Jr.') | (roster_final['Experience'] == 'R-So.'), 'Class'] = 'Third Year'
roster_final.loc[(roster_final['Experience'] == 'So.') | (roster_final['Experience'] == 'R-Fr.'), 'Class'] = 'Second Year'
roster_final.loc[(roster_final['Experience'] == 'Fr.'), 'Class'] = 'First Year'
# roster_final['Class'] = roster_final['Experience'].apply(lambda x: 'fifth_year' if x.isin(fifth_year))
year5 = roster_final[['Number', 'Name', 'Position', 'Experience', 'Class']][roster_final['Class']=='Fifth Year']
year4 = roster_final[['Number', 'Name', 'Position', 'Experience', 'Class']][roster_final['Class']=='Fourth Year']
year3 = roster_final[['Number', 'Name', 'Position', 'Experience', 'Class']][roster_final['Class']=='Third Year']
year2 = roster_final[['Number', 'Name', 'Position', 'Experience', 'Class']][roster_final['Class']=='Second Year']
year1 = roster_final[['Number', 'Name', 'Position', 'Experience', 'Class']][roster_final['Class']=='First Year']
scholarship_roster = roster_final.groupby("Position")
scholarship_roster = scholarship_roster["Name"].apply(list)
scholarship_roster = scholarship_roster.reset_index()
year5[year5['Position'] == 'OL']我可以使用groupbys和lists来分解第一个表,但我希望我的结果看起来与the aforementioned website.上的完全一样。如果有人可以帮助我获得这个输出,我将非常感激。
发布于 2021-07-23 07:55:48
下面的代码通常会回答这个问题吗?它应该会给你一个想法,或者至少是一些思考的食粮。
注意:
addYear(x)函数可能需要检查,因为我没有尝试获取每个字符串的正确年份,只是为了演示一下。
Year列的顺序是从第一到第五,但你提供的例子是左边的高年级学生(以及代表“大一新生”的单词)。但是,您可以轻松地颠倒代码第54行的顺序,并(如果需要)在查看时更改addYear函数中Year列中的值的措辞(请参阅上面的第一个项目符号)。代码:
import pandas as pd
import requests
#set how the names of the players will be joined in the final pivot. Pipe seems to work best
join_string = r' | '
# I could not get https://gopsusports.com/sports/football/roster/2021 to load but I saw this...
url = 'https://www.roarlionsroar.com/penn-state-football/roster'
page = requests.get(url)
table = pd.read_html(page.text, header = 1)
df_roster = table[0]
del df_roster['No.']
# df_roster['Class/Eligibility'].value_counts()
def addYear(x):
'''
df_roster['Class/Eligibility'].value_counts() returns the following:
FR/FR 35, SO/FR 19, SR/JR 16, JR/SO 14,
SO/SO 7, JR/JR 6, SR/SR 5, 5th/SR 2, GR/SR 1
'''
if x == 'FR/FR':
return 'First'
elif x == 'SO/FR':
return 'Second'
elif x == 'SR/JR':
return 'Fourth'
elif x == 'JR/SO':
return 'Third'
elif x == 'SO/SO':
return 'Second'
elif x == 'JR/JR':
return 'Third'
elif x == 'SR/SR':
return 'Fifth'
elif x == '5th/SR':
return 'Fifth'
elif x == 'GR/SR':
return 'Fifth'
else:
return 'Unknown'
df_roster['Year'] = df_roster['Class/Eligibility'].apply(addYear)
df_roster = df_roster[['Position', 'Year', 'Name']]
df_pivot = df_roster.pivot_table(index="Position", columns="Year", values="Name", \
margins = True, aggfunc=lambda x: join_string.join(x), fill_value='-')
df_pivot = df_pivot.reindex(['First', 'Second', 'Third', 'Fourth', 'Fifth'], axis=1)
df_pivot = df_pivot.reindex(['QB', 'RB', 'WR', 'TE', 'OL', 'DE', 'DT', 'LB', 'CB', 'S', 'K', 'P'], axis=0)
print(df_pivot)输出示例:

https://stackoverflow.com/questions/68491620
复制相似问题