时刻表,columns=hour,rows=weekday,data=subject
工作日x小时
1 2 3 4 5 6 7
Name
Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
Wednesday Data Science Project Project Project Project Project Project
Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
频率表 rows=weekday,columns=subject,data = subject频度在相应工作日
平日x主题
Data Data Mining Data Science Embedded Systems Industrial Psychology Project
Name
Friday 1 1 1 1 3
Monday 1 1 1 1 3
Thursday 2 0 1 1 3
Tuesday 0 1 1 1 4
Wednesday 0 1 0 0 6
码
self.start = datetime(2022, 1, 1)
self.end = datetime(2022, 3, 31)
self.file = 'timetable.csv'
self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
self.subject_frequency = self.sdf.apply(pd.value_counts).fillna(0)
print(self.subject_frequency.to_string())
self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
self.p = self.sdf.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)\
.pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
print(self.p.to_string())
必需表
classes ...
Data Mining 32
Data Science 32
Embedded Systems 32
Industrial Psychology 32
Project 146
将在以后增加更多的栏目,如当前的出勤率、每节课缺课的百分比下降、星期一、星期二休假的百分比损失、……等,以从出席率中减去。
最终的目标是分析哪一天休假是安全的,并监控我的比例。如果我的方向更好,请告诉我。
发布于 2022-03-06 08:54:57
一种可能的方法是像以前一样使用bdate_range
,并使用weekday
来选择工作日(0-4),并将这些数字设置为相应的工作日名称;然后与之一起对频率表进行reindex
。然后得到一个DataFrame,其中每一行对应于2022-1-1和2022-3-31之间的工作日。然后sum
查找每个类的总数:
out = (freqtable.reindex(pd.bdate_range('2022-1-1','2022-3-31').weekday
.map(dict(enumerate(['Monday','Tuesday','Wednesday','Thursday','Friday']))))
.sum()
.rename_axis(['classes']).reset_index(name='count'))
输出:
classes count
0 Data Mining 51
1 Data Science 51
2 Embedded Systems 51
3 Industrial Psychology 51
4 Project 244
发布于 2022-03-06 07:11:29
select_rows = [date.strftime("%A") for date in pd.bdate_range(self.start, self.end)]
r = self.p.loc[select_rows, :]
print(r.to_string())
print(r.sum())
请随意添加一个简单的代码,设计建议也是赞赏!
https://stackoverflow.com/questions/71364481
复制相似问题