我有一个这样的数据集:
df = pd.DataFrame({'scientist':["Wendelaar Bonga"," Sjoerd E.", "Grätzel"," Michael", "Willett", "Walter C.",
"Kessler", "Ronald C.", "Witten, Edward", "Wang, Zhong Lin"],
'SubjectField': ["Biomedical Engineering", "Inorganic & Nuclear Chemistry",
"Organic Chemistry", "Biomedical Engineering", "Developmental Biology",
"Mechanical Engineering & Transports", "Biomedical Engineering", "Microbiology",
"Cardiovascular System & Hematology", "Biomedical Engineering"]})
我想要计算每个主题领域的科学家数量,并提取具有超过2个科学家的主题领域。这是我用来计算科学家数量的代码
number_of_scientists_in_fields=data.groupby(['SubjectField'])['scientist'].count()
如何提取拥有2个以上科学家的主题领域?
发布于 2020-12-28 01:54:43
使用value_counts,如下所示:
fields = df.value_counts('SubjectField').to_frame('count')
res = fields[fields['count'] > 2]
print(res)
输出
count
SubjectField
Biomedical Engineering 4
发布于 2020-12-28 02:31:35
另一种方法,可能没有Dani的好,可能是这样的:
> df2 = df[df.SubjectField.duplicated(keep=False)]
> df2.groupby('SubjectField').count()
scientist
SubjectField
Biomedical Engineering 4
但是,此示例将包括2个或更多(不大于2)
发布于 2020-12-28 04:02:35
您只需创建一个Series
,然后使用> 2
对其进行过滤
In [2554]: x = df.groupby('SubjectField')['scientist'].count()
In [2559]: ans = x[x > 2]
In [2560]: ans
Out[2560]:
SubjectField
Biomedical Engineering 4
Name: scientist, dtype: int64
https://stackoverflow.com/questions/65468699
复制相似问题