你好,我有一个df,如:
Groups COL1
G1 Seq:1
G1 Seq:2
G1 Seq_1
G1 Seq:4
G2 Seq_2
G2 Seq_3
G2 Seq_4
G3 Seq:5
G3 Seq:6
G4 Seq:7
G4 Seq_5
我想数一数:
有人知道吗?我想我应该起诉一个re.sub
,然后在熊猫中计算每个Groups
的总和?
发布于 2020-11-10 09:37:31
您可以使用这个来计算使用pd.Series.str.contains
,然后使用GroupBy.all
和GroupBy.any
om = df['COL1'].str.contains(':')
one = om.groupby(df['Groups']).all().sum() # 1
two = om.groupby(df['Groups']).any().sum() - one # 2
# minus one because `any` counts all Trues too so we need
# subtract groups with all Trues.
three = (~om).groupby(df['Groups']).all().sum() # 1
发布于 2020-11-10 09:25:30
使用Series.str.contains
作为掩码,然后通过numpy.setdiff1d
将DataFrame.loc
过滤的值与~
或掩码的反向掩码进行比较:
m = df['COL1'].str.contains(':')
a = np.setdiff1d(df['Groups'], df.loc[~m, 'Groups']).tolist()
print (a)
['G3']
c = np.setdiff1d(df['Groups'], df.loc[m, 'Groups']).tolist()
print (c)
['G2']
b = np.setdiff1d(df.loc[~m, 'Groups'], c).tolist()
print (b)
['G1', 'G4']
用于计数的Anf获取列表长度:
print (len(a))
print (len(b))
print (len(c))
https://stackoverflow.com/questions/64765989
复制相似问题