考虑一下代码
将熊猫作为pd导入
# dfs
df_sample = pd.read_csv('...........')
array = ['' , '' , '' ....]
pattern = '|'.join(array)
# get all the rows
print(df_sample.COLUMN_NAME_XXX.str.contains(pattern))我如何才能获得列的内容,而不是目前的TRUE/FALSE?
因为我一直收到这样的信息:
manipulations.py:17: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
print(df_sample.COLUMN_NAME_XXX.str.contains(pattern))
0 False
1 True
2 False
3 NaN
4 NaN
...
10942 False
10943 NaN
10944 NaN
10945 NaN
10946 NaN
Name: COLUMN_NAME_XXX, Length: 568743243, dtype: object发布于 2021-07-14 21:47:07
通过fillna()尝试
m=df_sample.COLUMN_NAME_XXX.str.contains(pattern).fillna(False)
#Finally:
out=df[m]
#OR
out=df.loc[m]现在,如果您打印out,您将得到过滤后的数据帧
发布于 2021-07-14 21:35:41
您应该能够将该逻辑数组直接传递回数据帧切片运算符,例如:
df_sample[df_sample.COLUMN_NAME_XXX.str.contains(pattern)]它应该返回满足方括号内的条件的所有行。可以通过将条件设置为以下格式进行链接:
[(condition1) | (condition2)] #OR
[(condition1) & (condition2)] #AND它似乎会自动将值映射为False,但如果不是这样,您可以通过添加.fillna( NaN =False)将其作为另一个步骤添加到布尔数据帧中:
df_sample[df_sample.COLUMN_NAME_XXX.str.contains(pattern).fillna(value = False)]https://stackoverflow.com/questions/68378957
复制相似问题