我有一个大型的dataframe,它的一个列中有一个单词的几个变体。我想根据我要找的特定单词过滤行。样本数据如下所示。在这里,我想过滤那些在“rows”列中有"create“这个词的行,而不是其中的一个子字符串,比如"recreate”或“recreate”。
注意事项:我只是在寻找一个在str.contains中应用的Regex解决方案
In [4]: df = pd.DataFrame({"Resolution":["create profile", "recreate profile", "re-create profile", "created profile",
...: "re-created profile", "closed outlook and recreated profile", "purged outlook processes and created new profile
...: "], "Product":["Outlook", "Outlook", "Outlook", "Outlook", "Outlook", "Outlook", "Outlook"]})
In [5]: df
Out[5]:
Resolution Product
0 create profile Outlook
1 recreate profile Outlook
2 re-create profile Outlook
3 created profile Outlook
4 re-created profile Outlook
5 closed outlook and recreated profile Outlook
6 purged outlook processes and created new profile Outlook我的尝试:
我已经能够过滤“重新创造”和“重新创造”(过去式不重要):
In [13]: df[df.Resolution.str.contains("(?=.*recreate|re-create)(?=.*profile)")]
Out[13]:
Resolution Product
1 recreate profile Outlook
2 re-create profile Outlook
4 re-created profile Outlook
5 closed outlook and recreated profile Outlook问:如何修改regex,使其只包含"create“而不是子字符串的行?就像这样:
Resolution Product
0 create profile Outlook
3 created profile Outlook
6 purged outlook processes and created new profile Outlook发布于 2019-06-05 07:28:24
为反转条件添加~:
df = df[~df.Resolution.str.contains("(?=.*recreate|re-create)(?=.*profile)")]
print (df)
Resolution Product
0 create profile Outlook
3 created profile Outlook
6 purged outlook processes and created new profile Outlookhttps://stackoverflow.com/questions/56455922
复制相似问题