如何为Pandas dataframe实现“in”和“not in”

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (2)
  • 关注 (0)
  • 查看 (6026)

如何实现SQL的等价物INNOT IN?

我有一个包含所需值的列表。下面是一个场景:

df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']

# pseudo-code:
df[df['countries'] not in countries]

我目前的做法如下:

df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})

# IN
df.merge(countries,how='inner',on='countries')

# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]

但这似乎是个可怕的传说。有人能改进吗?

提问于
用户回答回答于

“in”用途:something.isin(somewhere)

或“不在”:~something.isin(somewhere)

作为一个有效的例子:

>>> df
  countries
0        US
1        UK
2   Germany
3     China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0    False
1     True
2    False
3     True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
  countries
1        UK
3     China
>>> df[~df.countries.isin(countries)]
  countries
0        US
2   Germany
用户回答回答于

我通常对这样的行进行泛型过滤:

criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]

扫码关注云+社区

领取腾讯云代金券