问熊猫获取的行不在其他数据帧中
EN

Stack Overflow用户

提问于 2015-03-06 23:10:28

回答 16查看 322.2K关注 0票数 325

我有两个熊猫数据框，它们有一些共同的行。

假设dataframe2是dataframe1的一个子集。

如何获取dataframe1中不在dataframe2中的行？

df1 = pandas.DataFrame(data = {'col1' : [1, 2, 3, 4, 5], 'col2' : [10, 11, 12, 13, 14]}) 
df2 = pandas.DataFrame(data = {'col1' : [1, 2, 3], 'col2' : [10, 11, 12]})

df1

   col1  col2
0     1    10
1     2    11
2     3    12
3     4    13
4     5    14

df2

   col1  col2
0     1    10
1     2    11
2     3    12

预期结果：

   col1  col2
3     4    13
4     5    14

python

pandas

dataframe

回答 16

Stack Overflow用户

回答已采纳

发布于 2015-03-06 23:35:39

一种方法是存储来自两个dfs的内部合并的结果，然后当一列的值不是这种公共的时候，我们可以简单地选择行：

In [119]:

common = df1.merge(df2,on=['col1','col2'])
print(common)
df1[(~df1.col1.isin(common.col1))&(~df1.col2.isin(common.col2))]
   col1  col2
0     1    10
1     2    11
2     3    12
Out[119]:
   col1  col2
3     4    13
4     5    14

编辑

您已经发现的另一种方法是使用isin，它将生成您可以删除的NaN行：

In [138]:

df1[~df1.isin(df2)].dropna()
Out[138]:
   col1  col2
3     4    13
4     5    14

但是，如果df2不以相同的方式开始行，那么这将不起作用：

df2 = pd.DataFrame(data = {'col1' : [2, 3,4], 'col2' : [11, 12,13]})

将生成整个df：

In [140]:

df1[~df1.isin(df2)].dropna()
Out[140]:
   col1  col2
0     1    10
1     2    11
2     3    12
3     4    13
4     5    14

票数 237

Stack Overflow用户

发布于 2017-11-04 11:46:13

当前选择的解决方案产生不正确的结果。为了正确解决这个问题，我们可以执行从df1到df2的左连接，确保首先只获取df2的唯一行。

首先，我们需要修改原始DataFrame以添加数据为3，10的行。

df1 = pd.DataFrame(data = {'col1' : [1, 2, 3, 4, 5, 3], 
                           'col2' : [10, 11, 12, 13, 14, 10]}) 
df2 = pd.DataFrame(data = {'col1' : [1, 2, 3],
                           'col2' : [10, 11, 12]})

df1

   col1  col2
0     1    10
1     2    11
2     3    12
3     4    13
4     5    14
5     3    10

df2

   col1  col2
0     1    10
1     2    11
2     3    12

执行左联接，消除df2中的重复项，以便df1的每一行都恰好与df2的一行联接。使用参数indicator返回一个额外的列，指示该行来自哪个表。

df_all = df1.merge(df2.drop_duplicates(), on=['col1','col2'], 
                   how='left', indicator=True)
df_all

   col1  col2     _merge
0     1    10       both
1     2    11       both
2     3    12       both
3     4    13  left_only
4     5    14  left_only
5     3    10  left_only

创建布尔条件：

df_all['_merge'] == 'left_only'

0    False
1    False
2    False
3     True
4     True
5     True
Name: _merge, dtype: bool

为什么其他解决方案是错误的

一些解决方案犯了同样的错误-它们只检查每个值在每一列中是独立的，而不是在同一行中一起。添加最后一行，它是唯一的，但具有来自df2的两列的值，这将暴露错误：

common = df1.merge(df2,on=['col1','col2'])
(~df1.col1.isin(common.col1))&(~df1.col2.isin(common.col2))
0    False
1    False
2    False
3     True
4     True
5    False
dtype: bool

这个解决方案得到了相同的错误结果：

df1.isin(df2.to_dict('l')).all(1)

票数 288

Stack Overflow用户

发布于 2017-06-02 07:56:55

假设索引在数据帧中是一致的(不考虑实际的col值)：

df1[~df1.index.isin(df2.index)]

票数 99

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/28901683

复制

相似问题

问熊猫获取的行不在其他数据帧中
EN

回答 16

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫获取的行不在其他数据帧中EN

回答 16

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫获取的行不在其他数据帧中
EN