基于列值组合获得所有单元格值的最佳方法是什么?
样本数据一:
Stock Name Price
0 AMD Advanced Micro Devices 100
1 GE General Electric Company 200
2 BAC Bank of America Corporation 300
3 AAPL Apple Inc. 500
4 MSFT Microsoft Corporation 1000
5 GOOGL Alphabet Inc. 2000
样本数据二:
Stock Name Price
0 AMD Advanced Micro Devices 100
1 GE General Electric Company 200
2 BAC Branch of America Corporation 300
3 AAPL Apple Inc. 500
4 MSFT Microsoft Corporation 1000
5 GOOGL Alphabet Inc. 2000
例如:我想使用(Stock和Name)作为键列,然后比较数据集。目的是将两个数据集之间的不匹配项与用作组合键的Stock+Name列打印出来。
我正在使用Pandas/Python3.7 3.7
样本输出:
美国银行300 .美国银行300分行
发布于 2018-11-20 13:45:43
也许,使用merge
+ query
实现完整的内部连接
df1.merge(df2, on='Stock').query('Name_x != Name_y')
Stock Name_x Price_x Name_y Price_y
2 BAC Bank of America Corporation 300 Branch of America Corporation 300
或者,一个与map
稍有不同的解决方案,您可以使用它获取股票符号:
m = df1.Stock.map(df2.set_index('Stock').Name).ne(df1.Name)
symbols = df1.loc[m, 'Stock']
print(symbols)
2 BAC
Name: Stock, dtype: object
然后按股票代码访问每个DataFrame行:
df1[df1.Stock.isin(symbols)]
Stock Name Price
2 BAC Bank of America Corporation 300
df2[df2.Stock.isin(symbols)]
Stock Name Price
2 BAC Branch of America Corporation 300
发布于 2018-11-20 13:52:33
如果它们位于两个数据流中,那么不附加条件地将它们合并到.concat
中是非常简单的。一旦它们被加入,这里有一种方法可以得到不匹配:
import pandas as pd
df1 = pd.DataFrame({
"Ticker_y": list("qwerty"),
"Name_y": list("asdfgh"),
"Ticker_x": list("qw3r7y"),
"Name_x": list("as6f8h")
})
mismatch = df1[(df1["Ticker_y"] != df1["Ticker_x"]) & (df1["Name_y"] != df1["Name_x"])]
最后一行只是说“df只满足这些条件”。
发布于 2018-11-20 20:23:54
我们可以使用isin
使用值序列进行测试,因为它确保DataFrame中的每个元素都包含在值中。
First DataFrame
>>> df1
Stock Name Price
0 AMD Advanced Micro Devices 100
1 GE General Electric Company 200
2 BAC Bank of America Corporation 300
3 APPL Apple Inc. 500
4 MSFT Microsoft Corporation 1000
5 GOOGL Alphabet Inc. 2000
第二DataFrame
>>> df2
Stock Name Price
0 AMD Advanced Micro Devices 100
1 GE General Electric Company 200
2 BAC Branch of America Corporation 300
3 APPL Apple Inc. 500
4 MSFT Microsoft Corporation 1000
5 GOOGL Alphabet Inc. 2000
你可以走了。
>>> df2[~df2.Name.isin(df1.Name.values)]
Stock Name Price
2 BAC Branch of America Corporation 300
或
>>> df1[~df1.Name.isin(df2.Name.values)]
Stock Name Price
2 BAC Bank of America Corporation 300
https://stackoverflow.com/questions/53402012
复制