首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >Python pandas dataframe获取列值的所有组合?

Python pandas dataframe获取列值的所有组合?
EN

Stack Overflow用户
提问于 2018-08-02 03:16:42
回答 4查看 1.4K关注 0票数 3

我有一个熊猫数据帧,看起来像这样:

代码语言:javascript
复制
        colour   points
0         red         1
1      yellow        10
2       black        -3

然后,我尝试执行以下算法:

代码语言:javascript
复制
combos = []
points = []

for i1 in range(len(df)):
  for i2 in range(len(df)):

    colour_main      = df['colour'].values[i1]
    colour_secondary = df['colour'].values[i2]
    combo = colour_main + "_" + colour_secondary

    point1 = df['points'].values[i1]
    point2 = df['points'].values[i2]
    new_points = point1 + point2

    combos.append(combo)
    points.append(new_points)

df_new = pd.DataFrame({'colours': combos,
                       'points': points})

print(df_new)

我想得到所有的组合和求和点:

如果颜色用作主颜色,我想求和他的值,如果颜色用作次要颜色,我想求反方向值

示例:

代码语言:javascript
复制
red_yellow =  1 + (-10) = -9
red_black  =  1 + ( +3) =  4
black_red  = -3 + ( -1) = -4

我现在得到的输出是:

代码语言:javascript
复制
         colours  points
0        red_red       2
1     red_yellow      11
2      red_black      -2
3     yellow_red      11
4  yellow_yellow      20
5   yellow_black       7
6      black_red      -2
7   black_yellow       7
8    blac_kblack      -6

我正在寻找的输出:

代码语言:javascript
复制
red_yellow       -9
red_black         4
yellow_red        9
yellow_black     13
black_red        -4
black_yellow    -13

我不知道如何将我的逻辑应用到这段代码中,而且我打赌有一种更简单的方法可以在不执行两个循环的情况下获得所有组合,但目前,这是我唯一想到的事情。

我想:

当我们像red_red这样获得20个输入colours

  • remove副本时,
  • 获得当之无愧的性能
EN

回答 4

Stack Overflow用户

回答已采纳

发布于 2018-08-02 03:29:58

以下是几个替代方案的timeit比较。

代码语言:javascript
复制
| method             | ms per loop |
|--------------------+-------------|
| alt2               |        2.36 |
| using_concat       |        3.26 |
| using_double_merge |        22.4 |
| orig               |        22.6 |
| alt                |        45.8 |

使用IPython生成timeit结果

代码语言:javascript
复制
In [138]: df = make_df(20)

In [143]: %timeit alt2(df)
100 loops, best of 3: 2.36 ms per loop
In [140]: %timeit orig(df)
10 loops, best of 3: 22.6 ms per loop
In [142]: %timeit alt(df)
10 loops, best of 3: 45.8 ms per loop
In [169]: %timeit using_double_merge(df)
10 loops, best of 3: 22.4 ms per loop
In [170]: %timeit using_concat(df)
100 loops, best of 3: 3.26 ms per loop

代码语言:javascript
复制
import numpy as np
import pandas as pd


def alt(df):
    df['const'] = 1
    result = pd.merge(df, df, on='const', how='outer')
    result = result.loc[(result['colour_x'] != result['colour_y'])]

    result['color'] = result['colour_x'] + '_' + result['colour_y']
    result['points'] = result['points_x'] - result['points_y']
    result = result[['color', 'points']]
    return result

def alt2(df):
     points = np.add.outer(df['points'], -df['points'])
     color = pd.MultiIndex.from_product([df['colour'], df['colour']])
     mask = color.labels[0] != color.labels[1]
     color = color.map('_'.join)
     result = pd.DataFrame({'points':points.ravel(), 'color':color})
     result = result.loc[mask]
     return result

def orig(df):
    combos = []
    points = []

    for i1 in range(len(df)):
        for i2 in range(len(df)):
            colour_main = df['colour'].iloc[i1]
            colour_secondary = df['colour'].iloc[i2]
            if colour_main != colour_secondary:
                combo = colour_main + "_" + colour_secondary

                point1 = df['points'].values[i1]
                point2 = df['points'].values[i2]
                new_points = point1 - point2

                combos.append(combo)
                points.append(new_points)

    return pd.DataFrame({'color':combos, 'points':points})

def using_concat(df):
    """https://stackoverflow.com/a/51641085/190597 (RafaelC)"""
    d = df.set_index('colour').to_dict()['points']
    s = pd.Series(list(itertools.combinations(df.colour, 2)))
    s = pd.concat([s, s.transform(lambda k: k[::-1])])
    v = s.map(lambda k: d[k[0]] - d[k[1]])
    df2 = pd.DataFrame({'comb': s.str.get(0)+'_' + s.str.get(1), 'values': v})
    return df2

def using_double_merge(df):
    """https://stackoverflow.com/a/51641007/190597 (sacul)"""    
    new = (df.reindex(pd.MultiIndex.from_product([df.colour, df.colour]))
           .reset_index()
           .drop(['colour', 'points'], 1)
           .merge(df.set_index('colour'), left_on='level_0', right_index=True)
           .merge(df.set_index('colour'), left_on='level_1', right_index=True))

    new['points_y'] *= -1
    new['sum'] = new.sum(axis=1)
    new = new[new.level_0 != new.level_1].drop(['points_x', 'points_y'], 1)
    new['colours'] = new[['level_0', 'level_1']].apply(lambda x: '_'.join(x),1)
    return new[['colours', 'sum']]

def make_df(N):
    df = pd.DataFrame({'colour': np.arange(N), 
                       'points': np.random.randint(10, size=N)})
    df['colour'] = df['colour'].astype(str)
    return df

alt2的主要思想是使用np.add_outerdf['points']之外构造一个加法表

代码语言:javascript
复制
In [149]: points = np.add.outer(df['points'], -df['points'])
In [151]: points
Out[151]: 
array([[  0,  -9,   4],
       [  9,   0,  13],
       [ -4, -13,   0]])

ravel用于使数组成为一维的:

代码语言:javascript
复制
In [152]: points.ravel()
Out[152]: array([  0,  -9,   4,   9,   0,  13,  -4, -13,   0])

颜色组合是用pd.MultiIndex.from_product生成的

代码语言:javascript
复制
In [153]: color = pd.MultiIndex.from_product([df['colour'], df['colour']])
In [155]: color = color.map('_'.join)
In [156]: color
Out[156]: 
Index(['red_red', 'red_yellow', 'red_black', 'yellow_red', 'yellow_yellow',
       'yellow_black', 'black_red', 'black_yellow', 'black_black'],
      dtype='object')

生成掩码以删除重复项:

代码语言:javascript
复制
mask = color.labels[0] != color.labels[1]

然后从这些部分生成result

代码语言:javascript
复制
 result = pd.DataFrame({'points':points.ravel(), 'color':color})
 result = result.loc[mask]

在我的original answer, here中解释了alt背后的想法。

票数 4
EN

Stack Overflow用户

发布于 2018-08-02 03:31:01

这有点冗长,但可以得到您想要的输出:

代码语言:javascript
复制
new = (df.reindex(pd.MultiIndex.from_product([df.colour, df.colour]))
       .reset_index()
       .drop(['colour', 'points'], 1)
       .merge(df.set_index('colour'), left_on='level_0', right_index=True)
       .merge(df.set_index('colour'), left_on='level_1', right_index=True))


new['points_x'] *= -1

new['sum'] = new.sum(axis=1)

new = new[new.level_0 != new.level_1].drop(['points_x', 'points_y'], 1)

new['colours'] = new[['level_0', 'level_1']].apply(lambda x: '_'.join(x),1)


>>> new
  level_0 level_1  sum       colours
3  yellow     red   -9    yellow_red
6   black     red    4     black_red
1     red  yellow    9    red_yellow
7   black  yellow   13  black_yellow
2     red   black   -4     red_black
5  yellow   black  -13  yellow_black
票数 2
EN

Stack Overflow用户

发布于 2018-08-02 03:36:19

代码语言:javascript
复制
d = df.set_index('colour').to_dict()['points']
s = pd.Series(list(itertools.combinations(df.colour, 2)))
s = pd.concat([s, s.transform(lambda k: k[::-1])])
v = s.map(lambda k: d[k[0]] - d[k[1]])
df2= pd.DataFrame({'comb': s.str.get(0)+'_' + s.str.get(1), 'values': v})

    comb             values
0   red_yellow       -9
1   red_black        4
2   yellow_black     13
0   yellow_red       9
1   black_red        -4
2   black_yellow    -13
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51640790

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档