Python pandas数据帧获取列值的所有组合?

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (2)
  • 关注 (0)
  • 查看 (107)

我有一个pandas数据框,如下所示:

        colour   points
0         red         1
1      yellow        10
2       black        -3

然后我试着做以下算法:

combos = []
points = []

for i1 in range(len(df)):
  for i2 in range(len(df)):

    colour_main      = df['colour'].values[i1]
    colour_secondary = df['colour'].values[i2]
    combo = colour_main + "_" + colour_secondary

    point1 = df['points'].values[i1]
    point2 = df['points'].values[i2]
    new_points = point1 + point2

    combos.append(combo)
    points.append(new_points)

df_new = pd.DataFrame({'colours': combos,
                       'points': points})

print(df_new)

我想获得所有组合和总和:

  • 如果颜色用作主要我想要总结他的价值
  • 如果颜色用作次要我想要相加的值

例:

red_yellow =  1 + (-10) = -9
red_black  =  1 + ( +3) =  4
black_red  = -3 + ( -1) = -4

我目前获得的输出:

         colours  points
0        red_red       2
1     red_yellow      11
2      red_black      -2
3     yellow_red      11
4  yellow_yellow      20
5   yellow_black       7
6      black_red      -2
7   black_yellow       7
8    blac_kblack      -6

我正在寻找的输出:

red_yellow       -9
red_black         4
yellow_red        9
yellow_black     13
black_red        -4
black_yellow    -13

我不知道如何将我的逻辑应用于此代码,我还打赌有一种更简单的方法可以在不进行两个循环的情况下获得所有组合,但是目前,这是我唯一想到的。

我想要:

  • 得到应得的产出
  • 在我们得到20种输入颜色的情况下改善性能
  • 删除像red_red这样的重复项
提问于
用户回答回答于

以下是timeit一些替代方案的比较。

| method             | ms per loop |
|--------------------+-------------|
| alt2               |        2.36 |
| using_concat       |        3.26 |
| using_double_merge |        22.4 |
| orig               |        22.6 |
| alt                |        45.8 |

timeit使用产生的结果IPython

In [138]: df = make_df(20)

In [143]: %timeit alt2(df)
100 loops, best of 3: 2.36 ms per loop
In [140]: %timeit orig(df)
10 loops, best of 3: 22.6 ms per loop
In [142]: %timeit alt(df)
10 loops, best of 3: 45.8 ms per loop
In [169]: %timeit using_double_merge(df)
10 loops, best of 3: 22.4 ms per loop
In [170]: %timeit using_concat(df)
100 loops, best of 3: 3.26 ms per loop
import numpy as np
import pandas as pd


def alt(df):
    df['const'] = 1
    result = pd.merge(df, df, on='const', how='outer')
    result = result.loc[(result['colour_x'] != result['colour_y'])]

    result['color'] = result['colour_x'] + '_' + result['colour_y']
    result['points'] = result['points_x'] - result['points_y']
    result = result[['color', 'points']]
    return result

def alt2(df):
     points = np.add.outer(df['points'], -df['points'])
     color = pd.MultiIndex.from_product([df['colour'], df['colour']])
     mask = color.labels[0] != color.labels[1]
     color = color.map('_'.join)
     result = pd.DataFrame({'points':points.ravel(), 'color':color})
     result = result.loc[mask]
     return result

def orig(df):
    combos = []
    points = []

    for i1 in range(len(df)):
        for i2 in range(len(df)):
            colour_main = df['colour'].iloc[i1]
            colour_secondary = df['colour'].iloc[i2]
            if colour_main != colour_secondary:
                combo = colour_main + "_" + colour_secondary

                point1 = df['points'].values[i1]
                point2 = df['points'].values[i2]
                new_points = point1 - point2

                combos.append(combo)
                points.append(new_points)

    return pd.DataFrame({'color':combos, 'points':points})

def using_concat(df):
    """https://stackoverflow.com/a/51641085/190597 (RafaelC)"""
    d = df.set_index('colour').to_dict()['points']
    s = pd.Series(list(itertools.combinations(df.colour, 2)))
    s = pd.concat([s, s.transform(lambda k: k[::-1])])
    v = s.map(lambda k: d[k[0]] - d[k[1]])
    df2 = pd.DataFrame({'comb': s.str.get(0)+'_' + s.str.get(1), 'values': v})
    return df2

def using_double_merge(df):
    """https://stackoverflow.com/a/51641007/190597 (sacul)"""    
    new = (df.reindex(pd.MultiIndex.from_product([df.colour, df.colour]))
           .reset_index()
           .drop(['colour', 'points'], 1)
           .merge(df.set_index('colour'), left_on='level_0', right_index=True)
           .merge(df.set_index('colour'), left_on='level_1', right_index=True))

    new['points_y'] *= -1
    new['sum'] = new.sum(axis=1)
    new = new[new.level_0 != new.level_1].drop(['points_x', 'points_y'], 1)
    new['colours'] = new[['level_0', 'level_1']].apply(lambda x: '_'.join(x),1)
    return new[['colours', 'sum']]

def make_df(N):
    df = pd.DataFrame({'colour': np.arange(N), 
                       'points': np.random.randint(10, size=N)})
    df['colour'] = df['colour'].astype(str)
    return df

主要思想alt2是用于np.add_outer构建一个附加表df['points']

In [149]: points = np.add.outer(df['points'], -df['points'])
In [151]: points
Out[151]: 
array([[  0,  -9,   4],
       [  9,   0,  13],
       [ -4, -13,   0]])

ravel 用于使数组1维:

In [152]: points.ravel()
Out[152]: array([  0,  -9,   4,   9,   0,  13,  -4, -13,   0])

并生成颜色组合pd.MultiIndex.from_product

In [153]: color = pd.MultiIndex.from_product([df['colour'], df['colour']])
In [155]: color = color.map('_'.join)
In [156]: color
Out[156]: 
Index(['red_red', 'red_yellow', 'red_black', 'yellow_red', 'yellow_yellow',
       'yellow_black', 'black_red', 'black_yellow', 'black_black'],
      dtype='object')

生成掩码以删除重复项:

mask = color.labels[0] != color.labels[1]

然后result从这些部分生成:

 result = pd.DataFrame({'points':points.ravel(), 'color':color})
 result = result.loc[mask]

alt我的原始答案在这里解释了背后的想法。

用户回答回答于

这有点啰嗦,但可以获得你想要的输出:

new = (df.reindex(pd.MultiIndex.from_product([df.colour, df.colour]))
       .reset_index()
       .drop(['colour', 'points'], 1)
       .merge(df.set_index('colour'), left_on='level_0', right_index=True)
       .merge(df.set_index('colour'), left_on='level_1', right_index=True))


new['points_x'] *= -1

new['sum'] = new.sum(axis=1)

new = new[new.level_0 != new.level_1].drop(['points_x', 'points_y'], 1)

new['colours'] = new[['level_0', 'level_1']].apply(lambda x: '_'.join(x),1)


>>> new
  level_0 level_1  sum       colours
3  yellow     red   -9    yellow_red
6   black     red    4     black_red
1     red  yellow    9    red_yellow
7   black  yellow   13  black_yellow
2     red   black   -4     red_black
5  yellow   black  -13  yellow_black

所属标签

可能回答问题的人

  • 天使的炫翼

    15 粉丝531 提问35 回答
  • 富有想象力的人

    2 粉丝0 提问26 回答
  • 旺仔小小鹿

    社区 · 运营 (已认证)

    48 粉丝0 提问26 回答
  • 发条丶魔灵1

    6 粉丝525 提问25 回答

扫码关注云+社区

领取腾讯云代金券