问使用Pandas中其他行的排列生成新行
EN

Stack Overflow用户

提问于 2020-02-10 14:10:10

回答 2查看 177关注 0票数 2

我有一个比赛结果的数据框架(每个比赛有14个参赛者)，看起来像这样：

df = race_id A0 B0 C0 A1 B1 C1 A2 B2 C2 ... A13 B13 C13 WINNER
       1     2   3 0  9  1   3  4  5 1       1   2   3   3
       2     1   5 2  7  3   2  8  6 0       6   4   1   9
       .....

我想在多元逻辑回归模型上训练数据。然而，根据目前的数据，该模型将对置换参与者很敏感。例如，如果模型被赋予记录

race_id A0 B0 C0 A1 B1 C1 A2 B2 C2 ... A13 B13 C13 WINNER
3       9  1   3  2  3 0  4  5 1       1   2   3   3

这只是将参与者0的特征更改为比赛1中的参与者1的特征，即使输入是相同的，模型也会为获胜者输出不同的预测。

因此，我希望为数据中具有相同获胜者的每一场比赛生成随机100个排列，以训练模型以适应排列。我如何为这个数据帧创建这100个样本排列(同时保留每个参赛者的A，B，C特征？

pandas

machine-learning

python

腾讯云OCR文字识别特惠

文字识别限时抢购，热门产品低至14.9元

回答 2

Stack Overflow用户

发布于 2020-02-10 16:03:39

在我们开始之前，这不是一个好的建模比赛结果的方法。

但是，如果您想要这样做，您需要对列名进行置换和重新映射，然后将得到的排列组合在一起。首先，通过解析列名动态地创建参与者列表：

participants = [col[1:] for col in df.columns if col.startswith('A')]

然后遍历这些参与者的排列，并应用列名重新映射：

import itertools


# Create an empty dataframe to hold our permuted races
races = pd.DataFrame()
for permutation in list(itertools.permutations(participants)):

  # Create the mapping of participants from the permutation
  mapping = {p:permutation[i] for i, p in enumerate(participants)}

  # From the participant mapping, create a column mapping
  columns = {}
  for col in df.columns:
    for old, new in mapping.items():
      if col.endswith(old):
        columns[col] = col.replace(old, new)

  # Remap column names
  race = df.rename(columns=columns)

  # Reassign the winner based on the mapping
  race['WINNER'] = race.apply(lambda row: mapping[row['WINNER']], axis=1)

  # Collect the races
  races = pd.concat([races, race])

票数 1

Stack Overflow用户

发布于 2020-02-10 18:34:37

这里有一个用三元组排列填充数据帧的选项，其中df是数据帧(我省略了winner列映射；请参阅chunkwise实现)。

请注意，rand_row只是我出于示例目的而创建的随机行。它填充了从1到10的值(就像在给定的dataframe中一样)，并且有40列(1个代表比赛id，每个参赛者加上13*3 )，当然，你可以改变它：

import random
import itertools

def chunkwise(t, size=2):
    it = iter(t)
    return zip(*[it]*size)

def fill(df, size):
    rand_row = [random.randrange(1, 10) for _ in range(0, 13*3)]
    triplets = list(chunkwise(rand_row, 3))
    for i in range(size):
        shuffeled = random.sample(triplets, len(triplets))
        flattened = [item for triplet in shuffeled for item in triplet]
        df.loc[i] = [i+1] + flattened
    return df;