问用于获取dataframe值位置的自定义函数
EN

Stack Overflow用户

提问于 2018-06-25 02:05:26

回答 1查看 117关注 0票数 0

这个问题和我上一个问题差不多，但是我有一个DF。

Index   Batch   Name    List Name
0        1      Jon     Adam
1           
2        2      Adam    Sam
3                       Chris
4        3      Voges   Jon
5           
6        4      Jon     Voges

我想在列表名称中搜索每个值的批号，即Adam、Sam、Chris、Jon和Voges。我想要另一个DF，如下所示

Index   Batch   Name    List Name   BatchNames
0        1      Jon     Adam        Adam(2)
1               
2        2      Adam    Sam         Sam(2)
3                       Chris       Chris(2)
4        3     Voges    Jon         Jon(1,4)
5               
6        4     Jon      Voges       Voges(3)

我想选择每个列表名称并在名称中搜索对应的批号，即1 and 4中存在Jon，依此类推。但如果Listname中的某个名称在Name中不存在，则应选择与其相近的对应批号，例如Name中不存在Sam，但它与Batch 2接近，Chris也是如此。基本上，批次之间存在的任何内容都属于最低批次编号。我该如何为此编写一个自定义函数？

python

dataframe

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-06-25 05:31:37

我会这样做：

import pandas as pd
import numpy as np

def custom_function(df):
    # Forward fill the Batch number
    df_Batch = df.Batch.copy()
    df.Batch.ffill(inplace=True)
    df.Batch = df.Batch.astype(int)
    # Make a new dataframe where we first get batches for the name column
    # and append batches for the list name column, there we be duplicates so we keep the first entry
    a = df.groupby('Name').Batch.apply(tuple).append(df.groupby('List Name').Batch.apply(tuple)).reset_index().groupby('index').first()
    # Create a series which concatenates the Batch number and List Name
    b = pd.Series(a.index.astype(str) + a.Batch.astype(str), index=a.index).replace(',','', regex=True).replace(' ',',',regex=True)
    # undo the forward fill (replace with original columns)
    df.Batch = df_Batch
    # join the series we just made to the dataframe
    return df.merge(b.to_frame().rename_axis('List Name'), how='left', on='List Name', suffixes=['', 'Names']).fillna('')
df = pd.DataFrame({'Batch':[1,np.nan,2,np.nan,3,np.nan,4], 'Name':['Jon',np.nan, 'Adam',np.nan, 'Voges',np.nan, 'Jon'], 'List Name':['Adam', np.nan, 'Sam', 'Chris', 'Jon', np.nan, 'Voges']})
# Out[122]: 
#    Batch   Name List Name
# 0    1.0    Jon      Adam
# 1    NaN    NaN       NaN
# 2    2.0   Adam       Sam
# 3    NaN    NaN     Chris
# 4    3.0  Voges       Jon
# 5    NaN    NaN       NaN
# 6    4.0    Jon     Voges
custom_function(df)
# Out[131]: 
#   Batch   Name List Name BatchNames
# 0     1    Jon      Adam    Adam(2)
# 1                                  
# 2     2   Adam       Sam     Sam(2)
# 3                  Chris   Chris(2)
# 4     3  Voges       Jon   Jon(1,4)
# 5                                  
# 6     4    Jon     Voges   Voges(3)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51012626

复制

相似问题

问用于获取dataframe值位置的自定义函数
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用于获取dataframe值位置的自定义函数EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用于获取dataframe值位置的自定义函数
EN