import pandas as pd
data = pd.DataFrame({'ratio' : [0.25,0.20,0.45,0.10],
        'range': ['1-25','26-50','51-75','76-100']})
degree = pd.DataFrame({'degree':[1,2,5,10,15,13,25,24,26,27,35,40,44,50,73, 80]})我需要添加一个新的列,基于作为范围列出的间隔条件。例如,如果自由度=1,属于1-25的范围,那么新列应该是0.25。如果度数= 24,范围为26-50,则新列应为0.20。
我使用下面的代码,首先将度划分为间隔,然后使用np.select。
degree_bin = pd.interval_range(start = 1, end = 101, freq = 25, closed = 'left')
degree['bin'] = pd.cut(degree['degree'], bins = degree_bin)
degree['bin'] = degree['bin'].astype('category') # change into 'category' var
choicelist = data['ratio'].tolist()
condlist  = [degree['bin'] == pd.Interval(1,26,closed = 'left'),
             degree['bin'] == pd.Interval(26,51,closed = 'left'),
             degree['bin'] == pd.Interval(51,76,closed = 'left'),
             degree['bin'] == pd.Interval(76,101,closed = 'left'),
            ]
degree['f'] = np.select(condlist, choicelist)但我有超过80个间隔,从1到2000年,我的整个数据集的频率为25。我怎样才能更有效地写出吊唁表呢?我可能需要调整使用pd.cut的方式,以便bin生成的度‘bin’与数据“范围”相匹配。
发布于 2021-01-03 23:31:59
我会用merge_asof
import pandas as pd
data = pd.DataFrame ({'ratio' : [0.25,0.20,0.45,0.10],
        'range': ['1-25','26-50','51-75','76-100']})
degree = pd.DataFrame ({'degree':[1,2,5,10,15,13,25,24,26,27,35,40,44,50,73, 80]})
min_max[['degree', 'max']] = data['range'].str.split("-", n=1, expand = True)
data['degree'] = min_max['degree']
data['degree'] = data['degree'].astype(int)
degree['degree'] = degree['degree'].astype(int)
degree = degree.sort_values(by='degree')
pd.merge_asof(degree, data, on="degree", direction="backward")结果:

https://stackoverflow.com/questions/65555617
复制相似问题