我有以下数据,我试图通过使用for循环迭代列来修改其中的一个片段。
data = {'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25-25', '59-59'],
'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165-171', '175-182'],
'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85-90', '90-95'],
'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19-21', '20-22'],
'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12-15', '12-15'],
'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']}
df = pd.DataFrame(data)
for col in df:
if col =='id':
continue
else:
df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]
但是我遇到了一些奇怪的事情,在运行循环之后,它似乎没有影响到所有的列。我期望这样做:
预期#
pd.DataFrame({'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25', '59'],
'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165', '175'],
'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85', '90'],
'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19', '20'],
'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12', '12'],
'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece', 'New York'],
'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens', 'Albany']})
,但我得到的却是:
pd.DataFrame({'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25', '59'],
'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165', '175'],
'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85', '90'],
'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19', '20'],
'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12', '12'],
'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']})
发布于 2022-10-01 07:31:46
如果您有兴趣知道您的代码中有什么错误,那么请遵循下面的一个;
问题其实在循环中..。在更新最后2列之前,它也会更新“雇用”列.当轮到他们的时候,当他们被更新为‘12’时,他们在就业栏中实际上没有'12-15‘的值.因此,只要更改代码中列的循环顺序就可以解决这个问题,“雇用”将在最后更新.
lst_cols = list(df.columns)
lst_cols.remove('employment')
lst_cols = lst_cols + ['employment']
for col in lst_cols:
if col =='id':
continue
else:
df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]
发布于 2022-10-01 15:36:32
获取您要修改的dataframe片的索引,将它们分配给一个变量(这样它们在运行循环时/之后不会更改),并使用iloc而不是loc。当在基于条件的多列上对数据进行切片时,这也会起作用。
index = df.loc[(df['employment']=='12-15') & (df['weight']=='90-95')].index
for ind,col in enumerate(df.columns):
if ind==0:
continue
else:
df.iloc[index,ind] = df.iloc[index,ind].str.split('-').str[0]
https://stackoverflow.com/questions/73916399
复制相似问题