我有一个数据,其中一列是美国。我想要创建一个新的列,并根据区域,即南部、西南部等来存储状态。看起来pd.cut只用于连续变量,所以这样的绑定看起来不像一个选项。是否有一种很好的方法来创建一个以另一列中的分类数据为条件的列?
发布于 2019-11-24 09:51:28
import pandas as pd
def label_states (row):
if row['state'] in ['Maine', 'New Hampshire', 'Vermont', 'Massachusetts', 'Rhode Island', 'Connecticut', 'New York', 'Pennsylvania', 'New Jersey']:
return 'north-east'
if row['state'] in ['Wisconsin', 'Michigan', 'Illinois', 'Indiana', 'Ohio', 'North Dakota', 'South Dakota', 'Nebraska', 'Kansas', 'Minnesota', 'Iowa', 'Missouri']:
return 'midwest'
if row['state'] in ['Delaware', 'Maryland', 'District of Columbia', 'Virginia', 'West Virginia', 'North Carolina', 'South Carolina', 'Georgia', 'Florida', 'Kentucky', 'Tennessee', 'Mississippi', 'Alabama', 'Oklahoma', 'Texas', 'Arkansas', 'Louisiana']:
return 'south'
return 'etc'
df = pd.DataFrame([{'state':"Illinois", 'data':"aaa"}, {'state':"Rhode Island",'data':"aba"}, {'state':"Georgia",'data':"aba"}, {'state':"Iowa",'data':"aba"}, {'state':"Connecticut",'data':"bbb"}, {'state':"Ohio",'data':"bbb"}])
df['label'] = df.apply(lambda row: label_states(row), axis=1)
df

发布于 2019-11-23 18:59:35
假设您的df包含:
当然,对于每个状态,它可以包含更多的列和多行。
若要添加区域名称(新列),请定义区域DataFrame,包含列:
然后合并这些DataFrames并将结果保存回df下。
df = df.merge(regions, on='State')部分结果是:
State Name State Region
0 Alabama AL Southeast
1 Arizona AZ Southwest
2 Arkansas AR South
3 California CA West
4 Colorado CO Southwest
5 Connecticut CT Northeast
6 Delaware DE Northeast
7 Florida FL Southeast
8 Georgia GA Southeast
9 Idaho ID Northwest
10 Illinois IL Central
11 Indiana IN Central
12 Iowa IA East North Central
13 Kansas KS South
14 Kentucky KY Central
15 Louisiana LA South当然,对于如何将美国状态分配给区域,有许多变体,因此,如果您想使用其他变体,请根据您的分类定义区域DataFrame。
https://stackoverflow.com/questions/59004206
复制相似问题