我在我的数据框中有一个分类列的列表,我正在尝试对其进行OneHotEncode。我对每一列单独使用了下面的代码,但是不知道如何迭代我的分类列表来做同样的事情。有人知道怎么做吗?
categoricals = ['bedrooms', 'bathrooms', 'floors', 'condition', 'grade',
'yr_built']
from sklearn.preprocessing import OneHotEncoder
bedrooms = df[['bedrooms']]
bed = OneHotEncoder(categories="auto", sparse=False, handle_unknown="ignore")
bed.fit(bedrooms)
bed_encoded = bed.transform(bedrooms)
bed_encoded = pd.DataFrame(
bed_encoded,
columns=bed.categories_[0],
index=df.index
)
df.drop("bedrooms", axis=1, inplace=True)
df = pd.concat([df, bed_encoded], axis=1)
发布于 2021-07-12 15:07:04
方法:1
首先创建DataFrame。您可以先使用序数编码器,如标签编码器,然后进行一次性编码。
categorical_cols = ['bedrooms', 'bathrooms', 'floors', 'condition', 'grade',
'yr_built']
from sklearn.preprocessing import LabelEncoder
# instantiate labelencoder object
le = LabelEncoder()
# apply le on categorical feature columns
# data is the dataframe
data[categorical_cols] = data[categorical_cols].apply(lambda col:
le.fit_transform(col))
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()
#One-hot-encode the categorical columns.
#outputs an array instead of dataframe.
array_hot_encoded = ohe.fit_transform(data[categorical_cols])
#Convert it to df
data_hot_encoded = pd.DataFrame(array_hot_encoded, index=data.index)
#Extract only the columns that are numeric and don't need to be encoded
data_numeric_cols = data.drop(columns=categorical_cols)
#Concatenate the two dataframes :
data_out = pd.concat([data_hot_encoded, data_numeric_cols], axis=1)
您还可以使用pd.factorize()
将分类数据映射到序数数据。
方法:2个
使用pd.get_dummies()
,这样你就可以直接从原始数据进行一次性编码。(不需要转换成顺序数据)
import pandas as pd
df = pd.get_dummies(data, columns = categorical_cols)
https://stackoverflow.com/questions/68349266
复制