文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在python中对数据帧中的列进行一次热编码

问如何在python中对数据帧中的列进行一次热编码
EN

Stack Overflow用户

提问于 2020-11-21 09:30:44

回答 2查看 57关注 0票数 0

我有一个数据集，其中包括教育水平的类别列初始值是0，nan，高中，研究生院，大学我已经清理了数据并将其转换为以下值

0->其他1->高中2->研究生院3->大学

在同一列中，现在我想将此列热编码为4列

我已经尝试使用scikit learn，如下所示

onehot_encoder = OneHotEncoder()
onehot_encoded = onehot_encoder.fit_transform(df_csv['EDUCATION'])
print(onehot_encoded)

我得到了这个错误

ValueError: Expected 2D array, got 1D array instead:
array=[3 3 3 ... 3 1 3].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

python

scikit-learn

one-hot-encoding

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-11-21 09:48:14

对于您的特定情况，如果您重塑底层数组(以及设置sparse=False)，它将为您提供一次性编码的数组：

import pandas as pd
from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({'EDUCATION':['high school','high school','high school',
                                'university','university','university',
                                'graduate school', 'graduate school','graduate school',
                                'others','others','others']})

onehot_encoder = OneHotEncoder(sparse=False)
onehot_encoder.fit_transform(df['EDUCATION'].to_numpy().reshape(-1,1))

>>>

array([[0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 1.],
       [0., 0., 0., 1.],
       [1., 0., 0., 0.],
       [1., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.]])

在我看来，最直接的方法是使用pandas.get_dummies

pd.get_dummies(df['EDUCATION'])

票数 1

Stack Overflow用户

发布于 2020-11-21 09:38:07

您需要将sparse设置为False

from sklearn.preprocessing import OneHotEncoder

onehot_encoder = OneHotEncoder(sparse=False)
y_train = np.random.randint(0,4,100)[:,None]
y_train = onehot_encoder.fit_transform(y_train)

或者，您也可以这样做

from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils

y_train = np.random.randint(0,4,100)
encoder = LabelEncoder()
encoder.fit(y_train)
encoded_y = encoder.transform(y_train)
y_train = np_utils.to_categorical(encoded_y)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64938963

复制

相似问题

问如何在python中对数据帧中的列进行一次热编码
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在python中对数据帧中的列进行一次热编码EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在python中对数据帧中的列进行一次热编码
EN