# 【学术】独热编码如何在Python中排列数据？

## 教程概述

1. 独热编码是什么？

2. 手动独热编码

3. 独热编码和scikit-learn

4. 独热编码与Keras

### 独热编码是什么？

`'red','red','green'`

`0,0,1`

```[1,0]
[1,0]
[0,1]```

## 手动独热编码

`hello world`

```from numpyimport argmax
# define input string
data= 'hello world'
print(data)
# define universe of possible input values
alphabet= 'abcdefghijklmnopqrstuvwxyz '
# define a mapping of chars to integers
char_to_int= dict((c, i)for i, cin enumerate(alphabet))
int_to_char= dict((i, c)for i, cin enumerate(alphabet))
# integer encode input data
integer_encoded= [char_to_int[char]for charin data]
print(integer_encoded)
# one hot encode
onehot_encoded= list()
for valuein integer_encoded:
letter= [0 for _in range(len(alphabet))]
letter[value]= 1
onehot_encoded.append(letter)
print(onehot_encoded)
# invert encoding
inverted= int_to_char[argmax(onehot_encoded[0])]
print(inverted)```

```hello world

[7,4,11,11,14,26,22,14,17,11,3]

[[0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]]

h```

## 独热编码和scikit-learn

```"cold"
"warm"
"hot"```

10个时间步长的示例序列可能是:

`cold, cold, warm, cold, hot, hot, warm, cold, warm, hot`

```from numpyimport array
from numpyimport argmax
from sklearn.preprocessingimport LabelEncoder
from sklearn.preprocessingimport OneHotEncoder
# define example
data= ['cold','cold','warm','cold','hot','hot','warm','cold','warm','hot']
values= array(data)
print(values)
# integer encode
label_encoder= LabelEncoder()
integer_encoded= label_encoder.fit_transform(values)
print(integer_encoded)
# binary encode
onehot_encoder= OneHotEncoder(sparse=False)
integer_encoded= integer_encoded.reshape(len(integer_encoded),1)
onehot_encoded= onehot_encoder.fit_transform(integer_encoded)
print(onehot_encoded)
# invert first example
inverted= label_encoder.inverse_transform([argmax(onehot_encoded[0, :])])
print(inverted)```

```['cold' 'cold' 'warm' 'cold' 'hot' 'hot' 'warm' 'cold' 'warm' 'hot']

[0 0 2 0 1 1 2 0 2 1]

[[1.  0.  0.]
[1.  0.  0.]
[0.  0.  1.]
[1.  0.  0.]
[0.  1.  0.]
[0.  1.  0.]
[0.  0.  1.]
[1.  0.  0.]
[0.  0.  1.]
[0.  1.  0.]]

['cold']```

## 独热编码与Keras

`data= [1,3,2,0,3,2,2,1,0,1]`

```from numpyimport array
from numpyimport argmax
from keras.utilsimport to_categorical
# define example
data= [1,3,2,0,3,2,2,1,0,1]
data= array(data)
print(data)
# one hot encode
encoded= to_categorical(data)
print(encoded)
# invert encoding
inverted= argmax(encoded[0])
print(inverted)```

```[1 3 2 0 3 2 2 1 0 1]

[[0.  1.  0.  0.]
[0.  0.  0.  1.]
[0.  0.  1.  0.]
[1.  0.  0.  0.]
[0.  0.  0.  1.]
[0.  0.  1.  0.]
[0.  0.  1.  0.]
[0.  1.  0.  0.]
[1.  0.  0.  0.]
[0.  1.  0.  0.]]

1```

• 什么是整数编码和独热编码，为什么它们在机器学习中是必需的。
• 如何在Python中动手计算一个整数编码和独热编码。
• 如何使用scikit-learn和Keras库来自动对Python中的序列数据进行编码。

1848 篇文章92 人订阅

0 条评论

## 相关文章

43860

63350

30080

10020

### 机器学习三剑客之NumpyNumpy计算(重要)

NumPy是Python语言的一个扩充程序库。支持高级大量的维度数组与矩阵运算，此外也针对数组运算提供大量的数学函数库。Numpy内部解除了Python的PI...

37660

43890

27710

### Python： numpy总结(3)

21、dot矩阵点积 例子： ll = [[1,2,3],[4,5,6],[7,8,9]]ld = dot(ll,ll) print 'dot:',l...

34840

### tf API 研读2：math

TF API数学计算 tf...... ：math （1）刚开始先给一个运行实例。         tf是基于图（Graph）的计算系统。而图的节点则是由操作（...

83550

30280