# 机器学习笔记——特征标准化

`EX = (x - mean)/σ`

R语言中的特征标准化：

```library("caTools")
library("scales")
data(iris)

split = sample.split(iris\$Species,SplitRatio = .8)
train_data = subset(iris,split == TRUE)
test_data  = subset(iris,split == FALSE)

train_data[,-5] = apply(train_data[,-5],2,rescale,to = c(0,1))
test_data[,-5] = apply(test_data[,-5],2,rescale,to = c(0,1))```

```range(train_data[,1])
range(apply(train_data[,-5],2,rescale,to = c(0,1))[,1])

[1] 4.3 7.7
[1] 0 1```

```scale1 = function(x){
(x - min(x))/(max(x) - min(x))
}
range(apply(train_data[,-5],2,scale1)[,1])
[1] 0 1```

z-score标准化

z-score标准化可以通过scale函数快速实现。

```train_data[,-5] = scale(train_data[,-5])

mean(train_data[,1]);sd(train_data[,1])
[1] 5.869167
[1] 0.8259241

mean(scale(train_data[,-5])[,1]);sd(scale(train_data[,-5])[,1])
[1] 0
[1] 1

#自定义一个z-score标准化函数
z_norm = function(x){
(x - mean(x))/sd(x)
}

mean(apply(train_data[,-5],2,z_norm)[,1]);sd(apply(train_data[,-5],2,z_norm)[,1])
[1] 0
[1] 1```

Python中sk-learn库中有专门用于处理以上两种标准化的函数。

```from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import pandas as pdimport numpy as np

data = iris['data']
iris_data = pd.DataFrame(
data = data,
columns = ['sepal_length','sepal_width','petal_length','petal_width']
)
iris_data["Species"] = iris[ 'target']
iris_data["Species"] = iris_data["Species"].map({0:"setosa",1:"versicolor",2:"virginica"})
x,y = iris_data.iloc[:,0:-1],iris_data.iloc[:,-1]
train_data,_data,train_target,test_target = train_test_split(x,y,test_size = 0.2,stratify = y)```

Python中的0-1标准化

```min_max_scaler = preprocessing.MinMaxScaler() #实例化0-1标准化方法
X_train_minmax = min_max_scaler.fit_transform(test_data.iloc[:,0:4].values)
X_test_minmax  = min_max_scaler.transform(test_data.iloc[:,0:4].values)

X_train_minmax[:,0].max() - X_train_minmax[:,0].min()1.0```

Python中的z-score标准化

```train_data.iloc[:,0].mean();train_data.iloc[:,0].std()
5.86166666666666
40.8416853174847874
sc_X = preprocessing.StandardScaler()     #实例化z-score标准化方法
X_train = sc_X.fit_transform(train_data.iloc[:,0:4].values)
X_test  = sc_X.transform(test_data.iloc[:,0:4].values)

X_train[:,0].mean();X_train[:,0].std()
-2.2907601741432396e-151.0```

https://www.coursera.org/learn/machine-learning

458 篇文章65 人订阅

0 条评论

## 相关文章

### 深度学习与TensorFlow:FCN论文翻译(二)

Each layer of data in a convnet is a three-dimensional array of size h × w × d, ...

2042

2558

### 卷积神经网络学习笔记

1.卷积神经网络的图像识别原理： 通过过滤函数 来描绘出图像的边界： 过滤函数和图像相同区域的数值进行相乘，得到新的图像， 新图像则只剩下边图像。 cros...

25910

### 开发 | Keras版faster-rcnn算法详解（RPN计算）

AI科技评论按：本文首发于知乎专栏Learning Machine，作者张潇捷， AI科技评论获其授权转载。 前段时间学完Udacity的机器学习和深度学习的课...

61611

4404

3346

3905

2340

2609

3132