梯度消失、梯度爆炸以及Kaggle房价预测

1. 梯度消失和梯度爆炸
2. 考虑到环境因素的其他问题
3. Kaggle房价预测

梯度消失和梯度爆炸

，输出层

。为了便于讨论，不考虑偏差参数，且设所有隐藏层的激活函数为恒等映射（identity mapping）

。给定输入

，多层感知机的第

。此时，如果层数

（消失）和

（爆炸）的乘积。当层数较多时，梯度的计算也容易出现消失或爆炸。

随机初始化模型参数

（删去

image

Xavier随机初始化

，输出个数为

，Xavier随机初始化将使该层中权重参数的每个元素都随机采样于均匀分布

cat

cat

dog

dog

image

image

image

image

cat

cat

dog

dog

image

image

image

image

image

Kaggle 房价预测实战

%matplotlib inline
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l
print(torch.__version__)
torch.set_default_tensor_type(torch.FloatTensor)
1.3.0

获取和读取数据集

test_data = pd.read_csv("/home/kesci/input/houseprices2807/house-prices-advanced-regression-techniques/test.csv")
train_data = pd.read_csv("/home/kesci/input/houseprices2807/house-prices-advanced-regression-techniques/train.csv")

train_data.shape
(1460, 81)

test_data.shape
(1459, 80)

train_data.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]]

all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:]))

预处理数据

，标准差为

。那么，我们可以将该特征的每个值先减去

numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index
all_features[numeric_features] = all_features[numeric_features].apply(
lambda x: (x - x.mean()) / (x.std()))
# 标准化后，每个数值特征的均值变为0，所以可以直接用0来替换缺失值
all_features[numeric_features] = all_features[numeric_features].fillna(0)

# dummy_na=True将缺失值也当作合法的特征值并为其创建指示特征
all_features = pd.get_dummies(all_features, dummy_na=True)
all_features.shape
(2919, 331)

n_train = train_data.shape[0]
train_features = torch.tensor(all_features[:n_train].values, dtype=torch.float)
test_features = torch.tensor(all_features[n_train:].values, dtype=torch.float)
train_labels = torch.tensor(train_data.SalePrice.values, dtype=torch.float).view(-1, 1)

训练模型

loss = torch.nn.MSELoss()

def get_net(feature_num):
net = nn.Linear(feature_num, 1)
for param in net.parameters():
nn.init.normal_(param, mean=0, std=0.01)
return net

，它的定义为

def log_rmse(net, features, labels):
# 将小于1的值设成1，使得取对数时数值更稳定
clipped_preds = torch.max(net(features), torch.tensor(1.0))
rmse = torch.sqrt(2 * loss(clipped_preds.log(), labels.log()).mean())
return rmse.item()

def train(net, train_features, train_labels, test_features, test_labels,
num_epochs, learning_rate, weight_decay, batch_size):
train_ls, test_ls = [], []
dataset = torch.utils.data.TensorDataset(train_features, train_labels)
net = net.float()
for epoch in range(num_epochs):
for X, y in train_iter:
l = loss(net(X.float()), y.float())
l.backward()
optimizer.step()
train_ls.append(log_rmse(net, train_features, train_labels))
if test_labels is not None:
test_ls.append(log_rmse(net, test_features, test_labels))
return train_ls, test_ls

K折交叉验证

def get_k_fold_data(k, i, X, y):
# 返回第i折交叉验证时所需要的训练和验证数据
assert k > 1
fold_size = X.shape[0] // k
X_train, y_train = None, None
for j in range(k):
idx = slice(j * fold_size, (j + 1) * fold_size)
X_part, y_part = X[idx, :], y[idx]
if j == i:
X_valid, y_valid = X_part, y_part
elif X_train is None:
X_train, y_train = X_part, y_part
else:
X_train = torch.cat((X_train, X_part), dim=0)
y_train = torch.cat((y_train, y_part), dim=0)
return X_train, y_train, X_valid, y_valid

def k_fold(k, X_train, y_train, num_epochs,
learning_rate, weight_decay, batch_size):
train_l_sum, valid_l_sum = 0, 0
for i in range(k):
data = get_k_fold_data(k, i, X_train, y_train)
net = get_net(X_train.shape[1])
train_ls, valid_ls = train(net, *data, num_epochs, learning_rate,
weight_decay, batch_size)
train_l_sum += train_ls[-1]
valid_l_sum += valid_ls[-1]
if i == 0:
d2l.semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'rmse',
range(1, num_epochs + 1), valid_ls,
['train', 'valid'])
print('fold %d, train rmse %f, valid rmse %f' % (i, train_ls[-1], valid_ls[-1]))
return train_l_sum / k, valid_l_sum / k

模型选择

k, num_epochs, lr, weight_decay, batch_size = 5, 100, 5, 0, 64
train_l, valid_l = k_fold(k, train_features, train_labels, num_epochs, lr, weight_decay, batch_size)
print('%d-fold validation: avg train rmse %f, avg valid rmse %f' % (k, train_l, valid_l))
fold 0, train rmse 0.241365, valid rmse 0.223083
fold 1, train rmse 0.229118, valid rmse 0.267488
fold 2, train rmse 0.232072, valid rmse 0.237995
fold 3, train rmse 0.238050, valid rmse 0.218671
fold 4, train rmse 0.231004, valid rmse 0.259185
5-fold validation: avg train rmse 0.234322, avg valid rmse 0.241284

预测并在Kaggle中提交结果

def train_and_pred(train_features, test_features, train_labels, test_data,
num_epochs, lr, weight_decay, batch_size):
net = get_net(train_features.shape[1])
train_ls, _ = train(net, train_features, train_labels, None, None,
num_epochs, lr, weight_decay, batch_size)
d2l.semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'rmse')
print('train rmse %f' % train_ls[-1])
preds = net(test_features).detach().numpy()
test_data['SalePrice'] = pd.Series(preds.reshape(1, -1)[0])
submission = pd.concat([test_data['Id'], test_data['SalePrice']], axis=1)
submission.to_csv('./submission.csv', index=False)
# sample_submission_data = pd.read_csv("../input/house-prices-advanced-regression-techniques/sample_submission.csv")

train_and_pred(train_features, test_features, train_labels, test_data, num_epochs, lr, weight_decay, batch_size)

0 条评论

• 机器学习（十二）交叉验证实例

假设有个未知模型具有一个或多个待定的参数，且有一个数据集能够反映该模型的特征属性（训练集）。

• 动手学深度学习(四) 过拟合欠拟合及其解决方案

在解释上述现象之前，我们需要区分训练误差（training error）和泛化误差（generalization error）。通俗来讲，前者指模型在训练数据集...

• 动手学深度学习(二) Softmax与分类模型

softmax运算符（softmax operator）解决了以上两个问题。它通过下式将输出值变换成值为正且和为1的概率分布：

• Pytorch实战Kaggle房价预测比赛

这是分享的第一个Kaggle比赛，也是Kaggle中难度最低的比赛之一，房价预测是一个回归问题，给出了房子的一些特征要求预测房子的价格。本文使用Pytorch构...

使用高斯核函数方式把数据维度扩展到无限维度进而得到一条粗壮的分界线。仔细看一下这个分割函数，其实就是一些Gaussian函数的线性组合，y就是增长的方向。 ...

使用高斯核函数方式把数据维度扩展到无限维度进而得到一条粗壮的分界线。仔细看一下这个分割函数，其实就是一些Gaussian函数的线性组合，y就是增长的方向。 ...

• 【机器学习】--模型评估指标之混淆矩阵，ROC曲线和AUC面积

实际上非常简单，精确率是针对我们预测结果而言的，它表示的是预测为正的样本中有多少是真正的正样本。那么预测为正就有两种可能了，一种就是把正类预测为正类(TP)，另...

• 《模式识别与智能计算》基于类中心的欧式距离法分类

算法过程： 1 选取某一样本 2 计算类中心 3 计算样本与每一类的类中心距离，这里采用欧式距离 4 循环计算待测样品和训练集中各类中心距离找出距离待测...

• 模型|利用Python语言做逻辑回归算法

问题是这些预测对于分类来说是不合理的，因为真实的概率必然在0到1之间。为了避免这个问题，我们必须使用一个函数对p(X)建模，该函数为X的所有值提供0到1之间的输...