# 机器学习(3) -- 贝叶斯及正则化

Content

3. Bayesian statistics and Regularization.

3.1 Underfitting and overfitting.

3.2 Bayesian statistics and regularization.

3.3 Optimize Cost function by regularization.

3.3.1 Regularized linear regression.

3.3.2 Regularized logistic regression.

key words: underfitting, overfitting, regularization, bayesian statistic

## 3.1 Underfitting and overfitting

1. 减少特征的数量
• 尽量选择我们认为具有一般化的特征，除去可能只有训练集才有的特征。（人工的）
• 采用模型选择算法(Model selection algorithm)
2. 正则化(Regularization)

## 3.2 Bayesian statistics and regularization

，如果我们要对新的进行预测，我们可以通过贝叶斯公式算出θ的后验概率(posterior distribution)，即：

（当然也有其他的假设方式）。在实际中，The Bayesian MAP estimate比极大似然估计更好的减少过拟合。例如，用Bayesian Logistic 回归算法可以用来处理特征数远大于训练样本数文本分类问题。

## 3.3 Optimize Cost function by regularization

### 3.3.1 Regularized linear regression

（注意正则化不包括theta0）

Lambda的取值应该合适，如果过大(如10^10)将会导致theta都趋于0，所有的特征量没有被学习到，导致欠拟合。后面将会讨论lambda的取值，现在暂时认为在0~10之间。

### 3.3.2 Regularized logistic regression

matlab实现Logistic regression的该函数代码如下：

```function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters.

m = length(y); % number of training examples
n = size(X,2);   % features number

J = 0;

h = sigmoid(X * theta); % sigmoid functionJ = sum((-y) .* log(h) - (1-y) .* log(1-h)) / m + lambda * sum(theta(2:n) .^ 2) / (2*m);

grad(1) = sum((h - y) .* X(:,1)) / m;
for i = 2:n
grad(i) = sum((h - y) .* X(:,i)) / m + lambda * theta(i) / m;
end

end```

```% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);
% Set regularization parameter lambda to 1 (you can vary this)
lambda = 1;
% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);
% Optimize
[theta, J, exit_flag] = ...
fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);```

648 篇文章121 人订阅

0 条评论

## 相关文章

3654

1806

1011

2024

### RF、GBDT、XGBoost面试级整理

RF、GBDT和XGBoost都属于集成学习（Ensemble Learning），集成学习的目的是通过结合多个基学习器的预测结果来改善单个学习器的泛化能力和鲁...

6056

1933

49310

### Stanford机器学习笔记-3.Bayesian statistics and Regularization

3. Bayesian statistics and Regularization Content 　　3. Bayesian statistics and R...

38117

1152

701