# 机器学习基本概念-2

## Capacity

The ability to perform well on previously unobserved inputs is called generalization.

The ability of the learner (or called model) to discover a function taken from a family of functions.

1. Linear predictor y = wx + b
2. Quadratic predictor y = W2X^2 + W1X + b
3. Degree-4 polynomial predictor y = b+W1x+W2X^2+W3X^3+W4X^4

Capacity can be measured by the number of training examples {Xi , Yi} that the learner could always fit, no matter how to change the values of Xi and Yi.

## Underfitting && Overfitting

Training: estimate the parameters of f from training examples {Xi , Yi)}.

Training中，我们会定义一个`cost function`或者叫`objective function`，训练中我们需要做的就是最优化次此函数，此时会产生`training error`. 衡量model的generalization，我们会让model作用在一个单独的set上，叫做`test set`，同样的也会产生`generalization error`或者叫`test error`.

1. Make the training error small.
2. Make the gap between training and test error small.

Underfitting就是learner不能有效的学习到training examples的特征. Overfitting就是learning对于training examples fits well，但是泛化能力很弱.

Underfitting occurs when the model is not able to obtain a sufficiently low error value on the training set. Overfitting occurs when the gap between the training error and test error is too large.

model的capacity小的话，可能会出现Underfitting的状况; model的capacity太大的话，可能会出现Overfitting的状况.

Models with low capacity may struggle to fit the training set. Models with high capacity can overfit by memorizing properties of the training set that do not serve them well on the test set.

Overfitting的话,其典型表现如下:

• Model is not rich enough.
• Difficult to find the global optimum of the objective function on the training set or easy to get stuck at local minimum.
• Limitation on the computation resources (not enough training iterations of an iterative optimization procedure).

• The family of functions is too large (compared with the size of the training data) and it contains many functions which all fit the training data well.
• Without sufficient data, the learner cannot distinguish which one is most appropriate and would make an arbitrary choice among these apparently good solutions.
• In most cases, data is contaminated by noise. The learner with large capacity tends to describe random errors or noise instead of the underlying models of data (classes).

optimal capacity 此时test error和training error的gap最小.

• Reduce the number of features.
• Reduce the number of independent parameters.
• Add regularization to the learner.
• Reduce the network size of deep models.
• Reduce the number of training iterations.

89 篇文章41 人订阅

0 条评论

## 相关文章

### 学界 | ImageNet 带来的预训练模型之风，马上要吹进 NLP 领域了

AI 科技评论按：对于计算机视觉领域的研究人员、产品开发人员来说，在 ImageNet 上预训练模型然后再用自己的任务专用数据训练模型已经成了惯例。但是自然语言...

1113

1704

53712

### 【一个深度学习模型解决所有问题】谷歌MultiModel通吃文本、图像、翻译

【新智元导读】我们能够制作出一个统一的深度学习模型，让这个模型解决多个领域的许多不同问题吗？谷歌研究人员提出了一个多模式适用的架构 MultiModel，用单一...

4396

### 数字图像处理的基本原理和常用方法

数字图像处理是指将图像信号转换成数字信号并利用计算机对其进行处理的过程。图像处理最早出现于 20 世纪 50 年代，当时的电子计算机已经发展到一定水平，人们...

1142

1202

2363

35311

835

3486