结构化机器学习项目

原创

大龄老码农-昊然

修改于 2021-05-14 17:41:04

5020

关键词：优化顺序，评估指标，数据集划分，改善模型表现，误差分析。。。

优化顺序

当一个算法最终应用的时候，出了问题，那我们怎么优化呢？优化的方法可能有很多，比如收集更多的数据，让数据的分布更加合理（更加分散，覆盖更多的情况），训练更长时间，使用adam替代梯度下降，尝试更大的网络，更小的网络，尝试dropout，尝试L2正则化，修改网络结构（修改激活函数，网络各层的节点数目）等等。

首先做到在训练集上表现较好（比如使用更大的网络规模，更好的优化算法，提升训练时间等）

其次在开发集上的表现较好（使用正则化，使用更大的训练集等）

再次在测试集上的表现较好（使用更大的开发集等）

在实际使用中表现要好（更换优化函数，检查数据分布等）

评估指标

评估指标可以看出来训练过程中，是不是提升了模型的能力。

TP: 预测为正，实际为正

FP: 预测为正，实际为负

TN:预测为负，实际为负

FN: 预测为负，实际为正

precision 准确率P ：TP / (TP + FN)

recall 召回率R：TP / (TP + FP)

以上两组指标经常是一个相互权衡的指标，一个高，另一个就低。

为了权衡两个指标，提出F1指标。

F1 = 2 / (1 / P + 1 / R) = 2TP / (2TP + FN + FP)

补充几个指标：

精度ACC：判断正确的 / 全部样本

ROC曲线纵坐标TPR： TP / (TP + TN) ，理想为1。

true positive rate：预测为正，实际为正所占比例

ROC曲线横坐标FPR：FP / （FP + TN）

false positive rate：实际为负，预测为正所占比例，理想为0。

除却这些单一数字评估指标之外，这些指标都是可以一直优化的，还有一些指标是可以满足就可以的

例如，运行时间之类的。

训练集，开发集，测试集划分

在传统的较小的数据集上，常用的划分是70%和30%，60%和20%和20%。

在大数据量的情况下，开发集和测试集的目的是了解程序的运行效果，我们可以将其减小。

例如在百万数据量时，可以划分为98%，1%，1%。

train集用来将模型训练的较好

dev集用来选取更好的模型，更新参数，更新模型等等

test集用来对模型进行无偏的估计，估计准确率等情况

什么时候改变开发/测试集以及指标

运算过程中，发现需要添加新的东西，比如之前只看分类误差，结果分类误差较好的模型，将很多色情图片都分类成了想要的类别，就需要更新目标了。

比如在目标函数中，给不同的样本不同的权重，色情图片的权重更高，误分类时候的损失更大。

第一步重新定义损失函数，

第二步重新训练模型，使得符合这个新定义的损失函数

如果训练测试集上面的表现和最终部署使用的时候表现不同，就需要更新评估矩阵以及训练测试集了。

改善模型表现

知道人类的水平，可以知道下一步是应该继续降低误差水平，还是需要降低方差水平，让表现更加稳定。人类的水平可以理解为贝叶斯误差的一个近似替代，当前的模型到人类表现之间的差距就是可避免偏差。

误差分析

误差分析是算法提升的探索方向，首先知道算法表现出错的情况，分析原因，提升算法表现。

收集开发集中误分类的数据
手动检查误分类的原因，分类整理。从中找出线索，下一步增加相关的训练数据等方法改进。

如果发现有训练数据被标记错误，如何解决？

随机出现，不用处理。
系统性出现，比如白色的狗一直被标记为猫，且在开发集中也找到了这样的出错情况，需要手动修正。

修复出错数据需要注意的点？

对训练集和开发集使用相同的处理手段，确保他们来自同样的分布。
在查看分类错误的数据的时候，也考虑一下查看一下分类正确的数据，看看是不是算法之外的原因造成的分类正确。
如果只修正开发集中的错误标注数据的话，可能造成训练集和开发集的数据来自细微不同的分布。这其实也是OK的。

快速搭建系统开始迭代

建立训练集，开发集以及评估指标
快速建立第一个简单的系统
通过偏差/方差的分析来决定下一步前进的方向

在不同的划分上进行训练并测试

训练集和开发集的分布不同

例如，训练集是从网络上下载的猫的图片分辨率很高，开发集是用户收集上传的图片，分辨率较差。

（不推荐）第一个方式是将训练集和开发集混合之后重新分配，不过由于训练集的数量远远超过开发集数量，造成最终的优化结果还是在优化网络下载的图片，而不是用户上传的图片，使得目标发生了偏移。

（推荐）第二种方式，将部分开发集数据分配到训练集中，这样的好处是目标还是正确的，就是优化用户上传的图片的准确率，缺点是训练集和开发集中的数据分布是不同的。

语言识别系统也是这样，比如训练车辆中使用的语言识别系统，训练集可以是各种语料库信息，开发集可以是用户真实使用的场景，可以将开发集数据分一些到训练集中，从而提升系统的表现。

不匹配数据划分的偏差和方差

当训练集和开发集以及测试集的分布不同时，分析偏差和方差的方式可能不一样，这时候如何分析这个问题。

为了解决这样的问题，我们可以从训练集中分割出一个小的部分，这个部分和训练集的分布相同，用这个部分来检测算法的能力，算法的泛化能力到底如何。（训练-开发集）

如果这个数据集（训练-开发集）的表现和训练集的表现差不多，说明算法的泛化能力是不错的，不错存在一个数据不匹配的问题，因为算法没有在开发集的数据相同分布的数据集中训练过。如果这个数据集的表现和开发集差不多，说明算法的泛化能力存在问题。

出现了一个新的需要解决的问题类型，就是数据不匹配问题！

如果开发集的表现和测试集的表现差距很大的话，很可能的问题就是算法对于开发集过拟合了，沃恩只是恰好选择了一个在开发集上面表现很好的算法，这时候，可以通过加大开发集的数据量来解决这个问题。

定位数据不匹配

如果出现了严重的数据不匹配问题（由于开发集和训练集的数据分布不同造成的），该怎么处理

1. 手动了解训练集和开发测试集的数据差异，比如语音识别中开发集的背景噪声比训练集更高等等问题。

2. 让训练集更加解决开发测试集，数据处理（给训练集添加噪声，生成新的训练集等），或者收集更多的信息。比如说噪声数据只有一个小时，而全部的训练数据有几万小时，如果只是简单的将噪声重复，并添加到训练数据中，可能会造成模型对于这部分的噪声过拟合。一定要注意训练集的数据来源信息，原始来源一定要多样化，不然容易过拟合到全部的状态空间的一个很小的子集。

迁移学习

从一个领域学习的知识，可以应用到别的领域，提升表现。

识别猫的模型，可以用来提升扫描X光的效果！！！！

将最后一层以及连接到这一层的权重删除，重新在新的数据集上面训练。

如果训练集较小，可以只训练最后的一层（或几层）网络权重，其他的保持不变；

如果训练集可以的话，可以训练整个网络，之前的网络权重就是这个网络的权重初始值。这种情况下，前面的训练称为预训练，后面的训练称为微调。

为什么这样做是可以的，因为网络的浅层处理的是边缘检测，曲线检测等浅层特征是类似的，从别的数据集学习的技能也可以应用到新的数据集。

也可以在之前的基础上，去除最后一层以及连接的权重，添加新的一层或者几层，随机初始化之后，在新的数据集上面训练。

迁移学习的使用场景是对于迁移来源问题，数据量足够（比如较为一般的语音识别任务），而对于迁移目标问题，数据较少（比如唤醒词识别）。

多任务学习

迁移学习是串行的，多任务学习是并行的（网络同时处理多个任务，多个任务之间相互帮助，提升表现）

比如在自动驾驶中，可以训练模型，同时识别行人，车辆，交通信号灯，停车标志等。

在使用时，多任务学习的每一个样本都是一个多维向量，表示这个样本的多种信息，比如这个图像上是不是有行人，车辆，信号灯，停车标志等（单任务的是一维向量）。

同单任务识别不同，多任务的损失函数需要对每一维向量上的损失都累加，计算全部损失作为模型的损失。

多任务学习同softmax不同，softmax还是单任务学习，不过单任务有多个类别，不同类别不可以同时出现。多任务有多个标签，可以同时出现。

训练一个网络识别多种图像会比训练多个网络分别识别不同的图像更加的有效。

多任务学习中，部分样本可能只标注了部分标签，比如只标注了有没有行人，有没有车，而没有标注信号灯以及停车标志，这样也是可以训练的，对于没有标注的标签，在处理损失函数的时候，忽视掉这个样本的这种标签，比如之前是对4个标签求和，对于缺少部分标签的样本来说，可能就是对于2个标签求和。

多任务学习什么时候有效

1. 多个任务之间比较接近，共享的浅层特征能够对于别的任务也是有效的

2. 总体的数据量比单个任务的数据量大得多，和迁移学习类似，一种典型的情况就是不同的任务的样本数都差不多，这样总体数据就比单个任务样本数多得多。

3. 只有当网络规模较小的时候，才可能出现多任务学习的结果比单任务学习的结果更差的情况。

迁移学习的使用比多任务学习要多

是否要使用端到端的深度学习

好处：

让数据说话；减少人工设计的组件

坏处：

需要大数据量的数据；没有人工设计的组件丢弃了，也就放弃了很多人类的先验知识

关键点：是否有大量的数据支持端到端的学习

如果数据不够的话，让端到端学习应用到系统中的一个组件更加有效

结构化机器学习项目举例

问题描述：

To help you practice strategies for machine learning, in this week we’ll present another scenario and ask how you would act. We think this “simulator” of working in a machine learning project will give a task of what leading a machine learning project could be like!

You are employed by a startup building self-driving cars. You are in charge of detecting road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. As an example, the above image contains a pedestrian crossing sign and red traffic lights

Your 100,000 labeled images are taken using the front-facing camera of your car. This is also the distribution of data you care most about doing well on. You think you might be able to get a much larger dataset off the internet, that could be helpful for training even if the distribution of internet data is not the same.

Question 1

You are just getting started on this project. What is the first thing you do? Assume each of the steps below would take about an equal amount of time (a few days).

Spend a few days collecting more data using the front-facing camera of your car, to better understand how much data per unit time you can collect.

Spend a few days checking what is human-level performance for these tasks so that you can get an accurate estimate of Bayes error.

Spend a few days training a basic model and see what mistakes it makes.

Spend a few days getting the internet data, so that you understand better what data is available.

解析：搭建一个深度学习系统的一般步骤如下：

设置开发、测试集和优化指标（确定方向）；

快速地建立基本的系统；

使用偏差方差分析、误差分析去确定后面步骤的优先步骤。

总的来说，如果我们想建立自己的深度学习系统，我们就需要做到：快速的建立自己的基本系统，并进行迭代。而不是想的太多，在一开始就建立一个非常复杂，难以入手的系统。

Question 2

Your goal is to detect road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. You plan to use a deep neural network with ReLU units in the hidden layers.

For the output layer, a softmax activation would be a good choice for the output layer because this is a multi-task learning problem. True/False?

True

False

解析：

Softmax 激活函数适用于多分类任务，即一个图像可能被分为猫，狗，鸭；而多分类任务是一个图像里面可能既含有猫，又含有狗；二者有所区别

Question 3

You are carrying out error analysis and counting up what errors the algorithm makes. Which of these datasets do you think you should manually go through and carefully examine, one image at a time?

10,000 images on which the algorithm made a mistake

10,000 randomly chosen images

500 images on which the algorithm made a mistake

500 randomly chosen images

解析：收集错误样例，在开发集（测试集）中，获取大约100个（少量）错误标记的例子，并统计其中的错误信息。

After working on the data for several weeks, your team ends up with the following data:

100,000 labeled images taken using the front-facing camera of your car.

900,000 labeled images of roads downloaded from the internet.

Each image’s labels precisely indicate the presence of any specific road signs and traffic signals or combinations of them. For example, y(i) = [1,0,0,1,0]t means the image contains a stop sign and a red traffic light.

Because this is a multi-task learning problem, you need to have all your y(i) vectors fully labeled. If one example is equal to [0,?,1,1,?]tthen the learning algorithm will not be able to use that example. True/False?

True

False

解析：Loss function求和的时候，只对带0、1标签的 j 进行求和，不多为？的标签求和即可。

Question 5

The distribution of data you care about contains images from your car’s front-facing camera; which comes from a different distribution than the images you were able to find and download off the internet. How should you split the dataset into train/dev/test sets?

Choose the training set to be the 900,000 images from the internet along with 20,000 images from your car’s front-facing camera. The 80,000 remaining images will be split equally in dev and test sets.

Choose the training set to be the 900,000 images from the internet along with 80,000 images from your car’s front-facing camera. The 20,000 remaining images will be split equally in dev and test sets.

Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split the 1,000,000 images dataset into 600,000 for the training set, 200,000 for the dev set and 200,000 for the test set.

Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split the 1,000,000 images dataset into 980,000 for the training set, 10,000 for the dev set and 10,000 for the test set.

解析：首先要确保dev 和 test set中全为目标图像，即汽车摄像头图像（20,000图像对于其来说已经足够）。其次将从互联网下载的图像全部放进train set，然后再加入部分实际图像。这样做

好处：开发集全部来自手机图片，瞄准目标；

坏处：训练集和开发、测试集来自不同的分布。

但是从长期来看，这样的分布能够给我们带来更好的系统性能。

Question 6

Assume you’ve finally chosen the following split between of the data:

You also know that human-level error on the road sign and traffic signals classification task is around 0.5%. Which of the following are True? (Check all that apply).

You have a large avoidable-bias problem because your training error is quite a bit higher than the human-level error.

You have a large variance problem because your model is not generalizing well to data from the same training distribution but that it has never seen before.

Your algorithm overfits the dev set because the error of the dev and test sets are very close.

You have a large data-mismatch problem because your model does a lot better on the training-dev set than on the dev set

You have a large variance problem because your training error is quite higher than the human-level error.

解析：

首先训练集误差与人类误差差别太大，存在avoidble bias; 其次train-dev set与dev set的巨大差别显示了存在data mismatch; variance 不大，因为test Set 和train-dev set差别不大

Question 7

Based on table from the previous question, a friend thinks that the training data distribution is much easier than the dev/test distribution. What do you think?

Your friend is right. (I.e., Bayes error for the training data distribution is probably lower than for the dev/test distribution.)

Your friend is wrong. (I.e., Bayes error for the training data distribution is probably higher than for the dev/test distribution.)

There’s insufficient information to tell if your friend is right or wrong.

Question 8

You decide to focus on the dev set and check by hand what are the errors due to. Here is a table summarizing your discoveries:

In this table, 4.1%, 8.0%, etc.are a fraction of the total dev set (not just examples your algorithm mislabeled). I.e. about 8.0/14.3 = 56% of your errors are due to foggy pictures.

The results from this analysis implies that the team’s highest priority should be to bring more foggy pictures into the training set so as to address the 8.0% of errors in that category. True/False?

True because it is the largest category of errors. As discussed in lecture, we should prioritize the largest category of error to avoid wasting the team’s time.

True because it is greater than the other error categories added together (8.0 > 4.1+2.2+1.0).

False because this would depend on how easy it is to add this data and how much you think your team thinks it’ll help.

False because data augmentation (synthesizing foggy images by clean/non-foggy images) is more efficient.

解析：不仅仅需要关注它占的比例有多大，还应当关注它是否好实现等

Question 9

You can buy a specially designed windshield wiper that help wipe off some of the raindrops on the front-facing camera. Based on the table from the previous question, which of the following statements do you agree with?

2.2% would be a reasonable estimate of the maximum amount this windshield wiper could improve performance.

2.2% would be a reasonable estimate of the minimum amount this windshield wiper could improve performance.

2.2% would be a reasonable estimate of how much this windshield wiper will improve performance.

2.2% would be a reasonable estimate of how much this windshield wiper could worsen performance in the worst case.

解析：最好情况下完全解决这个问题，提高2.2%

Question 10

You decide to use data augmentation to address foggy images. You find 1,000 pictures of fog off the internet, and “add” them to clean images to synthesize foggy days, like this:

Which of the following statements do you agree with?

So long as the synthesized fog looks realistic to the human eye, you can be confident that the synthesized data is accurately capturing the distribution of real foggy images, since human vision is very accurate for the problem you’re solving.

There is little risk of overfitting to the 1,000 pictures of fog so long as you are combing it with a much larger (>>1,000) of clean/non-foggy images.

Adding synthesized images that look like real foggy pictures taken from the front-facing camera of your car to training dataset won’t help the model improve because it will introduce avoidable-bias.

Question 11

After working further on the problem, you’ve decided to correct the incorrectly labeled data on the dev set. Which of these statements do you agree with? (Check all that apply).

You should also correct the incorrectly labeled data in the test set, so that the dev and test sets continue to come from the same distribution

You should correct incorrectly labeled data in the training set as well so as to avoid your training set now being even more different from your dev set.

You should not correct the incorrectly labeled data in the test set, so that the dev and test sets continue to come from the same distribution

You should not correct incorrectly labeled data in the training set as well so as to avoid your training set now being even more different from your dev set.

解析：test set 中的错误需要修正；对于train set,深度学习算法对训练集中的随机误差具有相当的鲁棒性。只要我们标记出错的例子符合随机误差，如：做标记的人不小心错误，或按错分类键。那么像这种随机误差导致的标记错误，一般来说不管这些误差可能也没有问题。

所以对于这类误差，我们可以不去用大量的时间和精力去做修正，只要数据集足够大，实际误差不会因为这些随机误差有很大的变化。

Question 12

So far your algorithm only recognizes red and green traffic lights. One of your colleagues in the startup is starting to work on recognizing a yellow traffic light. (Some countries call it an orange light rather than a yellow light; we’ll use the US convention of calling it yellow.) Images containing yellow lights are quite rare, and she doesn’t have enough data to build a good model. She hopes you can help her out using transfer learning.

What do you tell your colleague?

She should try using weights pre-trained on your dataset, and fine-tuning further with the yellow-light dataset.

If she has (say) 10,000 images of yellow lights, randomly sample 10,000 images from your dataset and put your and her data together. This prevents your dataset from “swamping” the yellow lights dataset.

You cannot help her because the distribution of data you have is different from hers, and is also lacking the yellow label.

Recommend that she try multi-task learning instead of transfer learning using all the data.

解析：对黄灯的情况（比较少）适当做些加权会好些

Question 13

Another colleague wants to use microphones placed outside the car to better hear if there’re other vehicles around you. For example, if there is a police vehicle behind you, you would be able to hear their siren. However, they don’t have much to train this audio system. How can you help?

Transfer learning from your vision dataset could help your colleague get going faster. Multi-task learning seems significantly less promising.

Multi-task learning from your vision dataset could help your colleague get going faster. Transfer learning seems significantly less promising.

Either transfer learning or multi-task learning could help our colleague get going faster.

Neither transfer learning nor multi-task learning seems promising.

解析：

迁移学习有意义的情况：

- 任务A和任务B有着相同的输入；

- 任务A所拥有的数据要远远大于任务B（对于更有价值的任务B，任务A所拥有的数据要比B大很多）；

- 任务A的低层特征学习对任务B有一定的帮助；

多任务学习有意义的情况

- 如果训练的一组任务可以共用低层特征；

- 通常，对于每个任务大量的数据具有很大的相似性；（如，在迁移学习中由任务A“100万数据”迁移到任务B“1000数据”；多任务学习中，任务A1，…，An，每个任务均有1000个数据，合起来就有1000n个数据，共同帮助任务的训练）

- 可以训练一个足够大的神经网络并同时做好所有的任务。

对于本题，一个是计算机视觉，一个是语音识别，二者之间并没有许多相似，既没有相同的输入，也没有可以共用的低层特征

Question 14

To recognize red and green lights, you have been using this approach:

(A) Input an image (x) to a neural network and have it directly learn a mapping to make a prediction as to whether there’s a red light and/or green light (y).

A teammate proposes a different, two-step approach:

(B) In this two-step approach, you would first (i) detect the traffic light in the image (if any), then (ii) determine the color of the illuminated lamp in the traffic light.

Between these two, Approach B is more of an end-to-end approach because it has distinct steps for the input end and the output end. True/False?

True

False

解析：

端到端的学习的优缺点：

优点：

- 端到端学习可以直接让数据“说话”；

- 所需手工设计的组件更少。

缺点：

- 需要大量的数据；

- 排除了可能有用的手工设计组件。

应用端到端学习的 Key question：是否有足够的数据能够直接学习到从x映射到y的足够复杂的函数。

这个很明显不是端到端的学习

Question 15

Approach A (in the question above) tends to be more promising than approach B if you have a __ (fill in the blank).

Large training set

Multi-task learning problem.

Large bias problem.

Problem with a high Bayes error.

解析：参考14

总结如下

1. 重视与实际应用场景的结合

主要谈的是机器学习项目在实际应用中如何获得更好的表现，基本的原则就是：尽量贴合实际应用场景。首先是训练过程中教科书般的分配：训练、验证、测试集合。不过在实际中，还存在一个问题，就是数据集合的分布问题，虽然可能不能保证三者的分布完全一致，但是基本的原则是三个集合的分布要符合自己应用需要解决的问题的要求。最起码的，测试集合必须符合实际场景，这样哪怕训练出了问题，但是在测试这块就能觉察出来。之后，就需要考虑训练集合与验证集合的数据尽量符合实际场景，数据集合不足的时候可以训练集合中加入一些非实际应用场景的数据作为辅助，依靠后面符合实际场景的验证、测试集合来评价最终的模型效果。

对于评价指标，依然是需要一个可以最终评价实际应用场景的指标。实际应用很少是一个单纯的分类准确率就可以评价的，虽然训练过程中还是会依靠优化Loss函数实现，但是验证集合上的评价应该还是要符合实际应用的需求。比如，即使最单纯的分类，实际应用中可能还会对不同的类别有不同的精度要求，那么准确率计算的时候，就需要考虑好各类的权重。或者是图像分割的任务，实际场景可能必须要保证某一些类别的分割的准度，其他的类别可能有个大概边界也没问题，那么不管是考虑Loss还是考虑验证集合的指标，都要把这个因素考虑进去。这个里面也提了硬性指标，硬性指标起码在我接触的来看主要就是性能问题，目前我们的性能硬性要求就是30FPS，包括GPU和CPU PC上，目前嵌入式实在做不到。

2. 处理Bias与Variance的平衡

bias与variance的考虑，其实就是判断模型的拟合能力以及防止过拟合的平衡。考虑模型规模是否达到数据集本身拟合的极限，靠的是模型与贝叶斯误差的差距，理想误差我们并不能直接获得，所以可以假设人类能够达到的最佳水平接近贝叶斯误差。只要接近人类、甚至超越人类的水平就可以基本认为模型的拟合能力基本接近数据集的极限。

当然，假设人类最佳表现接近理想误差的前提应该限制在简单任务、非结构化数据上。因为，人类主要的优势还是在非结构化数据的理解上，以及非常复杂的推理、归纳上。结构化数据上机器实现超越人类表现的难度还是比较低的。最近比较明显的就是AlphaGo的新老版本都很快的实现超越人类最佳表现，主要就是：第一，棋盘是一个很容易处理的结构化数据；第二，这个任务还是相对比较容易从数学上定义的，没有非常复杂的推理、归纳。

由于存在Unavoidable bias，所以一旦bias已经接近已知的理想状态就要开始考虑处理过拟合的问题了。这个工程的时候更加需要考虑下，以免浪费太多精力在没有意义的事情上。

对于封闭测试是否可以提供贝叶斯误差的问题，我的想法是还需要考虑任务与网络的复杂度问题。Understanding deep learning requires rethinking generalization这个里面出现了网络可以强行记忆数据集的问题，如果一个任务难度比较小，同时又使用了一个很复杂的结构，那么利用封闭测试获得的准确率可能只是强行记忆的结果，不能代表这个网络本身在任务上的推理能力。

3. 重视错误分析与网络诊断

这些其实都是些dirty work，谁也不愿意把数据集自己亲身过一遍。由于错误案例未必错误，正确案例未必正确，数据集不多的情况下，可能还需要直接扫一遍训练集，或者抽样一部分出来看。Andrew的分析表格比较有意思，这样可以更加量化具体问题的权重，要学习一个。自己只是简单估计个数目，没有这样一个个打勾的。

网络诊断问题上，我的观点是网络本身就是统计学习的，诊断的结果也应该是统计角度上来看，比如可以看梯度传递的分布情况，梯度是否能够传递到网络输入层，不同层梯度的衰减情况，不同的梯度可不可以利用不同的学习率来调整。是否需要理解具体的网络权重的意义或者是卷积是否符合自己的标准，这样的问题反倒不重要，毕竟网络内部有其自己的逻辑，和人类直觉不一样也很正常，最后训练结果满足应用要求才是最关键的。

上面的诊断与分析都是相对搭建网络、换模型来说比较脏的活，人性的角度来说没人愿意做，所以提供一个良好的UI接口才是关键。我的考虑，这个系统需要包括的是：

基本的学习曲线展示。
学习过程中实时/准实时的梯度传递情况。
网络权值的分布情况。
每一层的中间结果的分布与展示。
数据案例抽样展示（包括正确的、错误的、低置信度等等的结果）。
提供快速的模型替换与结构微调操作。
方便的细粒度的试验操作，比如锁定、调整权值。
针对数据集的统计信息展示，比如类别的分布、图片数据的各通道的分布。
对硬件、网络、存储的隐藏。
方便的扩展接口，训练人员可以快速脚本直接操作网络。

4. 多任务与端到端学习

多任务学习我的经验只限于：

需要注意处理好loss之间的平衡
尤其要保证任务之间的loss都保持在一个数量级上，或者最重要的任务的loss占据主导地位。
一般来说分类任务是最好训练的，可以考虑先着重分类任务形成特征，之后调整其他任务loss的权重。

对于端到端学习我们只是尝试了在模拟环境下利用车辆前视图像学习方向盘角度，虽然有效果，不过刚刚起步，还需要增加数据量进行新的验证，角度目前也难以保持平滑。油门由于没有加入速度的输入也没有利用多帧数据，所以没有办法直接提供对速度的控制。

参考文章

更多相关知识和参考文章来源可以关注我的博客站点

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

机器学习

人工智能

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

机器学习

人工智能

结构化机器学习项目

结构化机器学习项目

结构化机器学习项目举例

总结如下

1. 重视与实际应用场景的结合

2. 处理Bias与Variance的平衡

3. 重视错误分析与网络诊断

4. 多任务与端到端学习

参考文章

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐