机器学习算法概括理解

文章来源：企鹅号 - 双城vetenskap记

增强学习与监督，非监督学习的区别：

在有监督学习中，通常你有一个数据集，其中不仅有观察的结果，还有某些专家给你的答案。你的任务是尽可能猜测如何接近这位专家的意见。当然，这需要你的专业知识收集数据。没有专业数据，或多或少会模型就废了。其次，训练数据需要假定是相互独立的。这为了方便之后的随机梯度的方法训练模型。

强化学习（RL）不需要这些假设，不需要这些数据，只需要有可以做采样的系统就可以（问题的场景）。强化学习中不需要专家告诉你该怎么做，你只需要自己尝试不同的行动，并从系统的反馈中吸取经验。比如广告推荐的案例中，反馈可以表示为您获得点击的历史数目。但这意味着，RL算法必须探索到了所有可能的操作动作。一旦错过了尝试最佳行为，就有漏学的风险。RL的另一个问题是你必须非常小心地探索所有可能的状态空间。否则，您就有可能漏掉这个问题的答案。

无监督学习与RL有很大不同。无监督学习没有专家指导，但它试图做一些不同的事情。它不是学习最优策略，而是试图描述数据。它试图找到一些潜在的数据结构，这与增强学习试图找到策略是非常不同的。比如，强化学习教你骑自行车，无监督学习用来理解车的结构。骑自行车也许容易，但是理解自行车的结构不容易。特别是，当客体不是自行车，而是一台电脑。最后重要的是，尽管存在上述的异同点，但是监督学习，RL和无监督学习没有什么硬性的区别。相反，如果你试图解决实际的问题，你会多多少少结合的使用监督学习，RL和无监督学习。

RL是或多或少是最接近普世人工只能的算法，RL可以被看作是全监督，非监督学习的超集。

以下是国际版本：

International version：

The difference between RL and supervised learning:

In supervised learning, you generally assume that you have a dataset where you not only have the observation but the answers that some kind of the experts gave to you. And your task is to get as close as possible to the opinion of this expert. Of course, this means that you need the expertise to gather the data in the first place. And if you don't have, you model more or less get screwed. It is also very important to supervise learning can usually assume that your training point is independent of one another. And this basically helps you a great deal if you apply (say) the stochastic gradient descent where you want to sample the data.

Reinforcement learning, however, lacks all those assumptions. First, you don't have a dataset. Instead, you have some kind of system from which you can sample data, but you don't get reference answers, so there is no expert telling you what to do. Instead, you can try actions by yourself and there is some kind of critics that assign you positive and negative feedbacks. In the case of an advertisement, this feedback is denoted as the memory you get for the clicks. And this implies that whatever algorithm you used for RL, you have to take care of exploring all the possible actions. lest you the risk of never try the optimal actions and never learn it. Another problem with this RL is that the decision-maker affect its own observation. Basically, you have to be very careful to explore the state space well. Otherwise, you risk misinterpretation there and failing to grasp the entirety of your problem.

The other domain like the unsupervised learning also differs from the RL a great deal. Unsupervised learning doesn't have an expert as well, but it tries to do different things. Instead of to learn the optimal strategy, it seems to try to describe the data. It tries to find some underlying data structure, and this is very different from trying to find the strategy because somethings it is much easier to ride the bicycle than to understand the structure of it. Especially, when it comes to not the bicycle but a computer. Finally, it is important to know that although there are those kinds of features and bullet points, there is no hard decision boundary of what is supervised learning, RL, and unsupervised learning. Instead, if you are trying to solve any particular practical problem, you may find yourself using it in some kind of combination of supervised learning, RL, and unsupervised learning as a helper maybe. But RL is more-or-less the most general area that can be treated as kinds of the superset of full-supervised, non supervised learning here.

发表于: 2018-06-122018-06-12 14:27:22
原文链接：https://kuaibao.qq.com/s/20180612G0XTV700?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

机器学习算法概括理解

相关快讯

扫码

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐