前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >机器学习基本概念-1

机器学习基本概念-1

作者头像
GavinZhou
发布2018-01-02 16:02:03
6090
发布2018-01-02 16:02:03
举报
文章被收录于专栏:机器学习实践二三事

Learning algorithm

ML中的算法无疑都是学习型的算法,那么什么才是学习型算法(learning algorithm)呢? 机器学习大牛Bengio给出的解释是:

A machine learning algorithm is an algorithm that is able to learn from data.

这里的learn,Mitchell(1997)给出的定义是:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks in T , as measured by P , improves with experience(实践) E.

由此我们可以看出:

学习型的算法需要有能力从给定的数据中学习出能够有效地代表此数据的特征(feature)

所以一个ML system的基本构成就是:

  1. A learning algorithm
  2. Tasks
  3. Performance measure
  4. Experience
  5. Data

Task

ML出现的基本需求就是: 需要解决的任务过难,以至于无法使用一个固定的程序来解决它

Machine learning allows us to tackle tasks that are too difficult to solve with fixed programs written and designed by human beings.

那么什么是ML中的Task呢? 首先理解什么是我们在ML中常说的特征也就是feature,通俗点来说大概就是:

特征就是从some object或者event中抽取出来的可以定量表示和衡量的数学表达.

通常使用矩阵的形式来进行表达 再来说Task,Bengio给出的解释是:

Machine learning tasks are usually described in terms of how the machine learning system should process an example. An example is a collection of features that have been quantitatively measured from some object or event that we want the machine learning system to process.

说的有点抽象,实际就是我们需要解决什么问题,比如把我们的图片进行分类或者给定数据进行聚类之类的,这就是ML中的Task. 常见的比如:

  • Classification
  • Regression
  • Transcription
  • Machine translation
  • Semantic Segemention
  • Object Detection
  • Denoising
  • ………….

非常多,就不一列举了

Performance Measure

对于不同的learning algorithm,其ability不同,所以我们需要有个能够量化的衡量措施来检验之.

比如对于常见的classification来说,我们衡量某个算法的好坏的标准就是分类的准确率或者错误率. ML中我们更加关心的是model的泛化能力(generalization),也就是对于未见过的example的能力.

we care more about the performance of the model on new, previously unseen examples

但在一个具体的ML的task中,有时会存在两种困难:

  1. difficult to choose a performance measure that corresponds well to the desired behavior of the system.
  2. we know what quantity we would ideally like to measure, but measuring it is impractical.

所以在这种困难的情况下,我们通常采用的做法是:

  • design an alternative criterion
  • design a good approximation

Experience

ML的学习型算法广义上分为两类:

  • supervised
  • unsupervised

两者之间的界限是模糊的,大部分的学习型算法需要在某个数据集(dataset)上进行experience(实践). 那什么又是dataset呢?

A dataset is a collection of many examples.

dataset就是example的集合,比如像数字集合(0-9)的mnist数据集和多用途的VOC数据集等等,在计算中通常dataset会被表示为一个大的矩阵.

unsupervised的算法和supervised的算法在不同的dataset上进行experience:

  • Unsupervised learning algorithms experience a dataset containing many features, then learn useful properties of the structure of this dataset.
  • Supervised learning algorithms experience a dataset containing features, but each example is also associated with a label or target.

就写到这吧,下篇继续

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2016-08-27 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Learning algorithm
  • Task
  • Performance Measure
  • Experience
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档