前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Mid-Level 视觉表示 增强通用性和采样高效 for Learning Active Tasks

Mid-Level 视觉表示 增强通用性和采样高效 for Learning Active Tasks

作者头像
用户1908973
发布2019-04-28 14:57:48
7120
发布2019-04-28 14:57:48
举报
文章被收录于专栏:CreateAMindCreateAMind

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Active Tasks

https://perceptual.actor/

Abstract

One of the ultimate promises of computer vision is to help robotic agents perform active tasks, like delivering packages or doing household chores. However, the conven- tional approach to solving “vision” is to define a set of of- fline recognition problems (e.g. object detection) and solve those first. This approach faces a challenge from the re- cent rise of Deep Reinforcement Learning frameworks that learn active tasks from scratch using images as input. This poses a set of fundamental questions: what is the role of computer vision if everything can be learned from scratch? Could intermediate vision tasks actually be useful for per- forming arbitrary downstream active tasks?

We show that proper use of mid-level perception confers significant advantages over training from scratch. We im- plement a perception module as a set of mid-level visual representations and demonstrate that learning active tasks with mid-level features is significantly more sample-efficient than scratch and able to generalize in situations where the from-scratch approach fails. However, we show that realiz- ing these gains requires careful selection of the particular mid-level features for each downstream task. Finally, we put forth a simple and efficient perception module based on the results of our study, which can be adopted as a rather generic perception module for active frameworks.

We test three core hypotheses:

I. if mid-level vision pro- vides an advantage in terms of sample efficiency of learning an active task (answer: yes)

II. if mid-level vision provides an advantage towards generalization to unseen spaces (an- swer: yes)

III. if a fixed mid-level vision feature could suf- fice or a set of features would be essential to support arbi- trary active tasks (answer: a set is essential).

Hypothesis I: Does mid-level vision provide an advantage in terms of sample efficiency when learning an active task?

Hypothesis II: Can mid-level vision features generalize better to unseen spaces?

Hypothesis III: Can a single feature support all arbitrary downstream tasks? Or is a set of features required for that?

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-01-10,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档