前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >论文解读:主视觉大脑皮层的深度层级模型:机器视觉可以从中学到些什么?

论文解读:主视觉大脑皮层的深度层级模型:机器视觉可以从中学到些什么?

作者头像
用户1908973
发布2018-07-25 11:23:36
8560
发布2018-07-25 11:23:36
举报
文章被收录于专栏:CreateAMindCreateAMind

、论文:Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?

主视觉大脑皮层的深度层级模型:机器视觉可以从中学到些什么?

是一篇对视觉机理分析最前面透彻的论文。

一 机器学习中特征学习的不同实现分类-包括深度学习

from deeplearningbook.

1以下为两种不同的视觉模型比较

2视觉信号和脑区关系

  • 视觉 抽象过程及层级,大小比例对应实际物理脑区大小

3retina LGN

  • rods cell 低亮敏感 cones高亮彩色敏感
  • 注视需要中央凹高密度感光细胞区域,所以需要眼动控制(主动,被动控制;up to down or down to up)
  • Center-Surround Receptive Fields
  • LGN:Single-Opponent Cells
  • v1区 : Double-Opponent Cells
  • 5-10 percent of V1 cells are dedicated color-coding cells

4 v1识别能力和应用

  • feature : edges, bars, and gratings, i.e., linear oriented patterns.
  • simple and complex cells.
  • Gabor wavelets are a reasonable approximation of simple cells
  • Gabor wavelets have also been very successful in applications : image compression , image retrieval , and face recognition .
  • cells that are sensitive to the end of a bar or edge or the border of a grating. app: pose estimation, object recognition, stereo, structure from motion
  • line endings, motion, color
  • 双目差异 disparity:应用:depth, gaze control, object grasping, and object recognition

5 运动和时间序列检测的重要性

  • note that spatiotemporal features such as motion have been demonstrated to be the first features developmentally present in humans for recognizing objects (even sooner than color and orientation) 时空特征检测发育最早
  • 已有神经网络支持运动检测(http://youtube.com/watch?v=HrVamNQHf-I),但是当前的物体识别的深度学习网络普遍缺乏对运动检测到支持

6 v2识别的特征

  • V2 : orientation, color, and disparity(relative disparity) (absolute disparity in V1)
  • new feature of V2 : more sophisticated contour representation, including texture-defined contours, illusory contours, and contours with border ownership.
  • 2000, Zhou et al. [205] found that 18 percent of the cells in V1 and more than 50 percent of the cells in V2 and V4 (along the ventral pathway) respond or code according to the direction of the owner of the boundary.

7 v4

  • V4 seems to combine input from the M as well as the P pathway
  • integrating lower level into higher level responses and increasing invariances
  • V4 cells respond to contours defined by differences in speed and/or direction of motion with an orientation selectivity that matches the selectivity to luminance-defined contours
  • hue is invariant to luminance
  • Curvature Selectivity

8 MT运动检测

  • sensitive disparity
  • mid-level representation of motion depth
  • eye move control
  • MT Receptive fields are about 10 times larger than in V1
  • MT is retinotopically organized with motion and depth columns similar to orientation and ocular dominance columns in V1
  • MT cells are selective to higher order features of motion such as motion gradients, motion-defined edges, locally opposite motions, and motion-defined shapes

9 物体识别

  • TEO is responsible for medium complexity features and it integrates information about the shapes and relative positions of multiple contour elements
  • 识别的类别:color shape...
  • 识别的物体的复杂度,抽象的层级,整合的层级都比低层信息更高。

10 识别2 TE

  • capable of object recognition has to fulfill two seemingly conflicting requirements, i.e., selectivity and invariance.
  • On the one hand, neurons have to distinguish between different objects to provide information about object identity
  • On the other hand, this system also has to treat highly dissimilar retinal images of the same object as equivalent, and must therefore be insensitive to transformations in the retinal image that occur in natural vision (e.g., changes in position, illumination, retinal size, and so on).
  • TE区具备:size invariance ,cue i, position i,occlusion invariance 物体离的远近看到的不管大小都可以识别;看到部分即可识别(管中窥豹可见一斑);物体的姿势不同,角度不同均可识别;物体被遮挡也可识别

11 运动 动作

  • areas located in the dorsal stream are functionally related to different effectors:
  • LIP is involved in eye movements,
  • Medial Intraparietal Area (MIP) in arm movements,
  • AIP in hand movements (grasping), and
  • MST and VIP in body movements (self-motion)
  • Area MST is concerned with self-motion, both for movement of the head (or body) in space and movement of the eye in the head

12 运动感知

  • combination of retino-centric receptive fields and eye position modulation provides a population code in LIP that can represent the location of a stimulus in head-centric coordinates(LIP)
  • 眼睛为中心的方位判断(结合眼动感知)到以头为中心的判断
  • VIP is likely to be involved in self- motion, control of head movements, and the encoding of near-extrapersonal (head centered) space which link tactile and visual fields.
  • activity of MIP neurons mainly reflects the movement plan(活动计划) toward the target and not merely the location of the target or visual attention evoked by the target appearance

13 抓取

  • Some AIP neurons respond during object fixation and grasping, but not during grasping in the dark (visual-dominant neurons);
  • other AIP neurons do not respond during object fixation but only when the object is grasped, even in the dark (motor- dominant neurons),
  • a third class of AIP neurons responds during object fixation and grasping and during grasping in the dark (visuo-motor neurons
  • AIP are sensitive to the 2D and 3D features of the object and shape of the hand (in a light or dark environment) relevant for grasping

14以上功能图标梳理

三 按照功能进行相关脑区的梳理

color

  • single-opponent cells in LGN establish the two color axes red-green and blue-yellow, thereby sharpening the wavelength tuning and achieving some invariance to luminance.
  • Double-opponent cells provide the means to take nearby colors into account for color contrast.
  • V4, hue is encoded, which spans the full color space.
  • final step is IT, where there exists an association of color with form

shape

  • 动态性 复杂性 各个不变特征抽象层级
  • Edges are the primary features used to represent objects, it seems.
  • In V1, they are defined as boundaries between dark and light or between different color hues;
  • in V2, contours may also be defined by texture boundaries and these cells respond to illusory contours;
  • in V4, contours may even be defined by differences in motion
  • Possible solutions to the binding problem(哪些线条是一个物体的?) are tuning of cells to conjunctions of features, spatial attention, and temporal synchronization
  • 3d 信息 depth 处理的层级:差异的绝对、相对、0阶、1阶、2阶
  • selectivity for zero-, first-, and second-order disparities can be measured

3d

  • The coding of 3D shape in AIP is
  • faster (shorter latencies),
  • coarser (less sensitivity to discontinuities in the surfaces),
  • less categorical, and
  • more boundary based (less influence of the surface information)
  • compared to IT(IT区域的3d识别特性与以上相反)
  • AIP快速判断,逃避危险
  • IT准确识别判断思考

motion

  • analysis of motion in the primate visual system proceeds in a hierarchy from V1 (local spatiotem- poral filtering) to MT (2D motion) to MST (self-motion, motion in world coordinates)
  • The representation shifts from one of motion in the visual field (V1, MT) to one of motion in the world and motion of oneself in the world (MST)
  • motion processing is combined with disparity (MT, MST), eye movement information (MST), and vestibular signals

物体识别

  • Object recognition goes beyond simple 2D-shape perception in several aspects:
  • integration of different cues and modalities,
  • invariance to in-depth rotation and articulated movement, use of context. 不同的旋转和角度及运动中均能识别
  • It is also important to distinguish between-class discrimination (object categorization) and within-class discrimination of objects.
  • edges can be defined by luminance in V1, by textures in V2, and by differences in motion in V4
  • 角度识别不变特性:
  • a small fraction of IT neurons can exhibit some rotation invariance and speed of recognition of familiar objects does not depend on the rotation angle [79].
  • A particular case are face sensitive neurons that can show a rather large invariance to rotations in depth.
  • Representations of the same object under different angles that are presumably combined into a rotation invariant representation like simple cell responses might be combined into a complex cell response

场景对识别的作用

  • Context plays a major role in object recognition [124]
  • and can be of different nature—semantic, spatial configuration, or pose
  • and is, at least partially, provided by higher areas beyond IT
  • objects also help to recognize the context and context may be defined on a crude statistical level [124].

动作支持

  • visual information is combined with vestibular (in MST, VIP), auditory (in LIP), somatosensory (in VIP), and proprioceptive or motor feedback signals (MST and VIP for smooth eye movements, LIP for saccades, MST/VIP/7A/MIP for eye position)
  • LIP represents salience in the visual scene as a target signal for eye movements, MIP and AIP provide information for reaching (target signals) and grasping (shape signals). LIP and VIP provide information for the control of self-motion.

四 动画演示相关功能

如果不是动画请访问

  • http://homepages.inf.ed.ac.uk/jbednar/spinning.html

查看

五 视觉功能发育时间阶段

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2016-10-23,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档