专栏首页CreateAMind论文解读:主视觉大脑皮层的深度层级模型:机器视觉可以从中学到些什么?

论文解读:主视觉大脑皮层的深度层级模型:机器视觉可以从中学到些什么?

、论文:Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?

主视觉大脑皮层的深度层级模型:机器视觉可以从中学到些什么?

是一篇对视觉机理分析最前面透彻的论文。

一 机器学习中特征学习的不同实现分类-包括深度学习

from deeplearningbook.

1以下为两种不同的视觉模型比较

2视觉信号和脑区关系

  • 视觉 抽象过程及层级,大小比例对应实际物理脑区大小

3retina LGN

  • rods cell 低亮敏感 cones高亮彩色敏感
  • 注视需要中央凹高密度感光细胞区域,所以需要眼动控制(主动,被动控制;up to down or down to up)
  • Center-Surround Receptive Fields
  • LGN:Single-Opponent Cells
  • v1区 : Double-Opponent Cells
  • 5-10 percent of V1 cells are dedicated color-coding cells

4 v1识别能力和应用

  • feature : edges, bars, and gratings, i.e., linear oriented patterns.
  • simple and complex cells.
  • Gabor wavelets are a reasonable approximation of simple cells
  • Gabor wavelets have also been very successful in applications : image compression , image retrieval , and face recognition .
  • cells that are sensitive to the end of a bar or edge or the border of a grating. app: pose estimation, object recognition, stereo, structure from motion
  • line endings, motion, color
  • 双目差异 disparity:应用:depth, gaze control, object grasping, and object recognition

5 运动和时间序列检测的重要性

  • note that spatiotemporal features such as motion have been demonstrated to be the first features developmentally present in humans for recognizing objects (even sooner than color and orientation) 时空特征检测发育最早
  • 已有神经网络支持运动检测(http://youtube.com/watch?v=HrVamNQHf-I),但是当前的物体识别的深度学习网络普遍缺乏对运动检测到支持

6 v2识别的特征

  • V2 : orientation, color, and disparity(relative disparity) (absolute disparity in V1)
  • new feature of V2 : more sophisticated contour representation, including texture-defined contours, illusory contours, and contours with border ownership.
  • 2000, Zhou et al. [205] found that 18 percent of the cells in V1 and more than 50 percent of the cells in V2 and V4 (along the ventral pathway) respond or code according to the direction of the owner of the boundary.

7 v4

  • V4 seems to combine input from the M as well as the P pathway
  • integrating lower level into higher level responses and increasing invariances
  • V4 cells respond to contours defined by differences in speed and/or direction of motion with an orientation selectivity that matches the selectivity to luminance-defined contours
  • hue is invariant to luminance
  • Curvature Selectivity

8 MT运动检测

  • sensitive disparity
  • mid-level representation of motion depth
  • eye move control
  • MT Receptive fields are about 10 times larger than in V1
  • MT is retinotopically organized with motion and depth columns similar to orientation and ocular dominance columns in V1
  • MT cells are selective to higher order features of motion such as motion gradients, motion-defined edges, locally opposite motions, and motion-defined shapes

9 物体识别

  • TEO is responsible for medium complexity features and it integrates information about the shapes and relative positions of multiple contour elements
  • 识别的类别:color shape...
  • 识别的物体的复杂度,抽象的层级,整合的层级都比低层信息更高。

10 识别2 TE

  • capable of object recognition has to fulfill two seemingly conflicting requirements, i.e., selectivity and invariance.
  • On the one hand, neurons have to distinguish between different objects to provide information about object identity
  • On the other hand, this system also has to treat highly dissimilar retinal images of the same object as equivalent, and must therefore be insensitive to transformations in the retinal image that occur in natural vision (e.g., changes in position, illumination, retinal size, and so on).
  • TE区具备:size invariance ,cue i, position i,occlusion invariance 物体离的远近看到的不管大小都可以识别;看到部分即可识别(管中窥豹可见一斑);物体的姿势不同,角度不同均可识别;物体被遮挡也可识别

11 运动 动作

  • areas located in the dorsal stream are functionally related to different effectors:
  • LIP is involved in eye movements,
  • Medial Intraparietal Area (MIP) in arm movements,
  • AIP in hand movements (grasping), and
  • MST and VIP in body movements (self-motion)
  • Area MST is concerned with self-motion, both for movement of the head (or body) in space and movement of the eye in the head

12 运动感知

  • combination of retino-centric receptive fields and eye position modulation provides a population code in LIP that can represent the location of a stimulus in head-centric coordinates(LIP)
  • 眼睛为中心的方位判断(结合眼动感知)到以头为中心的判断
  • VIP is likely to be involved in self- motion, control of head movements, and the encoding of near-extrapersonal (head centered) space which link tactile and visual fields.
  • activity of MIP neurons mainly reflects the movement plan(活动计划) toward the target and not merely the location of the target or visual attention evoked by the target appearance

13 抓取

  • Some AIP neurons respond during object fixation and grasping, but not during grasping in the dark (visual-dominant neurons);
  • other AIP neurons do not respond during object fixation but only when the object is grasped, even in the dark (motor- dominant neurons),
  • a third class of AIP neurons responds during object fixation and grasping and during grasping in the dark (visuo-motor neurons
  • AIP are sensitive to the 2D and 3D features of the object and shape of the hand (in a light or dark environment) relevant for grasping

14以上功能图标梳理

三 按照功能进行相关脑区的梳理

color

  • single-opponent cells in LGN establish the two color axes red-green and blue-yellow, thereby sharpening the wavelength tuning and achieving some invariance to luminance.
  • Double-opponent cells provide the means to take nearby colors into account for color contrast.
  • V4, hue is encoded, which spans the full color space.
  • final step is IT, where there exists an association of color with form

shape

  • 动态性 复杂性 各个不变特征抽象层级
  • Edges are the primary features used to represent objects, it seems.
  • In V1, they are defined as boundaries between dark and light or between different color hues;
  • in V2, contours may also be defined by texture boundaries and these cells respond to illusory contours;
  • in V4, contours may even be defined by differences in motion
  • Possible solutions to the binding problem(哪些线条是一个物体的?) are tuning of cells to conjunctions of features, spatial attention, and temporal synchronization
  • 3d 信息 depth 处理的层级:差异的绝对、相对、0阶、1阶、2阶
  • selectivity for zero-, first-, and second-order disparities can be measured

3d

  • The coding of 3D shape in AIP is
  • faster (shorter latencies),
  • coarser (less sensitivity to discontinuities in the surfaces),
  • less categorical, and
  • more boundary based (less influence of the surface information)
  • compared to IT(IT区域的3d识别特性与以上相反)
  • AIP快速判断,逃避危险
  • IT准确识别判断思考

motion

  • analysis of motion in the primate visual system proceeds in a hierarchy from V1 (local spatiotem- poral filtering) to MT (2D motion) to MST (self-motion, motion in world coordinates)
  • The representation shifts from one of motion in the visual field (V1, MT) to one of motion in the world and motion of oneself in the world (MST)
  • motion processing is combined with disparity (MT, MST), eye movement information (MST), and vestibular signals

物体识别

  • Object recognition goes beyond simple 2D-shape perception in several aspects:
  • integration of different cues and modalities,
  • invariance to in-depth rotation and articulated movement, use of context. 不同的旋转和角度及运动中均能识别
  • It is also important to distinguish between-class discrimination (object categorization) and within-class discrimination of objects.
  • edges can be defined by luminance in V1, by textures in V2, and by differences in motion in V4
  • 角度识别不变特性:
  • a small fraction of IT neurons can exhibit some rotation invariance and speed of recognition of familiar objects does not depend on the rotation angle [79].
  • A particular case are face sensitive neurons that can show a rather large invariance to rotations in depth.
  • Representations of the same object under different angles that are presumably combined into a rotation invariant representation like simple cell responses might be combined into a complex cell response

场景对识别的作用

  • Context plays a major role in object recognition [124]
  • and can be of different nature—semantic, spatial configuration, or pose
  • and is, at least partially, provided by higher areas beyond IT
  • objects also help to recognize the context and context may be defined on a crude statistical level [124].

动作支持

  • visual information is combined with vestibular (in MST, VIP), auditory (in LIP), somatosensory (in VIP), and proprioceptive or motor feedback signals (MST and VIP for smooth eye movements, LIP for saccades, MST/VIP/7A/MIP for eye position)
  • LIP represents salience in the visual scene as a target signal for eye movements, MIP and AIP provide information for reaching (target signals) and grasping (shape signals). LIP and VIP provide information for the control of self-motion.

四 动画演示相关功能

如果不是动画请访问

  • http://homepages.inf.ed.ac.uk/jbednar/spinning.html

查看

五 视觉功能发育时间阶段

本文分享自微信公众号 - CreateAMind(createamind),作者:zdx3578

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2016-10-23

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • A Theory of State Abstraction for Reinforcement Learning

    A Theory of State Abstraction for Reinforcement Learning

    用户1908973
  • State Abstraction as 压缩 in Apprenticeship Learning

    State Abstraction as Compression in Apprenticeship Learning https://github.com/d...

    用户1908973
  • STCN

    https://www.arxiv-vanity.com/papers/1902.06568/

    用户1908973
  • 测量异构信息网络的多样性(CS AI)

    多样性是一个与许多研究领域相关的概念,从生态学到信息论,再到经济学,举几个例子。这个概念在信息检索、网络分析和人工神经网络社区中得到了越来越多的关注。虽然在网络...

    用户6853689
  • 避免护理机器人对痴呆症患者的不当治疗(CS RO)

    “最残忍和最令人发指的罪行”一词被用来描述历史上那些本应有责任保护和帮助患者的人,对脆弱的患者实施的一些不良护理。我们相信历史在不断重演,因为越来越多像人类一样...

    奥斯特洛夫斯萌
  • 不完全免疫算法简介HEIA--AIS学习笔记2

    However, most MOIAs only adopt a single hypermutation operator to evolve each cl...

    DrawSky
  • Woocommerce Trends 2020

    Top Woocommerce Trends To Follow In 2020. If you have an online store and missed...

    用户4822892
  • 通过多摄像头域自适应进行无监督车辆计数(CS CV)

    监测城市中的车辆流量是改善城市环境和市民生活质量的关键问题。图像是感知和评估大面积车辆流动的最佳传感方式。目前图像中的车辆计数技术依赖于大量的注释数据,随着新的...

    刘持诚
  • 使用梵文语法改善具有数字词源的多语言国家的电子治理和移动治理(CS.CY)

    随着数字连接(Wifi,3G,4G)和数字设备的巨大改进,如今已经可以在最偏远的角落访问互联网。农村居民可以轻松地通过PDA,笔记本电脑,智能手机等访问Web或...

    蔡小雪7100294
  • Monolithic vs Microservice Architecture- Pros and Cons

    The hassle that large scale enterprise applications under development bring to t...

    用户4822892

扫码关注云+社区

领取腾讯云代金券