、论文:Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?



一 机器学习中特征学习的不同实现分类-包括深度学习

from deeplearningbook.



  • 视觉 抽象过程及层级,大小比例对应实际物理脑区大小

3retina LGN

  • rods cell 低亮敏感 cones高亮彩色敏感
  • 注视需要中央凹高密度感光细胞区域,所以需要眼动控制(主动,被动控制;up to down or down to up)
  • Center-Surround Receptive Fields
  • LGN:Single-Opponent Cells
  • v1区 : Double-Opponent Cells
  • 5-10 percent of V1 cells are dedicated color-coding cells

4 v1识别能力和应用

  • feature : edges, bars, and gratings, i.e., linear oriented patterns.
  • simple and complex cells.
  • Gabor wavelets are a reasonable approximation of simple cells
  • Gabor wavelets have also been very successful in applications : image compression , image retrieval , and face recognition .
  • cells that are sensitive to the end of a bar or edge or the border of a grating. app: pose estimation, object recognition, stereo, structure from motion
  • line endings, motion, color
  • 双目差异 disparity:应用:depth, gaze control, object grasping, and object recognition

5 运动和时间序列检测的重要性

  • note that spatiotemporal features such as motion have been demonstrated to be the first features developmentally present in humans for recognizing objects (even sooner than color and orientation) 时空特征检测发育最早
  • 已有神经网络支持运动检测(http://youtube.com/watch?v=HrVamNQHf-I),但是当前的物体识别的深度学习网络普遍缺乏对运动检测到支持

6 v2识别的特征

  • V2 : orientation, color, and disparity(relative disparity) (absolute disparity in V1)
  • new feature of V2 : more sophisticated contour representation, including texture-defined contours, illusory contours, and contours with border ownership.
  • 2000, Zhou et al. [205] found that 18 percent of the cells in V1 and more than 50 percent of the cells in V2 and V4 (along the ventral pathway) respond or code according to the direction of the owner of the boundary.

7 v4

  • V4 seems to combine input from the M as well as the P pathway
  • integrating lower level into higher level responses and increasing invariances
  • V4 cells respond to contours defined by differences in speed and/or direction of motion with an orientation selectivity that matches the selectivity to luminance-defined contours
  • hue is invariant to luminance
  • Curvature Selectivity

8 MT运动检测

  • sensitive disparity
  • mid-level representation of motion depth
  • eye move control
  • MT Receptive fields are about 10 times larger than in V1
  • MT is retinotopically organized with motion and depth columns similar to orientation and ocular dominance columns in V1
  • MT cells are selective to higher order features of motion such as motion gradients, motion-defined edges, locally opposite motions, and motion-defined shapes

9 物体识别

  • TEO is responsible for medium complexity features and it integrates information about the shapes and relative positions of multiple contour elements
  • 识别的类别:color shape...
  • 识别的物体的复杂度,抽象的层级,整合的层级都比低层信息更高。

10 识别2 TE

  • capable of object recognition has to fulfill two seemingly conflicting requirements, i.e., selectivity and invariance.
  • On the one hand, neurons have to distinguish between different objects to provide information about object identity
  • On the other hand, this system also has to treat highly dissimilar retinal images of the same object as equivalent, and must therefore be insensitive to transformations in the retinal image that occur in natural vision (e.g., changes in position, illumination, retinal size, and so on).
  • TE区具备:size invariance ,cue i, position i,occlusion invariance 物体离的远近看到的不管大小都可以识别;看到部分即可识别(管中窥豹可见一斑);物体的姿势不同,角度不同均可识别;物体被遮挡也可识别

11 运动 动作

  • areas located in the dorsal stream are functionally related to different effectors:
  • LIP is involved in eye movements,
  • Medial Intraparietal Area (MIP) in arm movements,
  • AIP in hand movements (grasping), and
  • MST and VIP in body movements (self-motion)
  • Area MST is concerned with self-motion, both for movement of the head (or body) in space and movement of the eye in the head

12 运动感知

  • combination of retino-centric receptive fields and eye position modulation provides a population code in LIP that can represent the location of a stimulus in head-centric coordinates(LIP)
  • 眼睛为中心的方位判断(结合眼动感知)到以头为中心的判断
  • VIP is likely to be involved in self- motion, control of head movements, and the encoding of near-extrapersonal (head centered) space which link tactile and visual fields.
  • activity of MIP neurons mainly reflects the movement plan(活动计划) toward the target and not merely the location of the target or visual attention evoked by the target appearance

13 抓取

  • Some AIP neurons respond during object fixation and grasping, but not during grasping in the dark (visual-dominant neurons);
  • other AIP neurons do not respond during object fixation but only when the object is grasped, even in the dark (motor- dominant neurons),
  • a third class of AIP neurons responds during object fixation and grasping and during grasping in the dark (visuo-motor neurons
  • AIP are sensitive to the 2D and 3D features of the object and shape of the hand (in a light or dark environment) relevant for grasping


三 按照功能进行相关脑区的梳理


  • single-opponent cells in LGN establish the two color axes red-green and blue-yellow, thereby sharpening the wavelength tuning and achieving some invariance to luminance.
  • Double-opponent cells provide the means to take nearby colors into account for color contrast.
  • V4, hue is encoded, which spans the full color space.
  • final step is IT, where there exists an association of color with form


  • 动态性 复杂性 各个不变特征抽象层级
  • Edges are the primary features used to represent objects, it seems.
  • In V1, they are defined as boundaries between dark and light or between different color hues;
  • in V2, contours may also be defined by texture boundaries and these cells respond to illusory contours;
  • in V4, contours may even be defined by differences in motion
  • Possible solutions to the binding problem(哪些线条是一个物体的?) are tuning of cells to conjunctions of features, spatial attention, and temporal synchronization
  • 3d 信息 depth 处理的层级:差异的绝对、相对、0阶、1阶、2阶
  • selectivity for zero-, first-, and second-order disparities can be measured


  • The coding of 3D shape in AIP is
  • faster (shorter latencies),
  • coarser (less sensitivity to discontinuities in the surfaces),
  • less categorical, and
  • more boundary based (less influence of the surface information)
  • compared to IT(IT区域的3d识别特性与以上相反)
  • AIP快速判断,逃避危险
  • IT准确识别判断思考


  • analysis of motion in the primate visual system proceeds in a hierarchy from V1 (local spatiotem- poral filtering) to MT (2D motion) to MST (self-motion, motion in world coordinates)
  • The representation shifts from one of motion in the visual field (V1, MT) to one of motion in the world and motion of oneself in the world (MST)
  • motion processing is combined with disparity (MT, MST), eye movement information (MST), and vestibular signals


  • Object recognition goes beyond simple 2D-shape perception in several aspects:
  • integration of different cues and modalities,
  • invariance to in-depth rotation and articulated movement, use of context. 不同的旋转和角度及运动中均能识别
  • It is also important to distinguish between-class discrimination (object categorization) and within-class discrimination of objects.
  • edges can be defined by luminance in V1, by textures in V2, and by differences in motion in V4
  • 角度识别不变特性:
  • a small fraction of IT neurons can exhibit some rotation invariance and speed of recognition of familiar objects does not depend on the rotation angle [79].
  • A particular case are face sensitive neurons that can show a rather large invariance to rotations in depth.
  • Representations of the same object under different angles that are presumably combined into a rotation invariant representation like simple cell responses might be combined into a complex cell response


  • Context plays a major role in object recognition [124]
  • and can be of different nature—semantic, spatial configuration, or pose
  • and is, at least partially, provided by higher areas beyond IT
  • objects also help to recognize the context and context may be defined on a crude statistical level [124].


  • visual information is combined with vestibular (in MST, VIP), auditory (in LIP), somatosensory (in VIP), and proprioceptive or motor feedback signals (MST and VIP for smooth eye movements, LIP for saccades, MST/VIP/7A/MIP for eye position)
  • LIP represents salience in the visual scene as a target signal for eye movements, MIP and AIP provide information for reaching (target signals) and grasping (shape signals). LIP and VIP provide information for the control of self-motion.

四 动画演示相关功能


  • http://homepages.inf.ed.ac.uk/jbednar/spinning.html


五 视觉功能发育时间阶段

本文分享自微信公众号 - CreateAMind(createamind),作者:zdx3578

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。




0 条评论
登录 后参与评论


  • A Theory of State Abstraction for Reinforcement Learning

    A Theory of State Abstraction for Reinforcement Learning

  • State Abstraction as 压缩 in Apprenticeship Learning

    State Abstraction as Compression in Apprenticeship Learning https://github.com/d...

  • STCN


  • 测量异构信息网络的多样性(CS AI)


  • 避免护理机器人对痴呆症患者的不当治疗(CS RO)


  • 不完全免疫算法简介HEIA--AIS学习笔记2

    However, most MOIAs only adopt a single hypermutation operator to evolve each cl...

  • Woocommerce Trends 2020

    Top Woocommerce Trends To Follow In 2020. If you have an online store and missed...

  • 通过多摄像头域自适应进行无监督车辆计数(CS CV)


  • 使用梵文语法改善具有数字词源的多语言国家的电子治理和移动治理(CS.CY)


  • Monolithic vs Microservice Architecture- Pros and Cons

    The hassle that large scale enterprise applications under development bring to t...