前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >计算机视觉学术速递[9.2]

计算机视觉学术速递[9.2]

作者头像
公众号-arXiv每日学术速递
发布2021-09-16 15:10:05
1.1K0
发布2021-09-16 15:10:05
举报

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.CV 方向,今日共计38篇

检测相关(1篇)

【1】 Two-step Domain Adaptation for Mitosis Cell Detection in Histopathology Images 标题:两步域自适应用于组织病理学图像中有丝分裂细胞的检测 链接:https://arxiv.org/abs/2109.00109

作者:Ramin Nateghi,Fattaneh Pourakpour 机构:Electrical and Electronics Engineering Department, Shiraz University of Technology, Shiraz, Iran, Iranian Brain Mapping Lab, National Brain Mapping Laboratory, Tehran, Iran 摘要:我们提出了一种基于快速RCNN和卷积神经网络(CNN)的两步域平移不变有丝分裂细胞检测方法。我们使用染色增强技术生成现有组织病理学图像的各种域转移版本,使我们的方法能够有效地学习各种染色域并实现更好的泛化。在MIDOG-2021挑战赛的初步测试数据集上评估了我们方法的性能。实验结果表明,所提出的有丝分裂检测方法可以实现良好的性能领域转移组织病理学图像。 摘要:We propose a two-step domain shift-invariant mitosis cell detection method based on Faster RCNN and a convolutional neural network (CNN). We generate various domain-shifted versions of existing histopathology images using a stain augmentation technique, enabling our method to effectively learn various stain domains and achieve better generalization. The performance of our method is evaluated on the preliminary test data set of the MIDOG-2021 challenge. The experimental results demonstrate that the proposed mitosis detection method can achieve promising performance for domain-shifted histopathology images.

分类|识别相关(3篇)

【1】 A Weakly-Supervised Surface Crack Segmentation Method using Localisation with a Classifier and Thresholding 标题:基于分类器和阈值定位的弱监督表面裂纹分割方法 链接:https://arxiv.org/abs/2109.00456

作者:Jacob König,Mark Jenkins,Mike Mannion,Peter Barrie,Gordon Morison 机构:edge detection [9] and Wavelet decomposition [ 10] to segmentThe authors are with the School of Computing, Glasgow Caledonian University 摘要:表面裂缝是当今公共基础设施中常见的现象。最近的工作通过使用机器学习方法支持结构维护措施来解决这一问题,机器学习方法将表面裂纹从背景中分割出来,以便易于定位。然而,这些方法的一个常见问题是,为了创建功能良好的算法,训练数据需要具有属于裂纹的像素的详细注释。我们的工作提出了一种弱监督方法,该方法利用CNN分类器创建表面裂纹分割图。我们使用该分类器通过使用其类激活图和基于面片的分类方法创建粗糙裂纹定位图,并将其与基于阈值的方法融合,以分割大部分较暗的裂纹像素。该分类器有助于抑制背景区域的噪声,这些噪声通常通过标准阈值方法被错误地突出显示为裂纹。我们关注的是该方法的易实现性,结果表明,该方法在多个表面裂纹数据集上表现良好,能够有效地分割裂纹,即使用于训练的唯一数据是简单的分类标签。 摘要:Surface cracks are a common sight on public infrastructure nowadays. Recent work has been addressing this problem by supporting structural maintenance measures using machine learning methods which segment surface cracks from their background so that they are easy to localize. However, a common issue with those methods is that to create a well functioning algorithm, the training data needs to have detailed annotations of pixels that belong to cracks. Our work proposes a weakly supervised approach which leverages a CNN classifier to create surface crack segmentation maps. We use this classifier to create a rough crack localisation map by using its class activation maps and a patch based classification approach and fuse this with a thresholding based approach to segment the mostly darker crack pixels. The classifier assists in suppressing noise from the background regions, which commonly are incorrectly highlighted as cracks by standard thresholding methods. We focus on the ease of implementation of our method and it is shown to perform well on several surface crack datasets, segmenting cracks efficiently even though the only data that was used for training were simple classification labels.

【2】 BVMatch: Lidar-based Place Recognition Using Bird's-eye View Images 标题:BVMatch:基于激光雷达的鸟瞰图像位置识别 链接:https://arxiv.org/abs/2109.00317

作者:Lun Luo,Si-Yuan Cao,Bin Han,Hui-Liang Shen,Junwei Li 机构:and also with the Ningbo Research Institute 备注:None 摘要:由于点云数据的稀疏性,在大规模环境中使用激光雷达识别地点具有挑战性。在本文中,我们提出了BVMatch,一个基于激光雷达的帧到帧位置识别框架,能够估计二维相对姿态。基于地面区域可以近似为平面的假设,我们将地面区域均匀离散为网格,并将三维激光雷达扫描投影到鸟瞰图(BV)图像。我们进一步使用一组Log-Gabor滤波器构建最大索引映射(MIM),对图像中结构的方向信息进行编码。我们从理论上分析了MIM的方向特性,并引入了一种称为鸟瞰视图特征变换(BVFT)的新描述符。所提出的BVFT对BV图像的旋转和强度变化不敏感。利用BVFT描述符,我们将激光雷达位置识别和姿态估计任务统一到BVMatch框架中。在三个大规模数据集上进行的实验表明,BVMatch在位置识别的召回率和姿势估计精度方面都优于现有的方法。 摘要:Recognizing places using Lidar in large-scale environments is challenging due to the sparse nature of point cloud data. In this paper we present BVMatch, a Lidar-based frame-to-frame place recognition framework, that is capable of estimating 2D relative poses. Based on the assumption that the ground area can be approximated as a plane, we uniformly discretize the ground area into grids and project 3D Lidar scans to bird's-eye view (BV) images. We further use a bank of Log-Gabor filters to build a maximum index map (MIM) that encodes the orientation information of the structures in the images. We analyze the orientation characteristics of MIM theoretically and introduce a novel descriptor called bird's-eye view feature transform (BVFT). The proposed BVFT is insensitive to rotation and intensity variations of BV images. Leveraging the BVFT descriptors, we unify the Lidar place recognition and pose estimation tasks into the BVMatch framework. The experiments conducted on three large-scale datasets show that BVMatch outperforms the state-of-the-art methods in terms of both recall rate of place recognition and pose estimation accuracy.

【3】 An Empirical Study on the Joint Impact of Feature Selection and Data Resampling on Imbalance Classification 标题:特征选择和数据重采样对不平衡分类联合影响的实证研究 链接:https://arxiv.org/abs/2109.00201

作者:Chongsheng Zhang,Paolo Soda,Jingjun Bi,Gaojuan Fan,George Almpanidis,Salvador Garcia 机构:School of Computer and Information Engineering, Henan University, China, Department of Engineering, University Campus Bio-Medico of Rome, Italy, Department of Computer Science and Artificial Intelligence, University of Granada, Spain 备注:25 pages, 12 figures 摘要:现实世界的数据集通常呈现不同程度的不平衡(即长尾或倾斜)分布。虽然大多数(又称头部或频繁)类有足够的样本,但少数(又称尾部或罕见)类的样本数量可能不足。一方面,数据重采样是解决类不平衡的常用方法。另一方面,降维是一种传统的机器学习技术,用于在数据集上建立更强的分类模型,它减少了特征空间。然而,在高性能不平衡分类中,特征选择和数据重采样之间可能的协同作用以前很少被研究。为了解决这一问题,本文对特征选择和重采样对两类不平衡分类的联合影响进行了全面的实证研究。具体来说,我们研究了两条相反的管道在不平衡分类中的性能,即在数据重采样之前或之后应用特征选择。我们在52个公开数据集上进行了大量实验(共9225个实验),使用了9种特征选择方法、6种用于类不平衡学习的重采样方法和3种著名的分类算法。实验结果表明,这两条管道之间不存在恒定的赢家,因此,应考虑这两条管道来推导性能最佳的不平衡分类模型。我们还发现,不平衡分类模型的性能取决于所采用的分类器、多数样本数与少数样本数之比(IR)以及样本数与特征数之比(SFR)。总的来说,本研究为研究者和实践者提供了新的参考价值。 摘要:Real-world datasets often present different degrees of imbalanced (i.e., long-tailed or skewed) distributions. While the majority (a.k.a., head or frequent) classes have sufficient samples, the minority (a.k.a., tail or rare) classes can be under-represented by a rather limited number of samples. On one hand, data resampling is a common approach to tackling class imbalance. On the other hand, dimension reduction, which reduces the feature space, is a conventional machine learning technique for building stronger classification models on a dataset. However, the possible synergy between feature selection and data resampling for high-performance imbalance classification has rarely been investigated before. To address this issue, this paper carries out a comprehensive empirical study on the joint influence of feature selection and resampling on two-class imbalance classification. Specifically, we study the performance of two opposite pipelines for imbalance classification, i.e., applying feature selection before or after data resampling. We conduct a large amount of experiments (a total of 9225 experiments) on 52 publicly available datasets, using 9 feature selection methods, 6 resampling approaches for class imbalance learning, and 3 well-known classification algorithms. Experimental results show that there is no constant winner between the two pipelines, thus both of them should be considered to derive the best performing model for imbalance classification. We also find that the performance of an imbalance classification model depends on the classifier adopted, the ratio between the number of majority and minority samples (IR), as well as on the ratio between the number of samples and features (SFR). Overall, this study should provide new reference value for researchers and practitioners in imbalance learning.

分割|语义相关(5篇)

【1】 Joint Graph Learning and Matching for Semantic Feature Correspondence 标题:语义特征对应的联合图学习与匹配 链接:https://arxiv.org/abs/2109.00240

作者:He Liu,Tao Wang,Yidong Li,Congyan Lang,Yi Jin,Haibin Ling 机构:Stony BrookUniversity 摘要:近年来,在通过图神经网络(GNN)模型学习判别表示的基础上,深度图匹配方法在语义特征匹配方面取得了很大的进展。然而,这些方法通常依赖于启发式生成的图模式,这可能会引入不可靠的关系,从而影响匹配性能。在本文中,我们提出了一个名为GLAM的联合图学习和匹配网络,以探索可靠的图结构来促进图匹配。GLAM采用了一个纯粹的基于注意的图形学习和匹配框架。具体来说,它采用了两种注意机制,自我注意和交叉注意。自我注意发现特征之间的关系,并在所学结构上进一步更新特征表征;交叉注意计算要匹配的两个特征集之间的交叉图相关性,以进行特征重建。此外,最终的匹配解直接来自交叉注意层的输出,而不采用特定的匹配决策模块。在三个流行的视觉匹配基准(Pascal VOC、Willow Object和SPair-71k)上对所提出的方法进行了评估,它在所有基准上都比以前最先进的图形匹配方法有显著的优势。此外,我们的模型学习到的图形模式通过用学习到的图形结构替换手工绘制的图形结构,能够显著增强以前的深度图匹配方法。 摘要:In recent years, powered by the learned discriminative representation via graph neural network (GNN) models, deep graph matching methods have made great progresses in the task of matching semantic features. However, these methods usually rely on heuristically generated graph patterns, which may introduce unreliable relationships to hurt the matching performance. In this paper, we propose a joint \emph{graph learning and matching} network, named GLAM, to explore reliable graph structures for boosting graph matching. GLAM adopts a pure attention-based framework for both graph learning and graph matching. Specifically, it employs two types of attention mechanisms, self-attention and cross-attention for the task. The self-attention discovers the relationships between features and to further update feature representations over the learnt structures; and the cross-attention computes cross-graph correlations between the two feature sets to be matched for feature reconstruction. Moreover, the final matching solution is directly derived from the output of the cross-attention layer, without employing a specific matching decision module. The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k), and it outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks. Furthermore, the graph patterns learnt by our model are validated to be able to remarkably enhance previous deep graph matching methods by replacing their handcrafted graph structures with the learnt ones.

【2】 Contrastive Multiview Coding with Electro-optics for SAR Semantic Segmentation 标题:基于光电对比多视角编码的SAR语义分割 链接:https://arxiv.org/abs/2109.00120

作者:Keumgang Cha,Junghoon Seo,Yeji Choi 备注:To be appeared in IEEE GRSL. DOI to be updated 摘要:在深度学习模型的训练中,模型参数的初始化对模型性能、样本效率和收敛速度有很大的影响。用于模型初始化的表示学习近年来在遥感领域得到了广泛的研究。特别是,使用合成孔径雷达(SAR)传感器获得的图像的外观特征与普通光电(EO)图像的外观特征截然不同,因此表征学习在遥感领域更为重要。基于对比多视角编码,我们提出了SAR语义分割的多模态表示学习方法。与以前的研究不同,我们的方法联合使用EO图像、SAR图像和标签遮罩。实验表明,该方法在模型性能、采样效率和收敛速度等方面均优于现有方法。 摘要:In the training of deep learning models, how the model parameters are initialized greatly affects the model performance, sample efficiency, and convergence speed. Representation learning for model initialization has recently been actively studied in the remote sensing field. In particular, the appearance characteristics of the imagery obtained using the a synthetic aperture radar (SAR) sensor are quite different from those of general electro-optical (EO) images, and thus representation learning is even more important in remote sensing domain. Motivated from contrastive multiview coding, we propose multi-modal representation learning for SAR semantic segmentation. Unlike previous studies, our method jointly uses EO imagery, SAR imagery, and a label mask. Several experiments show that our approach is superior to the existing methods in model performance, sample efficiency, and convergence speed.

【3】 Looking at the whole picture: constrained unsupervised anomaly segmentation 标题:纵观全局:约束无监督异常分割 链接:https://arxiv.org/abs/2109.00482

作者:Julio Silva-Rodríguez,Valery Naranjo,Jose Dolz 机构:Institute of Transport and Territory, Universitat Politècnica de València, Valencia, Spain, Institute of Research and Innovation in, Bioengineering, LIVIA Laboratory, École de Technologie Supérieure (ETS), Montreal, Canada 摘要:目前的无监督异常定位方法依赖生成模型来学习正态图像的分布,然后利用生成模型识别由重建图像上的错误导致的潜在异常区域。然而,几乎所有现有文献的一个主要限制是需要使用异常图像来设置特定于类别的阈值来定位异常。这限制了它们在现实场景中的可用性,在现实场景中通常只能访问普通数据。尽管存在这一主要缺陷,但只有少数作品通过在训练期间整合注意力地图的监督来解决这一限制。在这项工作中,我们提出了一种新的公式,它不需要访问异常图像来定义阈值。此外,与最近的工作相比,所提出的约束是以更具原则性的方式制定的,利用了约束优化中的众所周知的知识。特别是,先前工作中注意图上的等式约束被不等式约束所取代,这允许更大的灵活性。此外,为了解决基于惩罚的函数的局限性,我们采用了流行的log-barrier方法的扩展来处理约束。在流行的BRATS'19数据集上进行的综合实验表明,所提出的方法大大优于相关文献,为无监督病变分割建立了新的最新结果。 摘要:Current unsupervised anomaly localization approaches rely on generative models to learn the distribution of normal images, which is later used to identify potential anomalous regions derived from errors on the reconstructed images. However, a main limitation of nearly all prior literature is the need of employing anomalous images to set a class-specific threshold to locate the anomalies. This limits their usability in realistic scenarios, where only normal data is typically accessible. Despite this major drawback, only a handful of works have addressed this limitation, by integrating supervision on attention maps during training. In this work, we propose a novel formulation that does not require accessing images with abnormalities to define the threshold. Furthermore, and in contrast to very recent work, the proposed constraint is formulated in a more principled manner, leveraging well-known knowledge in constrained optimization. In particular, the equality constraint on the attention maps in prior work is replaced by an inequality constraint, which allows more flexibility. In addition, to address the limitations of penalty-based functions we employ an extension of the popular log-barrier methods to handle the constraint. Comprehensive experiments on the popular BRATS'19 dataset demonstrate that the proposed approach substantially outperforms relevant literature, establishing new state-of-the-art results for unsupervised lesion segmentation.

【4】 ImageTBAD: A 3D Computed Tomography Angiography Image Dataset for Automatic Segmentation of Type-B Aortic Dissection 标题:ImageTBAD:用于B型主动脉夹层自动分割的三维CT血管成像数据集 链接:https://arxiv.org/abs/2109.00374

作者:Zeyang Yao,Jiawei Zhang,Hailong Qiu,Tianchen Wang,Yiyu Shi,Jian Zhuang,Yuhao Dong,Meiping Huang,Xiaowei Xu 机构:School of Medicine, South China University of Technology, Guangzhou, China , School of Computer Science, Fudan University, Shanghai, China , Department of Cardiovascular Surgery, Guangdong Provincial People’s Hospital, Department of Computer Science and Engineering 摘要:B型主动脉夹层(TBAD)是最严重的心血管事件之一,其特点是发病率逐年上升,疾病预后严重。目前,CT血管造影(CTA)已被广泛用于TBAD的诊断和预后。CTA中真腔(TL)、假腔(FL)和假腔血栓(FLT)的精确分割对于精确量化解剖特征至关重要。然而,现有工程仅关注TL和FL,而不考虑FLT。在本文中,我们提出了ImageTBAD,这是第一个带有TL、FL和FLT注释的TBAD三维CT血管造影(CTA)图像数据集。建议的数据集包含100 TBAD CTA图像,与现有医学成像数据集相比,其大小相当。由于FLT几乎可以出现在形状不规则的主动脉上的任何位置,因此FLT的分割提出了一类广泛的分割问题,其中目标存在于形状不规则的各种位置。我们进一步提出了一种用于TBAD自动分割的基线方法。结果表明,基线方法可以取得与现有的主动脉和TL分割工作相当的结果。然而,FLT的分割准确率只有52%,这留下了很大的改进空间,也显示了我们数据集的挑战。为了促进对这一挑战性问题的进一步研究,我们向公众发布了数据集和代码。 摘要:Type-B Aortic Dissection (TBAD) is one of the most serious cardiovascular events characterized by a growing yearly incidence,and the severity of disease prognosis. Currently, computed tomography angiography (CTA) has been widely adopted for the diagnosis and prognosis of TBAD. Accurate segmentation of true lumen (TL), false lumen (FL), and false lumen thrombus (FLT) in CTA are crucial for the precise quantification of anatomical features. However, existing works only focus on only TL and FL without considering FLT. In this paper, we propose ImageTBAD, the first 3D computed tomography angiography (CTA) image dataset of TBAD with annotation of TL, FL, and FLT. The proposed dataset contains 100 TBAD CTA images, which is of decent size compared with existing medical imaging datasets. As FLT can appear almost anywhere along the aorta with irregular shapes, segmentation of FLT presents a wide class of segmentation problems where targets exist in a variety of positions with irregular shapes. We further propose a baseline method for automatic segmentation of TBAD. Results show that the baseline method can achieve comparable results with existing works on aorta and TL segmentation. However, the segmentation accuracy of FLT is only 52%, which leaves large room for improvement and also shows the challenge of our dataset. To facilitate further research on this challenging problem, our dataset and codes are released to the public.

【5】 Uncertainty Quantified Deep Learning for Predicting Dice Coefficient of Digital Histopathology Image Segmentation 标题:预测数字组织病理学图像分割骰子系数的不确定性量化深度学习 链接:https://arxiv.org/abs/2109.00115

作者:Sambuddha Ghosal,Audrey Xie,Pratik Shah 机构:Massachusetts Institute of Technology, Program in Media, Arts and Sciences and Media Laboratory, Ames Street, Cambridge MA , United States 备注:Submitted to the 2022 IEEE International Symposium on Biomedical Imaging (ISBI) scientific conference 摘要:深度学习模型(DLMs)可以在医学图像分割和分类任务中实现最先进的性能。然而,不为其预测提供反馈(如Dice系数(Dice))的DLM在现实世界临床环境中的部署潜力有限。不确定性评估可以通过确定需要进一步审查但在计算上仍然无法部署的预测来增加这些自动化系统的可信度。在这项研究中,我们使用具有随机初始化权重的DLM和蒙特卡罗脱落(MCD)从显微镜下苏木精和伊红(H&E)染色的前列腺核心活检RGB图像分割肿瘤。我们设计了一种新的方法,使用单个图像(而不是整个图像)的多个基于临床区域的不确定性来预测线性模型输出的DLM模型的骰子。生成图像级不确定性图,并显示不完善的模型分割与特定前列腺组织区域(有或无肿瘤)相关的高度不确定性之间的对应关系。本研究结果表明,线性模型可以学习量化深度学习的不确定性系数和相关性((Spearman相关性(p<0.05))来预测医学图像特定区域的Dice分数。 摘要:Deep learning models (DLMs) can achieve state of the art performance in medical image segmentation and classification tasks. However, DLMs that do not provide feedback for their predictions such as Dice coefficients (Dice) have limited deployment potential in real world clinical settings. Uncertainty estimates can increase the trust of these automated systems by identifying predictions that need further review but remain computationally prohibitive to deploy. In this study, we use a DLM with randomly initialized weights and Monte Carlo dropout (MCD) to segment tumors from microscopic Hematoxylin and Eosin (H&E) dye stained prostate core biopsy RGB images. We devise a novel approach that uses multiple clinical region based uncertainties from a single image (instead of the entire image) to predict Dice of the DLM model output by linear models. Image level uncertainty maps were generated and showed correspondence between imperfect model segmentation and high levels of uncertainty associated with specific prostate tissue regions with or without tumors. Results from this study suggest that linear models can learn coefficients of uncertainty quantified deep learning and correlations ((Spearman's correlation (p<0.05)) to predict Dice scores of specific regions of medical images.

半弱无监督|主动学习|不确定性(2篇)

【1】 EventPoint: Self-Supervised Local Descriptor Learning for Event Cameras 标题:EventPoint:事件摄像机的自监督局部描述符学习 链接:https://arxiv.org/abs/2109.00210

作者:Ze Huang,Songzhi Su,Henry Zhang,Kevin Sun 摘要:提出了一种基于帧的事件数据的自监督学习方法,即事件点提取方法。与事件数据的其他特征提取方法不同,我们在真实数据驱动的数据集——DSEC上用我们提出的自监督学习方法来训练我们的模型,训练过程充分考虑事件数据的特性,以验证我们的工作的有效性,我们进行了几个完整的评估:我们模拟了DART并在N-caltech101数据集上进行了特征匹配实验,结果表明EventPoint的效果优于DART;我们使用UZH提供的Vid2e工具将Oxford robotcar数据转换为基于事件的格式,并结合提供的INS信息进行SLAM中重要的全局姿态估计实验。据我们所知,这是第一次完成这项具有挑战性的任务。足够的实验数据表明,EventPoint可以在CPU上实现实时性的同时获得更好的结果。 摘要:We proposes a method of extracting intrest points and descriptors using self-supervised learning method on frame-based event data, which is called EventPoint. Different from other feature extraction methods on event data, we train our model on real event-form driving dataset--DSEC with the self-supervised learning method we proposed, the training progress fully consider the characteristics of event data.To verify the effectiveness of our work,we conducted several complete evaluations: we emulated DART and carried out feature matching experiments on N-caltech101 dataset, the results shows that the effect of EventPoint is better than DART; We use Vid2e tool provided by UZH to convert Oxford robotcar data into event-based format, and combined with INS information provided to carry out the global pose estimation experiment which is important in SLAM. As far as we know, this is the first work to carry out this challenging task.Sufficient experimental data show that EventPoint can get better results while achieve real time on CPU.

【2】 Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds 标题:三维点云的时空自监督表示学习 链接:https://arxiv.org/abs/2109.00179

作者:Siyuan Huang,Yichen Xie,Song-Chun Zhu,Yixin Zhu 机构:University of California, Los Angeles , Shanghai Jiao Tong University, Beijing Institute for General Artificial Intelligence , Peking University , Tsinghua University 备注:ICCV 2021 摘要:到目前为止,各种3D场景理解任务仍然缺乏实用且可概括的预训练模型,这主要是由于3D场景理解任务的复杂性以及摄像机视图、照明、遮挡等带来的巨大变化,我们通过引入时空表示学习(STRL)框架来应对这一挑战,该框架能够以自我监督的方式从未标记的3D点云中学习。受婴儿如何在野外从视觉数据中学习的启发,我们探索了从3D数据中获得的丰富的时空线索。具体地说,STRL从一个3D点云序列中获取两个时间相关的帧作为输入,用空间数据扩充对其进行变换,并自我监督地学习不变表示。为了证实STRL的有效性,我们对三种类型(合成、室内和室外)的数据集进行了广泛的实验。实验结果表明,与有监督学习方法相比,学习的自监督表示方法有助于各种模型获得可比甚至更好的性能,同时能够将预先训练的模型推广到下游任务,包括三维形状分类、三维目标检测、,三维语义分割。此外,嵌入在三维点云中的时空上下文线索显著改善了学习表示。 摘要:To date, various 3D scene understanding tasks still lack practical and generalizable pre-trained models, primarily due to the intricate nature of 3D scene understanding tasks and their immense variations introduced by camera views, lighting, occlusions, etc. In this paper, we tackle this challenge by introducing a spatio-temporal representation learning (STRL) framework, capable of learning from unlabeled 3D point clouds in a self-supervised fashion. Inspired by how infants learn from visual data in the wild, we explore the rich spatio-temporal cues derived from the 3D data. Specifically, STRL takes two temporally-correlated frames from a 3D point cloud sequence as the input, transforms it with the spatial data augmentation, and learns the invariant representation self-supervisedly. To corroborate the efficacy of STRL, we conduct extensive experiments on three types (synthetic, indoor, and outdoor) of datasets. Experimental results demonstrate that, compared with supervised learning methods, the learned self-supervised representation facilitates various models to attain comparable or even better performances while capable of generalizing pre-trained models to downstream tasks, including 3D shape classification, 3D object detection, and 3D semantic segmentation. Moreover, the spatio-temporal contextual cues embedded in 3D point clouds significantly improve the learned representations.

时序|行为识别|姿态|视频|运动估计(4篇)

【1】 Memory Based Video Scene Parsing 标题:基于记忆的视频场景解析 链接:https://arxiv.org/abs/2109.00373

作者:Zhenchao Jin,Dongdong Yu,Kai Su,Zehuan Yuan,Changhu Wang 机构:University of Science and Technology of China, ByteDance 备注:technical report for "The 1st Video Scene Parsing in the Wild Challenge Workshop". arXiv admin note: text overlap with arXiv:2108.11819 摘要:视频场景解析是计算机视觉中一项长期存在的具有挑战性的任务,其目的是为给定视频中所有帧的像素分配预定义的语义标签。与图像语义分割相比,本课题更注重研究如何利用时间信息来获得更高的预测精度。在本报告中,我们介绍了我们在Wild Challenge中的第一个视频场景解析解决方案,该解决方案实现了57.44的mIoU,并获得了第二名(我们的团队名称为CharlesBLWX)。 摘要:Video scene parsing is a long-standing challenging task in computer vision, aiming to assign pre-defined semantic labels to pixels of all frames in a given video. Compared with image semantic segmentation, this task pays more attention on studying how to adopt the temporal information to obtain higher predictive accuracy. In this report, we introduce our solution for the 1st Video Scene Parsing in the Wild Challenge, which achieves a mIoU of 57.44 and obtained the 2nd place (our team name is CharlesBLWX).

【2】 Category-Level Metric Scale Object Shape and Pose Estimation 标题:类别级度量尺度对象形状和姿态估计 链接:https://arxiv.org/abs/2109.00326

作者:Taeyeop Lee,Byeong-Uk Lee,Myungchul Kim,In So Kweon 备注:IEEE Robotics and Automation Letters (RA-L). Preprint Version. Accepted August, 2021 摘要:随着深度学习识别技术的发展,2D图像能够实现精确的目标检测。然而,这些2D感知方法不足以获得完整的3D世界信息。同时,先进的三维形状估计方法专注于形状本身,而不考虑度量尺度。这些方法无法确定对象的准确位置和方向。为了解决这个问题,我们提出了一个框架,从单个RGB图像中联合估计度量尺度的形状和姿势。我们的框架有两个分支:度量尺度对象形状分支(MSOS)和规范化对象坐标空间分支(NOCS)。MSOS分支估计在摄影机坐标中观察到的公制比例形状。NOCS分支预测归一化对象坐标空间(NOCS)贴图,并从预测的公制比例网格对渲染深度贴图执行相似性变换,以获得6d姿势和大小。此外,我们还引入了归一化对象中心估计(NOCE)来估计从相机到对象中心的几何对齐距离。我们在合成和真实数据集上验证了我们的方法,以评估类别级别的对象姿势和形状。 摘要:Advances in deep learning recognition have led to accurate object detection with 2D images. However, these 2D perception methods are insufficient for complete 3D world information. Concurrently, advanced 3D shape estimation approaches focus on the shape itself, without considering metric scale. These methods cannot determine the accurate location and orientation of objects. To tackle this problem, we propose a framework that jointly estimates a metric scale shape and pose from a single RGB image. Our framework has two branches: the Metric Scale Object Shape branch (MSOS) and the Normalized Object Coordinate Space branch (NOCS). The MSOS branch estimates the metric scale shape observed in the camera coordinates. The NOCS branch predicts the normalized object coordinate space (NOCS) map and performs similarity transformation with the rendered depth map from a predicted metric scale mesh to obtain 6d pose and size. Additionally, we introduce the Normalized Object Center Estimation (NOCE) to estimate the geometrically aligned distance from the camera to the object center. We validated our method on both synthetic and real-world datasets to evaluate category-level object pose and shape.

【3】 Spatio-Temporal Perturbations for Video Attribution 标题:视频属性的时空扰动研究 链接:https://arxiv.org/abs/2109.00222

作者:Zhenqiang Li,Weimin Wang,Zuoyue Li,Yifei Huang,Yoichi Sato 机构:jp)Weimin Wang is with the DUT-RU International School of InformationScience and Engineering, Dalian University of Technology 备注:None 摘要:归因方法通过识别和可视化控制网络输出的输入区域/像素,为以视觉方式解释不透明神经网络提供了方向。关于视觉解释视频理解网络的归因方法,由于视频输入中存在独特的时空依赖性以及视频理解网络的特殊三维卷积或循环结构,因此具有挑战性。然而,现有的归因方法大多侧重于解释以单个图像为输入的网络,少数专门为视频归因设计的作品缺乏处理视频理解网络的多样化结构。在本文中,我们研究了一种通用的基于扰动的归因方法,该方法适用于多种视频理解网络。此外,我们还提出了一个新的正则化项,通过在空间和时间维度上约束属性结果的平滑度来增强该方法。为了在不依赖人工判断的情况下评估不同视频归因方法的有效性,我们引入了可靠的客观指标,并通过一种新提出的可靠性度量进行检查。我们通过主观和客观评价以及与多个显著归因方法的比较,验证了我们方法的有效性。 摘要:The attribution method provides a direction for interpreting opaque neural networks in a visual way by identifying and visualizing the input regions/pixels that dominate the output of a network. Regarding the attribution method for visually explaining video understanding networks, it is challenging because of the unique spatiotemporal dependencies existing in video inputs and the special 3D convolutional or recurrent structures of video understanding networks. However, most existing attribution methods focus on explaining networks taking a single image as input and a few works specifically devised for video attribution come short of dealing with diversified structures of video understanding networks. In this paper, we investigate a generic perturbation-based attribution method that is compatible with diversified video understanding networks. Besides, we propose a novel regularization term to enhance the method by constraining the smoothness of its attribution results in both spatial and temporal dimensions. In order to assess the effectiveness of different video attribution methods without relying on manual judgement, we introduce reliable objective metrics which are checked by a newly proposed reliability measurement. We verified the effectiveness of our method by both subjective and objective evaluation and comparison with multiple significant attribution methods.

【4】 An Integrated Framework for the Heterogeneous Spatio-Spectral-Temporal Fusion of Remote Sensing Images 标题:一种遥感图像异质时空融合的集成框架 链接:https://arxiv.org/abs/2109.00400

作者:Menghui Jiang,Huanfeng Shen,Jie Li,Liangpei Zhang 机构:School of Resource and Environmental Science, Wuhan University, P. R. China, School of Geodesy and Geomatics, Wuhan University, P. R. China, Collaborative Innovation Center of Geospatial Technology, Wuhan University, P. R. China 摘要:图像融合技术广泛应用于多源遥感图像之间的互补信息融合。受深度学习前沿的启发,本文首先提出了一种基于新型深度剩余周期GAN的异构集成框架。该网络由前向融合部分和后向退化反馈部分组成。前向部分根据各种观测结果生成所需的融合结果;后向退化反馈部分考虑了成像退化过程,并根据融合结果反向地重新生成观测值。该网络不仅能有效融合同质信息,而且能有效融合异质信息。此外,首次提出了一种异构集成融合框架,以同时融合多源异构观测的互补异构空间、光谱和时间信息。提出的异构集成框架还提供了一种统一的模式,可以完成各种融合任务,包括异构空间光谱融合、时空融合和异构空间光谱时间融合。实验针对土地覆盖变化和厚云覆盖这两种具有挑战性的情景进行。实验中使用了许多遥感卫星的图像,包括MODIS、Landsat-8、Sentinel-1和Sentinel-2。定性和定量评价都证实了该方法的有效性。 摘要:Image fusion technology is widely used to fuse the complementary information between multi-source remote sensing images. Inspired by the frontier of deep learning, this paper first proposes a heterogeneous-integrated framework based on a novel deep residual cycle GAN. The proposed network consists of a forward fusion part and a backward degeneration feedback part. The forward part generates the desired fusion result from the various observations; the backward degeneration feedback part considers the imaging degradation process and regenerates the observations inversely from the fusion result. The proposed network can effectively fuse not only the homogeneous but also the heterogeneous information. In addition, for the first time, a heterogeneous-integrated fusion framework is proposed to simultaneously merge the complementary heterogeneous spatial, spectral and temporal information of multi-source heterogeneous observations. The proposed heterogeneous-integrated framework also provides a uniform mode that can complete various fusion tasks, including heterogeneous spatio-spectral fusion, spatio-temporal fusion, and heterogeneous spatio-spectral-temporal fusion. Experiments are conducted for two challenging scenarios of land cover changes and thick cloud coverage. Images from many remote sensing satellites, including MODIS, Landsat-8, Sentinel-1, and Sentinel-2, are utilized in the experiments. Both qualitative and quantitative evaluations confirm the effectiveness of the proposed method.

医学相关(1篇)

【1】 The University of California San Francisco Preoperative Diffuse Glioma (UCSF-PDGM) MRI Dataset 标题:加州大学旧金山分校术前弥漫性胶质瘤(UCSF-PDGM)MRI数据集 链接:https://arxiv.org/abs/2109.00356

作者:Evan Calabrese,Javier Villanueva-Meyer,Jeffrey Rudie,Andreas Rauschecker,Ujjwal Baid,Spyridon Bakas,John Mongan,Christopher Hess,Soonmee Cha 机构:University of California San Francisco, Department of Radiology & Biomedical Imaging, University of Pennsylvania, Center for Biomedical Image Computing and Analytics (CBICA) 备注:6 pages, 1 figure, 2 tables 摘要:在这里,我们提出了加利福尼亚大学旧金山术前弥漫性胶质瘤MRI(UCSF PDGM)数据集。UCSF-PDGM数据集包括500名经组织病理学证实的弥漫性胶质瘤受试者,他们采用标准化的3特斯拉术前脑肿瘤MRI方案进行成像,主要采用3D成像,以及先进的扩散和灌注成像技术。该数据集还包括所有病例的异柠檬酸脱氢酶(IDH)突变状态以及世界卫生组织(WHO)III和IV级胶质瘤的O6-甲基鸟嘌呤DNA甲基转移酶(MGMT)启动子甲基化状态。UCSF-PDGM已经公开,希望世界各地的研究人员能够利用这些数据继续推动AI在弥漫性胶质瘤中的应用。 摘要:Here we present the University of California San Francisco Preoperative Diffuse Glioma MRI (UCSF-PDGM) dataset. The UCSF-PDGM dataset includes 500 subjects with histopathologically-proven diffuse gliomas who were imaged with a standardized 3 Tesla preoperative brain tumor MRI protocol featuring predominantly 3D imaging, as well as advanced diffusion and perfusion imaging techniques. The dataset also includes isocitrate dehydrogenase (IDH) mutation status for all cases and O6-methylguanine-DNA methyltransferase (MGMT) promotor methylation status for World Health Organization (WHO) grade III and IV gliomas. The UCSF-PDGM has been made publicly available in the hopes that researchers around the world will use these data to continue to push the boundaries of AI applications for diffuse gliomas.

GAN|对抗|攻击|生成相关(4篇)

【1】 Memory-Free Generative Replay For Class-Incremental Learning 标题:用于班级增量学习的无记忆生成性回放 链接:https://arxiv.org/abs/2109.00328

作者:Xiaomeng Xin,Yiran Zhong,Yunzhong Hou,Jinjun Wang,Liang Zheng 机构:† Xi’an Jiaotong University, ‡Australian National University 摘要:基于正则化的方法有助于缓解课堂增量学习中的灾难性遗忘问题。由于没有旧的任务图像,他们通常认为,如果分类器在新图像上产生类似的输出,则旧知识会得到很好的保留。在本文中,我们发现它们的有效性在很大程度上取决于旧类的性质:它们在容易区分的类上工作得很好,但在更细粒度的类上可能会失败,例如男孩和女孩。实际上,这些方法将新数据投射到完全连接层中的权重向量所跨越的特征空间中,对应于旧类。在细粒度的旧类上得到的预测结果是相似的,因此新分类器将逐渐失去对这些类的辨别能力。为了解决这个问题,我们提出了一种无内存生成重放策略,通过直接从旧分类器生成具有代表性的旧图像,并结合新数据进行新分类器训练,来保留细粒度的旧类特征。为了解决生成样本的均匀化问题,我们还提出了一种使生成样本之间的Kullback-Leibler(KL)散度最大化的分集损失。我们的方法是最好的补充先验正则化为基础的方法被证明是有效的易于区分的旧类。我们在CUB-200-2011、加州理工学院-101、CIFAR-100和Tiny ImageNet上验证了上述设计和见解,并表明我们的策略优于现有的无内存方法,具有明显的优势。代码可在https://github.com/xmengxin/MFGR 摘要:Regularization-based methods are beneficial to alleviate the catastrophic forgetting problem in class-incremental learning. With the absence of old task images, they often assume that old knowledge is well preserved if the classifier produces similar output on new images. In this paper, we find that their effectiveness largely depends on the nature of old classes: they work well on classes that are easily distinguishable between each other but may fail on more fine-grained ones, e.g., boy and girl. In spirit, such methods project new data onto the feature space spanned by the weight vectors in the fully connected layer, corresponding to old classes. The resulting projections would be similar on fine-grained old classes, and as a consequence the new classifier will gradually lose the discriminative ability on these classes. To address this issue, we propose a memory-free generative replay strategy to preserve the fine-grained old classes characteristics by generating representative old images directly from the old classifier and combined with new data for new classifier training. To solve the homogenization problem of the generated samples, we also propose a diversity loss that maximizes Kullback Leibler (KL) divergence between generated samples. Our method is best complemented by prior regularization-based methods proved to be effective for easily distinguishable old classes. We validate the above design and insights on CUB-200-2011, Caltech-101, CIFAR-100 and Tiny ImageNet and show that our strategy outperforms existing memory-free methods with a clear margin. Code is available at https://github.com/xmengxin/MFGR

【2】 Diverse Sample Generation: Pushing the Limit of Data-free Quantization 标题:多样化样本生成:突破无数据量化的极限 链接:https://arxiv.org/abs/2109.00212

作者:Haotong Qin,Yifu Ding,Xiangguo Zhang,Aoyu Li,Jiakai Wang,Xianglong Liu,Jiwen Lu 机构:And as shown inHaotong Qin is with the State Key Laboratory of Software Develop-ment Environment and Shen Yuan Honors College, Beihang University;Yifu Ding, and Xianglong Liuare with the State Key Laboratory of Software Development Environ-ment 摘要:最近,生成数据自由量化作为一种实用方法出现,它将神经网络压缩到较低的比特宽度,而不访问真实数据。它通过利用其全精度对应项的批量归一化(BN)统计信息生成数据以量化网络。然而,我们的研究表明,在实际应用中,完全受BN统计量约束的合成数据在分布和样本水平上遭受严重的均匀化,这导致量化网络的精度严重下降。本文提出了一种通用的多样本生成(DSG)方案,用于生成性无数据训练后量化和量化感知训练,以减轻有害的均匀化。在我们的DSG中,我们首先放松BN层中特征的统计对齐,以放松分布约束。然后,我们加强特定BN层对不同样本的损失影响,并在生成过程中抑制样本之间的相关性,分别从统计和空间角度使样本多样化。大量的实验表明,对于大规模的图像分类任务,我们的DSG可以在各种神经结构上始终优于现有的无数据量化方法,特别是在超低比特宽度下(例如,W4A4设置下22%的增益)。此外,由我们的DSG引起的数据多样化带来了各种量化方法的普遍收益,证明了多样性是用于无数据量化的高质量合成数据的一个重要特性。 摘要:Recently, generative data-free quantization emerges as a practical approach that compresses the neural network to low bit-width without access to real data. It generates data to quantize the network by utilizing the batch normalization (BN) statistics of its full-precision counterpart. However, our study shows that in practice, the synthetic data completely constrained by BN statistics suffers severe homogenization at distribution and sample level, which causes serious accuracy degradation of the quantized network. This paper presents a generic Diverse Sample Generation (DSG) scheme for the generative data-free post-training quantization and quantization-aware training, to mitigate the detrimental homogenization. In our DSG, we first slack the statistics alignment for features in the BN layer to relax the distribution constraint. Then we strengthen the loss impact of the specific BN layer for different samples and inhibit the correlation among samples in the generation process, to diversify samples from the statistical and spatial perspective, respectively. Extensive experiments show that for large-scale image classification tasks, our DSG can consistently outperform existing data-free quantization methods on various neural architectures, especially under ultra-low bit-width (e.g., 22% gain under W4A4 setting). Moreover, data diversifying caused by our DSG brings a general gain in various quantization methods, demonstrating diversity is an important property of high-quality synthetic data for data-free quantization.

【3】 Eyes Tell All: Irregular Pupil Shapes Reveal GAN-generated Faces 标题:眼睛说明一切:不规则的瞳孔形状显示出GaN生成的脸 链接:https://arxiv.org/abs/2109.00162

作者:Hui Guo,Shu Hu,Xin Wang,Ming-Ching Chang,Siwei Lyu 机构:University at Albany, SUNY, USA., University at Buffalo, SUNY, USA., Keya Medical, Settle, USA. 摘要:生成性对手网络(generativediscounternetwork,GAN)生成的高真实感人脸已被用作虚假社交媒体账户的个人资料图像,在视觉上很难区分真实人脸。在这项工作中,我们证明了GAN生成的人脸可以通过不规则的瞳孔形状曝光。这种现象是由于GAN模型中缺乏生理约束造成的。我们证明了这种伪影广泛存在于高质量GAN生成的人脸中,并进一步描述了一种从双眼提取瞳孔并分析其形状以暴露GAN生成的人脸的自动方法。定性和定量评估表明,该方法在识别人脸时简单有效。 摘要:Generative adversary network (GAN) generated high-realistic human faces have been used as profile images for fake social media accounts and are visually challenging to discern from real ones. In this work, we show that GAN-generated faces can be exposed via irregular pupil shapes. This phenomenon is caused by the lack of physiological constraints in the GAN models. We demonstrate that such artifacts exist widely in high-quality GAN-generated faces and further describe an automatic method to extract the pupils from two eyes and analysis their shapes for exposing the GAN-generated faces. Qualitative and quantitative evaluations of our method suggest its simplicity and effectiveness in distinguishing GAN-generated faces.

【4】 DPA: Learning Robust Physical Adversarial Camouflages for Object Detectors 标题:DPA:学习对象检测器的健壮物理对抗伪装 链接:https://arxiv.org/abs/2109.00124

作者:Yexin Duan,Jialin Chen,Xingyu Zhou,Junhua Zou,Zhengyun He,Wu Zhang,Zhisong Pan 机构:Army Engineering University of PLA, Nanjing, China, The ,th Research Institute of China Electronics Technology Group Corporation, Nanjing, China 备注:14 pages, 12 figures 摘要:对抗性攻击在现实世界中用于目标检测是可行的。然而,以前的大多数工作都试图学习应用于物体的“贴片”来愚弄探测器,而这些探测器在斜视视角下变得不那么有效甚至无效。为了解决这个问题,我们提出了密集攻击(DPA)来学习探测器的健壮、物理和有针对性的敌方伪装。迷彩非常坚固,因为在任意视点和不同照明条件下拍摄时,它们仍然具有对抗性;物理伪装是因为它们在3D虚拟场景和真实世界中都能很好地发挥作用;而目标伪装是因为它们会导致探测器将某个物体误识别为特定的目标类别。为了使生成的伪装在物理世界中具有鲁棒性,我们引入视点移动、照明和其他自然变换的组合来模拟物理现象。此外,为了改进攻击,DPA实质性地攻击固定区域中的所有分类。此外,我们还使用Unity仿真引擎构建了一个虚拟3D场景,以公平、重复地评估不同的物理攻击。大量实验表明,DPA的性能明显优于最新的方法,并能很好地推广到现实世界,对安全关键型计算机视觉系统构成潜在威胁。 摘要:Adversarial attacks are feasible in the real world for object detection. However, most of the previous works have tried to learn "patches" applied to an object to fool detectors, which become less effective or even ineffective in squint view angles. To address this issue, we propose the Dense Proposals Attack (DPA) to learn robust, physical and targeted adversarial camouflages for detectors. The camouflages are robust because they remain adversarial when filmed under arbitrary viewpoint and different illumination conditions, physical because they function well both in the 3D virtual scene and the real world, and targeted because they can cause detectors to misidentify an object as a specific target class. In order to make the generated camouflages robust in the physical world, we introduce a combination of viewpoint shifts, lighting and other natural transformations to model the physical phenomena. In addition, to improve the attacks, DPA substantially attacks all the classifications in the fixed region proposals. Moreover, we build a virtual 3D scene using the Unity simulation engine to fairly and reproducibly evaluate different physical attacks. Extensive experiments demonstrate that DPA outperforms the state-of-the-art methods significantly, and generalizes well to the real world, posing a potential threat to the security-critical computer vision systems.

人脸|人群计数(2篇)

【1】 Sparse to Dense Motion Transfer for Face Image Animation 标题:面向人脸图像动画的稀疏到密集运动传递 链接:https://arxiv.org/abs/2109.00471

作者:Ruiqi Zhao,Tianyi Wu,Guodong Guo 机构:Institute of Deep Learning, Baidu Research, Beijing, China, National Engineering Laboratory for Deep Learning Technology and Application, Beijing, China 备注:Accepted by ICCV 2021 Advances in Image Manipulation Workshop 摘要:人脸图像动画从单一的图像已经取得了显著的进展。然而,当只有稀疏的地标可用作驾驶信号时,它仍然具有挑战性。给定一幅源人脸图像和一系列稀疏的人脸标志,我们的目标是生成一个模仿标志运动的人脸视频。我们提出了一种从稀疏地标到人脸图像的运动传递的有效方法。然后,我们在一个统一的模型中结合全局和局部运动估计来忠实地传递运动。该模型可以学习从背景中分割运动前景,不仅生成全局运动,如人脸的旋转和平移,还可以生成细微的局部运动,如视线变化。我们进一步改进了视频中的人脸地标检测。利用时间上更好地对齐的地标序列进行训练,我们的方法可以生成具有更高视觉质量的时间上一致的视频。实验表明,我们在相同的身份测试中取得了与最先进的图像驱动方法相当的结果,在交叉身份测试中取得了更好的结果。 摘要:Face image animation from a single image has achieved remarkable progress. However, it remains challenging when only sparse landmarks are available as the driving signal. Given a source face image and a sequence of sparse face landmarks, our goal is to generate a video of the face imitating the motion of landmarks. We develop an efficient and effective method for motion transfer from sparse landmarks to the face image. We then combine global and local motion estimation in a unified model to faithfully transfer the motion. The model can learn to segment the moving foreground from the background and generate not only global motion, such as rotation and translation of the face, but also subtle local motion such as the gaze change. We further improve face landmark detection on videos. With temporally better aligned landmark sequences for training, our method can generate temporally coherent videos with higher visual quality. Experiments suggest we achieve results comparable to the state-of-the-art image driven method on the same identity testing and better results on cross identity testing.

【2】 Bio-inspired robot perception coupled with robot-modeled human perception 标题:仿生机器人感知与机器人建模的人类感知耦合 链接:https://arxiv.org/abs/2109.00097

作者:Tobias Fischer 备注:Paper accepted to the "Robotics: Science and Systems Pioneers Workshop 2021" 摘要:我的首要研究目标是为机器人提供感知能力,使其能够以类似人类的方式与人类互动。为了发展这些感知能力,我相信研究人类视觉系统的原理是有用的。我使用这些原理来开发新的计算机视觉算法,并在智能机器人系统中验证它们的有效性。我对这种方法很感兴趣,因为它提供了双重好处:揭示人类视觉系统固有的原理,以及将这些原理应用于其人工对应物。图1描述了我的研究。 摘要:My overarching research goal is to provide robots with perceptional abilities that allow interactions with humans in a human-like manner. To develop these perceptional abilities, I believe that it is useful to study the principles of the human visual system. I use these principles to develop new computer vision algorithms and validate their effectiveness in intelligent robotic systems. I am enthusiastic about this approach as it offers the dual benefit of uncovering principles inherent in the human visual system, as well as applying these principles to its artificial counterpart. Fig. 1 contains a depiction of my research.

图像视频检索|Re-id相关(1篇)

【1】 Efficient Person Search: An Anchor-Free Approach 标题:高效的人搜索:一种无锚点的方法 链接:https://arxiv.org/abs/2109.00211

作者:Yichao Yan,Jinpeng Li,Jie Qin,Shengcai Liao,Xiaokang Yang 机构:Yang, Fellow, IEEE 备注:arXiv admin note: substantial text overlap with arXiv:2103.11617 摘要:人物搜索的目的是从真实、未删减的图像中同时定位和识别查询人物。为了实现这一目标,最先进的模型通常会在两级探测器(如更快的R-CNN)上添加一个re id分支。由于ROI对齐操作,由于re id特征与相应的对象区域显式对齐,该管道产生了很好的精度,但同时,由于密集的对象锚,它引入了很高的计算开销。在这项工作中,我们通过引入以下专用设计,提出了一种无锚方法来有效地解决这一具有挑战性的任务。首先,我们选择一个无锚检测器(即FCOS)作为我们框架的原型。由于缺少密集的对象锚,与现有的人员搜索模型相比,它显示出显著更高的效率。其次,当直接使用这种无锚检测器进行人员搜索时,在学习健壮的re id特征方面存在几个主要挑战,我们将其总结为不同级别(即规模、区域和任务)的失调问题。为了解决这些问题,我们提出了一个对齐的特征聚合模块,以生成更具辨别力和鲁棒性的特征嵌入。因此,我们将我们的模型命名为功能对齐的个人搜索网络(AlignPS)。第三,通过研究基于锚定和无锚定模型的优点,我们进一步使用ROI-Align头部来增强AlignPS,这显著提高了re-id特征的鲁棒性,同时仍然保持了我们的模型的高效性。在两个具有挑战性的基准(即中大-系统和PRW)上进行的大量实验表明,我们的框架实现了最先进的或具有竞争力的性能,同时显示出更高的效率。所有源代码、数据和经过训练的模型均可从以下网址获得:https://github.com/daodaofr/alignps. 摘要:Person search aims to simultaneously localize and identify a query person from realistic, uncropped images. To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN. Owing to the ROI-Align operation, this pipeline yields promising accuracy as re-id features are explicitly aligned with the corresponding object regions, but in the meantime, it introduces high computational overhead due to dense object anchors. In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs. First, we select an anchor-free detector (i.e., FCOS) as the prototype of our framework. Due to the lack of dense object anchors, it exhibits significantly higher efficiency compared with existing person search models. Second, when directly accommodating this anchor-free detector for person search, there exist several major challenges in learning robust re-id features, which we summarize as the misalignment issues in different levels (i.e., scale, region, and task). To address these issues, we propose an aligned feature aggregation module to generate more discriminative and robust feature embeddings. Accordingly, we name our model as Feature-Aligned Person Search Network (AlignPS). Third, by investigating the advantages of both anchor-based and anchor-free models, we further augment AlignPS with an ROI-Align head, which significantly improves the robustness of re-id features while still keeping our model highly efficient. Extensive experiments conducted on two challenging benchmarks (i.e., CUHK-SYSU and PRW) demonstrate that our framework achieves state-of-the-art or competitive performance, while displaying higher efficiency. All the source codes, data, and trained models are available at: https://github.com/daodaofr/alignps.

点云|SLAM|雷达|激光|深度RGBD相关(3篇)

【1】 Point Cloud Pre-training by Mixing and Disentangling 标题:基于混合解缠的点云预训练 链接:https://arxiv.org/abs/2109.00452

作者:Chao Sun,Zhedong Zheng,Yi Yang 机构:andChao Sun is with School of Computer Science, Zhejiang University 摘要:大规模点云的注释仍然非常耗时,并且对于许多实际任务都不可用。点云预训练是一种潜在的解决方案,可用于获得可扩展的快速适应模型。因此,在本文中,我们研究了一种新的自监督学习方法,称为混合和分离(MD),用于点云预训练。顾名思义,我们将探索如何将原始点云从混合点云中分离出来,并利用这一具有挑战性的任务作为模型训练的借口优化目标。考虑到原始数据集中的有限训练数据远小于主流ImageNet,混合过程可以有效地生成更多高质量的样本。我们构建了一个基线网络来验证我们的直觉,它只包含两个模块,编码器和解码器。给定一个混合点云,首先对编码器进行预训练以提取语义嵌入。然后利用一个实例自适应解码器根据嵌入情况对点云进行分解。尽管简单,编码器在训练后固有地能够捕获点云关键点,并且可以通过预训练和微调范式快速适应下游任务,包括分类和分割。在两个数据集上的大量实验表明,编码器+ours(MD)显著优于从头开始训练的编码器,且收敛速度快。在消融研究中,我们进一步研究了每个成分的影响,并讨论了所提出的自监督学习策略的优点。我们希望这种在点云上的自监督学习尝试能够为减少深入学习的模型对大规模标记数据的依赖和节省大量注释成本铺平道路。 摘要:The annotation for large-scale point clouds is still time-consuming and unavailable for many real-world tasks. Point cloud pre-training is one potential solution for obtaining a scalable model for fast adaptation. Therefore, in this paper, we investigate a new self-supervised learning approach, called Mixing and Disentangling (MD), for point cloud pre-training. As the name implies, we explore how to separate the original point cloud from the mixed point cloud, and leverage this challenging task as a pretext optimization objective for model training. Considering the limited training data in the original dataset, which is much less than prevailing ImageNet, the mixing process can efficiently generate more high-quality samples. We build one baseline network to verify our intuition, which simply contains two modules, encoder and decoder. Given a mixed point cloud, the encoder is first pre-trained to extract the semantic embedding. Then an instance-adaptive decoder is harnessed to disentangle the point clouds according to the embedding. Albeit simple, the encoder is inherently able to capture the point cloud keypoints after training and can be fast adapted to downstream tasks including classification and segmentation by the pre-training and fine-tuning paradigm. Extensive experiments on two datasets show that the encoder + ours (MD) significantly surpasses that of the encoder trained from scratch and converges quickly. In ablation studies, we further study the effect of each component and discuss the advantages of the proposed self-supervised learning strategy. We hope this self-supervised learning attempt on point clouds can pave the way for reducing the deeply-learned model dependence on large-scale labeled data and saving a lot of annotation costs in the future.

【2】 You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors 标题:您只需假设一次:使用旋转等变描述符的点云配准 链接:https://arxiv.org/abs/2109.00182

作者:Haiping Wang,Yuan Liu,Zhen Dong,Wenping Wang,Bisheng Yang 机构:Wuhan University, The University of Hong Kong, Texas A&M University 备注:18 pages, 14 figures, 11 tables, Project page: this https URL 摘要:在本文中,我们提出了一种新的基于局部描述符的框架,称为You Only Pshotize Once(YOHO),用于两个未对齐的点云的配准。与大多数现有的局部描述子依赖于脆弱的局部参考框架来获得旋转不变性相比,该描述子通过最近的组等变特征学习技术实现了旋转不变性,从而对点密度和噪声具有更强的鲁棒性。同时,YOHO中的描述符也有一个旋转等变部分,这使我们能够仅从一个对应假设估计配准。这种性质减少了可行变换的搜索空间,从而大大提高了YOHO的精度和效率。大量实验表明,YOHO在四个广泛使用的数据集(3DMatch/3DLoMatch数据集、ETH数据集和WHU-TLS数据集)上以更少的RANSAC迭代次数实现了优异的性能。更多详细信息请参见我们的项目页面:https://hpwang-whu.github.io/YOHO/. 摘要:In this paper, we propose a novel local descriptor-based framework, called You Only Hypothesize Once (YOHO), for the registration of two unaligned point clouds. In contrast to most existing local descriptors which rely on a fragile local reference frame to gain rotation invariance, the proposed descriptor achieves the rotation invariance by recent technologies of group equivariant feature learning, which brings more robustness to point density and noise. Meanwhile, the descriptor in YOHO also has a rotation equivariant part, which enables us to estimate the registration from just one correspondence hypothesis. Such property reduces the searching space for feasible transformations, thus greatly improves both the accuracy and the efficiency of YOHO. Extensive experiments show that YOHO achieves superior performances with much fewer needed RANSAC iterations on four widely-used datasets, the 3DMatch/3DLoMatch datasets, the ETH dataset and the WHU-TLS dataset. More details are shown in our project page: https://hpwang-whu.github.io/YOHO/.

【3】 CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds 标题:CPFN:高分辨率点云的级联基元拟合网络 链接:https://arxiv.org/abs/2109.00113

作者:Eric-Tuan Lê,Minhyuk Sung,Duygu Ceylan,Radomir Mech,Tamy Boubekeur,Niloy J. Mitra 机构:University College London, KAIST, Adobe Research 备注:None 摘要:在计算机视觉和逆向工程中,将人造物体表示为基本原语的集合有着悠久的历史。在高分辨率点云扫描的情况下,挑战在于能够检测大型原语以及解释详细部分的原语。虽然经典的RANSAC方法需要特定于具体情况的参数调整,但最先进的网络受到其主干模块(如PointNet++)内存消耗的限制,因此无法检测到精细规模的原语。我们提出了级联原始拟合网络(CPFN),该网络依赖于自适应面片采样网络来汇集全局和局部原始检测网络的检测结果。作为一个关键的使能因素,我们提出了一个合并公式,可以在全局和局部范围内动态聚合原语。我们的评估表明,在高分辨率点云数据集上,CPFN将最先进的SPFN性能提高了13-14%,特别是将精细尺度基元的检测提高了20-22%。 摘要:Representing human-made objects as a collection of base primitives has a long history in computer vision and reverse engineering. In the case of high-resolution point cloud scans, the challenge is to be able to detect both large primitives as well as those explaining the detailed parts. While the classical RANSAC approach requires case-specific parameter tuning, state-of-the-art networks are limited by memory consumption of their backbone modules such as PointNet++, and hence fail to detect the fine-scale primitives. We present Cascaded Primitive Fitting Networks (CPFN) that relies on an adaptive patch sampling network to assemble detection results of global and local primitive detection networks. As a key enabler, we present a merging formulation that dynamically aggregates the primitives across global and local scales. Our evaluation demonstrates that CPFN improves the state-of-the-art SPFN performance by 13-14% on high-resolution point cloud datasets and specifically improves the detection of fine-scale primitives by 20-22%.

3D|3D重建等相关(2篇)

【1】 Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction 标题:3D中的常见对象:真实3D类别重构的大规模学习和评估 链接:https://arxiv.org/abs/2109.00512

作者:Jeremy Reizenstein,Roman Shapovalov,Philipp Henzler,Luca Sbordone,Patrick Labatut,David Novotny 机构:Facebook AI Research, University College London 备注:None 摘要:传统的学习3D对象类别的方法主要是在合成数据集上进行训练和评估,这是因为真实的3D注释类别中心数据不可用。我们的主要目标是通过收集与现有合成数据相似的真实数据,促进这一领域的进展。因此,这项工作的主要贡献是一个大型数据集,称为3D中的公共对象,具有对象类别的真实多视图图像,并使用摄影机姿势和地面真实3D点云进行注释。该数据集包含来自近19000个视频的150万帧,这些视频捕获了50个MS-COCO类别的对象,因此,就类别和对象的数量而言,该数据集远远大于备选数据集。我们利用这一新数据集对几种新的视图合成和以类别为中心的三维重建方法进行第一次大规模“野外”评估。最后,我们介绍了NerFormer——一种新的神经渲染方法,它利用强大的转换器在给定少量视图的情况下重建对象。CO3D数据集可在以下位置获得:https://github.com/facebookresearch/co3d . 摘要:Traditional approaches for learning 3D object categories have been predominantly trained and evaluated on synthetic datasets due to the unavailability of real 3D-annotated category-centric data. Our main goal is to facilitate advances in this field by collecting real-world data in a magnitude similar to the existing synthetic counterparts. The principal contribution of this work is thus a large-scale dataset, called Common Objects in 3D, with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds. The dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories and, as such, it is significantly larger than alternatives both in terms of the number of categories and objects. We exploit this new dataset to conduct one of the first large-scale "in-the-wild" evaluations of several new-view-synthesis and category-centric 3D reconstruction methods. Finally, we contribute NerFormer - a novel neural rendering method that leverages the powerful Transformer to reconstruct an object given a small number of its views. The CO3D dataset is available at https://github.com/facebookresearch/co3d .

【2】 DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension 标题:DensePose 3D:将关节对象的标准曲面贴图提升到三维 链接:https://arxiv.org/abs/2109.00033

作者:Roman Shapovalov,David Novotny,Benjamin Graham,Patrick Labatut,Andrea Vedaldi 机构:Facebook AI Research, Training images with ,D DensePose annotations, DP,D, Test-time monocular ,D mesh reconstruction 备注:Accepted for ICCV 2021 摘要:我们解决的问题是单目三维重建的关节对象,如人类和动物。我们贡献了DensePose 3D,这是一种只能从2D图像注释中以弱监督方式学习此类重建的方法。这与以前使用参数化模型(如SMPL)的可变形重建方法形成了鲜明对比,SMPL是在大型3D对象扫描数据集上预先训练的。因为它不需要3D扫描,DensePose 3D可用于学习各种关节类型,如不同的动物物种。该方法以端到端的方式学习将给定类别特定的3D模板网格软划分为刚性部分,以及预测部分运动的单目重建网络,以便它们正确地重新投影到对象的2D DensePose样曲面标注上。通过将部分分配表示为拉普拉斯-贝尔特拉米算子平滑特征函数的组合,将对象分解为多个部分,从而实现正则化。与最先进的非刚性结构相比,我们从人类和动物类别的合成和真实数据的运动基线显示了显著的改进。 摘要:We tackle the problem of monocular 3D reconstruction of articulated objects like humans and animals. We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only. This is in stark contrast with previous deformable reconstruction methods that use parametric models such as SMPL pre-trained on a large dataset of 3D object scans. Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species. The method learns, in an end-to-end fashion, a soft partition of a given category-specific 3D template mesh into rigid parts together with a monocular reconstruction network that predicts the part motions such that they reproject correctly onto 2D DensePose-like surface annotations of the object. The decomposition of the object into parts is regularized by expressing part assignments as a combination of the smooth eigenfunctions of the Laplace-Beltrami operator. We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.

其他神经网络|深度学习|模型|建模(4篇)

【1】 Towards Learning a Vocabulary of Visual Concepts and Operators using Deep Neural Networks 标题:基于深度神经网络的视觉概念和运算符词汇学习研究 链接:https://arxiv.org/abs/2109.00479

作者:Sunil Kumar Vengalil,Neelam Sinha 机构:International Institute of Information Technology, Bangalore, India 摘要:深度神经网络已成为图像和视频识别、分割以及其他图像和视频相关任务等许多应用的默认选择,这些模型的一个关键挑战是缺乏可解释性。生成可解释预测的这一要求促使研究界对经过训练的模型进行各种分析。在本研究中,我们使用MNIST图像分析训练模型的学习特征图,以获得更可解释的预测。我们的研究重点是推导一组基本元素,这里称为视觉概念,可用于从数据生成分布生成任意样本。我们从模型学习的特征映射中导出基本元素。我们通过使用MNIST图像训练的变分自动编码器生成视觉概念来说明这一想法。我们通过添加大约60,使用随机选择的视觉概念生成了000张新图像。通过这一点,我们能够将重建损失(均方误差)从初始值120(无增强)减少到60(有增强)。我们的方法是实现最终目标的第一步,最终目标是实现经过训练的深度神经网络模型,其预测,可以很好地解释隐藏层和学习过滤器中的功能。在生产中部署此类模型时,可以轻松修改以适应新数据,而现有的深度学习模型需要重新训练或微调。这个过程同样需要大量的数据样本,除非模型具有良好的解释性,否则很难生成这些样本。 摘要:Deep neural networks have become the default choice for many applications like image and video recognition, segmentation and other image and video related tasks.However, a critical challenge with these models is the lack of explainability.This requirement of generating explainable predictions has motivated the research community to perform various analysis on trained models.In this study, we analyze the learned feature maps of trained models using MNIST images for achieving more explainable predictions.Our study is focused on deriving a set of primitive elements, here called visual concepts, that can be used to generate any arbitrary sample from the data generating distribution.We derive the primitive elements from the feature maps learned by the model.We illustrate the idea by generating visual concepts from a Variational Autoencoder trained using MNIST images.We augment the training data of MNIST dataset by adding about 60,000 new images generated with visual concepts chosen at random.With this we were able to reduce the reconstruction loss (mean square error) from an initial value of 120 without augmentation to 60 with augmentation.Our approach is a first step towards the final goal of achieving trained deep neural network models whose predictions, features in hidden layers and the learned filters can be well explained.Such a model when deployed in production can easily be modified to adapt to new data, whereas existing deep learning models need a re training or fine tuning. This process again needs a huge number of data samples that are not easy to generate unless the model has good explainability.

【2】 A Protection Method of Trained CNN Model Using Feature Maps Transformed With Secret Key From Unauthorized Access 标题:一种基于密钥变换特征映射的训练CNN模型防止非授权访问的方法 链接:https://arxiv.org/abs/2109.00224

作者:MaungMaung AprilPyone,Hitoshi Kiya 机构:Tokyo Metropolitan University, Tokyo, Japan 备注:To appear in APSIPA 2021. arXiv admin note: text overlap with arXiv:2105.14756 摘要:本文提出了一种带密钥的卷积神经网络模型保护方法,使授权用户获得较高的分类准确率,而未授权用户获得较低的分类准确率。该方法将带密钥的分块变换应用于网络中的特征映射。当选择较大的密钥空间时,传统的基于密钥的模型保护方法无法保持较高的精度。相比之下,该方法不仅保持了与非保护精度几乎相同的精度,而且具有更大的密钥空间。在CIFAR-10数据集上进行了实验,结果表明,所提出的模型保护方法在分类精度、密钥空间以及对密钥估计攻击和微调攻击的鲁棒性方面优于以前的基于密钥的模型保护方法。 摘要:In this paper, we propose a model protection method for convolutional neural networks (CNNs) with a secret key so that authorized users get a high classification accuracy, and unauthorized users get a low classification accuracy. The proposed method applies a block-wise transformation with a secret key to feature maps in the network. Conventional key-based model protection methods cannot maintain a high accuracy when a large key space is selected. In contrast, the proposed method not only maintains almost the same accuracy as non-protected accuracy, but also has a larger key space. Experiments were carried out on the CIFAR-10 dataset, and results show that the proposed model protection method outperformed the previous key-based model protection methods in terms of classification accuracy, key space, and robustness against key estimation attacks and fine-tuning attacks.

【3】 Problem Learning: Towards the Free Will of Machines 标题:问题学习:走向机器的自由意志 链接:https://arxiv.org/abs/2109.00177

作者:Yongfeng Zhang 机构:Department of Computer Science, Rutgers University, New Brunswick, NJ 备注:17 pages, 1 figure 摘要:机器智能管道通常由六个部分组成:问题、表示、模型、损失、优化器和度量。研究人员一直在努力使管道的许多组件实现自动化。然而,管道的一个关键组成部分——问题定义——在自动化方面仍然没有得到充分的探索。通常,它需要领域专家的广泛努力来识别、定义和阐述某一领域的重要问题。然而,自动发现某个领域的研究或应用问题是有益的,因为它有助于识别隐藏在数据中的有效和潜在的重要问题,而这些问题是领域专家所不知道的,可以扩大我们在某个领域可以做的任务范围,甚至可以激发全新的发现。本文描述了问题学习,旨在学习从数据或机器与环境的交互中发现和定义有效的道德问题。我们将问题学习形式化为在问题空间中识别有效的道德问题,并介绍几种可能的问题学习方法。从广义上讲,问题学习是一种实现智能机器自由意志的方法。目前,机器仍然局限于解决人类定义的问题,没有能力或灵活性自由探索人类甚至未知的各种可能问题。尽管许多机器学习技术已经被开发并集成到智能系统中,但它们仍然关注于机器解决人类定义的问题的方法而不是目的。然而,提出好的问题有时甚至比解决问题更重要,因为一个好的问题有助于激发新的想法和获得更深的理解。本文还讨论了负责任人工智能背景下问题学习的伦理含义。 摘要:A machine intelligence pipeline usually consists of six components: problem, representation, model, loss, optimizer and metric. Researchers have worked hard trying to automate many components of the pipeline. However, one key component of the pipeline--problem definition--is still left mostly unexplored in terms of automation. Usually, it requires extensive efforts from domain experts to identify, define and formulate important problems in an area. However, automatically discovering research or application problems for an area is beneficial since it helps to identify valid and potentially important problems hidden in data that are unknown to domain experts, expand the scope of tasks that we can do in an area, and even inspire completely new findings. This paper describes Problem Learning, which aims at learning to discover and define valid and ethical problems from data or from the machine's interaction with the environment. We formalize problem learning as the identification of valid and ethical problems in a problem space and introduce several possible approaches to problem learning. In a broader sense, problem learning is an approach towards the free will of intelligent machines. Currently, machines are still limited to solving the problems defined by humans, without the ability or flexibility to freely explore various possible problems that are even unknown to humans. Though many machine learning techniques have been developed and integrated into intelligent systems, they still focus on the means rather than the purpose in that machines are still solving human defined problems. However, proposing good problems is sometimes even more important than solving problems, because a good problem can help to inspire new ideas and gain deeper understandings. The paper also discusses the ethical implications of problem learning under the background of Responsible AI.

【4】 Architecture Aware Latency Constrained Sparse Neural Networks 标题:架构感知延迟受限稀疏神经网络 链接:https://arxiv.org/abs/2109.00170

作者:Tianli Zhao,Qinghao Hu,Xiangyu He,Weixiang Xu,Jiaxing Wang,Cong Leng,Jian Cheng 摘要:深度神经网络的加速以满足特定的延迟约束对于其在移动设备上的部署至关重要。在本文中,我们设计了一个体系结构感知的延迟约束稀疏(ALCS)框架来修剪和加速CNN模型。考虑到现代移动计算体系结构,我们提出了单指令多数据(SIMD)结构的剪枝,以及一种新的稀疏卷积算法来提高计算效率。此外,我们建议使用分段线性插值来估计稀疏模型的运行时间。整个延迟约束剪枝任务是一个约束优化问题,可以用交替方向乘数法(ADMM)有效地求解。大量实验表明,我们的系统算法协同设计框架能够在资源受限的移动设备上实现更好的网络精度和延迟帕累托前沿。 摘要:Acceleration of deep neural networks to meet a specific latency constraint is essential for their deployment on mobile devices. In this paper, we design an architecture aware latency constrained sparse (ALCS) framework to prune and accelerate CNN models. Taking modern mobile computation architectures into consideration, we propose Single Instruction Multiple Data (SIMD)-structured pruning, along with a novel sparse convolution algorithm for efficient computation. Besides, we propose to estimate the run time of sparse models with piece-wise linear interpolation. The whole latency constrained pruning task is formulated as a constrained optimization problem that can be efficiently solved with Alternating Direction Method of Multipliers (ADMM). Extensive experiments show that our system-algorithm co-design framework can achieve much better Pareto frontier among network accuracy and latency on resource-constrained mobile devices.

其他(6篇)

【1】 EVReflex: Dense Time-to-Impact Prediction for Event-based Obstacle Avoidance 标题:EVReflex:基于事件避障的密集撞击时间预测 链接:https://arxiv.org/abs/2109.00405

作者:Celyn Walters,Simon Hadfield 机构:©, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including 备注:To be published in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2021 摘要:广泛的避障范围导致了多种基于计算机视觉的方法。尽管它很受欢迎,但它并不是一个已解决的问题。使用摄像机和深度传感器的传统计算机视觉技术通常关注静态场景,或者依赖于障碍物的先验信息。生物传感器的最新发展为动态场景提供了一个引人注目的选择。尽管与基于帧的传感器相比,这些传感器具有许多优势,例如高动态范围和时间分辨率,但基于事件的感知在很大程度上仍停留在二维上。这通常导致解决方案依赖于启发式和特定于特定任务。我们表明,事件和深度的融合克服了执行避障时每个单独模态的故障情况。我们提出的方法将事件摄影机和激光雷达流结合起来,在不事先了解场景几何体或障碍物的情况下估计碰撞时间。此外,我们还发布了一个广泛的基于事件的数据集,其中包含六个视觉流,跨越700多个扫描场景。 摘要:The broad scope of obstacle avoidance has led to many kinds of computer vision-based approaches. Despite its popularity, it is not a solved problem. Traditional computer vision techniques using cameras and depth sensors often focus on static scenes, or rely on priors about the obstacles. Recent developments in bio-inspired sensors present event cameras as a compelling choice for dynamic scenes. Although these sensors have many advantages over their frame-based counterparts, such as high dynamic range and temporal resolution, event-based perception has largely remained in 2D. This often leads to solutions reliant on heuristics and specific to a particular task. We show that the fusion of events and depth overcomes the failure cases of each individual modality when performing obstacle avoidance. Our proposed approach unifies event camera and lidar streams to estimate metric time-to-impact without prior knowledge of the scene geometry or obstacles. In addition, we release an extensive event-based dataset with six visual streams spanning over 700 scanned scenes.

【2】 Seeing Implicit Neural Representations as Fourier Series 标题:将隐式神经表示视为傅立叶级数 链接:https://arxiv.org/abs/2109.00249

作者:Nuri Benbarka,Timon Höfer,Hamd ul-moqeet Riaz,Andreas Zell 机构:University of T¨ubingen, Wilhelm-Schickard-Institute for Computer Science, Sand , T¨ubingen 摘要:隐式神经表示(INR)使用多层感知器来表示低维问题域中的高频函数。最近,这些表示在与复杂3D对象和场景相关的任务上取得了最新成果。一个核心问题是高度详细信号的表示,这是使用具有周期激活函数(警报器)的网络或对输入应用傅里叶映射来解决的。这项工作分析了这两种方法之间的联系,并表明傅里叶映射感知器在结构上类似于一个隐藏层警报器。此外,我们确定了先前提出的傅里叶映射与一般d维傅里叶级数之间的关系,从而得到了整数格映射。此外,我们还修改了一种渐进式训练策略来处理任意傅里叶映射,并表明它提高了插值任务的泛化能力。最后,我们比较了图像回归和新视图合成任务上的不同映射。我们证实了先前的发现,即影响映射性能的主要因素是嵌入的大小及其元素的标准偏差。 摘要:Implicit Neural Representations (INR) use multilayer perceptrons to represent high-frequency functions in low-dimensional problem domains. Recently these representations achieved state-of-the-art results on tasks related to complex 3D objects and scenes. A core problem is the representation of highly detailed signals, which is tackled using networks with periodic activation functions (SIRENs) or applying Fourier mappings to the input. This work analyzes the connection between the two methods and shows that a Fourier mapped perceptron is structurally like one hidden layer SIREN. Furthermore, we identify the relationship between the previously proposed Fourier mapping and the general d-dimensional Fourier series, leading to an integer lattice mapping. Moreover, we modify a progressive training strategy to work on arbitrary Fourier mappings and show that it improves the generalization of the interpolation task. Lastly, we compare the different mappings on the image regression and novel view synthesis tasks. We confirm the previous finding that the main contributor to the mapping performance is the size of the embedding and standard deviation of its elements.

【3】 Perceptually Optimized Deep High-Dynamic-Range Image Tone Mapping 标题:感知优化的深度高动态范围图像色调映射 链接:https://arxiv.org/abs/2109.00180

作者:Chenyang Le,Jiebin Yan,Yuming Fang,Kede Ma 机构:School of Information Management, Jiangxi University of Finance and Economics, Nanchang, China, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong 摘要:我们描述了一个深度高动态范围(HDR)图像色调映射算子,该算子在计算效率和感知上都得到了优化。我们首先将HDR图像分解为一个归一化的{Laplacian}金字塔,并使用两个深度神经网络(DNN)从归一化表示中估计所需色调映射图像的{Laplacian}金字塔。然后,我们通过最小化最近提出的感知度量标准{Laplacian}金字塔距离(NLPD),在HDR图像数据库上对整个方法进行端到端优化。定性和定量实验表明,我们的方法生成的图像具有更好的视觉质量,并且在现有的局部色调映射算法中运行最快。 摘要:We describe a deep high-dynamic-range (HDR) image tone mapping operator that is computationally efficient and perceptually optimized. We first decompose an HDR image into a normalized {Laplacian} pyramid, and use two deep neural networks (DNNs) to estimate the {Laplacian} pyramid of the desired tone-mapped image from the normalized representation. We then end-to-end optimize the entire method over a database of HDR images by minimizing the normalized {Laplacian} pyramid distance (NLPD), a recently proposed perceptual metric. Qualitative and quantitative experiments demonstrate that our method produces images with better visual quality, and runs the fastest among existing local tone mapping algorithms.

【4】 Implicit Behavioral Cloning 标题:隐式行为克隆 链接:https://arxiv.org/abs/2109.00137

作者:Pete Florence,Corey Lynch,Andy Zeng,Oscar Ramirez,Ayzaan Wahid,Laura Downs,Adrian Wong,Johnny Lee,Igor Mordatch,Jonathan Tompson 机构:Robotics at Google 摘要:我们发现,在广泛的机器人策略学习场景中,使用隐式模型处理有监督的策略学习通常比常用的显式模型表现得更好。我们对这一发现进行了广泛的实验,并提供了直观的见解和理论依据,将隐式模型的特性与显式模型的特性相比较,特别是在逼近复杂、潜在不连续和多值(集值)函数方面。在机器人策略学习任务中,我们发现基于能量模型(EBM)的隐式行为克隆策略通常优于常见的显式(均方误差或混合密度)行为克隆策略,包括在具有高维动作空间和视觉图像输入的任务中。我们发现这些策略在D4RL基准测试套件中具有挑战性的人工专家任务上提供了有竞争力的结果或优于最先进的离线强化学习方法,尽管没有使用奖励信息。在现实世界中,具有隐式策略的机器人可以从人类演示中学习接触丰富的任务的复杂和非常微妙的行为,包括具有高度组合复杂性的任务和需要1毫米精度的任务。 摘要:We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.

【5】 Working Memory Connections for LSTM 标题:LSTM的工作内存连接 链接:https://arxiv.org/abs/2109.00020

作者:Federico Landi,Lorenzo Baraldi,Marcella Cornia,Rita Cucchiara 机构:Department of Engineering “Enzo Ferrari”, University of Modena and Reggio Emilia, Modena, Italy 备注:Accepted for publication in Neural Networks 摘要:长短时记忆递归神经网络(LSTM)在学习长期依赖关系时,利用门控机制来缓解梯度的爆炸和消失。因此,LSTM和其他门控RNN被广泛采用,成为许多序列建模任务的事实标准。尽管LSTM内的存储单元包含基本信息,但不允许直接影响选通机制。在这项工作中,我们通过包含来自内部单元状态的信息来提高门电位。所提议的修改名为“工作记忆连接”,包括在网络门中添加一个可学习的单元内容非线性投影。这种修改可以适用于经典的LSTM门,而无需对底层任务进行任何假设,在处理较长序列时尤其有效。以前在这方面的研究工作可以追溯到21世纪初,与香草LSTM相比,无法带来一致的改进。作为本文的一部分,我们确定了一个与以前的连接相关的关键问题,该问题严重限制了它们的有效性,从而阻止了来自内部单元状态的知识的成功集成。我们通过广泛的实验评估表明,工作记忆连接不断提高LSTM在各种任务上的性能。数值结果表明,单元状态包含有用的信息,值得包含在栅极结构中。 摘要:Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

【6】 A survey on IQA 标题:一份关于室内空气质量的调查报告 链接:https://arxiv.org/abs/2109.00347

作者:Lanjiang. Wang 机构:A Survey on Image Quality AssessmentLanjiang WangUniversity of Electronic Science and Technology of ChinaAbstractImage quality assessment(IQA) is of increasing importance for image-based appli-cations 摘要:图像质量评估(IQA)对于基于图像的应用越来越重要。它的目的是建立一个模型,可以取代人类准确地评估图像质量。根据参考图像是否完整和可用,图像质量评估可分为三类:完全参考(FR)、减少参考(RR)和非参考(NR)图像质量评估。由于深度学习的蓬勃发展和研究者的广泛关注,近年来提出了几种基于深度学习的非参考图像质量评估方法,有些方法的性能已经超过了简化参考甚至全参考图像质量评估模型。本文将回顾图像质量评估和视频质量评估的概念和指标,简要介绍全参考和半参考图像质量评估的一些方法,重点介绍基于深度学习的非参考图像质量评估方法。然后介绍了常用的综合数据库和现实数据库。最后,总结并提出挑战。 摘要:Image quality assessment(IQA) is of increasing importance for image-based applications. Its purpose is to establish a model that can replace humans for accurately evaluating image quality. According to whether the reference image is complete and available, image quality evaluation can be divided into three categories: full-reference(FR), reduced-reference(RR), and non-reference(NR) image quality assessment. Due to the vigorous development of deep learning and the widespread attention of researchers, several non-reference image quality assessment methods based on deep learning have been proposed in recent years, and some have exceeded the performance of reduced -reference or even full-reference image quality assessment models. This article will review the concepts and metrics of image quality assessment and also video quality assessment, briefly introduce some methods of full-reference and semi-reference image quality assessment, and focus on the non-reference image quality assessment methods based on deep learning. Then introduce the commonly used synthetic database and real-world database. Finally, summarize and present challenges.

机器翻译,仅供参考

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2021-09-02,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
图像处理
图像处理基于腾讯云深度学习等人工智能技术,提供综合性的图像优化处理服务,包括图像质量评估、图像清晰度增强、图像智能裁剪等。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档