访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问
cs.CV 方向,今日共计60篇
Transformer(5篇)
【1】 CMT: Convolutional Neural Networks Meet Vision Transformers 标题:CMT:卷积神经网络迎接视觉变形
作者:Jianyuan Guo,Kai Han,Han Wu,Chang Xu,Yehui Tang,Chunjing Xu,Yunhe Wang 机构:Noah’s Ark Lab, Huawei Technologies., School of Computer Science, University of Sydney. 链接:https://arxiv.org/abs/2107.06263 摘要:由于视觉变换器能够捕捉图像中的长距离相关性,因此它已经成功地应用于图像识别任务中。然而,Transformer与现有的卷积神经网络(CNNs)在性能和计算成本上仍有差距。在本文中,我们的目标是解决这个问题,并发展一个网络,不仅可以超越规范Transformer,但也高性能卷积模型。我们提出了一种新的基于Transformer的混合网络,它利用Transformer来捕获长距离的依赖关系,利用cnn来建模局部特征。此外,我们对其进行了扩展,得到了一系列称为CMTs的模型,与以前的卷积和基于Transformer的模型相比,获得了更好的精度和效率。特别是,我们的CMT-S在ImageNet上实现了83.5%的top-1精度,而在FLOPs上则分别比现有的DeiT和EfficientNet小14倍和2倍。提出的CMT-S在CIFAR10(99.2%)、CIFAR100(91.7%)、Flowers(98.7%)和COCO(44.3%mAP)等具有挑战性的视觉数据集上也有很好的通用性,且计算量相当小。 摘要:Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image. However, there are still gaps in both performance and computational cost between transformers and existing convolutional neural networks (CNNs). In this paper, we aim to address this issue and develop a network that can outperform not only the canonical transformers, but also the high-performance convolutional models. We propose a new transformer based hybrid network by taking advantage of transformers to capture long-range dependencies, and of CNNs to model local features. Furthermore, we scale it to obtain a family of models, called CMTs, obtaining much better accuracy and efficiency than previous convolution and transformer based models. In particular, our CMT-S achieves 83.5% top-1 accuracy on ImageNet, while being 14x and 2x smaller on FLOPs than the existing DeiT and EfficientNet, respectively. The proposed CMT-S also generalizes well on CIFAR10 (99.2%), CIFAR100 (91.7%), Flowers (98.7%), and other challenging vision datasets such as COCO (44.3% mAP), with considerably less computational cost.
【2】 HAT: Hierarchical Aggregation Transformers for Person Re-identification 标题:HAT:用于人员重新识别的分层聚合转换器
作者:Guowen Zhang,Pingping Zhang,Jinqing Qi,Huchuan Lu 机构:Dalian University of Technology, Dalian, Liaoning, China 备注:This work has been accepted by ACM International Conference on Multimedia 2021.To our best knowledge, this work is the very first to take advantages of both CNNs and Transformers for image-based person Re-ID 链接:https://arxiv.org/abs/2107.05946 摘要:近年来,随着深度卷积神经网络(CNNs)的发展,人员再识别(Re-ID)在各种应用中取得了巨大的成功。然而,由于CNNs的感受野有限,在非重叠摄像机下,如何在全局视野中提取出有区别的表征仍然是一个挑战。同时,Transformers对空间和序列数据的长期依赖性建模能力很强。本文综合CNNs和Transformers的优点,提出了一种新的基于图像的高性能person-Re-ID学习框架HAT。为了实现这一目标,我们首先提出了一种深度监督聚合(DSA)方法来循环聚合CNN主干的层次特征。DSA通过多粒度监控,增强了多尺度特征,实现了不同于以往方法的人物检索。然后,我们引入了一种基于变换的特征校正(TFC)方法来整合低层细节信息,作为高层语义信息的全局先验。提出的TFC被插入到每个层次的特征中,从而大大提高了性能。实验结果表明,本文提出的几种基于Re-ns的图像识别方法比基于cnid的大规模图像识别方法具有更好的性能。代码发布于https://github.com/AI-Zhpp/HAT. 摘要:Recently, with the advance of deep Convolutional Neural Networks (CNNs), person Re-Identification (Re-ID) has witnessed great success in various applications. However, with limited receptive fields of CNNs, it is still challenging to extract discriminative representations in a global view for persons under non-overlapped cameras. Meanwhile, Transformers demonstrate strong abilities of modeling long-range dependencies for spatial and sequential data. In this work, we take advantages of both CNNs and Transformers, and propose a novel learning framework named Hierarchical Aggregation Transformer (HAT) for image-based person Re-ID with high performance. To achieve this goal, we first propose a Deeply Supervised Aggregation (DSA) to recurrently aggregate hierarchical features from CNN backbones. With multi-granularity supervisions, the DSA can enhance multi-scale features for person retrieval, which is very different from previous methods. Then, we introduce a Transformer-based Feature Calibration (TFC) to integrate low-level detail information as the global prior for high-level semantic information. The proposed TFC is inserted to each level of hierarchical features, resulting in great performance improvements. To our best knowledge, this work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID. Comprehensive experiments on four large-scale Re-ID benchmarks demonstrate that our method shows better results than several state-of-the-art methods. The code is released at https://github.com/AI-Zhpp/HAT.
【3】 ST-DETR: Spatio-Temporal Object Traces Attention Detection Transformer 标题:ST-DETR:时空对象跟踪注意检测转换器
作者:Eslam Mohamed,Ahmad El-Sallab 机构:Senior Machine Learning Engineer, Valeo R&D Cairo, Egypt, Ahmad El Sallab, Senior Expert AISenior Chief Engineer 备注:arXiv admin note: substantial text overlap with arXiv:2106.11401 链接:https://arxiv.org/abs/2107.05887 摘要:提出了一种基于时空变换器的时空帧目标检测体系结构ST-DETR。我们将时间帧视为空间和时间上的序列,并采用完全注意机制来利用两个维度上的特征相关性。这种处理方法使我们能够处理帧序列作为时间对象特征跟踪空间中的每个位置。我们探讨了两种可能的方法;早期的空间特征在时间维度上聚集,后期的对象查询空间特征在时间维度上聚集。此外,我们提出了一种新的时间位置嵌入技术来编码时间序列信息。为了评估我们的方法,我们选择了运动目标检测(MOD)任务,因为它是展示时间维度重要性的完美候选。结果表明,在KITTI-MOD数据集上,与一步空间基线相比,mAP显著提高了5%。 摘要:We propose ST-DETR, a Spatio-Temporal Transformer-based architecture for object detection from a sequence of temporal frames. We treat the temporal frames as sequences in both space and time and employ the full attention mechanisms to take advantage of the features correlations over both dimensions. This treatment enables us to deal with frames sequence as temporal object features traces over every location in the space. We explore two possible approaches; the early spatial features aggregation over the temporal dimension, and the late temporal aggregation of object query spatial features. Moreover, we propose a novel Temporal Positional Embedding technique to encode the time sequence information. To evaluate our approach, we choose the Moving Object Detection (MOD)task, since it is a perfect candidate to showcase the importance of the temporal dimension. Results show a significant 5% mAP improvement on the KITTI MOD dataset over the 1-step spatial baseline.
【4】 Visual Parser: Representing Part-whole Hierarchies with Transformers 标题:可视化解析器:用Transformers表示部分-整体层次结构
作者:Shuyang Sun*,Xiaoyu Yue*,Song Bai,Philip Torr 机构:†University of Oxford, ‡ByteDance AI Lab 链接:https://arxiv.org/abs/2107.05790 摘要:人类视觉能够从整个场景中获取部分或整体的层次信息。本文介绍了可视化解析器(ViP),它用Transformer显式地构造这样一个层次结构。ViP将视觉表征分为两个层次:局部层次和整体层次。每个部分的信息表示整体中几个独立向量的组合。为了对这两个层次的表示进行建模,我们首先通过注意机制将整体信息编码为部分向量,然后将部分向量中的全局信息解码为整体表示。通过对两个层次的编码-解码交互进行迭代分析,该模型可以逐步细化两个层次的特征。实验结果表明,ViP在分类、检测和实例分割三个主要任务上都具有很强的竞争力。特别是在目标检测方面,它可以大大超过以前最先进的CNN主干网。ViP系列的微型型号的参数减少了7.2美元,翻牌次数减少了10.9美元,其性能可与最大型号的ResNeXt-101-64美元乘以4d的ResNe(X)t系列相媲美。可视化结果还表明,学习到的部分对预测类具有很高的信息量,使得ViP比以前的基础架构更易于解释。代码位于https://github.com/kevin-ssy/ViP. 摘要:Human vision is able to capture the part-whole hierarchical information from the entire scene. This paper presents the Visual Parser (ViP) that explicitly constructs such a hierarchy with transformers. ViP divides visual representations into two levels, the part level and the whole level. Information of each part represents a combination of several independent vectors within the whole. To model the representations of the two levels, we first encode the information from the whole into part vectors through an attention mechanism, then decode the global information within the part vectors back into the whole representation. By iteratively parsing the two levels with the proposed encoder-decoder interaction, the model can gradually refine the features on both levels. Experimental results demonstrate that ViP can achieve very competitive performance on three major tasks e.g. classification, detection and instance segmentation. In particular, it can surpass the previous state-of-the-art CNN backbones by a large margin on object detection. The tiny model of the ViP family with $7.2\times$ fewer parameters and $10.9\times$ fewer FLOPS can perform comparably with the largest model ResNeXt-101-64$\times$4d of ResNe(X)t family. Visualization results also demonstrate that the learnt parts are highly informative of the predicting class, making ViP more explainable than previous fundamental architectures. Code is available at https://github.com/kevin-ssy/ViP.
【5】 Combiner: Full Attention Transformer with Sparse Computation Cost 标题:合并器:具有稀疏计算成本的全注意力Transformer
作者:Hongyu Ren,Hanjun Dai,Zihang Dai,Mengjiao Yang,Jure Leskovec,Dale Schuurmans,Bo Dai 机构:‡University of Alberta 链接:https://arxiv.org/abs/2107.05768 摘要:Transformers提供了一类对序列建模非常有效的表达性架构。然而,transformers的关键限制是其二次内存和时间复杂度$\mathcal{O}(L^2)$相对于注意层中的序列长度,这限制了它在超长序列中的应用。大多数现有的方法都利用注意力矩阵中的稀疏性或低秩假设来降低成本,但牺牲了表达能力。相反,我们提出了组合器,它在保持低计算和内存复杂度的同时,在每个注意头中提供完全的注意能力。其核心思想是将自我注意机制视为每个位置嵌入的条件期望,并用结构化因子分解近似条件分布。每个位置都可以关注所有其他位置,或者通过直接关注,或者通过间接关注抽象,这些抽象又是来自相应局部区域的嵌入的条件期望。我们表明,现有稀疏变换器中使用的大多数稀疏注意模式都能够激发这种分解的设计以获得充分的注意,从而产生相同的次二次代价($\mathcal{O}(L\log(L))$或$\mathcal{O}(L\sqrt{L})$)。Combiner是现有Transformer中注意层的一个替代品,可以很容易地在通用框架中实现。对自回归和双向序列任务的实验评估表明了该方法的有效性,在多个图像和文本建模任务中得到了最新的结果。 摘要:Transformers provide a class of expressive architectures that are extremely effective for sequence modeling. However, the key limitation of transformers is their quadratic memory and time complexity $\mathcal{O}(L^2)$ with respect to the sequence length in attention layers, which restricts application in extremely long sequences. Most existing approaches leverage sparsity or low-rank assumptions in the attention matrix to reduce cost, but sacrifice expressiveness. Instead, we propose Combiner, which provides full attention capability in each attention head while maintaining low computation and memory complexity. The key idea is to treat the self-attention mechanism as a conditional expectation over embeddings at each location, and approximate the conditional distribution with a structured factorization. Each location can attend to all other locations, either via direct attention, or through indirect attention to abstractions, which are again conditional expectations of embeddings from corresponding local regions. We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention, resulting in the same sub-quadratic cost ($\mathcal{O}(L\log(L))$ or $\mathcal{O}(L\sqrt{L})$). Combiner is a drop-in replacement for attention layers in existing transformers and can be easily implemented in common frameworks. An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach, yielding state-of-the-art results on several image and text modeling tasks.
检测相关(5篇)
【1】 Deep learning approaches to Earth Observation change detection 标题:深度学习方法在对地观测变化检测中的应用
作者:Antonio Di Pilato,Nicolò Taggio,Alexis Pompili,Michele Iacobellis,Adriano Di Florio,Davide Passarelli,Sergio Samarelli 机构:Dipartimento Interateneo di Fisica, Università degli Studi di Bari, Bari, Italy, Planetek Italia, Politecnico di Bari 链接:https://arxiv.org/abs/2107.06132 摘要:在过去的几年里,人们对遥感领域中的变化检测越来越感兴趣。搜索卫星图像的变化有许多有用的应用,从土地覆盖和土地利用分析到异常检测。特别是,通过多年的观测,城市变化检测为研究城市蔓延和发展提供了一个有效的工具。同时,变化检测往往是一项具有计算挑战性和耗时的任务,这就需要创新的方法来保证在合理的时间内以无可置疑的价值获得最佳结果。在本文中,我们提出了两种不同的变化检测方法(语义分割和分类),这两种方法都利用卷积神经网络来获得良好的结果,可以进一步细化并用于各种应用的后处理工作流。 摘要:The interest for change detection in the field of remote sensing has increased in the last few years. Searching for changes in satellite images has many useful applications, ranging from land cover and land use analysis to anomaly detection. In particular, urban change detection provides an efficient tool to study urban spread and growth through several years of observation. At the same time, change detection is often a computationally challenging and time-consuming task, which requires innovative methods to guarantee optimal results with unquestionable value and within reasonable time. In this paper we present two different approaches to change detection (semantic segmentation and classification) that both exploit convolutional neural networks to achieve good results, which can be further refined and used in a post-processing workflow for a large variety of applications.
【2】 Bidirectional Regression for Arbitrary-Shaped Text Detection 标题:双向回归在任意形状文本检测中的应用
作者:Tao Sheng,Zhouhui Lian 机构:Wangxuan Institute of Computer Technology, Peking University, Beijing, China 备注:Accepted at ICDAR 2021, 15 pages 链接:https://arxiv.org/abs/2107.06129 摘要:随着深度学习算法的普及,任意形状文本的检测越来越受到人们的关注,并得到了迅速的发展。然而,现有的方法往往会得到不准确的检测结果,这主要是由于相对较弱的能力,利用上下文信息和不适当的偏移参考选择。该文提出了一种新的文本实例表达方法,它将前景和背景信息整合到流水线中,并以文本边界附近的像素作为偏移量的起点。此外,还设计了相应的后处理算法,将四种预测结果依次组合,准确地重构文本实例。我们在几个具有挑战性的场景文本基准上评估了我们的方法,包括曲线和多方向文本数据集。实验结果表明,该方法与其他最新的方法相比,具有更好的性能,例如,总文本的F值为83.4%,MSRA-TD500的F值为82.4%,等等。 摘要:Arbitrary-shaped text detection has recently attracted increasing interests and witnessed rapid development with the popularity of deep learning algorithms. Nevertheless, existing approaches often obtain inaccurate detection results, mainly due to the relatively weak ability to utilize context information and the inappropriate choice of offset references. This paper presents a novel text instance expression which integrates both foreground and background information into the pipeline, and naturally uses the pixels near text boundaries as the offset starts. Besides, a corresponding post-processing algorithm is also designed to sequentially combine the four prediction results and reconstruct the text instance accurately. We evaluate our method on several challenging scene text benchmarks, including both curved and multi-oriented text datasets. Experimental results demonstrate that the proposed approach obtains superior or competitive performance compared to other state-of-the-art methods, e.g., 83.4% F-score for Total-Text, 82.4% F-score for MSRA-TD500, etc.
【3】 CentripetalText: An Efficient Text Instance Representation for Scene Text Detection 标题:CentripetalText:一种用于场景文本检测的高效文本实例表示
作者:Tao Sheng,Jie Chen,Zhouhui Lian 机构:Wangxuan Institute of Computer Technology, Peking University, Beijing, China 链接:https://arxiv.org/abs/2107.05945 摘要:由于文本曲率、方向和纵横比的变化,场景文本检测仍然是一个巨大的挑战。最棘手的问题之一是如何表示任意形状的文本实例。尽管已经提出了许多先进的方法来灵活地对不规则文本进行建模,但大多数方法都失去了简单性和鲁棒性。其复杂的后处理和Dirac-delta分布下的回归影响了检测性能和泛化能力。本文提出了一种有效的文本实例表示方法,称为向心文本(CT),它将文本实例分解为文本核和向心移位的组合。具体来说,我们利用向心位移来实现像素聚集,将外部文本像素引导到内部文本内核。松弛操作被集成到向心位移的稠密回归中,允许在一个范围而不是特定值内进行正确预测。该方法具有文本轮廓重建方便、预测误差容忍等特点,分别保证了较高的检测精度和较快的推理速度。此外,我们将文本检测器缩小为一个建议生成模块,即向心文本建议网络(CentripetalText proposal Network,CPN),取代Mask textspotterv3中的SPN,生成更精确的建议。为了验证我们设计的有效性,我们在几种常用的场景文本基准上进行了实验,包括曲线和多方向文本数据集。对于场景文本检测任务,我们的方法与现有的其他方法相比,取得了更好的或有竞争力的性能,例如,在40.0fps时,总文本的F-measure为86.3%,在MSRA-TD500上,34.8fps时,F-measure为86.1%,等等。对于端到端的场景文本识别任务,我们比Mask TextSpotter v3在总文本上的性能提高了1.1%。 摘要:Scene text detection remains a grand challenge due to the variation in text curvatures, orientations, and aspect ratios. One of the most intractable problems is how to represent text instances of arbitrary shapes. Although many state-of-the-art methods have been proposed to model irregular texts in a flexible manner, most of them lose simplicity and robustness. Their complicated post-processings and the regression under Dirac delta distribution undermine the detection performance and the generalization ability. In this paper, we propose an efficient text instance representation named CentripetalText (CT), which decomposes text instances into the combination of text kernels and centripetal shifts. Specifically, we utilize the centripetal shifts to implement the pixel aggregation, which guide the external text pixels to the internal text kernels. The relaxation operation is integrated into the dense regression for centripetal shifts, allowing the correct prediction in a range, not a specific value. The convenient reconstruction of the text contours and the tolerance of the prediction errors in our method guarantee the high detection accuracy and the fast inference speed respectively. Besides, we shrink our text detector into a proposal generation module, namely CentripetalText Proposal Network (CPN), replacing SPN in Mask TextSpotter v3 and producing more accurate proposals. To validate the effectiveness of our designs, we conduct experiments on several commonly used scene text benchmarks, including both curved and multi-oriented text datasets. For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods, e.g., F-measure of 86.3% at 40.0 FPS on Total-Text, F-measure of 86.1% at 34.8 FPS on MSRA-TD500, etc. For the task of end-to-end scene text recognition, we outperform Mask TextSpotter v3 by 1.1% on Total-Text.
【4】 Automatic Seizure Detection Using the Pulse Transit Time 标题:基于脉冲传输时间的癫痫自动检测
作者:Eric Fiege,Salima Houta,Pinar Bisgin,Rainer Surges,Falk Howar 机构:Fraunhofer Institute for Software and, Systems Engineering, Dortmund, Germany, Department of Epileptology, University of Bonn Medical Center, Bonn, Germany, TU Dortmund University 链接:https://arxiv.org/abs/2107.05894 摘要:癫痫发作的记录在计划药物治疗中起着至关重要的作用。癫痫发作自动检测的解决方案可以帮助改善目前癫痫发作人工记录不完整和错误的问题。近年来,许多可穿戴传感器已经为此进行了测试。然而,发现有轻微症状的癫痫仍然很困难,目前的解决方案往往有很高的误报率。癫痫发作也会影响病人的动脉血压,目前还没有研究用传感器来检测。脉搏通过时间(PTT)提供了动脉血压的无创估计。它可以通过使用两个传感器来测量脉冲波到达之间的时间差来获得。由于分离的时间芯片,时钟漂移出现,这是强烈影响PTT。在这项工作中,我们提出了一种算法,该算法能够响应PTT的变化,考虑到时钟漂移,并能够使用分离的传感器对血压变化进行无创监测。此外,我们还研究了是否可以使用PTT检测癫痫发作。我们的结果表明,使用该算法,可以检测随机森林的癫痫发作。在多模式方法中使用PTT和其他信号,可以提高对轻微症状癫痫发作的检测。 摘要:Documentation of epileptic seizures plays an essential role in planning medical therapy. Solutions for automated epileptic seizure detection can help improve the current problem of incomplete and erroneous manual documentation of epileptic seizures. In recent years, a number of wearable sensors have been tested for this purpose. However, detecting seizures with subtle symptoms remains difficult and current solutions tend to have a high false alarm rate. Seizures can also affect the patient's arterial blood pressure, which has not yet been studied for detection with sensors. The pulse transit time (PTT) provides a noninvasive estimate of arterial blood pressure. It can be obtained by using to two sensors, which are measuring the time differences between arrivals of the pulse waves. Due to separated time chips a clock drift emerges, which is strongly influencing the PTT. In this work, we present an algorithm which responds to alterations in the PTT, considering the clock drift and enabling the noninvasive monitoring of blood pressure alterations using separated sensors. Furthermore we investigated whether seizures can be detected using the PTT. Our results indicate that using the algorithm, it is possible to detect seizures with a Random Forest. Using the PTT along with other signals in a multimodal approach, the detection of seizures with subtle symptoms could thereby be improved.
【5】 Detecting when pre-trained nnU-Net models fail silently for Covid-19 标题:检测预先训练的NNU-Net模型何时发生冠状病毒静默故障
作者:Camila Gonzalez,Karol Gotkowski,Andreas Bucher,Ricarda Fischbach,Isabel Kaltenborn,Anirban Mukhopadhyay 机构: Darmstadt University of Technology, Karolinenpl. , Darmstadt, Germany, University Hospital Frankfurt, Theodor-Stern-Kai , Frankfurt am Main 链接:https://arxiv.org/abs/2107.05975 摘要:计算机断层扫描中肺部病变的自动分割有可能减轻临床医生在Covid-19大流行期间的负担。然而,预测性深度学习模型在临床上并不可信,因为它在非分布(OOD)数据中失败了。提出了一种利用特征空间中马氏距离的轻量级OOD检测方法。所提出的方法可以无缝地集成到最先进的分割管道中,而不需要改变模型结构或训练过程,因此可以用来评估预先训练的模型对新数据的适用性。我们用一个多机构数据集训练的基于补丁的nnU网络结构验证了我们的方法,发现它能有效地检测模型错误分割的样本。 摘要:Automatic segmentation of lung lesions in computer tomography has the potential to ease the burden of clinicians during the Covid-19 pandemic. Yet predictive deep learning models are not trusted in the clinical routine due to failing silently in out-of-distribution (OOD) data. We propose a lightweight OOD detection method that exploits the Mahalanobis distance in the feature space. The proposed approach can be seamlessly integrated into state-of-the-art segmentation pipelines without requiring changes in model architecture or training procedure, and can therefore be used to assess the suitability of pre-trained models to new data. We validate our method with a patch-based nnU-Net architecture trained with a multi-institutional dataset and find that it effectively detects samples that the model segments incorrectly.
分类|识别相关(4篇)
【1】 Per-Pixel Classification is Not All You Need for Semantic Segmentation 标题:按像素分类并不是语义分割所需的全部
作者:Bowen Cheng,Alexander G. Schwing,Alexander Kirillov 机构:Facebook AI Research (FAIR), University of Illinois at Urbana-Champaign (UIUC) 备注:Project page: this https URL 链接:https://arxiv.org/abs/2107.06278 摘要:现代方法通常将语义分割描述为每像素分类任务,而实例级分割则使用另一种掩码分类来处理。我们的关键见解:掩码分类具有足够的通用性,可以使用完全相同的模型、丢失和训练过程以统一的方式解决语义和实例级分割任务。根据这一观察,我们提出了MaskFormer,一个简单的掩码分类模型,它预测一组二进制掩码,每个掩码都与一个全局类标签预测相关联。总体而言,本文提出的基于掩膜分类的方法简化了语义和全景分割任务的有效方法,并显示了良好的实验结果。特别是,我们观察到,当类的数目较大时,MaskFormer的性能优于每像素分类基线。我们的基于掩模分类的方法优于目前最先进的语义(在ADE20K上为55.6miou)和全景分割(在COCO上为52.7pq)模型。 摘要:Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this observation, we propose MaskFormer, a simple mask classification model which predicts a set of binary masks, each associated with a single global class label prediction. Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results. In particular, we observe that MaskFormer outperforms per-pixel classification baselines when the number of classes is large. Our mask classification-based method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
【2】 Cats, not CAT scans: a study of dataset similarity in transfer learning for 2D medical image classification 标题:CATS,而不是CAT扫描:二维医学图像分类转移学习中的数据集相似性研究
作者:Irma van den Brandt,Floris Fok,Bas Mulders,Joaquin Vanschoren,Veronika Cheplygina 机构:∗Eindhoven University of Technology, The Netherlands, †IT University of Copenhagen, Denmark 链接:https://arxiv.org/abs/2107.05940 摘要:转移学习是医学图像分类中常用的一种策略,特别是通过对源数据进行预训练和对目标数据进行微调。目前对于如何选择合适的源数据还没有共识,在文献中我们可以找到支持大型自然图像数据集(如ImageNet)的证据,以及支持更专业的医学数据集的证据。本文对9个具有自然或医学图像的源数据集和3个具有二维图像的目标数据集进行了系统的研究。我们发现ImageNet是导致最高性能的源,但是更大的数据集并不一定更好。我们还研究了数据相似性的不同定义。我们表明,关于相似性的共同直觉可能是不准确的,因此不足以预先预测一个适当的来源。最后,我们讨论了该领域需要进一步研究的几个步骤,特别是对于其他类型(例如3D)的医学图像。我们的实验和预训练模型可通过\url获得{https://www.github.com/vcheplygina/cats-scans} 摘要:Transfer learning is a commonly used strategy for medical image classification, especially via pretraining on source data and fine-tuning on target data. There is currently no consensus on how to choose appropriate source data, and in the literature we can find both evidence of favoring large natural image datasets such as ImageNet, and evidence of favoring more specialized medical datasets. In this paper we perform a systematic study with nine source datasets with natural or medical images, and three target medical datasets, all with 2D images. We find that ImageNet is the source leading to the highest performances, but also that larger datasets are not necessarily better. We also study different definitions of data similarity. We show that common intuitions about similarity may be inaccurate, and therefore not sufficient to predict an appropriate source a priori. Finally, we discuss several steps needed for further research in this field, especially with regard to other types (for example 3D) medical images. Our experiments and pretrained models are available via \url{https://www.github.com/vcheplygina/cats-scans}
【3】 Region attention and graph embedding network for occlusion objective class-based micro-expression recognition 标题:基于遮挡目标类的区域注意力和图形嵌入网络微表情识别
作者:Qirong Mao,Ling Zhou,Wenming Zheng,Xiuyan Shao,Xiaohua Huang 机构: China and also with the School of Biological Science andMedical Engineering, Southeast University 链接:https://arxiv.org/abs/2107.05904 摘要:微表情识别(Micro expression recognition,简称MER)近十年来引起了众多研究者的关注。然而,在真实场景中,MER会发生遮挡。本文深入研究了MER中一个有趣但尚未探索的具有挑战性的问题,即阻塞MER。首先,为了研究真实遮挡下的MER,利用不同的掩模为社区创建了综合遮挡微表情数据库。第二,为了抑制遮挡的影响,提出了一种基于区域启发的关联网络来模拟不同面部区域之间的关系。RRRN由主干网、区域启发(\textbf{RI})模块和关系推理(\textbf{RR})模块组成。更具体地说,骨干网的目的是从不同的面部区域中提取特征表示,RI模块根据无障碍性和抑制遮挡影响的重要性,基于注意机制从区域本身计算自适应权重,RR模块通过执行图卷积来利用这些区域之间的渐进交互。对megc2018协议的讲义数据库评价和复合数据库评价任务进行了实验研究。实验结果表明,RRRN能有效地挖掘面部区域的重要性,并捕捉到MER中面部区域的合作互补关系。结果还表明,RRRN的性能优于现有的方法,特别是在遮挡方面,而且RRRN对遮挡的鲁棒性更强。 摘要:Micro-expression recognition (\textbf{MER}) has attracted lots of researchers' attention in a decade. However, occlusion will occur for MER in real-world scenarios. This paper deeply investigates an interesting but unexplored challenging issue in MER, \ie, occlusion MER. First, to research MER under real-world occlusion, synthetic occluded micro-expression databases are created by using various mask for the community. Second, to suppress the influence of occlusion, a \underline{R}egion-inspired \underline{R}elation \underline{R}easoning \underline{N}etwork (\textbf{RRRN}) is proposed to model relations between various facial regions. RRRN consists of a backbone network, the Region-Inspired (\textbf{RI}) module and Relation Reasoning (\textbf{RR}) module. More specifically, the backbone network aims at extracting feature representations from different facial regions, RI module computing an adaptive weight from the region itself based on attention mechanism with respect to the unobstructedness and importance for suppressing the influence of occlusion, and RR module exploiting the progressive interactions among these regions by performing graph convolutions. Experiments are conducted on handout-database evaluation and composite database evaluation tasks of MEGC 2018 protocol. Experimental results show that RRRN can significantly explore the importance of facial regions and capture the cooperative complementary relationship of facial regions for MER. The results also demonstrate RRRN outperforms the state-of-the-art approaches, especially on occlusion, and RRRN acts more robust to occlusion.
【4】 eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges 标题:eProduct:解决产品识别挑战的百万级视觉搜索基准
作者:Jiangbo Yuan,An-Ti Chiang,Wen Tang,Antonio Haro 机构:Bay Inc., San Jose, CA, USA 备注:This paper was accepted at FGVC8 CVPR2021 as a competition paper (this https URL) 链接:https://arxiv.org/abs/2107.05856 摘要:大规模产品识别是计算机视觉和机器学习在电子商务领域的主要应用之一。由于产品的数量通常比产品类别的数量大得多,因此基于图像的产品识别通常被视为一个视觉搜索而不是一个分类问题。这也是超细粒度识别的一个例子,其中有许多产品具有轻微或细微的视觉差异。创建一个基准数据集,以便在现实环境中对各种视觉搜索解决方案进行训练和评估,一直是一个挑战。这推动了eProduct的创建,eProduct是一个由250万个产品图像组成的数据集,旨在加速自监督学习、弱监督学习和多模式学习领域的发展,以实现细粒度识别。我们将eProduct作为一个训练集和一个评估集来展示,其中训练集包含1.3M+带有标题和层次分类标签的列表图像,用于模型开发,评估集包含10000个查询和110万个索引图像,用于视觉搜索评估。我们将介绍eProduct的构建步骤,分析其多样性,并介绍在其上训练的基线模型的性能。 摘要:Large-scale product recognition is one of the major applications of computer vision and machine learning in the e-commerce domain. Since the number of products is typically much larger than the number of categories of products, image-based product recognition is often cast as a visual search rather than a classification problem. It is also one of the instances of super fine-grained recognition, where there are many products with slight or subtle visual differences. It has always been a challenge to create a benchmark dataset for training and evaluation on various visual search solutions in a real-world setting. This motivated creation of eProduct, a dataset consisting of 2.5 million product images towards accelerating development in the areas of self-supervised learning, weakly-supervised learning, and multimodal learning, for fine-grained recognition. We present eProduct as a training set and an evaluation set, where the training set contains 1.3M+ listing images with titles and hierarchical category labels, for model development, and the evaluation set includes 10,000 query and 1.1 million index images for visual search evaluation. We will present eProduct's construction steps, provide analysis about its diversity and cover the performance of baseline models trained on it.
分割|语义相关(3篇)
【1】 Learning from Partially Overlapping Labels: Image Segmentation under Annotation Shift 标题:从部分重叠标签中学习:注释平移下的图像分割
作者:Gregory Filbrandt,Konstantinos Kamnitsas,David Bernstein,Alexandra Taylor,Ben Glocker 机构: Department of Computing, Imperial College London, UK, Joint Dept of Physics, The Institute of Cancer Research and The Royal Marsden, Department of Radiotherapy, The Royal Marsden NHS Foundation Trust, UK, ∗ Equal contribution 链接:https://arxiv.org/abs/2107.05938 摘要:缺乏高质量的注释图像仍然是训练精确图像分割模型的一个限制因素。当越来越多的带注释的数据集公开时,每个数据库中的样本数量通常很小。结合不同的数据库来创建大量的训练数据是很有吸引力的,但也很有挑战性,因为数据采集和注释过程的差异导致了异构性,通常会产生不兼容甚至冲突的信息。在这篇论文中,我们研究并提出了几种策略学习部分重叠标签的背景下腹部器官分割。我们发现,将半监督方法与自适应交叉熵损失相结合,可以成功地利用非均匀标注的数据,并且与基线和替代方法相比,显著提高了分割精度。 摘要:Scarcity of high quality annotated images remains a limiting factor for training accurate image segmentation models. While more and more annotated datasets become publicly available, the number of samples in each individual database is often small. Combining different databases to create larger amounts of training data is appealing yet challenging due to the heterogeneity as a result of differences in data acquisition and annotation processes, often yielding incompatible or even conflicting information. In this paper, we investigate and propose several strategies for learning from partially overlapping labels in the context of abdominal organ segmentation. We find that combining a semi-supervised approach with an adaptive cross entropy loss can successfully exploit heterogeneously annotated data and substantially improve segmentation accuracy compared to baseline and alternative approaches.
【2】 NucMM Dataset: 3D Neuronal Nuclei Instance Segmentation at Sub-Cubic Millimeter Scale 标题:NucMM数据集:亚立方毫米尺度下的3D神经元核实例分割
作者:Zudi Lin,Donglai Wei,Mariela D. Petkova,Yuelong Wu,Zergham Ahmed,Krishna Swaroop K,Silin Zou,Nils Wendt,Jonathan Boulanger-Weill,Xueying Wang,Nagaraju Dhanyasi,Ignacio Arganda-Carreras,Florian Engert,Jeff Lichtman,Hanspeter Pfister 机构:Harvard University, NIT Karnataka, Technical University of Munich, Donostia, International Physics Center (DIPC), University of the Basque Country, (UPVEHU), Ikerbasque, Basque Foundation for Science 备注:The two first authors contributed equally. To be published in the proceedings of MICCAI 2021 链接:https://arxiv.org/abs/2107.05840 摘要:从显微镜图像体积中分割三维细胞核对于生物学和临床分析至关重要,这使得细胞表达模式和细胞谱系的研究成为可能。然而,目前的神经元细胞核数据集通常包含小于$10^{\text{-}3}\mm^3$的体积,每个体积少于500个实例,无法揭示大脑大区域的复杂性,限制了对神经元结构的研究。在本文中,我们将这项任务推进到亚立方毫米尺度,并用两个完全注释的卷管理NucMM数据集:一个$0.1\mm^3$的电子显微镜(EM)卷,包含几乎整个斑马鱼大脑,约170000个核;一个价值0.25毫米3美元的显微CT(uCT)体积,包含一部分老鼠视觉皮层,大约有7000个核。通过两种成像方式和显著增加的体积大小和实例数量,我们发现了神经元细胞核在外观和密度上的巨大差异,给该领域带来了新的挑战。我们还进行了统计分析,以定量说明这些挑战。为了解决这一问题,我们提出了一种新的混合表示学习模型,该模型结合了前景掩模、等高线图和符号距离变换的优点来生成高质量的三维掩模。在NucMM数据集上的基准比较表明,本文提出的方法明显优于现有的核分割方法。代码和数据可在https://connectomics-bazaar.github.io/proj/nucMM/index.html. 摘要:Segmenting 3D cell nuclei from microscopy image volumes is critical for biological and clinical analysis, enabling the study of cellular expression patterns and cell lineages. However, current datasets for neuronal nuclei usually contain volumes smaller than $10^{\text{-}3}\ mm^3$ with fewer than 500 instances per volume, unable to reveal the complexity in large brain regions and restrict the investigation of neuronal structures. In this paper, we have pushed the task forward to the sub-cubic millimeter scale and curated the NucMM dataset with two fully annotated volumes: one $0.1\ mm^3$ electron microscopy (EM) volume containing nearly the entire zebrafish brain with around 170,000 nuclei; and one $0.25\ mm^3$ micro-CT (uCT) volume containing part of a mouse visual cortex with about 7,000 nuclei. With two imaging modalities and significantly increased volume size and instance numbers, we discover a great diversity of neuronal nuclei in appearance and density, introducing new challenges to the field. We also perform a statistical analysis to illustrate those challenges quantitatively. To tackle the challenges, we propose a novel hybrid-representation learning model that combines the merits of foreground mask, contour map, and signed distance transform to produce high-quality 3D masks. The benchmark comparisons on the NucMM dataset show that our proposed method significantly outperforms state-of-the-art nuclei segmentation approaches. Code and data are available at https://connectomics-bazaar.github.io/proj/nucMM/index.html.
【3】 Detect and Locate: A Face Anti-Manipulation Approach with Semantic and Noise-level Supervision 标题:检测定位:一种基于语义和噪声级监控的人脸反操纵方法
作者:Chenqi Kong,Baoliang Chen,Haoliang Li,Shiqi Wang,Anderson Rocha,Sam Kwong 备注:12 pages, 10 figures 链接:https://arxiv.org/abs/2107.05821 摘要:深度学习的技术进步使得复杂的面部操作方案成为可能,在现代社会引发了严重的信任问题和安全问题。一般来说,检测被操纵的人脸和定位可能改变的区域是一项具有挑战性的任务。在此,我们提出一个概念上简单但有效的方法来有效地检测图像中的伪造人脸,同时定位被操纵的区域。该方案依赖于一个分割图,该分割图提供关于图像的有意义的高级语义信息线索。此外,噪音地图估计,发挥补充作用,捕捉低层次的线索,并随后授权决策。最后,结合这两个模块的特征识别出假人脸。实验结果表明,该模型具有较高的检测精度和定位性能。 摘要:The technological advancements of deep learning have enabled sophisticated face manipulation schemes, raising severe trust issues and security concerns in modern society. Generally speaking, detecting manipulated faces and locating the potentially altered regions are challenging tasks. Herein, we propose a conceptually simple but effective method to efficiently detect forged faces in an image while simultaneously locating the manipulated regions. The proposed scheme relies on a segmentation map that delivers meaningful high-level semantic information clues about the image. Furthermore, a noise map is estimated, playing a complementary role in capturing low-level clues and subsequently empowering decision-making. Finally, the features from these two modules are combined to distinguish fake faces. Extensive experiments show that the proposed model achieves state-of-the-art detection accuracy and remarkable localization performance.
Zero/Few Shot|迁移|域适配|自适应(6篇)
【1】 Exploiting Image Translations via Ensemble Self-Supervised Learning for Unsupervised Domain Adaptation 标题:基于集成自监督学习的无监督领域自适应图像翻译
作者:Fabrizio J. Piva,Gijs Dubbelman 机构:Eindhoven University of Technology, Department of Electrical Engineering, Groene Loper ,AZ Eindhoven, The Netherlands 备注:Manuscript under review at Computer Vision and Image Understanding (CVIU) journal 链接:https://arxiv.org/abs/2107.06235 摘要:我们提出了一种无监督域自适应(UDA)策略,该策略将多幅图像的翻译、集成学习和自监督学习结合在一起。我们主要研究UDA的一个标准任务,即在有标记的合成数据和无标记的真实数据上训练语义切分模型,以期对后者有更好的效果。为了利用多幅图像翻译的优势,我们提出了一种集成学习方法,其中三个分类器以不同图像翻译的输入特征作为预测值,使每个分类器独立学习,目的是通过稀疏多项式Logistic回归来组合它们的输出。这种称为元学习器的回归层在进行自监督学习时有助于减少伪标签生成过程中的偏差,并通过考虑每个分类器的贡献来提高模型的可推广性。我们在标准的UDA基准上评估了我们的方法,即使GTA V和Synthia适应城市景观,并在union度量的平均交集上获得了最新的结果。大量的烧蚀实验被报道强调了我们提出的UDA策略的优点。 摘要:We introduce an unsupervised domain adaption (UDA) strategy that combines multiple image translations, ensemble learning and self-supervised learning in one coherent approach. We focus on one of the standard tasks of UDA in which a semantic segmentation model is trained on labeled synthetic data together with unlabeled real-world data, aiming to perform well on the latter. To exploit the advantage of using multiple image translations, we propose an ensemble learning approach, where three classifiers calculate their prediction by taking as input features of different image translations, making each classifier learn independently, with the purpose of combining their outputs by sparse Multinomial Logistic Regression. This regression layer known as meta-learner helps to reduce the bias during pseudo label generation when performing self-supervised learning and improves the generalizability of the model by taking into consideration the contribution of each classifier. We evaluate our method on the standard UDA benchmarks, i.e. adapting GTA V and Synthia to Cityscapes, and achieve state-of-the-art results in the mean intersection over union metric. Extensive ablation experiments are reported to highlight the advantageous properties of our proposed UDA strategy.
【2】 Domain-Irrelevant Representation Learning for Unsupervised Domain Generalization 标题:无监督领域泛化的领域无关表示学习
作者:Xingxuan Zhang,Linjun Zhou,Renzhe Xu,Peng Cui,Zheyan Shen,Haoxin Liu 机构:Department of Computer Science, Tsinghua University, Beijing, China 链接:https://arxiv.org/abs/2107.06219 摘要:领域泛化(DG)的目的是帮助在一组源域上训练的模型更好地泛化到不可见的目标域上。现有的分布式遗传算法的性能很大程度上依赖于足够的标记数据,然而这些标记数据通常是昂贵的或不可用的。虽然未标记的数据更容易获取,但我们试图探索无监督学习如何帮助深度模型跨领域推广。具体来说,我们研究了一个新的泛化问题,称为无监督领域泛化,其目的是学习具有未标记数据的可泛化模型。此外,我们提出了一种与领域无关的无监督学习(DIUL)方法来处理未标记数据中的显著和误导性异质性以及源数据和目标数据之间的严重分布偏移。令人惊讶的是,我们发现DIUL不仅可以弥补标记数据的不足,而且在标记数据足够的情况下还可以进一步增强模型的泛化能力。作为一种预训练方法,DIUL显示出优于ImageNet的预训练协议,即使在可用数据没有标记的情况下,也比ImageNet要少很多。大量的实验清楚地证明了我们的方法与最先进的无监督学习方法相比的有效性。 摘要:Domain generalization (DG) aims to help models trained on a set of source domains generalize better on unseen target domains. The performances of current DG methods largely rely on sufficient labeled data, which however are usually costly or unavailable. While unlabeled data are far more accessible, we seek to explore how unsupervised learning can help deep models generalizes across domains. Specifically, we study a novel generalization problem called unsupervised domain generalization, which aims to learn generalizable models with unlabeled data. Furthermore, we propose a Domain-Irrelevant Unsupervised Learning (DIUL) method to cope with the significant and misleading heterogeneity within unlabeled data and severe distribution shifts between source and target data. Surprisingly we observe that DIUL can not only counterbalance the scarcity of labeled data but also further strengthen the generalization ability of models when the labeled data are sufficient. As a pretraining approach, DIUL shows superior to ImageNet pretraining protocol even when the available data are unlabeled and of a greatly smaller amount compared to ImageNet. Extensive experiments clearly demonstrate the effectiveness of our method compared with state-of-the-art unsupervised learning counterparts.
【3】 Deep Ranking with Adaptive Margin Triplet Loss 标题:自适应边际三重损失的深度排序
作者:Mai Lan Ha,Volker Blanz 机构:University of Siegen 链接:https://arxiv.org/abs/2107.06187 摘要:我们提出了一个简单的修改,从一个固定的保证金三重损失的自适应保证金三重损失。原始三元组丢失在人脸识别、人脸再识别和细粒度相似性等分类问题中有着广泛的应用,而我们提出的丢失方法非常适合于评级为连续值的评级数据集。相对于原始的三元组丢失需要仔细采样的情况,In-out方法可以利用整个数据集生成三元组,并且优化仍然可以收敛,不会经常遇到模型崩溃的问题。自适应边缘只需在训练前计算一次,这比在固定边缘情况下在每个历元后生成三元组要便宜得多。除了显著提高了训练稳定性(在我们的实验中,所提出的模型从未崩溃过,相比之下,在现有的三重丢失情况下,训练崩溃了几倍),我们在各种评级数据集和网络架构上取得了比原始三重丢失略好的性能。 摘要:We propose a simple modification from a fixed margin triplet loss to an adaptive margin triplet loss. While the original triplet loss is used widely in classification problems such as face recognition, face re-identification and fine-grained similarity, our proposed loss is well suited for rating datasets in which the ratings are continuous values. In contrast to original triplet loss where we have to sample data carefully, in out method, we can generate triplets using the whole dataset, and the optimization can still converge without frequently running into a model collapsing issue. The adaptive margins only need to be computed once before the training, which is much less expensive than generating triplets after every epoch as in the fixed margin case. Besides substantially improved training stability (the proposed model never collapsed in our experiments compared to a couple of times that the training collapsed on existing triplet loss), we achieved slightly better performance than the original triplet loss on various rating datasets and network architectures.
【4】 Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain Adaptation 标题:用于鲁棒领域自适应的快速批量核范数最大化和最小化
作者:Shuhao Cui,Shuhui Wang,Junbao Zhuo,Liang Li,Qingming Huang,Qi Tian 备注:TPAMI under revivew. arXiv admin note: text overlap with arXiv:2003.12237 链接:https://arxiv.org/abs/2107.06154 摘要:由于视觉域自适应中的域差异,当源模型在目标域的决策边界附近遇到高密度的数据时,源模型的性能会下降。常用的解决方法是最小化Shannon熵,使决策边界远离高密度区域。然而,熵最小化也会导致预测多样性的严重降低,给域自适应带来不利影响。本文通过研究随机选取的数据批的分类输出矩阵的结构,研究了预测的可分辨性和多样性。通过理论分析发现,预测的可分辨性和多样性可以分别用批量输出矩阵的Frobenius范数和秩来度量。核范数是前者的上界,是后者的凸近似。因此,我们提出了批量核范数最大化和最小化,即对目标输出矩阵进行核范数最大化以提高目标预测能力,对源批量输出矩阵进行核范数最小化以提高源领域知识的适用性。进一步用L{1,2}-范数逼近核范数,设计了大类稳定解的多批优化问题。该快速逼近方法计算复杂度为O(n^2),具有较好的收敛性。实验表明,在三种典型的域自适应情况下,该方法能够提高自适应精度和鲁棒性。代码可在https://github.com/cuishuhao/BNM. 摘要:Due to the domain discrepancy in visual domain adaptation, the performance of source model degrades when bumping into the high data density near decision boundary in target domain. A common solution is to minimize the Shannon Entropy to push the decision boundary away from the high density area. However, entropy minimization also leads to severe reduction of prediction diversity, and unfortunately brings harm to the domain adaptation. In this paper, we investigate the prediction discriminability and diversity by studying the structure of the classification output matrix of a randomly selected data batch. We find by theoretical analysis that the prediction discriminability and diversity could be separately measured by the Frobenius-norm and rank of the batch output matrix. The nuclear-norm is an upperbound of the former, and a convex approximation of the latter. Accordingly, we propose Batch Nuclear-norm Maximization and Minimization, which performs nuclear-norm maximization on the target output matrix to enhance the target prediction ability, and nuclear-norm minimization on the source batch output matrix to increase applicability of the source domain knowledge. We further approximate the nuclear-norm by L_{1,2}-norm, and design multi-batch optimization for stable solution on large number of categories. The fast approximation method achieves O(n^2) computational complexity and better convergence property. Experiments show that our method could boost the adaptation accuracy and robustness under three typical domain adaptation scenarios. The code is available at https://github.com/cuishuhao/BNM.
【5】 Force-in-domain GAN inversion 标题:畴内力GaN反转
作者:Guangjie Leng,Yeku Zhu,Zhi-Qin John Xu 机构: Institute of Natural Sciences and School of Mathematical Sciences and MOE-LSC, Shanghai Jiao Tong University, Qing Yuan Research Institute, Shanghai Jiao Tong University 链接:https://arxiv.org/abs/2107.06050 摘要:实证研究表明,生成性对抗网络(GANs)在训练生成图像时,其潜在空间会出现各种语义。为了进行真实图像的编辑,需要从真实图像到潜在空间的精确映射,以利用这些学习到的语义,这是一个重要而困难的问题。最近提出了一种域内GAN反转方法,通过将反转码得到的重建图像强制在真实图像空间中,从而将反转码限制在潜在空间中。经验上,我们发现在域内GAN的反转码可以明显地偏离潜在空间。为了解决这一问题,我们在域内GAN的基础上提出了一种强制域内GAN,它利用一个鉴别器在潜在空间内强制反转码。GAN畴中的力也可以用循环GAN来解释。大量实验表明,我们在GAN域的工作不仅可以在像素级重建目标图像,而且可以将反转后的代码与潜在空间很好地对齐,实现语义编辑。 摘要:Empirical works suggest that various semantics emerge in the latent space of Generative Adversarial Networks (GANs) when being trained to generate images. To perform real image editing, it requires an accurate mapping from the real image to the latent space to leveraging these learned semantics, which is important yet difficult. An in-domain GAN inversion approach is recently proposed to constraint the inverted code within the latent space by forcing the reconstructed image obtained from the inverted code within the real image space. Empirically, we find that the inverted code by the in-domain GAN can deviate from the latent space significantly. To solve this problem, we propose a force-in-domain GAN based on the in-domain GAN, which utilizes a discriminator to force the inverted code within the latent space. The force-in-domain GAN can also be interpreted by a cycle-GAN with slight modification. Extensive experiments show that our force-in-domain GAN not only reconstructs the target image at the pixel level, but also align the inverted code with the latent space well for semantic editing.
【6】 Detect and Defense Against Adversarial Examples in Deep Learning using Natural Scene Statistics and Adaptive Denoising 标题:基于自然场景统计和自适应去噪的深度学习对手例检测与防御
作者:Anouar Kherchouche,Sid Ahmed Fezza,Wassim Hamidouche 机构:Received: date Accepted: date 链接:https://arxiv.org/abs/2107.05780 摘要:尽管深度神经网络(DNNs)表现出了巨大的性能,但最近的研究表明,它们对对抗性例子(AEs)是脆弱的,也就是说,精心设计的扰动输入可以欺骗目标神经网络。目前,关于制造这种AEs的有效攻击的文献非常丰富。同时,已经制定了许多防御策略来缓解这一脆弱性。然而,后者对特定的攻击表现出有效性,对不同的攻击没有很好的通用性。在本文中,我们提出了一个防御DNN分类器的框架,以抵抗各种各样的样本。该方法基于两阶段框架,包括一个单独的检测器和一个去噪块。检测器的目的是通过使用自然场景统计(NSS)对AEs进行特征化来检测AEs,其中我们证明这些统计特征会因存在不利干扰而改变。去噪器基于块匹配3D(BM3D)滤波器,由卷积神经网络(CNN)估计的最佳阈值反馈,将检测到的AEs样本投影到数据流形中。我们对MNIST、CIFAR-10和Tiny ImageNet三个标准数据集进行了全面评估。实验结果表明,在黑盒、灰盒和白盒环境下,该方法对一组攻击的鲁棒性均优于现有的防御技术。源代码位于:https://github.com/kherchouche-anouar/2DAE 摘要:Despite the enormous performance of deepneural networks (DNNs), recent studies have shown theirvulnerability to adversarial examples (AEs), i.e., care-fully perturbed inputs designed to fool the targetedDNN. Currently, the literature is rich with many ef-fective attacks to craft such AEs. Meanwhile, many de-fenses strategies have been developed to mitigate thisvulnerability. However, these latter showed their effec-tiveness against specific attacks and does not general-ize well to different attacks. In this paper, we proposea framework for defending DNN classifier against ad-versarial samples. The proposed method is based on atwo-stage framework involving a separate detector anda denoising block. The detector aims to detect AEs bycharacterizing them through the use of natural scenestatistic (NSS), where we demonstrate that these statis-tical features are altered by the presence of adversarialperturbations. The denoiser is based on block matching3D (BM3D) filter fed by an optimum threshold valueestimated by a convolutional neural network (CNN) toproject back the samples detected as AEs into theirdata manifold. We conducted a complete evaluation onthree standard datasets namely MNIST, CIFAR-10 andTiny-ImageNet. The experimental results show that theproposed defense method outperforms the state-of-the-art defense techniques by improving the robustnessagainst a set of attacks under black-box, gray-box and white-box settings. The source code is available at: https://github.com/kherchouche-anouar/2DAE
半弱无监督|主动学习|不确定性(3篇)
【1】 Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval 标题:风格检索:无人监督的面部特征转移和检索
作者:Min Jin Chong,Wen-Sheng Chu,Abhishek Kumar 机构:University of Illinois at Urbana-Champaign, Google Research 备注:Code is here this https URL 链接:https://arxiv.org/abs/2107.06256 摘要:本文提出了一种在真实图像上进行细粒度人脸特征传输和检索的无监督框架——风格检索(RIS)。最近的工作表明,通过利用StyleGAN潜在空间的解纠缠特性,可以学习一个目录,该目录允许在生成的图像上局部语义转移面部特征。RIS在以下方面改进了现有的艺术:1)特征分离,允许具有挑战性的转移(即头发和姿势),这在SoTA方法中是不可能的。2) 无需对每幅图像进行超参数调整,也无需对大量图像计算目录。3) 利用人脸特征(如眼睛)进行人脸检索是在细粒度水平上检索人脸图像的第一步。4) 鲁棒性和对真实图像的自然应用。我们的定性和定量分析表明,RIS在真实图像上实现了高保真的特征传输和精确的细粒度检索。我们讨论了RIS的负责任应用。 摘要:We present Retrieve in Style (RIS), an unsupervised framework for fine-grained facial feature transfer and retrieval on real images. Recent work shows that it is possible to learn a catalog that allows local semantic transfers of facial features on generated images by capitalizing on the disentanglement property of the StyleGAN latent space. RIS improves existing art on: 1) feature disentanglement and allows for challenging transfers (i.e., hair and pose) that were not shown possible in SoTA methods. 2) eliminating the need for per-image hyperparameter tuning, and for computing a catalog over a large batch of images. 3) enabling face retrieval using the proposed facial features (e.g., eyes), and to our best knowledge, is the first work to retrieve face images at the fine-grained level. 4) robustness and natural application to real images. Our qualitative and quantitative analyses show RIS achieves both high-fidelity feature transfers and accurate fine-grained retrievals on real images. We discuss the responsible application of RIS.
【2】 Kit-Net: Self-Supervised Learning to Kit Novel 3D Objects into Novel 3D Cavities 标题:KIT-Net:将新的3D对象装入新的3D腔的自监督学习
作者:Shivin Devgon,Jeffrey Ichnowski,Michael Danielczuk,Daniel S. Brown,Ashwin Balakrishna,Shirin Joshi,Eduardo M. C. Rocha,Eugen Solowjow,Ken Goldberg 机构: 1TheAUTOLABattheUniversityofCalifornia 备注:None 链接:https://arxiv.org/abs/2107.05789 摘要:在工业零件装配中,三维物体被插入型腔中进行运输或后续装配。配套是一个关键的步骤,因为它可以减少下游加工和处理时间,并使较低的存储和运输成本。我们提出了Kit-Net,一个框架,用于将以前看不见的三维物体装配成空腔,给出目标空腔和一个物体在未知初始方向上被夹钳夹住的深度图像。Kit-Net采用自监督深度学习和数据增强的方法训练卷积神经网络(CNN),利用模拟深度图像对的大型训练数据集,鲁棒地估计物体之间的三维旋转,并匹配凹腔或凸腔。然后,Kit-Net使用训练好的CNN来实现一个控制器来定位和定位新的物体,以便插入到新的棱柱形和共形三维腔中。仿真实验表明,Kit网能使目标网格与目标空腔的平均相交体积达到98.9%。用工业物体进行的物理实验在使用基线方法的试验中成功率为18%,在使用Kit-Net的试验中成功率为63%。视频、代码和数据可在https://github.com/BerkeleyAutomation/Kit-Net. 摘要:In industrial part kitting, 3D objects are inserted into cavities for transportation or subsequent assembly. Kitting is a critical step as it can decrease downstream processing and handling times and enable lower storage and shipping costs. We present Kit-Net, a framework for kitting previously unseen 3D objects into cavities given depth images of both the target cavity and an object held by a gripper in an unknown initial orientation. Kit-Net uses self-supervised deep learning and data augmentation to train a convolutional neural network (CNN) to robustly estimate 3D rotations between objects and matching concave or convex cavities using a large training dataset of simulated depth images pairs. Kit-Net then uses the trained CNN to implement a controller to orient and position novel objects for insertion into novel prismatic and conformal 3D cavities. Experiments in simulation suggest that Kit-Net can orient objects to have a 98.9% average intersection volume between the object mesh and that of the target cavity. Physical experiments with industrial objects succeed in 18% of trials using a baseline method and in 63% of trials with Kit-Net. Video, code, and data are available at https://github.com/BerkeleyAutomation/Kit-Net.
【3】 SoftHebb: Bayesian inference in unsupervised Hebbian soft winner-take-all networks 标题:SoftHebb:无监督Hebbian软赢家通吃网络中的贝叶斯推理
作者:Timoleon Moraitis,Dmitry Toichkin,Yansong Chua,Qinghai Guo 机构:Huawei - Zurich Research Center, Zurich, Switzerland, Moscow, Russia, Laboratories, Huawei Technologies, Shenzhen, China 链接:https://arxiv.org/abs/2107.05747 摘要:最先进的人工神经网络(ANN)需要标记数据或层间反馈,通常在生物学上是不可信的,并且容易受到人类不易受到的对抗性攻击。另一方面,Hebbian学习在winner-take-all(WTA)网络中是无监督的,前馈的,并且在生物学上是合理的。然而,除了在非常有限的假设条件下,WTA网络的目标优化理论一直缺乏。在这里,我们正式得出这样一个理论,基于生物学上看似合理,但通用的人工神经网络元素。通过Hebbian学习,网络参数保持了数据的贝叶斯生成模型。不存在监督损失函数,但网络确实最小化了其激活和输入分布之间的交叉熵。关键是一个“软”WTA,那里没有绝对的“硬”赢家神经元,以及一种特殊类型的Hebbian样的权重和偏差可塑性。我们在实践中证实了我们的理论,在手写数字(MNIST)识别中,我们的Hebbian算法SoftHebb在不访问交叉熵的情况下最小化交叉熵,并且优于更常用的基于硬WTA的方法。引人注目的是,在某些条件下,它甚至优于有监督的端到端反向传播。具体地说,在两层网络中,当训练数据只呈现一次、测试数据有噪声以及基于梯度的对抗攻击时,SoftHebb的性能优于反向传播。混淆SoftHebb的对抗性攻击也会混淆人眼。最后,该模型可以根据输入分布生成对象的插值。 摘要:State-of-the-art artificial neural networks (ANNs) require labelled data or feedback between layers, are often biologically implausible, and are vulnerable to adversarial attacks that humans are not susceptible to. On the other hand, Hebbian learning in winner-take-all (WTA) networks, is unsupervised, feed-forward, and biologically plausible. However, an objective optimization theory for WTA networks has been missing, except under very limiting assumptions. Here we derive formally such a theory, based on biologically plausible but generic ANN elements. Through Hebbian learning, network parameters maintain a Bayesian generative model of the data. There is no supervisory loss function, but the network does minimize cross-entropy between its activations and the input distribution. The key is a "soft" WTA where there is no absolute "hard" winner neuron, and a specific type of Hebbian-like plasticity of weights and biases. We confirm our theory in practice, where, in handwritten digit (MNIST) recognition, our Hebbian algorithm, SoftHebb, minimizes cross-entropy without having access to it, and outperforms the more frequently used, hard-WTA-based method. Strikingly, it even outperforms supervised end-to-end backpropagation, under certain conditions. Specifically, in a two-layered network, SoftHebb outperforms backpropagation when the training dataset is only presented once, when the testing data is noisy, and under gradient-based adversarial attacks. Adversarial attacks that confuse SoftHebb are also confusing to the human eye. Finally, the model can generate interpolations of objects from its input distribution.
医学相关(3篇)
【1】 Scalable, Axiomatic Explanations of Deep Alzheimer's Diagnosis from Heterogeneous Data 标题:异质数据对深度阿尔茨海默病诊断的可伸缩性、公理化解释
作者:Sebastian Pölsterl,Christina Aigner,Christian Wachinger 机构:Artificial Intelligence in Medical Imaging (AI-Med), Department of Child and Adolescent Psychiatry, Ludwig-Maximilians-Universität, Munich, Germany 备注:Accepted at 2021 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 链接:https://arxiv.org/abs/2107.05997 摘要:深度神经网络(DNNs)具有从复杂的生物医学数据中学习的巨大潜力。特别是,DNNs已被用于无缝融合来自神经解剖学、遗传学、生物标志物和神经心理学测试的异质信息,以实现对阿尔茨海默病的高准确诊断。另一方面,它们的黑匣子性质仍然是在诊所采用这种系统的一个障碍,在诊所,解释性是绝对必要的。我们提出异质神经网络的Shapley值解释(SVEHNN)来解释神经解剖和生物标记物的三维点云中的DNN对阿尔茨海默病的诊断。我们的解释是基于Shapley值,这是一种独特的方法,满足所有基本公理的地方解释先前建立的文献。因此,SVEHNN具有许多令人满意的特性,而以往在医学决策的可解释性方面的工作是缺乏的。为了避免Shapley值的指数时间复杂度,我们提出将一个给定的DNN转化为一个轻量级的概率深度网络而不需要再训练,从而实现了在特征数上的二次复杂度。在合成数据和真实数据的实验中,我们证明了我们可以在极大地减少运行时间的情况下逼近精确的Shapley值,并且可以揭示网络从数据中学习到的隐藏知识。 摘要:Deep Neural Networks (DNNs) have an enormous potential to learn from complex biomedical data. In particular, DNNs have been used to seamlessly fuse heterogeneous information from neuroanatomy, genetics, biomarkers, and neuropsychological tests for highly accurate Alzheimer's disease diagnosis. On the other hand, their black-box nature is still a barrier for the adoption of such a system in the clinic, where interpretability is absolutely essential. We propose Shapley Value Explanation of Heterogeneous Neural Networks (SVEHNN) for explaining the Alzheimer's diagnosis made by a DNN from the 3D point cloud of the neuroanatomy and tabular biomarkers. Our explanations are based on the Shapley value, which is the unique method that satisfies all fundamental axioms for local explanations previously established in the literature. Thus, SVEHNN has many desirable characteristics that previous work on interpretability for medical decision making is lacking. To avoid the exponential time complexity of the Shapley value, we propose to transform a given DNN into a Lightweight Probabilistic Deep Network without re-training, thus achieving a complexity only quadratic in the number of features. In our experiments on synthetic and real data, we show that we can closely approximate the exact Shapley value with a dramatically reduced runtime and can reveal the hidden knowledge the network has learned from the data.
【2】 Attention based CNN-LSTM Network for Pulmonary Embolism Prediction on Chest Computed Tomography Pulmonary Angiograms 标题:基于注意力的CNN-LSTM网络在胸部CT肺动脉造影肺动脉栓塞预测中的应用
作者:Sudhir Suman,Gagandeep Singh,Nicole Sakla,Rishabh Gattu,Jeremy Green,Tej Phatak,Dimitris Samaras,Prateek Prasanna 机构: Dept. of Electrical Engineering, Indian Institute of Technology, Bombay, India, Dept. of Radiology, Newark Beth Israel Medical Center, NJ, USA, Dept. of Computer Science, Stony Brook University, NY, USA 备注:This work will be presented at MICCAI 2021 链接:https://arxiv.org/abs/2107.06276 摘要:在美国每年有超过60000人死亡,肺栓塞(PE)是最致命的心血管疾病之一。它是由肺动脉阻塞引起的;确认它的存在是耗时的,而且容易过度诊断。利用自动PE检测系统对诊断的准确性和效率至关重要。在这项研究中,我们提出了一个基于两阶段注意的CNN-LSTM网络,用于预测PE,其相关类型(慢性,急性)和相应位置(左侧,右侧或中央)的计算机断层扫描(CT)检查。我们在现有最大的公共计算机断层肺血管造影PE数据集(RSNA-STR肺栓塞CT(RSSPECT)数据集,N=7279 CT研究)上训练我们的模型,并在N=106个研究的内部管理数据集上进行测试。我们的框架通过多层切片方法反映了放射学诊断过程,因此可以仔细评估真实肺栓塞的准确性和病理后遗症,使医生能够更好地评估PE的发病率。我们提出的方法优于一个基线CNN分类器和一个单级CNN-LSTM网络,在测试集上实现了0.95的AUC来检测研究中是否存在PE。 摘要:With more than 60,000 deaths annually in the United States, Pulmonary Embolism (PE) is among the most fatal cardiovascular diseases. It is caused by an artery blockage in the lung; confirming its presence is time-consuming and is prone to over-diagnosis. The utilization of automated PE detection systems is critical for diagnostic accuracy and efficiency. In this study we propose a two-stage attention-based CNN-LSTM network for predicting PE, its associated type (chronic, acute) and corresponding location (leftsided, rightsided or central) on computed tomography (CT) examinations. We trained our model on the largest available public Computed Tomography Pulmonary Angiogram PE dataset (RSNA-STR Pulmonary Embolism CT (RSPECT) Dataset, N=7279 CT studies) and tested it on an in-house curated dataset of N=106 studies. Our framework mirrors the radiologic diagnostic process via a multi-slice approach so that the accuracy and pathologic sequela of true pulmonary emboli may be meticulously assessed, enabling physicians to better appraise the morbidity of a PE when present. Our proposed method outperformed a baseline CNN classifier and a single-stage CNN-LSTM network, achieving an AUC of 0.95 on the test set for detecting the presence of PE in the study.
【3】 A Survey of Applications of Artificial Intelligence for Myocardial Infarction Disease Diagnosis 标题:人工智能在心肌梗死疾病诊断中的应用综述
作者:Javad Hassannataj Joloudari,Sanaz Mojrian,Issa Nodehi,Amir Mashmool,Zeynab Kiani Zadegan,Sahar Khanjani Shirkharkolaie,Tahereh Tamadon,Samiyeh Khosravi,Mitra Akbari,Edris Hassannataj,Roohallah Alizadehsani,Danial Sharifrazi,Amir Mosavi 机构:Department of Computer Engineering, University of Birjand, Birjand, Iran, Department of Information Technology Engineering, Mazandaran University of Science and Technology, Babol, Iran, Department of Computer Engineering, University of Qom, Qom, Iran 备注:18 pages, 7 figures 链接:https://arxiv.org/abs/2107.06179 摘要:心肌梗死(MID)是由于未诊断的冠状动脉疾病(CAD)进展迅速而引起的,它通过减少心肌的血流量而导致心肌细胞的损伤。MID是全世界中老年人死亡的主要原因。一般来说,临床医生对原始心电图(ECG)信号进行检测,以进行中期识别,这既费时又费钱。提出了一种基于人工智能的心电信号MID自动诊断方法。因此,本文综述了基于人工智能的心电信号中间诊断方法,包括机器学习和深度学习。利用这些方法证明了ML方法中心电信号的特征提取和选择需要手工制作。相反,这些任务是在DL方法中自动探索的。据我们所知,深度卷积神经网络(DCNN)方法是为早期诊断MID而发展起来的一种高要求的方法。大多数研究者倾向于使用DCNN方法,而没有研究使用人工智能方法对心电信号进行中间诊断。 摘要:Myocardial infarction disease (MID) is caused to the rapid progress of undiagnosed coronary artery disease (CAD) that indicates the injury of a heart cell by decreasing the blood flow to the cardiac muscles. MID is the leading cause of death in middle-aged and elderly subjects all over the world. In general, raw Electrocardiogram (ECG) signals are tested for MID identification by clinicians that is exhausting, time-consuming, and expensive. Artificial intelligence-based methods are proposed to handle the problems to diagnose MID on the ECG signals automatically. Hence, in this survey paper, artificial intelligence-based methods, including machine learning and deep learning, are review for MID diagnosis on the ECG signals. Using the methods demonstrate that the feature extraction and selection of ECG signals required to be handcrafted in the ML methods. In contrast, these tasks are explored automatically in the DL methods. Based on our best knowledge, Deep Convolutional Neural Network (DCNN) methods are highly required methods developed for the early diagnosis of MID on the ECG signals. Most researchers have tended to use DCNN methods, and no studies have surveyed using artificial intelligence methods for MID diagnosis on the ECG signals.
GAN|对抗|攻击|生成相关(4篇)
【1】 Generative Adversarial Learning via Kernel Density Discrimination 标题:基于核密度判别的生成性对抗性学习
作者:Abdelhak Lemkhenter,Adam Bielski,Alp Eren Sari,Paolo Favaro 机构:Institute of Computer Science, University of Bern 链接:https://arxiv.org/abs/2107.06197 摘要:本文介绍了一种新的产生式对抗学习方法——核密度判别GAN(KDD-GAN)。KDD-GAN将训练描述为似然比优化问题,其中数据分布通过(局部)核密度估计(KDE)显式地写入。这是受对比学习的最新进展及其与KDE关系的启发。我们直接在特征空间中定义kde,放弃了核特征映射可逆性的要求。在我们的方法中,特征不再像在原始GAN公式中那样针对线性可分性进行优化,而是针对特征空间中更一般的分布区分进行优化。我们分析了我们的损失相对于特征表示的梯度,并表明它比原来的铰链损失有更好的表现。我们在CIFAR10和ImageNet的缩放版本上用所提出的基于KDE的丢失进行了实验,该丢失被用作训练丢失或正则化项。我们使用BigGAN/SA-GAN作为主干和基线,因为我们的重点不是设计网络的体系结构。我们显示,与基线相比,FID生成的样品质量从10%提高到40%。代码将可用。 摘要:We introduce Kernel Density Discrimination GAN (KDD GAN), a novel method for generative adversarial learning. KDD GAN formulates the training as a likelihood ratio optimization problem where the data distributions are written explicitly via (local) Kernel Density Estimates (KDE). This is inspired by the recent progress in contrastive learning and its relation to KDE. We define the KDEs directly in feature space and forgo the requirement of invertibility of the kernel feature mappings. In our approach, features are no longer optimized for linear separability, as in the original GAN formulation, but for the more general discrimination of distributions in the feature space. We analyze the gradient of our loss with respect to the feature representation and show that it is better behaved than that of the original hinge loss. We perform experiments with the proposed KDE-based loss, used either as a training loss or a regularization term, on both CIFAR10 and scaled versions of ImageNet. We use BigGAN/SA-GAN as a backbone and baseline, since our focus is not to design the architecture of the networks. We show a boost in the quality of generated samples with respect to FID from 10% to 40% compared to the baseline. Code will be made available.
【2】 This Person (Probably) Exists. Identity Membership Attacks Against GAN Generated Faces 标题:这个人(可能)是存在的。针对GAN生成的人脸的身份成员身份攻击
作者:Ryan Webster,Julien Rabin,Loic Simon,Frederic Jurie 机构: University of Caen Normandie, ENSI Caen 链接:https://arxiv.org/abs/2107.06018 摘要:最近,生成性对抗网络(GANs)实现了惊人的现实主义,甚至愚弄了人类的观察者。事实上,流行的舌战网站{\small\url{http://thispersondoesnotexist.com}},用甘生成的图片嘲讽用户,这些图片看起来太真实了,让人难以置信。另一方面,GANs确实泄露了他们训练数据的信息,最近文献中显示的成员攻击就是明证。在这项工作中,我们通过构建一个成功的新的成员攻击,挑战了甘面临的假设,即真的是新的创作。与以前的工作不同,我们的攻击可以准确地识别与训练样本具有相同身份的样本,而不是相同的样本。我们通过几个流行的人脸数据集和GAN训练程序展示了我们的攻击兴趣。值得注意的是,我们发现,即使存在显著的数据集多样性,一个过度代表的人也会引起隐私问题。 摘要:Recently, generative adversarial networks (GANs) have achieved stunning realism, fooling even human observers. Indeed, the popular tongue-in-cheek website {\small \url{ http://thispersondoesnotexist.com}}, taunts users with GAN generated images that seem too real to believe. On the other hand, GANs do leak information about their training data, as evidenced by membership attacks recently demonstrated in the literature. In this work, we challenge the assumption that GAN faces really are novel creations, by constructing a successful membership attack of a new kind. Unlike previous works, our attack can accurately discern samples sharing the same identity as training samples without being the same samples. We demonstrate the interest of our attack across several popular face datasets and GAN training procedures. Notably, we show that even in the presence of significant dataset diversity, an over represented person can pose a privacy concern.
【3】 EvoBA: An Evolution Strategy as a Strong Baseline forBlack-Box Adversarial Attacks 标题:EvoBA:作为黑箱攻击强基线的进化策略
作者:Andrei Ilie,Marius Popescu,Alin Stefanescu 机构:University of Bucharest, Romania 链接:https://arxiv.org/abs/2107.05754 摘要:最近的工作表明,白盒对抗攻击可以很容易地应用于最先进的图像分类器。然而,现实生活中的场景更像是黑盒对抗性条件,缺乏透明度,并且通常对查询预算施加自然的硬约束。我们提出了$\textbf{EvoBA}$,一种基于简单进化搜索策略的黑盒对抗攻击$\textbf{EvoBA}$是高效的查询,最小化了$L\u 0$的敌对干扰,并且不需要任何形式的训练$\textbf{EvoBA}$通过与$\textbf{AutoZOOM}$等更复杂的最新黑匣子攻击一致的结果显示了效率和有效性。它比$\textbf{SimBA}$(一种简单而强大的基线黑盒攻击)的查询效率更高,并且具有类似的复杂性。因此,我们建议将其作为黑匣子对抗性攻击的一个新的强基线,并将其作为一个快速而通用的工具来获得关于$L_0$对抗性干扰的图像分类器的健壮性的经验洞察。存在快速可靠的$L泳2$黑盒攻击,如$\textbf{SimBA}$,和$L泳infty}$黑盒攻击,如$\textbf{DeepSearch}$。我们提出$\textbf{EvoBA}$作为一种查询效率高的$L\u 0$黑盒对抗攻击,与上述方法一起,可以作为评估图像分类器经验鲁棒性的通用工具。这些方法的主要优点是运行速度快,查询效率高,易于集成到图像分类器开发流水线中。虽然我们的攻击最小化了$L\u 0$的对抗性干扰,但我们也报告了$L\u 2$,并且注意到我们与最先进的$L\u 2$黑匣子攻击$\textbf{AutoZOOM}$和$L\u 2$强基线$\textbf{SimBA}$进行了比较。 摘要:Recent work has shown how easily white-box adversarial attacks can be applied to state-of-the-art image classifiers. However, real-life scenarios resemble more the black-box adversarial conditions, lacking transparency and usually imposing natural, hard constraints on the query budget. We propose $\textbf{EvoBA}$, a black-box adversarial attack based on a surprisingly simple evolutionary search strategy. $\textbf{EvoBA}$ is query-efficient, minimizes $L_0$ adversarial perturbations, and does not require any form of training. $\textbf{EvoBA}$ shows efficiency and efficacy through results that are in line with much more complex state-of-the-art black-box attacks such as $\textbf{AutoZOOM}$. It is more query-efficient than $\textbf{SimBA}$, a simple and powerful baseline black-box attack, and has a similar level of complexity. Therefore, we propose it both as a new strong baseline for black-box adversarial attacks and as a fast and general tool for gaining empirical insight into how robust image classifiers are with respect to $L_0$ adversarial perturbations. There exist fast and reliable $L_2$ black-box attacks, such as $\textbf{SimBA}$, and $L_{\infty}$ black-box attacks, such as $\textbf{DeepSearch}$. We propose $\textbf{EvoBA}$ as a query-efficient $L_0$ black-box adversarial attack which, together with the aforementioned methods, can serve as a generic tool to assess the empirical robustness of image classifiers. The main advantages of such methods are that they run fast, are query-efficient, and can easily be integrated in image classifiers development pipelines. While our attack minimises the $L_0$ adversarial perturbation, we also report $L_2$, and notice that we compare favorably to the state-of-the-art $L_2$ black-box attack, $\textbf{AutoZOOM}$, and of the $L_2$ strong baseline, $\textbf{SimBA}$.
【4】 Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions 标题:Wasserstein Gans的隐凸性:闭式解的可解释生成模型
作者:Arda Sahiner,Tolga Ergen,Batu Ozturkler,Burak Bartan,John Pauly,Morteza Mardani,Mert Pilanci 机构:Department of Electrical Engineering, Stanford University 备注:First two authors contributed equally to this work; 30 pages, 11 figures 链接:https://arxiv.org/abs/2107.05680 摘要:生成性对抗网络(generativediscountarial Networks,GANs)通常用于建模复杂的数据分布。GANs的生成元和鉴别器通常都是用神经网络建模的,这就分别对生成元和鉴别器提出了一个非凸和非凹的非透明优化问题。这类网络通常采用梯度下降上升法(GDA)进行启发式优化,但目前尚不清楚优化问题是否包含鞍点,或者启发式方法是否能在实际中找到鞍点。本文从凸对偶的角度分析了用两层神经网络鉴别器对Wasserstein-GANs进行训练的过程,并针对不同的生成器给出了Wasserstein-GANs可以用凸优化方法精确求解的条件,或者可以表示为凸凹对策的条件。利用这种凸对偶解释,我们进一步证明了不同激活函数对鉴别器的影响。数值结果验证了这种方法的有效性,并将其应用于线性生成器和二次激活鉴别器对应的凸结构的渐进训练中。我们的实验代码在https://github.com/ardasahiner/ProCoGAN. 摘要:Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GDA), but it is unclear whether the optimization problem contains any saddle points, or whether heuristic methods can find them in practice. In this work, we analyze the training of Wasserstein GANs with two-layer neural network discriminators through the lens of convex duality, and for a variety of generators expose the conditions under which Wasserstein GANs can be solved exactly with convex optimization approaches, or can be represented as convex-concave games. Using this convex duality interpretation, we further demonstrate the impact of different activation functions of the discriminator. Our observations are verified with numerical results demonstrating the power of the convex interpretation, with applications in progressive training of convex architectures corresponding to linear generators and quadratic-activation discriminators for CelebA image generation. The code for our experiments is available at https://github.com/ardasahiner/ProCoGAN.
Attention注意力(2篇)
【1】 Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms 标题:局部增强的自我注意:将自我注意作为局部和语境术语的再思考
作者:Chenglin Yang,Siyuan Qiao,Adam Kortylewski,Alan Yuille 机构:Johns Hopkins University 链接:https://arxiv.org/abs/2107.05637 摘要:自我注意在计算机视觉模型中已经很普遍。受全连通条件随机场(crf)的启发,我们将其分解为局部项和上下文项。它们对应于CRF中的一元和二元项,并通过带有投影矩阵的注意机制来实现。我们观察到,一元项对输出的贡献很小,同时,完全依赖于一元项的标准cnn在各种任务上都取得了很好的性能。因此,我们提出了局部增强的自我注意(LESA),通过卷积来增强一元项,并利用融合模块来动态耦合一元和二元运算。在我们的实验中,我们用LESA代替了自我注意模块。在ImageNet和COCO上的实验结果表明,LESA在图像识别、目标检测和实例分割等方面优于卷积和自注意基线。代码已公开。 摘要:Self-Attention has become prevalent in computer vision models. Inspired by fully connected Conditional Random Fields (CRFs), we decompose it into local and context terms. They correspond to the unary and binary terms in CRF and are implemented by attention mechanisms with projection matrices. We observe that the unary terms only make small contributions to the outputs, and meanwhile standard CNNs that rely solely on the unary terms achieve great performances on a variety of tasks. Therefore, we propose Locally Enhanced Self-Attention (LESA), which enhances the unary term by incorporating it with convolutions, and utilizes a fusion module to dynamically couple the unary and binary operations. In our experiments, we replace the self-attention modules with LESA. The results on ImageNet and COCO show the superiority of LESA over convolution and self-attention baselines for the tasks of image recognition, object detection, and instance segmentation. The code is made publicly available.
【2】 Attention-Guided Progressive Neural Texture Fusion for High Dynamic Range Image Restoration 标题:注意力引导的渐进神经纹理融合在高动态范围图像复原中的应用
作者:Jie Chen,Zaifeng Yang,Tsz Nam Chan,Hui Li,Junhui Hou,Lap-Pui Chau 链接:https://arxiv.org/abs/2107.06211 摘要:多曝光融合高动态范围(HDR)成像是现代成像平台的一项重要任务。尽管最近在硬件和算法创新方面都取得了一些进展,但由于饱和度、运动以及多曝光融合过程中引入的各种伪影(如重影、噪声和模糊)而导致的内容关联模糊性仍然是一个挑战。在这项工作中,我们提出了一个注意力引导的渐进式神经纹理融合(APNT-Fusion)HDR恢复模型,旨在在一个框架内解决这些问题。提出了一种高效的双流结构,分别关注饱和区域上的纹理特征转移和多曝光色调与纹理特征融合。提出了一种基于多尺度VGG特征的神经特征传递机制,在掩蔽饱和HDR域中建立不同曝光量之间的空间对应关系,用于区分模糊图像区域的上下文线索。设计了渐进式纹理融合模块,对编码后的两个流特征进行多尺度渐进式融合。此外,我们还介绍了几种新颖的注意机制,即运动注意模块检测并抑制参考图像之间的内容差异;饱和注意模块有助于区分由饱和引起的失调和由运动引起的失调;尺度注意模块保证了不同编解码器尺度之间的纹理融合一致性。我们进行了全面的定性和定量评估和消融研究,验证了这些新模块在相同的框架下协调工作,并优于最先进的方法。 摘要:High Dynamic Range (HDR) imaging via multi-exposure fusion is an important task for most modern imaging platforms. In spite of recent developments in both hardware and algorithm innovations, challenges remain over content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise, and blur. In this work, we propose an Attention-guided Progressive Neural Texture Fusion (APNT-Fusion) HDR restoration model which aims to address these issues within one framework. An efficient two-stream structure is proposed which separately focuses on texture feature transfer over saturated regions and multi-exposure tonal and texture feature fusion. A neural feature transfer mechanism is proposed which establishes spatial correspondence between different exposures based on multi-scale VGG features in the masked saturated HDR domain for discriminative contextual clues over the ambiguous image areas. A progressive texture blending module is designed to blend the encoded two-stream features in a multi-scale and progressive manner. In addition, we introduce several novel attention mechanisms, i.e., the motion attention module detects and suppresses the content discrepancies among the reference images; the saturation attention module facilitates differentiating the misalignment caused by saturation from those caused by motion; and the scale attention module ensures texture blending consistency between different coder/decoder scales. We carry out comprehensive qualitative and quantitative evaluations and ablation studies, which validate that these novel modules work coherently under the same framework and outperform state-of-the-art methods.
人脸|人群计数(1篇)
【1】 Everybody Is Unique: Towards Unbiased Human Mesh Recovery 标题:每个人都是独一无二的:走向不偏不倚的人脉恢复
作者:Ren Li,Meng Zheng,Srikrishna Karanam,Terrence Chen,Ziyan Wu 机构:United Imaging Intelligence, Cambridge MA 备注:10 pages, 5 figures, 4 tables 链接:https://arxiv.org/abs/2107.06239 摘要:我们考虑肥胖人网格恢复的问题,即,将参数人类网格拟合到肥胖人群的图像。尽管肥胖者的网格拟合是许多应用(如医疗保健)中的一个重要问题,但网格恢复方面的许多最新进展仅限于非肥胖者的图像。在这项工作中,我们通过介绍和讨论现有算法的局限性,找出了当前文献中的这一关键差距。接下来,我们将提供一个简单的基线来解决这个问题,它是可伸缩的,并且可以很容易地与现有算法结合使用,以提高它们的性能。最后,我们提出了一个广义人体网格优化算法,大大提高了现有方法在肥胖者图像和社区标准基准数据集上的性能。该技术的一个关键创新是,它不依赖于昂贵的监视来创建网格参数。取而代之的是,从广泛和廉价的二维关键点注释开始,我们的方法自动生成网格参数,这些参数可以用来重新训练和微调任何现有的网格估计算法。通过这种方式,我们展示了我们的方法作为一个下降,以提高性能的各种当代网格估计方法。我们进行了广泛的实验,在多个数据集,包括标准和肥胖的人的图像,并证明了我们提出的技术的有效性。 摘要:We consider the problem of obese human mesh recovery, i.e., fitting a parametric human mesh to images of obese people. Despite obese person mesh fitting being an important problem with numerous applications (e.g., healthcare), much recent progress in mesh recovery has been restricted to images of non-obese people. In this work, we identify this crucial gap in the current literature by presenting and discussing limitations of existing algorithms. Next, we present a simple baseline to address this problem that is scalable and can be easily used in conjunction with existing algorithms to improve their performance. Finally, we present a generalized human mesh optimization algorithm that substantially improves the performance of existing methods on both obese person images as well as community-standard benchmark datasets. A key innovation of this technique is that it does not rely on supervision from expensive-to-create mesh parameters. Instead, starting from widely and cheaply available 2D keypoints annotations, our method automatically generates mesh parameters that can in turn be used to re-train and fine-tune any existing mesh estimation algorithm. This way, we show our method acts as a drop-in to improve the performance of a wide variety of contemporary mesh estimation methods. We conduct extensive experiments on multiple datasets comprising both standard and obese person images and demonstrate the efficacy of our proposed techniques.
跟踪(1篇)
【1】 Object Tracking and Geo-localization from Street Images 标题:基于街道图像的目标跟踪与地理定位
作者:Daniel Wilson,Thayer Alshaabi,Colin Van Oort,Xiaohan Zhang,Jonathan Nelson,Safwan Wshah 机构:• A large and realistic dataset to support research in the field of object geolo-, calization, • An object detector designed to predict GPS locations using a local offset, and coordinate transform 备注:28 pages, 7 figures, to be submitted to Elsevier Pattern Recognition 链接:https://arxiv.org/abs/2107.06257 摘要:从街道图像中对静态物体进行地理定位是一项挑战,但对于道路资源测绘和自动驾驶也非常重要。在本文中,我们提出了一个两阶段的框架,检测和地理定位交通标志从低帧速率街道视频。我们提出的系统使用了一种改进的视网膜网(GPS-RetinaNet),除了执行标准分类和边界盒回归外,还可以预测每个标志相对于相机的位置偏移。我们的自定义跟踪器由学习的度量网络和匈牙利算法的变体组成,将GPS视网膜网中的候选符号检测浓缩为地理定位符号。我们的度量网络估计检测对之间的相似性,然后匈牙利算法使用度量网络提供的相似性分数匹配图像中的检测。我们的模型是使用更新版本的ARTS数据集训练的,该数据集包含25544幅图像和47.589个符号注释~\cite{ARTS}。拟议的数据集涵盖了从广泛的道路选择中收集的各种环境。每个注释都包含一个标志类标签、其地理空间位置、装配标签、路侧指示器,以及有助于评估的唯一标识符。该数据集将支持该领域的未来进展,并且所提出的系统演示了如何利用真实地理定位数据集的一些独特特性。 摘要:Geo-localizing static objects from street images is challenging but also very important for road asset mapping and autonomous driving. In this paper we present a two-stage framework that detects and geolocalizes traffic signs from low frame rate street videos. Our proposed system uses a modified version of RetinaNet (GPS-RetinaNet), which predicts a positional offset for each sign relative to the camera, in addition to performing the standard classification and bounding box regression. Candidate sign detections from GPS-RetinaNet are condensed into geolocalized signs by our custom tracker, which consists of a learned metric network and a variant of the Hungarian Algorithm. Our metric network estimates the similarity between pairs of detections, then the Hungarian Algorithm matches detections across images using the similarity scores provided by the metric network. Our models were trained using an updated version of the ARTS dataset, which contains 25,544 images and 47.589 sign annotations ~\cite{arts}. The proposed dataset covers a diverse set of environments gathered from a broad selection of roads. Each annotaiton contains a sign class label, its geospatial location, an assembly label, a side of road indicator, and unique identifiers that aid in the evaluation. This dataset will support future progress in the field, and the proposed system demonstrates how to take advantage of some of the unique characteristics of a realistic geolocalization dataset.
图像视频检索|Re-id相关(1篇)
【1】 'CADSketchNet' -- An Annotated Sketch dataset for 3D CAD Model Retrieval with Deep Neural Networks 标题:CADSketchNet--一种用于深度神经网络三维CAD模型检索的注释草图数据集
作者:Bharadwaj Manda,Shubham Dhayarkar,Sai Mitheran,V. K. Viekash,Ramanathan Muthuganapathy 机构: Indian Institute of Technology Madras , National Institute of Technology Tiruchirappalli 备注:Computers & Graphics Journal, Special Section on 3DOR 2021 链接:https://arxiv.org/abs/2107.06212 摘要:三维建模和数字存档领域的不断进步导致了数字存储数据量的激增。因此,根据存储在这些数据库中的数据类型,开发了若干检索系统。然而,与文本数据或图像不同的是,执行三维模型搜索是非常重要的。在三维模型中,由于存在孔、体积特征、锐边等,检索三维工程/CAD模型或机械部件更具挑战性,这使得CAD本身成为一个领域。本文的研究工作旨在开发一个适合于建立基于深度学习的三维CAD模型检索系统的数据集。从可用的CAD数据库中收集3D CAD模型,并准备计算机生成的草图数据集,称为“CADSketchNet”。此外,零部件的手绘草图也添加到CADSetchNet中。利用该数据集的草图图像,本文还旨在评估各种检索系统或接受草图图像作为输入查询的三维CAD模型搜索引擎的性能。在CADSketchNet上构建并测试了多个实验模型。这些实验,连同模型架构,相似性度量的选择与搜索结果一起被报告。 摘要:Ongoing advancements in the fields of 3D modelling and digital archiving have led to an outburst in the amount of data stored digitally. Consequently, several retrieval systems have been developed depending on the type of data stored in these databases. However, unlike text data or images, performing a search for 3D models is non-trivial. Among 3D models, retrieving 3D Engineering/CAD models or mechanical components is even more challenging due to the presence of holes, volumetric features, presence of sharp edges etc., which make CAD a domain unto itself. The research work presented in this paper aims at developing a dataset suitable for building a retrieval system for 3D CAD models based on deep learning. 3D CAD models from the available CAD databases are collected, and a dataset of computer-generated sketch data, termed 'CADSketchNet', has been prepared. Additionally, hand-drawn sketches of the components are also added to CADSketchNet. Using the sketch images from this dataset, the paper also aims at evaluating the performance of various retrieval system or a search engine for 3D CAD models that accepts a sketch image as the input query. Many experimental models are constructed and tested on CADSketchNet. These experiments, along with the model architecture, choice of similarity metrics are reported along with the search results.
表征学习(1篇)
【1】 On Designing Good Representation Learning Models 标题:论设计良好的表征学习模式
作者:Qinglin Li,Bin Li,Jonathan M Garibaldi,Guoping Qiu 备注:15 pages, 链接:https://arxiv.org/abs/2107.05948 摘要:表征学习的目标不同于决策等机器学习的最终目标,因此很难建立清晰直接的表征学习模型训练目标。有人认为,一个好的代表应该解开潜在的变异因素,但如何将其转化为训练目标仍然是未知的。本文试图建立直接训练准则和设计原则,以发展良好的表征学习模型。我们提出一个好的表征学习模型应该具有最大的表达能力,即能够区分最大数量的输入配置。我们正式定义了表达性,并引入了一般学习模型的最大表达性定理。我们建议训练一个模型,最大限度地提高其表达能力,同时纳入一般的先验知识,如模型的光滑性。提出了一种良心竞争学习算法,该算法在保证模型光滑性的前提下,使模型达到mex。我们还引入了标签一致性训练(LCT)技术,通过鼓励模型为相似的样本分配一致的标签来提高模型的平滑度。我们提出了大量的实验结果表明,我们的方法确实可以设计出表征学习模型,能够开发出与现有技术相当或更好的表征。我们还表明,我们的技术计算效率高,对不同的参数设置具有鲁棒性,可以有效地处理各种数据集。 摘要:The goal of representation learning is different from the ultimate objective of machine learning such as decision making, it is therefore very difficult to establish clear and direct objectives for training representation learning models. It has been argued that a good representation should disentangle the underlying variation factors, yet how to translate this into training objectives remains unknown. This paper presents an attempt to establish direct training criterions and design principles for developing good representation learning models. We propose that a good representation learning model should be maximally expressive, i.e., capable of distinguishing the maximum number of input configurations. We formally define expressiveness and introduce the maximum expressiveness (MEXS) theorem of a general learning model. We propose to train a model by maximizing its expressiveness while at the same time incorporating general priors such as model smoothness. We present a conscience competitive learning algorithm which encourages the model to reach its MEXS whilst at the same time adheres to model smoothness prior. We also introduce a label consistent training (LCT) technique to boost model smoothness by encouraging it to assign consistent labels to similar samples. We present extensive experimental results to show that our method can indeed design representation learning models capable of developing representations that are as good as or better than state of the art. We also show that our technique is computationally efficient, robust against different parameter settings and can work effectively on a variety of datasets.
蒸馏|知识提取(1篇)
【1】 3D Parametric Wireframe Extraction Based on Distance Fields 标题:基于距离场的三维参数化线框提取
作者:Albert Matveev,Alexey Artemov,Denis Zorin,Evgeny Burnaev 机构: New York University, USA and Skolkovo Institute of Science and Technology 链接:https://arxiv.org/abs/2107.06165 摘要:提出了一种从密集采样点云中提取参数线框的流水线。我们的方法处理一个标量距离场,表示接近最近的尖锐特征曲线。在中间阶段,它检测角点,构造曲线分割,并建立一个适合于线框的拓扑图。作为输出,我们生成可以任意编辑和采样的参数样条曲线。我们评估了我们的方法对50个复杂的三维形状,并比较了它与新的深度学习为基础的技术,证明了优越的质量。 摘要:We present a pipeline for parametric wireframe extraction from densely sampled point clouds. Our approach processes a scalar distance field that represents proximity to the nearest sharp feature curve. In intermediate stages, it detects corners, constructs curve segmentation, and builds a topological graph fitted to the wireframe. As an output, we produce parametric spline curves that can be edited and sampled arbitrarily. We evaluate our method on 50 complex 3D shapes and compare it to the novel deep learning-based technique, demonstrating superior quality.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】 PU-Flow: a Point Cloud Upsampling Networkwith Normalizing Flows 标题:PU-FLOW:一种流归一化的点云上采样网络
作者:Aihua Mao,Zihui Du,Junhui Hou,Yaqi Duan,Yong-jin Liu,Ying He 机构: Hou is with the Department of Computer Science, City University ofHong Kong 链接:https://arxiv.org/abs/2107.05893 摘要:点云上采样的目的是从给定的稀疏点云中生成密集点云,由于点集的不规则性和无序性,这是一项具有挑战性的任务。为了解决这个问题,我们提出了一个新的基于深度学习的模型PU-Flow,它结合了标准化流和特征插值技术来产生均匀分布在下垫面上的密集点。具体地说,我们将上采样过程描述为潜在空间中的点插值,其中插值权重从局部几何上下文自适应地学习,并利用归一化流的可逆特性在欧氏空间和潜在空间之间变换点。我们评估PU流动的三维模型与尖锐的特点和高频细节广泛。定性和定量结果表明,该方法在重建质量、逼近曲面精度和计算效率等方面优于现有的基于深度学习的方法。 摘要:Point cloud upsampling aims to generate dense point clouds from given sparse ones, which is a challenging task due to the irregular and unordered nature of point sets. To address this issue, we present a novel deep learning-based model, called PU-Flow,which incorporates normalizing flows and feature interpolation techniques to produce dense points uniformly distributed on the underlying surface. Specifically, we formulate the upsampling process as point interpolation in a latent space, where the interpolation weights are adaptively learned from local geometric context, and exploit the invertible characteristics of normalizing flows to transform points between Euclidean and latent spaces. We evaluate PU-Flow on a wide range of 3D models with sharp features and high-frequency details. Qualitative and quantitative results show that our method outperforms state-of-the-art deep learning-based approaches in terms of reconstruction quality, proximity-to-surface accuracy, and computation efficiency.
3D|3D重建等相关(1篇)
【1】 Combining 3D Image and Tabular Data via the Dynamic Affine Feature Map Transform 标题:基于动态仿射要素地图变换的三维图像与表格数据融合
作者:Sebastian Pölsterl,Tom Nuno Wolf,Christian Wachinger 机构:Artificial Intelligence in Medical Imaging (AI-Med), Department of Child and Adolescent Psychiatry, Ludwig-Maximilians-Universität, Munich, Germany 备注:Accepted at 2021 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 链接:https://arxiv.org/abs/2107.05990 摘要:先前的研究表明,卷积神经网络(CNNs)可以利用高维图像信息对患者进行分类。然而,很少有研究关注这些模型如何利用通常低维的表格信息,如患者人口统计或实验室测量。我们介绍了动态仿射特征映射变换(DAFT),这是CNNs的一个通用模块,它根据患者的表格临床信息动态地重新缩放和移动卷积层的特征映射。我们发现DAFT在结合3D图像和表格信息进行痴呆诊断和预测痴呆发生时间方面非常有效,它的平均平衡准确率为0.622,平均c指数为0.748。我们广泛的烧蚀研究为DAFT的建筑特性提供了有价值的见解。我们的实现可在https://github.com/ai-med/DAFT. 摘要:Prior work on diagnosing Alzheimer's disease from magnetic resonance images of the brain established that convolutional neural networks (CNNs) can leverage the high-dimensional image information for classifying patients. However, little research focused on how these models can utilize the usually low-dimensional tabular information, such as patient demographics or laboratory measurements. We introduce the Dynamic Affine Feature Map Transform (DAFT), a general-purpose module for CNNs that dynamically rescales and shifts the feature maps of a convolutional layer, conditional on a patient's tabular clinical information. We show that DAFT is highly effective in combining 3D image and tabular information for diagnosis and time-to-dementia prediction, where it outperforms competing CNNs with a mean balanced accuracy of 0.622 and mean c-index of 0.748, respectively. Our extensive ablation study provides valuable insights into the architectural properties of DAFT. Our implementation is available at https://github.com/ai-med/DAFT.
其他神经网络|深度学习|模型|建模(8篇)
【1】 Learning Aesthetic Layouts via Visual Guidance 标题:通过视觉指导学习美学布局
作者:Qingyuan Zheng,Zhuoru Li,Adam Bargteil 机构: University of Maryland 备注:17 pages 链接:https://arxiv.org/abs/2107.06262 摘要:我们探索视觉引导的计算方法,以帮助创造美观的艺术和平面设计。我们的工作补充和建立在以前的工作,开发了人类如何看待图像的模型。我们的方法包括三个步骤。首先,我们收集了一个艺术杰作的数据集,并用最新的视觉模型标记视觉定影。其次,利用无监督学习对艺术名著的视觉引导模板进行聚类。第三,我们开发了一个使用生成对抗网络的管道来学习视觉引导的原理,并且可以产生美观的布局。研究表明,审美视觉引导原理可以被学习和整合到一个高维模型中,并且可以被图形元素的特征所质疑。我们通过在各种图纸和图形设计上生成布局来评估我们的方法。此外,我们的模型在生成布局时考虑了图形元素的颜色和结构。因此,我们相信,我们的工具,生成多种美学布局选择秒,可以帮助艺术家创造美丽的艺术和平面设计。 摘要:We explore computational approaches for visual guidance to aid in creating aesthetically pleasing art and graphic design. Our work complements and builds on previous work that developed models for how humans look at images. Our approach comprises three steps. First, we collected a dataset of art masterpieces and labeled the visual fixations with state-of-art vision models. Second, we clustered the visual guidance templates of the art masterpieces with unsupervised learning. Third, we developed a pipeline using generative adversarial networks to learn the principles of visual guidance and that can produce aesthetically pleasing layouts. We show that the aesthetic visual guidance principles can be learned and integrated into a high-dimensional model and can be queried by the features of graphic elements. We evaluate our approach by generating layouts on various drawings and graphic designs. Moreover, our model considers the color and structure of graphic elements when generating layouts. Consequently, we believe our tool, which generates multiple aesthetic layout options in seconds, can help artists create beautiful art and graphic designs.
【2】 Learning a Discriminant Latent Space with Neural Discriminant Analysis 标题:用神经判别分析学习判别潜在空间
作者:Mai Lan Ha,Gianni Franchi,Emanuel Aldea,Volker Blanz 机构:Department of Computer Science, University of Siegen, Germany, U,IS, ENSTA Paris, Institut Polytechnique de Paris, France, Laboratoire SATIE, Paris-Saclay University 链接:https://arxiv.org/abs/2107.06209 摘要:判别特征在图像和目标分类以及半监督学习、细粒度分类、分布外检测等领域都有重要的应用。受线性判别分析(LDA)的启发,我们提出了一种用于深度卷积神经网络(DCNNs)的神经判别分析(NDA)优化方法。NDA将深度特征转换为更具辨别力的特征,从而提高了在各种任务中的性能。我们提出的优化有两个主要目标类间和类内方差。第一种方法是最小化每个类内的差异。第二个目标是最大化来自不同类的特征之间的成对距离。我们在一般监督分类、细粒度分类、半监督学习和分布外检测等不同的研究领域对我们的NDA优化进行了评估。与不使用NDA的基线方法相比,我们在所有领域都实现了性能改进。此外,使用NDA,我们在各种测试数据集的四项任务上也超过了最新的水平。 摘要:Discriminative features play an important role in image and object classification and also in other fields of research such as semi-supervised learning, fine-grained classification, out of distribution detection. Inspired by Linear Discriminant Analysis (LDA), we propose an optimization called Neural Discriminant Analysis (NDA) for Deep Convolutional Neural Networks (DCNNs). NDA transforms deep features to become more discriminative and, therefore, improves the performances in various tasks. Our proposed optimization has two primary goals for inter- and intra-class variances. The first one is to minimize variances within each individual class. The second goal is to maximize pairwise distances between features coming from different classes. We evaluate our NDA optimization in different research fields: general supervised classification, fine-grained classification, semi-supervised learning, and out of distribution detection. We achieve performance improvements in all the fields compared to baseline methods that do not use NDA. Besides, using NDA, we also surpass the state of the art on the four tasks on various testing datasets.
【3】 Scalable Surface Reconstruction with Delaunay-Graph Neural Networks 标题:基于Delaunay-Graph神经网络的可伸缩曲面重构
作者:Raphael Sulzer,Loic Landrieu,Renaud Marlet,Bruno Vallet 备注:None 链接:https://arxiv.org/abs/2107.06130 摘要:我们介绍了一种新的基于学习的,可见性感知的,大规模的,充满缺陷的点云表面重建方法。我们的方法可以处理实际多视点立体(MVS)采集中遇到的点云缺陷的规模和种类。我们的方法依赖于三维Delaunay四面体化,其细胞通过一个图神经网络和一个可以用图割求解的能量模型被划分为表面内部和外部。该模型利用了局部几何属性和视线可见性信息,能够从少量的综合训练数据中学习可见性模型,并推广到实际采集中。结合深度学习方法的效率和基于能量模型的可扩展性,我们的方法在两个公开的重建基准上都优于基于学习和非学习的重建算法。 摘要:We introduce a novel learning-based, visibility-aware, surface reconstruction method for large-scale, defect-laden point clouds. Our approach can cope with the scale and variety of point cloud defects encountered in real-life Multi-View Stereo (MVS) acquisitions. Our method relies on a 3D Delaunay tetrahedralization whose cells are classified as inside or outside the surface by a graph neural network and an energy model solvable with a graph cut. Our model, making use of both local geometric attributes and line-of-sight visibility information, is able to learn a visibility model from a small amount of synthetic training data and generalizes to real-life acquisitions. Combining the efficiency of deep learning methods and the scalability of energy based models, our approach outperforms both learning and non learning-based reconstruction algorithms on two publicly available reconstruction benchmarks.
【4】 MSR-Net: Multi-Scale Relighting Network for One-to-One Relighting 标题:MSR-NET:用于一对一重光的多尺度重光网络
作者:Sourya Dipta Das,Nisarg A. Shah,Saikat Dutta 机构:Jadavpur University, India, IIT Jodhpur, India, IIT Madras, India 备注:Workshop on Differentiable Vision, Graphics, and Physics in Machine Learning at NeurIPS 2020. arXiv admin note: text overlap with arXiv:2102.09242 链接:https://arxiv.org/abs/2107.06125 摘要:深度图像重光照可以通过特定于光照的润色来增强照片,而无需人工干预,因此近年来得到了广泛的关注。现有的大多数流行的重新照明方法都是运行时密集型的,内存效率低下。考虑到这些问题,我们建议使用堆叠的深层多尺度层次网络,它在不同的尺度上聚集每个图像的特征。我们的解决方案是可微的和鲁棒的转换图像照明设置从输入图像到目标图像。此外,我们还发现,使用多步训练方法来解决这个问题,使用两种不同的损失函数可以显著提高性能,并可以实现高质量的重建图像。 摘要:Deep image relighting allows photo enhancement by illumination-specific retouching without human effort and so it is getting much interest lately. Most of the existing popular methods available for relighting are run-time intensive and memory inefficient. Keeping these issues in mind, we propose the use of Stacked Deep Multi-Scale Hierarchical Network, which aggregates features from each image at different scales. Our solution is differentiable and robust for translating image illumination setting from input image to target image. Additionally, we have also shown that using a multi-step training approach to this problem with two different loss functions can significantly boost performance and can achieve a high quality reconstruction of a relighted image.
【5】 Using Causal Analysis for Conceptual Deep Learning Explanation 标题:因果分析在概念深度学习解释中的应用
作者:Sumedha Singla,Stephen Wallace,Sofia Triantafillou,Kayhan Batmanghelich 机构: Computer Science Department, University of Pittsburgh, USA, University of Pittsburgh School of Medicine, University of Pittsburgh, USA, Department of Biomedical Informatics, University of Pittsburgh, USA 备注:10 pages, 6 figures 链接:https://arxiv.org/abs/2107.06098 摘要:模型的可解释性对于在医疗领域建立可信的机器学习模型至关重要。理想的解释类似于领域专家的决策过程,并使用对临床医生有意义的概念或术语来表达。为了提供这样的解释,我们首先将分类器的隐藏单位与临床相关概念联系起来。我们利用伴随胸部X光图像的放射学报告来定义概念。我们使用线性稀疏逻辑回归发现概念和隐藏单元之间的稀疏关联。为了确保确定的单位真正影响分类器的结果,我们采用了因果推理文献中的工具,更具体地说,通过反事实干预进行中介分析。最后,我们构造一个低深度的决策树,将所有发现的概念转换成一个直接的决策规则,表达给放射科医生。我们在一个大的胸部x光数据集上评估了我们的方法,在这个数据集上,我们的模型产生了一个与临床知识一致的全局解释。 摘要:Model explainability is essential for the creation of trustworthy Machine Learning models in healthcare. An ideal explanation resembles the decision-making process of a domain expert and is expressed using concepts or terminology that is meaningful to the clinicians. To provide such an explanation, we first associate the hidden units of the classifier to clinically relevant concepts. We take advantage of radiology reports accompanying the chest X-ray images to define concepts. We discover sparse associations between concepts and hidden units using a linear sparse logistic regression. To ensure that the identified units truly influence the classifier's outcome, we adopt tools from Causal Inference literature and, more specifically, mediation analysis through counterfactual interventions. Finally, we construct a low-depth decision tree to translate all the discovered concepts into a straightforward decision rule, expressed to the radiologist. We evaluated our approach on a large chest x-ray dataset, where our model produces a global explanation consistent with clinical knowledge.
【6】 A Novel Deep Learning Method for Thermal to Annotated Thermal-Optical Fused Images 标题:一种新的热光融合图像深度学习方法
作者:Suranjan Goswami,IEEE Student Member,Satish Kumar Singh,Senior Member,IEEE,Bidyut B. Chaudhuri,Life Fellow,IEEE 链接:https://arxiv.org/abs/2107.05942 摘要:热图像描绘物体的被动辐射,并用灰度图像捕捉它们。与光学彩色图像相比,这样的图像具有非常不同的数据分布。我们在这里提出一个工作,产生一个灰度热光融合掩模给定的热输入。这是一项基于深入学习的开创性工作,因为据我们所知,在热光灰度融合方面没有其他工作。我们的方法也很独特,因为我们提出的深度学习方法是在离散小波变换(DWT)域而不是灰度域上工作的。作为这项工作的一部分,我们还提出了一个新的和独特的数据库,用于获取感兴趣的区域在热图像的基础上现有的热视觉配对数据库,包含了感兴趣的区域对5个不同类别的数据。最后,我们提出了一种简单的低成本统计方法来识别融合图像中的感兴趣区域,我们称之为融合区域(RoF)。在数据库上的实验表明,在识别融合图像的感兴趣区域方面取得了令人鼓舞的结果。我们还表明,他们可以更好地处理混合形式,而不是只有热图像。 摘要:Thermal Images profile the passive radiation of objects and capture them in grayscale images. Such images have a very different distribution of data compared to optical colored images. We present here a work that produces a grayscale thermo-optical fused mask given a thermal input. This is a deep learning based pioneering work since to the best of our knowledge, there exists no other work on thermal-optical grayscale fusion. Our method is also unique in the sense that the deep learning method we are proposing here works on the Discrete Wavelet Transform (DWT) domain instead of the gray level domain. As a part of this work, we also present a new and unique database for obtaining the region of interest in thermal images based on an existing thermal visual paired database, containing the Region of Interest on 5 different classes of data. Finally, we are proposing a simple low cost overhead statistical measure for identifying the region of interest in the fused images, which we call as the Region of Fusion (RoF). Experiments on the database show encouraging results in identifying the region of interest in the fused images. We also show that they can be processed better in the mixed form rather than with only thermal images.
【7】 ReLLIE: Deep Reinforcement Learning for Customized Low-Light Image Enhancement 标题:RELIE:用于定制微光图像增强的深度强化学习
作者:Rongkai Zhang,Lanqing Guo,Siyu Huang,Bihan Wen 机构:Nanyang Technological University 备注:Accepted by ACM MM 2021 链接:https://arxiv.org/abs/2107.05830 摘要:微光图像增强(LLIE)是一个普遍而富有挑战性的问题,因为:1)在实际应用中,微光测量可能因成像条件的不同而不同;2) 图像可以根据每个个体的不同偏好进行主观启蒙。为了解决这两个难题,本文提出了一种新的基于深度强化学习的方法ReLLIE,用于定制的微光增强。ReLLIE将LLIE建模为一个马尔可夫决策过程,即按顺序和循环地估计像素级的图像特定曲线。根据一组精心设计的非参考损失函数计算出的报酬,提出了一种轻量级网络来估计微光图像输入的启发曲线。当ReLLIE学习一个策略而不是一个图像转换时,它可以处理各种微光测量,并通过灵活地在不同时间应用策略来提供定制的增强输出。此外,通过使用即插即用去噪器,ReLLIE可以轻松地增强具有混合腐蚀(例如噪声)的真实世界图像。在各种基准上的大量实验证明了ReLLIE与最先进的方法相比的优势。 摘要:Low-light image enhancement (LLIE) is a pervasive yet challenging problem, since: 1) low-light measurements may vary due to different imaging conditions in practice; 2) images can be enlightened subjectively according to diverse preferences by each individual. To tackle these two challenges, this paper presents a novel deep reinforcement learning based method, dubbed ReLLIE, for customized low-light enhancement. ReLLIE models LLIE as a markov decision process, i.e., estimating the pixel-wise image-specific curves sequentially and recurrently. Given the reward computed from a set of carefully crafted non-reference loss functions, a lightweight network is proposed to estimate the curves for enlightening of a low-light image input. As ReLLIE learns a policy instead of one-one image translation, it can handle various low-light measurements and provide customized enhanced outputs by flexibly applying the policy different times. Furthermore, ReLLIE can enhance real-world images with hybrid corruptions, e.g., noise, by using a plug-and-play denoiser easily. Extensive experiments on various benchmarks demonstrate the advantages of ReLLIE, comparing to the state-of-the-art methods.
【8】 AlterSGD: Finding Flat Minima for Continual Learning by Alternative Training 标题:AlterSGD:通过另类训练找到适合持续学习的扁平迷你图
作者:Zhongzhan Huang,Mingfu Liang,Senwei Liang,Wei He 机构:Tsinghua University, Northwestern University, Purdue University, Nanyang Technological University 链接:https://arxiv.org/abs/2107.05804 摘要:深度神经网络在连续学习多个知识时会遭受灾难性遗忘,越来越多的方法被提出来缓解这一问题。其中一些方法通过将平坦的局部极小值与持续学习中的遗忘缓解联系起来,取得了相当好的效果。然而,它们不可避免地需要(1)繁琐的超参数调整,(2)额外的计算成本。为了缓解这些问题,本文提出了一种简单而有效的优化方法AlterSGD,用于在损失景观中寻找平坦的最小值。在AlterSGD中,当网络在每次学习新知识时趋于收敛时,我们交替进行梯度下降和上升。此外,我们从理论上证明了这样的策略可以鼓励优化收敛到平坦的极小值。我们在语义切分的连续学习基准上验证了AlterSGD,实验结果表明,在具有挑战性的连续学习协议下,AlterSGD能够显著地减少遗忘,并在很大程度上优于现有的方法。 摘要:Deep neural networks suffer from catastrophic forgetting when learning multiple knowledge sequentially, and a growing number of approaches have been proposed to mitigate this problem. Some of these methods achieved considerable performance by associating the flat local minima with forgetting mitigation in continual learning. However, they inevitably need (1) tedious hyperparameters tuning, and (2) additional computational cost. To alleviate these problems, in this paper, we propose a simple yet effective optimization method, called AlterSGD, to search for a flat minima in the loss landscape. In AlterSGD, we conduct gradient descent and ascent alternatively when the network tends to converge at each session of learning new knowledge. Moreover, we theoretically prove that such a strategy can encourage the optimization to converge to a flat minima. We verify AlterSGD on continual learning benchmark for semantic segmentation and the empirical results show that we can significantly mitigate the forgetting and outperform the state-of-the-art methods with a large margin under challenging continual learning protocols.
其他(10篇)
【1】 MINERVAS: Massive INterior EnviRonments VirtuAl Synthesis 标题:Minervas:大规模室内环境虚拟合成
作者:Haocheng Ren,Hao Zhang,Jia Zheng,Jiaxiang Zheng,Rui Tang,Rui Wang,Hujun Bao 机构:State Key Lab of CAD&CG, Zhejiang University, Manycore Tech (Kujiale) 备注:The two first authors contribute equally. Project pape: this https URL 链接:https://arxiv.org/abs/2107.06149 摘要:随着数据驱动技术的迅速发展,数据在各种计算机视觉任务中扮演着重要的角色。许多现实的和合成的数据集已经被提出来解决不同的问题。然而,存在许多尚未解决的挑战:(1)数据集的创建通常是一个繁琐的过程,需要手动注释;(2)大多数数据集仅为单个特定任务设计;(3)3D场景的修改或随机化很困难;(4)商业3D数据的发布可能会遇到版权问题。本文提出了一个大规模的室内环境虚拟合成系统MINERVAS,以方便各种视觉任务的三维场景修改和二维图像合成。特别地,我们设计了一个使用领域特定语言的可编程管道,允许用户(1)从商业室内场景数据库中选择场景,(2)使用定制规则合成不同任务的场景,(3)渲染各种图像数据,如视觉颜色、几何结构、语义标签。我们的系统通过使用多级采样器提供用户可控制的随机性,减轻了为不同任务定制大量场景的困难,并且免除了用户对细粒度场景配置的操作。最重要的是,它使用户能够访问具有数百万室内场景的商业场景数据库,并保护核心数据资产(如三维CAD模型)的版权。通过使用合成的数据来提高系统在不同类型的计算机视觉任务中的性能,证明了系统的有效性和灵活性。 摘要:With the rapid development of data-driven techniques, data has played an essential role in various computer vision tasks. Many realistic and synthetic datasets have been proposed to address different problems. However, there are lots of unresolved challenges: (1) the creation of dataset is usually a tedious process with manual annotations, (2) most datasets are only designed for a single specific task, (3) the modification or randomization of the 3D scene is difficult, and (4) the release of commercial 3D data may encounter copyright issue. This paper presents MINERVAS, a Massive INterior EnviRonments VirtuAl Synthesis system, to facilitate the 3D scene modification and the 2D image synthesis for various vision tasks. In particular, we design a programmable pipeline with Domain-Specific Language, allowing users to (1) select scenes from the commercial indoor scene database, (2) synthesize scenes for different tasks with customized rules, and (3) render various imagery data, such as visual color, geometric structures, semantic label. Our system eases the difficulty of customizing massive numbers of scenes for different tasks and relieves users from manipulating fine-grained scene configurations by providing user-controllable randomness using multi-level samplers. Most importantly, it empowers users to access commercial scene databases with millions of indoor scenes and protects the copyright of core data assets, e.g., 3D CAD models. We demonstrate the validity and flexibility of our system by using our synthesized data to improve the performance on different kinds of computer vision tasks.
【2】 Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation 标题:教Agent如何绘制地图:多目标导航的空间推理
作者:Pierre Marza,Laetitia Matignon,Olivier Simonin,Christian Wolf 机构: LIRIS, UMR CNRS , Université de Lyon, INSA Lyon, Villeurbanne, France, Université de Lyon, Univ. Lyon , CITI Lab, INRIA Chroma team 链接:https://arxiv.org/abs/2107.06011 摘要:在视觉导航的背景下,为了使agent能够在所考虑的地点利用其观察历史并有效地达到已知的目标,绘制一个新环境的能力是必要的。这种能力可以与空间推理相联系,在空间推理中,智能体能够感知空间关系和规律,并发现对象的启示。在经典的强化学习(RL)设置中,这种能力仅从奖励中学习。我们引入了辅助任务形式的辅助监督,旨在帮助为达到下游目标而训练的代理出现空间感知能力。我们发现,学习估计量化给定位置的代理和目标之间的空间关系的度量在多目标导航设置中具有很高的积极影响。我们的方法显著提高了不同基线代理的性能,这些代理可以构建环境的显式或隐式表示,甚至可以匹配以地面真值图作为输入的不可比较的oracle代理的性能。 摘要:In the context of visual navigation, the capacity to map a novel environment is necessary for an agent to exploit its observation history in the considered place and efficiently reach known goals. This ability can be associated with spatial reasoning, where an agent is able to perceive spatial relationships and regularities, and discover object affordances. In classical Reinforcement Learning (RL) setups, this capacity is learned from reward alone. We introduce supplementary supervision in the form of auxiliary tasks designed to favor the emergence of spatial perception capabilities in agents trained for a goal-reaching downstream objective. We show that learning to estimate metrics quantifying the spatial relationships between an agent at a given location and a goal to reach has a high positive impact in Multi-Object Navigation settings. Our method significantly improves the performance of different baseline agents, that either build an explicit or implicit representation of the environment, even matching the performance of incomparable oracle agents taking ground-truth maps as input.
【3】 Towards Building a Food Knowledge Graph for Internet of Food 标题:构建面向食品互联网的食品知识图谱
作者:Weiqing Min,Chunlin Liu,Shuqiang Jiang 机构:The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, University of Chinese Academy of Sciences, Beijing, China, A R T I C L E I N F O 备注:18 pages, 2 figures 链接:https://arxiv.org/abs/2107.05869 摘要:背景:由于众所周知的数据协调问题,食品系统中各种网络(如物联网和移动网络)和数据库(如营养表和食品成分数据库)的部署产生了大量的信息孤岛。食品知识图以结构化的形式提供了一个统一和标准化的概念术语及其关系,因此可以将整个食品系统中的这些信息筒仓转换为一个更具可重用性的全球数字连接食品互联网,实现从农场到叉子的食品系统的每个阶段。范围和方法:我们回顾了食品知识组织的演变,从食品分类、食品本体到食品知识图。然后,我们讨论了食品知识图的几个代表性应用的进展。我们最后讨论了主要的挑战和未来的发展方向。主要发现和结论:我们对当前食品知识图的研究进行了全面总结,结果表明,食品知识图在面向食品的应用中发挥着重要作用,包括食品搜索和问答(QA)、个性化膳食推荐、食品分析和可视化、食品可追溯性、,食品机械智能制造。食品知识图的未来发展方向包括多模态食品知识图和食品智能等领域。 摘要:Background: The deployment of various networks (e.g., Internet of Things (IoT) and mobile networks) and databases (e.g., nutrition tables and food compositional databases) in the food system generates massive information silos due to the well-known data harmonization problem. The food knowledge graph provides a unified and standardized conceptual terminology and their relationships in a structured form and thus can transform these information silos across the whole food system to a more reusable globally digitally connected Internet of Food, enabling every stage of the food system from farm-to-fork. Scope and approach: We review the evolution of food knowledge organization, from food classification, food ontology to food knowledge graphs. We then discuss the progress in food knowledge graphs from several representative applications. We finally discuss the main challenges and future directions. Key findings and conclusions: Our comprehensive summary of current research on food knowledge graphs shows that food knowledge graphs play an important role in food-oriented applications, including food search and Question Answering (QA), personalized dietary recommendation, food analysis and visualization, food traceability, and food machinery intelligent manufacturing. Future directions for food knowledge graphs cover several fields such as multimodal food knowledge graphs and food intelligence.
【4】 Dynamic Distribution of Edge Intelligence at the Node Level for Internet of Things 标题:物联网边缘智能在节点级的动态分布
作者:Hawzhin Mohammed,Tolulope A. Odetola,Nan Guo,Syed Rafay Hasan 机构:∗Department of Electrical and Computer Engineering, Tennessee Tech University, Cookeville, TN, ∗∗Center for Manufacturing Research, Tennessee Tech University, Cookeville, TN 备注:5 pages, 4 figures, and 4 tables 链接:https://arxiv.org/abs/2107.05828 摘要:本文提出了一种仅利用物联网设备的卷积神经网络(CNN)结构的动态部署方法。通过对CNN进行分区和流水线,它在资源受限的设备之间水平地分配计算负载(称为水平协作),这反过来又增加了吞吐量。通过分区,我们可以减少单个物联网设备的计算量和能耗,并在不牺牲准确性的情况下提高吞吐量。此外,通过在生成点处处理数据,可以实现数据隐私。结果表明,将CNN分成两个和三个资源受限的设备,吞吐量可分别提高1.55倍到1.75倍。 摘要:In this paper, dynamic deployment of Convolutional Neural Network (CNN) architecture is proposed utilizing only IoT-level devices. By partitioning and pipelining the CNN, it horizontally distributes the computation load among resource-constrained devices (called horizontal collaboration), which in turn increases the throughput. Through partitioning, we can decrease the computation and energy consumption on individual IoT devices and increase the throughput without sacrificing accuracy. Also, by processing the data at the generation point, data privacy can be achieved. The results show that throughput can be increased by 1.55x to 1.75x for sharing the CNN into two and three resource-constrained devices, respectively.
【5】 Multitask Identity-Aware Image Steganography via Minimax Optimization 标题:基于极小极大优化的多任务身份感知图像隐写
作者:Jiabao Cui,Pengyi Zhang,Songyuan Li,Liangli Zheng,Cuizhu Bao,Jupeng Xia,Xi Li 机构: Zhejiang University 备注:Accepted to Transaction of Image Processing 链接:https://arxiv.org/abs/2107.05819 摘要:大容量图像隐写技术是一种保存人脸、指纹等敏感数据的技术,其目的是在封面图像中隐藏秘密图像。以往的方法主要关注传输过程中的安全性,在接收端恢复秘密图像后,存在隐私泄露的风险。为了解决这个问题,我们提出了一个多任务身份感知图像隐写(MIAIS)框架,在不恢复秘密图像的情况下实现对容器图像的直接识别。直接识别的关键问题是将秘密图像的身份信息保存到容器图像中,同时使容器图像看起来与封面图像相似。因此,我们引入了一个简单的内容丢失来保留身份信息,并设计了一个minimax优化来处理矛盾的方面。我们证明了鲁棒性结果可以在不同的覆盖数据集之间传递。为了在某些情况下能够灵活地进行秘密图像恢复,我们在方法中加入了一个可选的恢复网络,提供了一个多任务框架。在多任务场景下的实验结果表明,与其他视觉信息隐藏方法和现有的高容量图像隐写方法相比,该框架是有效的。 摘要:High-capacity image steganography, aimed at concealing a secret image in a cover image, is a technique to preserve sensitive data, e.g., faces and fingerprints. Previous methods focus on the security during transmission and subsequently run a risk of privacy leakage after the restoration of secret images at the receiving end. To address this issue, we propose a framework, called Multitask Identity-Aware Image Steganography (MIAIS), to achieve direct recognition on container images without restoring secret images. The key issue of the direct recognition is to preserve identity information of secret images into container images and make container images look similar to cover images at the same time. Thus, we introduce a simple content loss to preserve the identity information, and design a minimax optimization to deal with the contradictory aspects. We demonstrate that the robustness results can be transferred across different cover datasets. In order to be flexible for the secret image restoration in some cases, we incorporate an optional restoration network into our method, providing a multitask framework. The experiments under the multitask scenario show the effectiveness of our framework compared with other visual information hiding methods and state-of-the-art high-capacity image steganography methods.
【6】 Fast and Explicit Neural View Synthesis 标题:快速显式神经视图综合
作者:Pengsheng Guo,Miguel Angel Bautista,Alex Colburn,Liang Yang,Daniel Ulbricht,Joshua M. Susskind,Qi Shan 机构:Apple 链接:https://arxiv.org/abs/2107.05775 摘要:我们研究了由三维物体组成的场景的新视图合成问题。我们提出了一个简单而有效的方法,既不是连续的,也不是隐含的,具有挑战性的观点合成最近的趋势。我们证明,尽管连续辐射场表示由于其表达能力而受到了广泛的关注,但我们的简单方法在将渲染速度提高400倍以上的同时,获得了与最新基线相当甚至更好的新视图重建质量。我们的模型是以类别无关的方式训练的,不需要场景特定的优化。因此,它能够将新的视图合成推广到训练过程中没有看到的对象类别。此外,我们证明,通过我们的简单公式,我们可以使用视图合成作为一个自我监督信号,在没有显式三维监督的情况下有效地学习三维几何。 摘要:We study the problem of novel view synthesis of a scene comprised of 3D objects. We propose a simple yet effective approach that is neither continuous nor implicit, challenging recent trends on view synthesis. We demonstrate that although continuous radiance field representations have gained a lot of attention due to their expressive power, our simple approach obtains comparable or even better novel view reconstruction quality comparing with state-of-the-art baselines while increasing rendering speed by over 400x. Our model is trained in a category-agnostic manner and does not require scene-specific optimization. Therefore, it is able to generalize novel view synthesis to object categories not seen during training. In addition, we show that with our simple formulation, we can use view synthesis as a self-supervision signal for efficient learning of 3D geometry without explicit 3D supervision.
【7】 Affect Expression Behaviour Analysis in the Wild using Consensual Collaborative Training 标题:基于两厢情愿合作训练的野外情感表达行为分析
作者:Darshan Gera,S Balasubramanian 备注:None 链接:https://arxiv.org/abs/2107.05736 摘要:野外人脸表情识别是建立可靠的人机交互系统的关键。然而,由于众包、注释者的主观性、图像质量差等因素的影响,大规模数据集的注释一直是一个关键的挑战,基于关键词搜索等的自动标注。由于深层网络的记忆能力,这种噪声注释阻碍了FER的性能。在早期学习阶段,深层网络适合于干净的数据。最后,由于他们的记忆能力,他们开始过度适应嘈杂的标签,这限制了他们的表现。本报告介绍了我们提交的2021年野生动物情感行为分析(ABAW)竞赛的表情识别跟踪中使用的协商一致的协作训练(CCT)框架。CCT采用监督损失和一致性损失的凸组合联合训练三个网络,不需要对噪声分布做任何假设。采用动态过渡机制,从早期学习中的监督损失转移到后期网络预测一致性损失。联合训练减少了总体误差,一致性损失防止了对噪声样本的过度拟合。在具有挑战性的Aff-Wild2数据集上验证了该模型的分类性能。我们的代码在https://github.com/1980x/ABAW2021DMACS. 摘要:Facial expression recognition (FER) in the wild is crucial for building reliable human-computer interactive systems. However, annotations of large scale datasets in FER has been a key challenge as these datasets suffer from noise due to various factors like crowd sourcing, subjectivity of annotators, poor quality of images, automatic labelling based on key word search etc. Such noisy annotations impede the performance of FER due to the memorization ability of deep networks. During early learning stage, deep networks fit on clean data. Then, eventually, they start overfitting on noisy labels due to their memorization ability, which limits FER performance. This report presents Consensual Collaborative Training (CCT) framework used in our submission to expression recognition track of the Affective Behaviour Analysis in-the-wild (ABAW) 2021 competition. CCT co-trains three networks jointly using a convex combination of supervision loss and consistency loss, without making any assumption about the noise distribution. A dynamic transition mechanism is used to move from supervision loss in early learning to consistency loss for consensus of predictions among networks in the later stage. Co-training reduces overall error, and consistency loss prevents overfitting to noisy samples. The performance of the model is validated on challenging Aff-Wild2 dataset for categorical expression classification. Our code is made publicly available at https://github.com/1980x/ABAW2021DMACS.
【8】 Bayesian Atlas Building with Hierarchical Priors for Subject-specific Regularization 标题:面向特定主题正则化的分层先验贝叶斯地图集构建
作者:Jian Wang,Miaomiao Zhang 机构: Computer Science, University of Virginia, USA, Electrical and Computer Engineering, University of Virginia,USA 备注:None 链接:https://arxiv.org/abs/2107.05698 摘要:该文提出了一种新的分层贝叶斯模型,用于无偏地图集的建立,并对图像配准的主题进行了正则化。我们开发了一个自动选择参数的图集构建过程,根据单个图像数据控制微分同胚变换的平滑度。为了实现这一点,我们在正则化参数上引入了一个层次化的先验分布,它允许对具有不同程度几何变换的图像进行多重惩罚。然后,我们将正则化参数视为潜变量,并使用蒙特卡罗期望最大化(MCEM)算法将它们从模型中整合出来。我们的算法的另一个优点是,它消除了手动参数调整的需要,这可能是繁琐和不可行的。我们证明了我们的模型对三维脑磁共振图像的有效性。实验结果表明,与现有的单惩罚正则化地图集生成算法相比,该模型提供了更清晰的地图集。我们的代码在https://github.com/jw4hv/HierarchicalBayesianAtlasBuild. 摘要:This paper presents a novel hierarchical Bayesian model for unbiased atlas building with subject-specific regularizations of image registration. We develop an atlas construction process that automatically selects parameters to control the smoothness of diffeomorphic transformation according to individual image data. To achieve this, we introduce a hierarchical prior distribution on regularization parameters that allows multiple penalties on images with various degrees of geometric transformations. We then treat the regularization parameters as latent variables and integrate them out from the model by using the Monte Carlo Expectation Maximization (MCEM) algorithm. Another advantage of our algorithm is that it eliminates the need for manual parameter tuning, which can be tedious and infeasible. We demonstrate the effectiveness of our model on 3D brain MR images. Experimental results show that our model provides a sharper atlas compared to the current atlas building algorithms with single-penalty regularizations. Our code is publicly available at https://github.com/jw4hv/HierarchicalBayesianAtlasBuild.
【9】 DDCNet-Multires: Effective Receptive Field Guided Multiresolution CNN for Dense Prediction 标题:DDCNet-Multires:有效感受场引导的多分辨率细胞神经网络密度预测
作者:Ali Salehi,Madhusudhanan Balasubramanian 机构:Tennessee, United States 备注:27 pages, 10 figures, 2 tables. arXiv admin note: text overlap with arXiv:2107.04715 链接:https://arxiv.org/abs/2107.05634 摘要:在具有非均匀运动动力学、遮挡和场景均匀性的场景中,当存在大位移时,密集光流估计是一个挑战。处理这些挑战的传统方法包括分层和多分辨率处理方法。基于学习的光流方法通常使用多分辨率的方法,当存在大范围的流速和非均匀运动时,图像扭曲。这种从粗到精的方法的精度受到多分辨率图像扭曲时的重影伪影以及具有较高运动对比度的较小场景范围内的消失问题的影响。在此之前,我们设计了以有效感受野(ERF)特性为指导的密集预测网络(DDCNet)的构建策略。DDCNet的设计有意地简单和紧凑,允许它被用作设计更复杂但紧凑的网络的构建块。在这项工作中,我们扩展了DDCNet策略,通过级联基于DDCNet的子网来处理异构的运动动力学,减少了子网的ERF。我们的具有多分辨率功能的DDCNet(DDCNet Multires)结构紧凑,没有任何专门的网络层。我们使用标准光流基准数据集评估了DDCNet Multires网络的性能。我们的实验表明,DDCNet-Multires比DDCNet-B0和-B1改进,并且提供了与类似的基于轻量级学习的方法相当的精度的光流估计。 摘要:Dense optical flow estimation is challenging when there are large displacements in a scene with heterogeneous motion dynamics, occlusion, and scene homogeneity. Traditional approaches to handle these challenges include hierarchical and multiresolution processing methods. Learning-based optical flow methods typically use a multiresolution approach with image warping when a broad range of flow velocities and heterogeneous motion is present. Accuracy of such coarse-to-fine methods is affected by the ghosting artifacts when images are warped across multiple resolutions and by the vanishing problem in smaller scene extents with higher motion contrast. Previously, we devised strategies for building compact dense prediction networks guided by the effective receptive field (ERF) characteristics of the network (DDCNet). The DDCNet design was intentionally simple and compact allowing it to be used as a building block for designing more complex yet compact networks. In this work, we extend the DDCNet strategies to handle heterogeneous motion dynamics by cascading DDCNet based sub-nets with decreasing extents of their ERF. Our DDCNet with multiresolution capability (DDCNet-Multires) is compact without any specialized network layers. We evaluate the performance of the DDCNet-Multires network using standard optical flow benchmark datasets. Our experiments demonstrate that DDCNet-Multires improves over the DDCNet-B0 and -B1 and provides optical flow estimates with accuracy comparable to similar lightweight learning-based methods.
【10】 Lifting the Convex Conjugate in Lagrangian Relaxations: A Tractable Approach for Continuous Markov Random Fields 标题:提升拉格朗日松弛中的凸共轭:连续马尔可夫随机场的一种简便方法
作者:Hartmut Bauermeister,Emanuel Laude,Thomas Möllenhoff,Michael Moeller,Daniel Cremers 机构:Thomas M¨ollenhoff‡ 链接:https://arxiv.org/abs/2107.06028 摘要:非凸优化中的对偶分解方法可能存在对偶间隙。当直接将它们应用于非凸问题时,如具有连续状态空间的马尔可夫随机场(MRF)中的映射推断,这就提出了一个挑战。为了消除这些差距,本文考虑在测度空间中对原来的非凸任务进行重构。然后将这种无限维的重格式近似为半无限维的重格式,这种半无限维的重格式是通过对偶函数中的分段多项式离散得到的。我们在对偶离散化所引起的原始问题背后提供了一种几何直觉,并在矩空间上画出了最优化的连接。与已有的离散方法相比,分段多项式离散方法能更好地保持问题的连续性。引用最优传输理论和凸代数几何的结果,将半无限规划化为有限规划,并给出了一个基于半定规划的具体实现。我们在实验和理论上都证明了这种方法成功地缩小了对偶缺口。为了展示我们的方法的可扩展性,我们将其应用于两幅图像之间的立体匹配问题。 摘要:Dual decomposition approaches in nonconvex optimization may suffer from a duality gap. This poses a challenge when applying them directly to nonconvex problems such as MAP-inference in a Markov random field (MRF) with continuous state spaces. To eliminate such gaps, this paper considers a reformulation of the original nonconvex task in the space of measures. This infinite-dimensional reformulation is then approximated by a semi-infinite one, which is obtained via a piecewise polynomial discretization in the dual. We provide a geometric intuition behind the primal problem induced by the dual discretization and draw connections to optimization over moment spaces. In contrast to existing discretizations which suffer from a grid bias, we show that a piecewise polynomial discretization better preserves the continuous nature of our problem. Invoking results from optimal transport theory and convex algebraic geometry we reduce the semi-infinite program to a finite one and provide a practical implementation based on semidefinite programming. We show, experimentally and in theory, that the approach successfully reduces the duality gap. To showcase the scalability of our approach, we apply it to the stereo matching problem between two images.