[计算机视觉论文速递] 2018-03-03

Amusi

发布于 2018-04-12 10:06:33

1.2K0

文章被收录于专栏：CVerCVer

通知：这篇推文很长，有32篇论文速递信息，涉及目标检测、图像分割、网络优化、人脸表情识别、SLAM和OCR等方向。

[1]《The 2018 DAVIS Challenge on Video Object Segmentation》

Abstract：论文提出了2018 DAVIS的视频目标分割挑战赛，其是一个专门为视频目标分割设计的公共竞争任务。

注：此文是比赛通知，并非实质学术性论文。大牛们快去报名Solo，提前祝贺取得好成绩！注意是视频目标分割，哈哈，有点意思！

arxiv：https://arxiv.org/abs/1803.00557

比赛官网：http://davischallenge.org/

[2]《LCR-Net++: Multi-person 2D and 3D Pose Detection in Natural Images》

2017 CVPR IEEE Trans. PAMI 2018（Submitted）

Abstract：论文提出一个对自然图像进行联合2D和3D人类姿态估计的端到端框架。论文方法的关键是每幅图像一些姿态proposals的generation和scoring。这使得能够同时预测2D和3D多人的姿势。因此，论文方法不需要人类的近似定位来进行初始化。论文提出的定位-分类-回归框架，命名为LCR-Net。

注：这篇文章的前身发表在CVPR 2017《LCR-Net: Localization-Classification-Regression for Human Pose》，因为论文很赞，又发表在PAMI 2018上（，目前是Submitted状态，并没有Accepted）。

arxiv：https://arxiv.org/abs/1803.00455

[3]《Graph Kernels based on High Order Graphlet Parsing and Hashing》

Abstract：论文提出一种新的high-order stochastic graphlet embedding (SGE)，其将图映射为向量空间。论文的主要贡献包括一个新的随机搜索程序，其有效地分析给定的图和提取/采样无限高阶的graphlets。

注：Graph-based方法，很硬！

arxiv：https://arxiv.org/abs/1803.00425

[4]《Yedrouj-Net: An efficient CNN for spatial steganalysis》

IEEE ICASSP 2018

Abstract：论文提出了一种CNN(Yedrouj-Net)，在对图像进行隐写分析的错误概率方面达到the state-of-the-art。

注：利用CNN检测对图像进行的隐写分析(steganalysis)，哇，好厉害！

arxiv：https://arxiv.org/abs/1803.00407

[5]《DeepDefense: Training Deep Neural Networks with Improved Robustness》

CVPR 2018 submission version 清华大学的工作

Abstract：Despite the efficacy on a variety of computer vision tasks, deep neural networks (DNNs) are vulnerable to adversarial attacks, limiting their applications in security-critical systems. Recent works have shown the possibility of generating imperceptibly perturbed image inputs (a.k.a., adversarial examples) to fool well-trained DNN models into making arbitrary predictions. To address this problem, we propose a training recipe named DeepDefense. Our core idea is to integrate an adversarial perturbation-based regularizer into the classification objective, such that the obtained models learn to resist potential attacks, directly and precisely. The whole optimization problem is solved just like training a recursive network.（这里直接搬原文比较好）

注：本文的目的是提高DNNs的受对抗攻击的能力，很有意思！

arixiv：https://arxiv.org/abs/1803.00404

[6]《Unravelling Robustness of Deep Learning based Face Recognition Against Adversarial Attacks》

AAAI 2018

Abstract：论文试图揭露关于人脸识别的DNNs鲁棒性的三方面。

注：哪三个方面呢，当然是要看论文啦！

arxiv：https://arxiv.org/abs/1803.00401

[7]《Calcium Removal From Cardiac CT Images Using Deep Convolutional Neural Network》

ISBI 2018

Abstract：Coronary calcium causes beam hardening and blooming artifacts on cardiac computed tomography angiography (CTA) images, which lead to overestimation of lumen stenosis and reduction of diagnostic specificity。本文提出了一种基于多步法修复处理的机器学习方法。同时还提出一种新的网络配置，称为Dense-Unet，可以,在低计算成本下实现最佳的性能。

注：哼，医学方向的Deep Learning我怎么会了解呢！Coronary calcium都不知道啥意思！只知道U-Net在图像分割领域（特别是医学方向）很重要！

arxiv：https://arxiv.org/abs/1803.00399

[8]《Satellite imagery analysis for operational damage assessment in Emergency situations》

Abstract：当发生重大灾难时，如何及时提出估计损害的问题，以支持地方当局或人道主义小组的决策过程和救灾工作。本文考虑了机器学习和计算机视觉在遥感图像中的应用，以提高受灾地区受损建筑物评估的时间效率。我们提出了一个可以在各种灾害管理应用中有用的通用workflow，并通过在评估2017年加利福尼亚州野火造成的破坏时论证所提出方法的可靠性。

注：很有意思的应用！

arxiv：https://arxiv.org/abs/1803.00397

[9]《Fast and robust misalignment correction of Fourier ptychographic microscopy》

Abstract：Fourier ptychographic microscopy (FPM) is a newly developed computational imaging technique that can provide gigapixel images with both high resolution (HR) and wide field of view (FOV). However, the position misalignment of the LED array induces a degradation of the reconstructed image, especially in the regions away from the optical axis. In this paper, we propose a robust and fast method to correct the LED misalignment of the FPM, termed as misalignment correction for the FPM (mcFPM). Although different regions in the FOV have different sensitivity to the LED misalignment, the experimental results show that the mcFPM is robust with respect to the elimination of each region. Compared with the state-of-the-art methods, the mcFPM is much faster.

注：FPM是什么？没听说过。哇，一个很新的研究方向！

arixv：https://arxiv.org/abs/1803.00395

[10]《Image Dataset for Visual Objects Classification in 3D Printing》

Workshop track - ICLR 2018

Abstract：添加剂制造 (additive manufacturing (AM)) 的迅速发展，也被称为3D打印，带来了潜在的风险和安全问题以及显著的效益。为了提高3D打印过程的安全水平，本研究旨在通过深度学习来检测和识别非法成分。在这项工作中，我们收集了一个数据集61340幅2D图像 (每个图像的 28×28) 10 类，包括枪和其他非枪对象，对应的投影结果的原始3D模型。为了验证数据集，我们训练了一种用于火炮分类的卷积神经网络 (CNN) 模型, 可达到98.16% 的分类精度。

注：What？Deep Learning和3D Printing相结合？果然擦出了火花！

arxiv：https://arxiv.org/abs/1803.00391

[11]《Poisson Image Denoising Using Best Linear Prediction: A Post-processing Framework》

Abstract：论文讨论了利用泊松噪声降噪图像的问题。提出了一种基于最佳线性预测的patch-based方法，可以估计出原潜在的清晰图像。

arxiv：https://arxiv.org/abs/1803.00389

[12]《Learning Filter Scale and Orientation In CNNs》

Abstract：Convolutional neural networks have many hyperparameters such as the filter size, number of filters, and pooling size, which require manual tuning. Though deep stacked structures are able to create multi-scale and hierarchical representations, manually fixed filter sizes limit the scale of representations that can be learned in a single convolutional layer. This paper introduces a new adaptive filter model that allows variable scale and orientation. The scale and orientation parameters of filters can be learned using back propagation. In this work, we employed a multivariate (2D) Gaussian as the envelope function and showed that it can grow, shrink, or rotate by updating its covariance matrix during back propagation training.The results demonstrate that the new model can effectively learn and produce filters of different scales and orientations in a single layer.（原谅我直接搬原文，因为翻译肯定跟狗shi一样）

arxiv：https://arxiv.org/abs/1803.00388

[13]《A General Pipeline for 3D Detection of Vehicles》

ICRA 2018

Abstract：论文提出一种灵活的pipeline，其采用任意2D目标检测网络，并将其与3D点云融合，使2D目标检测网络的最小变化来产生3D信息。

arxiv：https://arxiv.org/abs/1803.00387

[14]《MAGAN: Aligning Biological Manifolds》

Abstract：论文提出一种新的GAN，称为Manifold-Aligning GAN (MAGAN)，其对齐两个manifold, 使相关点在每个测量空间被排列在一起。我们展示了马江在单细胞生物学中的应用, 将两种不同的测量类型结合在一起。

注：有没有大佬可以告知这里的manifold是什么意思？

arxiv：https://arxiv.org/abs/1803.00385

[15]《Fibres of Failure: Classifying errors in predictive processes》

Abstract：我们描述了Fibres of Failure (FIFA)，其是一种利用MAPPER算法从拓扑数据分析中对预测过程的失效模式进行分类的方法。该方法利用MAPPER建立了基于预测误差分层的输入数据的图形模型。

注：这里的FIFA并不是国际足联，球迷看到别激动！而且MAPPER我也不清楚，感兴趣的童鞋请自行google。

arxiv：https://arxiv.org/abs/1803.00384

[16]《Five-point Fundamental Matrix Estimation for Uncalibrated Cameras》

Abstract：论文的目标是从两幅图像中的五组对应的旋转不变特征，来估计这两幅视图中的基本矩阵。首先假设它们是共平面的，并利用它们的旋转分量，通过三组对应特征来估计单应矩阵(homography)。然后从单应矩阵和两个附加点对的一般位置得到了基本矩阵。所提出的方法，加上像Graph-Cut RANSAC这样的稳健估计，在精度和需要的迭代次数方面优于其他最先进的算法。

注：终于看到一个图像处理（特指非结合Deep Learning）方向的论文了。涉及经典的homography和RANSAC，很有意思！论文说先利用三组对应特征就能估计homography，可是经典的方法是至少四组。有点奇怪，有时间精读一下论文。

arxiv：https://arxiv.org/abs/1803.00260

[17]《Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective》

CVPR 2018

Abstract：本文使用多幅图像解决了致密非刚性结构(dense non-rigid structure

from motion，NRSfm)的任务。通过对Grassmann manifold上的问题建模，提出了一种新的dense NRSfm方法。具体来说, 我们假设复杂的非刚性变形是在spatially和temporally的局部线性子空间的联合。This naturally allows for a compact representation

of the complex non-rigid deformation over frames.

注：Sfm升级难度->NRSfm，如今的dense NRSfm很赞

arxiv：https://arxiv.org/abs/1803.00233

[18]《DRUNET: A Dilated-Residual U-Net Deep Learning Network to Digitally Stain Optic Nerve Head Tissues in Optical Coherence Tomography Images》

Abstract：Given that the neural and connective tissues of the optic nerve head (ONH) exhibit complex morphological changes with the development and progression of glaucoma, their simultaneous isolation from optical coherence tomography (OCT) images may be of great interest for the clinical diagnosis and management of this pathology. A deep learning algorithm was designed and trained to digitally stain (i.e. highlight) 6 ONH tissue layers by capturing both the local (tissue texture) and contextual information

(spatial arrangement of tissues). The overall dice coefficient (mean of all tissues) was 0.91 ± 0.05 when assessed against manual segmentations performed by an expert observer. We offer here a robust segmentation framework that could be extended for the automated parametric study of the ONH tissues.（给翻译跪了！）

注：U-Net又来了！

arxiv：https://arxiv.org/abs/1803.00232

[19]《Tongue image constitution recognition based on Complexity Perception method》

Abstract：论文使用基于深度卷积神经网络(DCNNS)的方法进行舌区检测(tongue area)、舌区标定和体质分类(constitution)。

注：很接地气的Deep Learning应用，国内很多医生看舌头来判断病人状态，如今可以使用Deep Learning来检查、判断。OMG，点赞！

arxiv：https://arxiv.org/abs/1803.00219

[20]《TSSD: Temporal Single-Shot Detector Based on Attention and LSTM for Robotic Intelligent Perception》

Submitted to IROS 2018

Abstract：In this paper, based on attention mechanism and convolutional long short-term memory (ConvLSTM), we propose a temporal single-shot detector (TSSD) for robotic vision. Distinct from previous methods, we take aim at temporally integrating pyramidal feature hierarchy using ConvLSTM, and design a novel structure including a high-level ConvLSTM unit as well as a low-level one (HL-LSTM) for multi-scale feature maps. Moreover, we develop a creative temporal analysis unit, namely ConvLSTM-based。（很硬的paper，俺就不翻译）

注：咦！Temporal object detection是什么鬼？

arxiv：https://arxiv.org/abs/1803.00197

[21]《Facial Expression Recognition Based on Complexity Perception Classification Algorithm》

Abstract：论文提出一种简单有效的CNN模型来提取脸部特征，并提出了一种复杂感知分类算法 (complexity perception classification，CPC)。

注：Facial expression recognition (FER)很有趣的研究方向，曾看我室友装逼搞过。

arxiv：https://arxiv.org/abs/1803.00185

[22]《A Class-Incremental Learning Method Based on One Class Support Vector Machine》

Under review as a conference paper at ICPR 2018

Abstract：A method based on one class support vector machine (OCSVM) is proposed for class incremental learning. Several OCSVM models divide the input space into several parts.

Then, the 1VS1 classifiers are constructed for the confuse part by using the support vectors. During the class incremental learning process, the OCSVM of the new class is trained at first. Then the support vectors of the old classes and the support vectors of the new class are reused to train 1VS1 classifiers for the confuse part. In order to bring more information to the certain support vectors, the support vectors are at the boundary of the distribution of samples as much as possible when the OCSVM is built. Compared with the traditional methods, the proposed method retains the original model and thus reduces memory consumption and training time cost. Various experiments on different datasets also verify the efficiency of the proposed method.

注：SVM还能玩出花样，真的厉害！人人都去Deep Learning，而Machine Learning显得更重要了。

arxiv：https://arxiv.org/abs/1803.00159

[23]《Ring loss: Convex Feature Normalization for Face Recognition》

CVPR 2018

Abstract：We motivate and present Ring loss, a simple and elegant feature normalization approach for deep networks designed to augment standard loss functions such as Softmax. We argue that deep feature normalization is an important aspect of supervised classification problems where we require the model to represent each class in a multi-class problem equally well. The direct approach to feature normalization through the hard normalization operation results in a non-convex formulation. Instead, Ring loss applies soft normalization, where it gradually learns to constrain the norm to the scaled unit circle while preserving convexity leading to more robust features. We apply Ring loss to large-scale face recognition problems and present results on LFW, the challenging protocols of IJB-A Janus, Janus CS3 (a superset of IJB-A Janus), Celebrity Frontal-Profile (CFP) and MegaFace with 1 million distractors. Ring loss outperforms

strong baselines, matches state-of-the-art performance on IJB-A Janus and outperforms all other results on the challenging Janus CS3 thereby achieving state-of-the-art. We also outperform strong baselines in handling extremely low resolution face matching.（没有翻译，不代表偷懒...哦）

注：看起来很厉害！很厉害！

arxiv：https://arxiv.org/abs/1803.00130

[24]《SalientDSO: Bringing Attention to Direct Sparse Odometry》

Abstract：We merge the successes of these two communities and present a way to incorporate semantic information in the form of visual saliency to Direct Sparse Odometry – a highly successful direct sparse VO algorithm. We also present a framework to filter the visual saliency based on scene parsing. Our framework, SalientDSO, relies on the widely successful deep learning based approaches for visual saliency and scene parsing which drives the feature selection for obtaining highly-accurate and robust VO even in the presence of as few as 40 point features per frame. We provide extensive quantitative evaluation of SalientDSO on the ICL-NUIM and TUM monoVO datasets and show that we outperform DSO and ORB-SLAM – two very popular state-of-the-art approaches in the literature. We also collect and publicly release a CVL-UMD dataset which contains two indoor cluttered sequences on which we show qualitative evaluations. To our knowledge this is the first paper to use visual saliency and scene parsing to drive the feature selection in direct VO.（很硬的paper，俺就不翻译）

注：SLAM方向，很有意思！此论文专注研究视觉里程计(Visual Odometry，VO)。

arxiv：https://arxiv.org/abs/1803.00127

[25]《Chinese Text in the Wild》

Abstract：论文提供了一个新创建的中文文本数据集，其中有由专家标注3万多条街景图像，大约100万个汉字。

注：清华大学出品必属精品。安利一波数据集。

arxiv：https://arxiv.org/abs/1803.00085

中文文本数据集：https://ctwdataset.github.io/

[26]《Joint Pixel and Feature-level Domain Adaptation in the Wild》

Submitted to CVPR 2018

Abstract：Recent developments in deep domain adaptation have allowed knowledge transfer from a labeled source domain to an unlabeled target domain at the level of intermediate features or input pixels. We propose that advantages may be derived by combining them, in the form of different insights that lead to a novel design and complementary properties that result in better performance. At the feature level, inspired by insights from semi-supervised learning in a domain adversarial neural network, we propose a novel regularization in the form of domain adversarial entropy minimization. Next, we posit that insights from computer vision are more amenable to injection at the pixel level and specifically address the key challenge of adaptation across different semantic levels. In particular, we use 3D geometry and image synthesization based on a generalized appearance flow to preserve identity across higher-level pose transformations, while using an attribute-conditioned CycleGAN to translate a single source into multiple target images that differ in lower-level properties such as lighting. We validate on a novel problem of car recognition in unlabeled surveillance images using labeled images from the web, handling explicitly specified, nameable factors of variation through pixel-level and implicit, unspecified factors through feature-level adaptation. Extensive experiments achieve state-of-the-art results, demonstrating

the effectiveness of complementing feature and pixel-level information via our proposed domain adaptation method.（很硬的paper，俺就不翻译）

arxiv：https://arxiv.org/abs/1803.00068

[27]《Super-Efficient Spatially Adaptive Contrast Enhancement Algorithm for Superficial Vein Imaging》

ICIIS 2017

Abstract：论文提出了一种超高效空间自适应增强算法，可以增强基于浅静脉成像的红外(IR) radiation。

arxiv：https://arxiv.org/abs/1803.00039

[28]《Hardware-Efficient Guided Image Filtering For Multi-Label Problem》

Abstract：In this paper we propose a hardwareefficient Guided Filter (HGF), which solves the efficiency problem of multichannel guided image filtering and yields competent results when applying it to multi-label problems with synthesized polynomial multichannel guidance.

注：咦，什么是The Guided Filter (GF) 问题？

arxiv：https://arxiv.org/abs/1803.00005

[29]《Speeding Up the Bilateral Filter: A Joint Acceleration Way》

IEEE TRANSACTIONS ON IMAGE PROCESSING 2016

Abstract：Jointly employing five techniques: kernel truncation, best N-term approximation as well as previous 2-D box filtering, dimension promotion and shiftability property, we propose a unified framework to transform BF with arbitrary spatial and range kernels into a set of 3-D box filters that can be computed in linear time. To the best of our knowledge, our algorithm is the first method that can integrate all these acceleration techniques and therefore can draw upon one another’s strong point to overcome deficiencies. T

注：Bilateral Filter，厉害了！

arxiv：https://arxiv.org/abs/1803.00004

[30]《Natural data structure extracted from neighborhood-similarity graphs》

Abstract：论文提出一种将数据点邻域相似性直接编码为稀疏图的新方法。该非迭代框架允许对数据进行透明的解释，而不改变原始数据维度和度量。

arxiv：https://arxiv.org/abs/1803.00500

[31]《Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions》

Abstract：In this paper we argue that sufficient width of a feedforward network is equally important by answering the simple question under which conditions the decision regions of a neural network are connected. It turns out that for a class of activation functions including leaky ReLU, neural networks having a pyramidal structure, that is no layer has more hidden units than the input dimension, produce necessarily connected decision regions. This implies that a sufficiently wide layer is necessary to produce disconnected decision regions.

注：看不懂，只能膜拜了！

arxiv：https://arxiv.org/abs/1803.00094