1.Efficient end-to-end learning for quantizable representations（可量化表示的端到端高效学习）
作者：Yeonwoo Jeong,Hyun Oh Song
Accepted and to appear at ICML 2018
摘要：Embedding representation learning via neural networks is at the core foundation of modern similarity based search. While much effort has been put in developing algorithms for learning binary hamming code representations for search efficiency, this still requires a linear scan of the entire dataset per each query and trades off the search accuracy through binarization. To this end, we consider the problem of directly learning a quantizable embedding representation and the sparse binary hash code end-to-end which can be used to construct an efficient hash table not only providing significant search reduction in the number of data but also achieving the state of the art search accuracy outperforming previous state of the art deep metric learning methods. We also show that finding the optimal sparse binary hash code in a mini-batch can be computed exactly in polynomial time by solving a minimum cost flow problem. Our results on Cifar-100 and on ImageNet datasets show the state of the art search accuracy in precision@k and NMI metrics while providing up to 98X and 478X search speedup respectively over exhaustive linear search.
2.Nonlinear Metric Learning through Geodesic Polylinear Interpolation (ML-GPI)（通过测地线多线性插值的非线性度量学习(ML-GPI)）
作者：Zhewei Wang,Bibo Shi,Charles D. Smith,Jundong Liu
accepted to ICPR'2018
机构：Duke University，University of Kentucky，Ohio University
摘要：In this paper, we propose a nonlinear distance metric learning scheme based on the fusion of component linear metrics. Instead of merging displacements at each data point, our model calculates the velocities induced by the component transformations, via a geodesic interpolation on a Lie transfor- mation group. Such velocities are later summed up to produce a global transformation that is guaranteed to be diffeomorphic. Consequently, pair-wise distances computed this way conform to a smooth and spatially varying metric, which can greatly benefit k-NN classification. Experiments on synthetic and real datasets demonstrate the effectiveness of our model.
3.Online Deep Metric Learning（在线深度量学习）
作者：Wenbin Li,Jing Huo,Yinghuan Shi,Yang Gao,Lei Wang,Jiebo Luo
机构：Nanjing University，University of Rochester，University of Wollongong
摘要：Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space? In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms? To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.
4.End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching（姿态不变三维匹配关键点检测器和描述子的端到端学习）
作者：Georgios Georgakis,Srikrishna Karanam,Ziyan Wu,Jan Ernst,Jana Kosecka
机构：George Mason University
摘要：Finding correspondences between images or 3D scans is at the heart of many computer vision and image retrieval applications and is often enabled by matching local keypoint descriptors. Various learning approaches have been applied in the past to different stages of the matching pipeline, considering detector, descriptor, or metric learning objectives. These objectives were typically addressed separately and most previous work has focused on image data. This paper proposes an end-to-end learning framework for keypoint detection and its representation (descriptor) for 3D depth maps or 3D scans, where the two can be jointly optimized towards task-specific objectives without a need for separate annotations. We employ a Siamese architecture augmented by a sampling layer and a novel score loss function which in turn affects the selection of region proposals. The positive and negative examples are obtained automatically by sampling corresponding region proposals based on their consistency with known 3D pose labels. Matching experiments with depth data on multiple benchmark datasets demonstrate the efficacy of the proposed approach, showing significant improvements over state-of-the-art methods.
5.Feasibility Based Large Margin Nearest Neighbor Metric Learning（基于可行性的大间隔最近邻度量学习）
作者：Babak Hosseini,Barbara Hammer
published in ESANN2018
摘要：Large margin nearest neighbor (LMNN) is a metric learner which optimizes the performance of the popular $k$NN classifier. However, its resulting metric relies on pre-selected target neighbors. In this paper, we address the feasibility of LMNN's optimization constraints regarding these target points, and introduce a mathematical measure to evaluate the size of the feasible region of the optimization problem. We enhance the optimization framework of LMNN by a weighting scheme which prefers data triplets which yield a larger feasible region. This increases the chances to obtain a good metric as the solution of LMNN's problem. We evaluate the performance of the resulting feasibility-based LMNN algorithm using synthetic and real datasets. The empirical results show an improved accuracy for different types of datasets in comparison to regular LMNN.
6.Scalable Angular Discriminative Deep Metric Learning for Face Recognition（基于可扩展角度判别深度度量学习的人脸识别）
作者：Bowen Wu,Huaming Wu,Monica M. Y. Zhang
机构：Nankai University，Tianjin University
摘要：With the development of deep learning, Deep Metric Learning (DML) has achieved great improvements in face recognition. Specifically, the widely used softmax loss in the training process often bring large intra-class variations, and feature normalization is only exploited in the testing process to compute the pair similarities. To bridge the gap, we impose the intra-class cosine similarity between the features and weight vectors in softmax loss larger than a margin in the training step, and extend it from four aspects. First, we explore the effect of a hard sample mining strategy. To alleviate the human labor of adjusting the margin hyper-parameter, a self-adaptive margin updating strategy is proposed. Then, a normalized version is given to take full advantage of the cosine similarity constraint. Furthermore, we enhance the former constraint to force the intra-class cosine similarity larger than the mean inter-class cosine similarity with a margin in the exponential feature projection space. Extensive experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and IARPA Janus Benchmark A (IJB-A) datasets demonstrate that the proposed methods outperform the mainstream DML methods and approach the state-of-the-art performance.
7.A Unified Framework for Domain Adaptation using Metric Learning on Manifolds（流形上度量学习的区域自适应统一框架）
作者：Sridhar Mahadevan,Bamdev Mishra,Shalini Ghosh
机构：University of Massachusetts，Stanford University
摘要：We present a novel framework for domain adaptation, whereby both geometric and statistical differences between a labeled source domain and unlabeled target domain can be integrated by exploiting the curved Riemannian geometry of statistical manifolds. Our approach is based on formulating transfer from source to target as a problem of geometric mean metric learning on manifolds. Specifically, we exploit the curved Riemannian manifold geometry of symmetric positive definite (SPD) covariance matrices. We exploit a simple but important observation that as the space of covariance matrices is both a Riemannian space as well as a homogeneous space, the shortest path geodesic between two covariances on the manifold can be computed analytically. Statistics on the SPD matrix manifold, such as the geometric mean of two matrices can be reduced to solving the well-known Riccati equation. We show how the Ricatti-based solution can be constrained to not only reduce the statistical differences between the source and target domains, such as aligning second order covariances and minimizing the maximum mean discrepancy, but also the underlying geometry of the source and target domains using diffusions on the underlying source and target manifolds. A key strength of our proposed approach is that it enables integrating multiple sources of variation between source and target in a unified way, by reducing the combined objective function to a nested set of Ricatti equations where the solution can be represented by a cascaded series of geometric mean computations. In addition to showing the theoretical optimality of our solution, we present detailed experiments using standard transfer learning testbeds from computer vision comparing our proposed algorithms to past work in domain adaptation, showing improved results over a large variety of previous methods.
8.Generative Model with Coordinate Metric Learning for Object Recognition Based on 3D Models（基于三维模型坐标度量学习生成模型的物体识别）
作者：Yida Wang,Weihong Deng
机构：University of Posts and Telecommunications
摘要：Given large amount of real photos for training, Convolutional neural network shows excellent performance on object recognition tasks. However, the process of collecting data is so tedious and the background are also limited which makes it hard to establish a perfect database. In this paper, our generative model trained with synthetic images rendered from 3D models reduces the workload of data collection and limitation of conditions. Our structure is composed of two sub-networks: semantic foreground object reconstruction network based on Bayesian inference and classification network based on multi-triplet cost function for avoiding over-fitting problem on monotone surface and fully utilizing pose information by establishing sphere-like distribution of descriptors in each category which is helpful for recognition on regular photos according to poses, lighting condition, background and category information of rendered images. Firstly, our conjugate structure called generative model with metric learning utilizing additional foreground object channels generated from Bayesian rendering as the joint of two sub-networks. Multi-triplet cost function based on poses for object recognition are used for metric learning which makes it possible training a category classifier purely based on synthetic data. Secondly, we design a coordinate training strategy with the help of adaptive noises acting as corruption on input images to help both sub-networks benefit from each other and avoid inharmonious parameter tuning due to different convergence speed of two sub-networks. Our structure achieves the state of the art accuracy of over 50\% on ShapeNet database with data migration obstacle from synthetic images to real photos. This pipeline makes it applicable to do recognition on real images only based on 3D models.
9.Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning（基于像素级度量学习的视频目标快速分割）
作者：Yuhua Chen,Jordi Pont-Tuset,Alberto Montes,Luc Van Gool
Accepted to CVPR 2018
摘要：This paper tackles the problem of video object segmentation, given some user annotation which indicates the object of interest. The problem is formulated as pixel-wise retrieval in a learned embedding space: we embed pixels of the same object instance into the vicinity of each other, using a fully convolutional network trained by a modified triplet loss as the embedding model. Then the annotated pixels are set as reference and the rest of the pixels are classified using a nearest-neighbor approach. The proposed method supports different kinds of user input such as segmentation mask in the first frame (semi-supervised scenario), or a sparse set of clicked points (interactive scenario). In the semi-supervised scenario, we achieve results competitive with the state of the art but at a fraction of computation cost (275 milliseconds per frame). In the interactive scenario where the user is able to refine their input iteratively, the proposed method provides instant response to each input, and reaches comparable quality to competing methods with much less interaction.
10.Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive
作者：Baida Hamdan,Davood Zabihzadeh,Monsefi Reza
机构：Ferdowsi University of Mashhad
摘要：Similarity/Distance measures play a key role in many machine learning, pattern recognition, and data mining algorithms, which leads to the emergence of metric learning field. Many metric learning algorithms learn a global distance function from data that satisfy the constraints of the problem. However, in many real-world datasets that the discrimination power of features varies in the different regions of input space, a global metric is often unable to capture the complexity of the task. To address this challenge, local metric learning methods are proposed that learn multiple metrics across the different regions of input space. Some advantages of these methods are high flexibility and the ability to learn a nonlinear mapping but typically achieves at the expense of higher time requirement and overfitting problem. To overcome these challenges, this research presents an online multiple metric learning framework. Each metric in the proposed framework is composed of a global and a local component learned simultaneously. Adding a global component to a local metric efficiently reduce the problem of overfitting. The proposed framework is also scalable with both sample size and the dimension of input data. To the best of our knowledge, this is the first local online similarity/distance learning framework based on PA (Passive/Aggressive). In addition, for scalability with the dimension of input data, DRP (Dual Random Projection) is extended for local online learning in the present work. It enables our methods to be run efficiently on high-dimensional datasets, while maintains their predictive performance. The proposed framework provides a straightforward local extension to any global online similarity/distance learning algorithm based on PA.
原文发布于微信公众号 - 专知（Quan_Zhuanzhi）