机器学习学术速递[9.8]

公众号-arXiv每日学术速递

发布于 2021-09-16 16:46:38

2K0

发布于 2021-09-16 16:46:38

文章被收录于专栏：arXiv每日学术速递

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计69篇

Graph相关(图学习|图神经网络|图优化等)(2篇)

【1】 HMSG: Heterogeneous Graph Neural Network based on Metapath Subgraph Learning 标题：HMSG：基于元路径子图学习的异构图神经网络链接：https://arxiv.org/abs/2109.02868

作者：Xinjun Cai,Jiaxing Shang,Fei Hao,Dajiang Liu,Linjiang Zheng 机构： College of Computer Science, Chongqing University, Chongqing , China, Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing University, Chongqing , China 备注：12 pages, 3 figures, 6 tables 摘要：许多真实世界的数据可以表示为具有不同类型节点和连接的异构图。异构图神经网络模型旨在将节点或子图嵌入到低维向量空间中，用于各种下游任务，如节点分类、链路预测等。尽管最近提出了几种模型，但它们要么仅聚合来自相同类型邻居的信息，或者只是不加区别地以同样的方式对待同质和异质邻居。基于这些观察结果，我们提出了一种新的异构图神经网络模型HMSG，该模型能够全面地从同质和异构邻居中获取结构、语义和属性信息。具体来说，我们首先将异构图分解为多个基于元路径的同构子图和异构子图，每个子图关联特定的语义和结构信息。然后将消息聚合方法独立地应用于每个子图，以便以更具针对性和效率的方式学习信息。通过特定于类型的属性转换，还可以在不同类型的节点之间传输节点属性。最后，我们将子图中的信息融合在一起，得到完整的表示。在节点分类、节点聚类和链路预测任务的多个数据集上进行的大量实验表明，HMSG在所有评估指标上的性能都优于最先进的基线。摘要：Many real-world data can be represented as heterogeneous graphs with different types of nodes and connections. Heterogeneous graph neural network model aims to embed nodes or subgraphs into low-dimensional vector space for various downstream tasks such as node classification, link prediction, etc. Although several models were proposed recently, they either only aggregate information from the same type of neighbors, or just indiscriminately treat homogeneous and heterogeneous neighbors in the same way. Based on these observations, we propose a new heterogeneous graph neural network model named HMSG to comprehensively capture structural, semantic and attribute information from both homogeneous and heterogeneous neighbors. Specifically, we first decompose the heterogeneous graph into multiple metapath-based homogeneous and heterogeneous subgraphs, and each subgraph associates specific semantic and structural information. Then message aggregation methods are applied to each subgraph independently, so that information can be learned in a more targeted and efficient manner. Through a type-specific attribute transformation, node attributes can also be transferred among different types of nodes. Finally, we fuse information from subgraphs together to get the complete representation. Extensive experiments on several datasets for node classification, node clustering and link prediction tasks show that HMSG achieves the best performance in all evaluation metrics than state-of-the-art baselines.

【2】 Graph Attention Layer Evolves Semantic Segmentation for Road Pothole Detection: A Benchmark and Algorithms 标题：基于图注意力层进化语义分割的道路坑洞检测基准与算法链接：https://arxiv.org/abs/2109.02711

作者：Rui Fan,Hengli Wang,Yuan Wang,Ming Liu,Ioannis Pitas 机构： the Hong Kong University of Science and Technology 备注：accepted as a regular paper to IEEE Transactions on Image Processing 摘要：现有的道路坑洼检测方法可分为基于计算机视觉的方法和基于机器学习的方法。前一种方法通常采用二维图像分析/理解或三维点云建模和分割算法从视觉传感器数据中检测道路凹坑。后一种方法通常以端到端的方式使用卷积神经网络（CNN）处理道路坑洞检测。然而，道路坑洼不一定无处不在，为CNN训练准备一个大型注释良好的数据集是一项挑战。在这方面，基于计算机视觉的方法是过去十年的主流研究趋势，而基于机器学习的方法只是讨论而已。最近，我们发布了第一个基于立体视觉的道路凹坑检测数据集和一种新的视差变换算法，从而可以高度区分受损和未受损的道路区域。然而，目前还没有使用视差图像或变换视差图像训练的最先进（SoTA）CNN的基准。因此，在本文中，我们首先讨论了用于语义分割的SoTA CNN，并通过大量实验评估了它们在道路坑洞检测中的性能。此外，受图神经网络（GNN）的启发，我们提出了一种新的CNN层，称为图注意层（GAL），它可以很容易地部署在任何现有的CNN中，以优化用于语义分割的图像特征表示。我们的实验将性能最好的实现GAL-DeepLabv3+与九个SoTA CNN在三种模式的训练数据上进行了比较：RGB图像、视差图像和变换的视差图像。实验结果表明，我们提出的GAL-DeepLabv3+在所有训练数据模式下都达到了最佳的整体坑洞检测精度。摘要：Existing road pothole detection approaches can be classified as computer vision-based or machine learning-based. The former approaches typically employ 2-D image analysis/understanding or 3-D point cloud modeling and segmentation algorithms to detect road potholes from vision sensor data. The latter approaches generally address road pothole detection using convolutional neural networks (CNNs) in an end-to-end manner. However, road potholes are not necessarily ubiquitous and it is challenging to prepare a large well-annotated dataset for CNN training. In this regard, while computer vision-based methods were the mainstream research trend in the past decade, machine learning-based methods were merely discussed. Recently, we published the first stereo vision-based road pothole detection dataset and a novel disparity transformation algorithm, whereby the damaged and undamaged road areas can be highly distinguished. However, there are no benchmarks currently available for state-of-the-art (SoTA) CNNs trained using either disparity images or transformed disparity images. Therefore, in this paper, we first discuss the SoTA CNNs designed for semantic segmentation and evaluate their performance for road pothole detection with extensive experiments. Additionally, inspired by graph neural network (GNN), we propose a novel CNN layer, referred to as graph attention layer (GAL), which can be easily deployed in any existing CNN to optimize image feature representations for semantic segmentation. Our experiments compare GAL-DeepLabv3+, our best-performing implementation, with nine SoTA CNNs on three modalities of training data: RGB images, disparity images, and transformed disparity images. The experimental results suggest that our proposed GAL-DeepLabv3+ achieves the best overall pothole detection accuracy on all training data modalities.

Transformer(1篇)

【1】 Sequential Diagnosis Prediction with Transformer and Ontological Representation 标题：基于Transformer和本体表示的序贯诊断预测链接：https://arxiv.org/abs/2109.03069

作者：Xueping Peng,Guodong Long,Tao Shen,Sen Wang,Jing Jiang 机构：∗ Australian Artificial Intelligence Institute, FEIT, University of Technology Sydney, Australia, † School of Information Technology and Electrical Engineering, The University of Queensland, Australia 备注：10 pages, 5 figures, Accepted by IEEE ICDM 2021. arXiv admin note: text overlap with arXiv:2107.09288 摘要：电子健康记录（EHR）上的顺序诊断预测已被证明对医学领域的预测分析至关重要。EHR数据是患者与医疗系统互动的连续记录，具有许多固有的暂时性、不规则性和数据不足的特征。最近的一些工作通过利用EHR数据中的序列信息来训练医疗保健预测模型，但是它们容易受到不规则的、暂时的EHR数据的影响，这些数据具有入院/出院状态，并且数据不足。为了缓解这种情况，我们提出了一种称为SETOR的端到端鲁棒Transformer模型，该模型利用神经常微分方程处理患者就诊之间的不规则时间间隔和每次就诊的住院时间，通过整合医学本体来缓解数据不足的限制，并通过使用多层转换块来捕获患者就诊之间的依赖关系。在两个真实的医疗数据集上进行的实验表明，我们的顺序诊断预测模型SETOR不仅比以前的最先进的方法获得了更好的预测结果，无论训练数据是否充足，而且还得到了更多可解释的医疗代码嵌入。实验代码可在GitHub存储库中获得(https://github.com/Xueping/SETOR). 摘要：Sequential diagnosis prediction on the Electronic Health Record (EHR) has been proven crucial for predictive analytics in the medical domain. EHR data, sequential records of a patient's interactions with healthcare systems, has numerous inherent characteristics of temporality, irregularity and data insufficiency. Some recent works train healthcare predictive models by making use of sequential information in EHR data, but they are vulnerable to irregular, temporal EHR data with the states of admission/discharge from hospital, and insufficient data. To mitigate this, we propose an end-to-end robust transformer-based model called SETOR, which exploits neural ordinary differential equation to handle both irregular intervals between a patient's visits with admitted timestamps and length of stay in each visit, to alleviate the limitation of insufficient data by integrating medical ontology, and to capture the dependencies between the patient's visits by employing multi-layer transformer blocks. Experiments conducted on two real-world healthcare datasets show that, our sequential diagnoses prediction model SETOR not only achieves better predictive results than previous state-of-the-art approaches, irrespective of sufficient or insufficient training data, but also derives more interpretable embeddings of medical codes. The experimental codes are available at the GitHub repository (https://github.com/Xueping/SETOR).

GAN|对抗|攻击|生成相关(3篇)

【1】 Brand Label Albedo Extraction of eCommerce Products using Generative Adversarial Network 标题：基于产生式对抗网络的电子商务产品品牌反照率提取链接：https://arxiv.org/abs/2109.02929

作者：Suman Sapkota,Manish Juneja,Laurynas Keleras,Pranav Kotwal,Binod Bhattarai 机构：Zeg.AI Pvt. Ltd, London, UK 备注：5 pages, 5 figures 摘要：在本文中，我们提出了我们的解决方案，提取反照率的品牌标签的电子商务产品。为此，我们生成一个用于反照率提取的大规模照片真实合成数据集，然后训练生成模型，将具有不同照明条件的图像转换为反照率。我们进行了广泛的评估，以测试我们的方法在野生图像中的普遍性。从实验结果中，我们观察到，与现有方法相比，我们的解决方案在不可见的渲染图像和野生图像中都具有良好的通用性。摘要：In this paper we present our solution to extract albedo of branded labels for e-commerce products. To this end, we generate a large-scale photo-realistic synthetic data set for albedo extraction followed by training a generative model to translate images with diverse lighting conditions to albedo. We performed an extensive evaluation to test the generalisation of our method to in-the-wild images. From the experimental results, we observe that our solution generalises well compared to the existing method both in the unseen rendered images as well as in the wild image.

【2】 Adversarial Parameter Defense by Multi-Step Risk Minimization 标题：基于多步风险最小化的对抗性参数防御链接：https://arxiv.org/abs/2109.02889

作者：Zhiyuan Zhang,Ruixuan Luo,Xuancheng Ren,Qi Su,Liangyou Li,Xu Sun 机构：MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University, Beijing, China., Center for Data Science, Peking University, Beijing, China., School of Foreign Languages, Peking University, Beijing, China. 备注：None 摘要：先前的研究表明，DNN容易受到对抗性示例的攻击，而对抗性训练可以建立对抗性示例的防御。此外，最近的研究表明，深层神经网络也容易受到参数破坏的影响。模型参数的脆弱性对于研究模型的鲁棒性和泛化具有重要意义。在这项工作中，我们引入了参数损坏的概念，并建议利用损失变化指标来测量损失池的平坦度和神经网络参数的参数鲁棒性。在此基础上，分析了参数腐蚀问题，提出了多步对抗性腐蚀算法。为了增强神经网络，我们提出了一种对抗性参数防御算法，该算法将多个对抗性参数损坏的平均风险降至最低。实验结果表明，该算法能提高神经网络的参数鲁棒性和精度。摘要：Previous studies demonstrate DNNs' vulnerability to adversarial examples and adversarial training can establish a defense to adversarial examples. In addition, recent studies show that deep neural networks also exhibit vulnerability to parameter corruptions. The vulnerability of model parameters is of crucial value to the study of model robustness and generalization. In this work, we introduce the concept of parameter corruption and propose to leverage the loss change indicators for measuring the flatness of the loss basin and the parameter robustness of neural network parameters. On such basis, we analyze parameter corruptions and propose the multi-step adversarial corruption algorithm. To enhance neural networks, we propose the adversarial parameter defense algorithm that minimizes the average risk of multiple adversarial parameter corruptions. Experimental results show that the proposed algorithm can improve both the parameter robustness and accuracy of neural networks.

【3】 Robustness and Generalization via Generative Adversarial Training 标题：通过生成性对抗性训练实现健壮性和泛化链接：https://arxiv.org/abs/2109.02765

作者：Omid Poursaeed,Tianxing Jiang,Harry Yang,Serge Belongie,SerNam Lim 机构：Ser-Nam Lim, Cornell University, Cornell Tech, Facebook AI 备注：ICCV 2021. arXiv admin note: substantial text overlap with arXiv:1911.09058 摘要：虽然深度神经网络在各种计算机视觉任务中取得了显著的成功，但它们往往无法推广到新的领域和输入图像的细微变化。已经提出了几种防御措施来提高对这些变化的鲁棒性。然而，当前的防御系统只能抵御训练中使用的特定攻击，并且模型通常仍然容易受到其他输入变化的影响。此外，这些方法通常会降低模型在干净图像上的性能，并且不能推广到域外样本。在本文中，我们提出了生成性对抗训练，这是一种同时提高模型对测试集和域外样本的泛化能力以及对未知对抗攻击的鲁棒性的方法。我们不改变图像的低水平预定义方面，而是使用具有分离潜在空间的生成模型生成低水平、中水平和高水平变化的光谱。通过这些示例进行对抗性训练，可以通过观察训练期间的各种输入变化，使模型能够承受范围广泛的攻击。我们表明，我们的方法不仅提高了模型在干净图像和域外样本上的性能，而且使其对不可预见的攻击具有鲁棒性，并且优于以前的工作。我们通过在分类、分割和目标检测等各种任务上展示结果，验证了我们方法的有效性。摘要：While deep neural networks have achieved remarkable success in various computer vision tasks, they often fail to generalize to new domains and subtle variations of input images. Several defenses have been proposed to improve the robustness against these variations. However, current defenses can only withstand the specific attack used in training, and the models often remain vulnerable to other input variations. Moreover, these methods often degrade performance of the model on clean images and do not generalize to out-of-domain samples. In this paper we present Generative Adversarial Training, an approach to simultaneously improve the model's generalization to the test set and out-of-domain samples as well as its robustness to unseen adversarial attacks. Instead of altering a low-level pre-defined aspect of images, we generate a spectrum of low-level, mid-level and high-level changes using generative models with a disentangled latent space. Adversarial training with these examples enable the model to withstand a wide range of attacks by observing a variety of input alterations during training. We show that our approach not only improves performance of the model on clean images and out-of-domain samples but also makes it robust against unforeseen attacks and outperforms prior work. We validate effectiveness of our method by demonstrating results on various tasks such as classification, segmentation and object detection.

半/弱/无/有监督|不确定性|主动学习(1篇)

【1】 GANSER: A Self-supervised Data Augmentation Framework for EEG-based Emotion Recognition 标题：Ganser：一种基于EEG情感识别的自监督数据增强框架链接：https://arxiv.org/abs/2109.03124

作者：Ahi Zhang,Sheng-hua Zhong,Yan Liu 机构：Department of Computing, The Hong Kong Polytechnic University Hong Kong, Hong Kong, China, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China 摘要：基于脑电图（EEG）的情感计算中存在的数据稀缺问题使得使用机器学习算法尤其是深度学习模型难以建立高精度、高稳定性的有效模型。数据增强最近在深度学习模型方面取得了相当大的性能改进：提高了准确性、稳定性，并减少了过度拟合。在本文中，我们提出了一种新的数据扩充框架，即基于生成对抗网络的自监督数据扩充（GANSER）。作为第一个将对抗训练与自监督学习相结合的脑电情感识别框架，该框架能够生成高质量、高多样性的模拟脑电样本。特别是，我们利用对抗性训练来学习脑电发生器，并强制生成的脑电信号近似真实样本的分布，从而确保增强样本的质量。变换函数用于屏蔽部分脑电信号，并迫使发生器根据剩余部分合成潜在的脑电信号，以产生各种各样的样本。引入变换过程中的掩蔽可能性作为先验知识，指导对模拟脑电信号提取可分辨特征，并将分类器推广到增广样本空间。最后，大量的实验表明，我们提出的方法可以帮助情感识别提高性能，并取得最先进的结果。摘要：The data scarcity problem in Electroencephalography (EEG) based affective computing results into difficulty in building an effective model with high accuracy and stability using machine learning algorithms especially deep learning models. Data augmentation has recently achieved considerable performance improvement for deep learning models: increased accuracy, stability, and reduced over-fitting. In this paper, we propose a novel data augmentation framework, namely Generative Adversarial Network-based Self-supervised Data Augmentation (GANSER). As the first to combine adversarial training with self-supervised learning for EEG-based emotion recognition, the proposed framework can generate high-quality and high-diversity simulated EEG samples. In particular, we utilize adversarial training to learn an EEG generator and force the generated EEG signals to approximate the distribution of real samples, ensuring the quality of augmented samples. A transformation function is employed to mask parts of EEG signals and force the generator to synthesize potential EEG signals based on the remaining parts, to produce a wide variety of samples. The masking possibility during transformation is introduced as prior knowledge to guide to extract distinguishable features for simulated EEG signals and generalize the classifier to the augmented sample space. Finally, extensive experiments demonstrate our proposed method can help emotion recognition for performance gain and achieve state-of-the-art results.

迁移|Zero/Few/One-Shot|自适应(5篇)

【1】 On the Convergence of Decentralized Adaptive Gradient Methods 标题：关于分散自适应梯度法的收敛性链接：https://arxiv.org/abs/2109.03194

作者：Xiangyi Chen,Belhal Karimi,Weijie Zhao,Ping Li 机构：Cognitive Computing Lab, Baidu Research, NE ,th St. Bellevue, WA , USA 摘要：自适应梯度方法，包括Adam、AdaGrad及其变体，已经非常成功地用于训练深度学习模型，如神经网络。同时，由于分布式计算的需要，分布式优化算法正迅速成为人们关注的焦点。随着计算能力的增长和在移动设备上使用机器学习模型的需要，需要仔细考虑分布式训练算法的通信成本。在本文中，我们引入了新的收敛的分散自适应梯度方法，并将自适应梯度方法严格地融入到分散训练过程中。具体来说，我们提出了一个通用的算法框架，可以将现有的自适应梯度方法转换为分散的方法。此外，我们还深入分析了该算法框架的收敛性，并证明了如果给定的自适应梯度法在某些特定条件下收敛，则其分散的对应算法也是收敛的。我们通过一个原型方法，即AMSGrad，从理论上和数值上说明了我们的通用分散框架的优点。摘要：Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks. Meanwhile, given the need for distributed computing, distributed optimization algorithms are rapidly becoming a focal point. With the growth of computing power and the need for using machine learning models on mobile devices, the communication cost of distributed training algorithms needs careful consideration. In this paper, we introduce novel convergent decentralized adaptive gradient methods and rigorously incorporate adaptive gradient methods into decentralized training procedures. Specifically, we propose a general algorithmic framework that can convert existing adaptive gradient methods to their decentralized counterparts. In addition, we thoroughly analyze the convergence behavior of the proposed algorithmic framework and show that if a given adaptive gradient method converges, under some specific conditions, then its decentralized counterpart is also convergent. We illustrate the benefit of our generic decentralized framework on a prototype method, i.e., AMSGrad, both theoretically and numerically.

【2】 Few-shot Learning in Emotion Recognition of Spontaneous Speech Using a Siamese Neural Network with Adaptive Sample Pair Formation 标题：基于自适应样本对形成的暹罗神经网络在自发语音情感识别中的小概率学习链接：https://arxiv.org/abs/2109.02915

作者：Kexin Feng,Theodora Chaspari 机构： Chaspari are with the HUman Bio-Behavioral Signals(HUBBS) Laboratory in the Department of Computer Science & Engi-neering at Texas A&M University 备注：IEEE Transactions on Affective Computing, early access article. doi: 10.1109/TAFFC.2021.3109485 摘要：基于语音的机器学习（ML）被认为是一种很有前途的解决方案，用于跟踪现实生活中反映情绪变化的韵律和光谱时间模式，为了解一个人的认知和心理状态提供了一个有价值的窗口。然而，在动态研究中，标记数据的缺乏阻碍了ML模型的可靠训练，而ML模型通常依赖于基于“数据饥饿”分布的学习。该文利用来自动作情感的大量标记语音数据，提出了一种从少量标记样本中自动识别自发语音中情感的多镜头学习方法。少数镜头学习是通过暹罗神经网络的度量学习方法实现的，该神经网络对样本之间的相对距离进行建模，而不是依赖于学习每个情绪相应分布的绝对模式。结果表明，在四个数据集中，即使使用少量标记样本，所提出的度量学习在从自发语音识别情感方面也是可行的。与常用的自适应方法（包括网络微调和对抗性学习）相比，它们进一步证明了所提出的度量学习的优越性能。从这项工作的结果提供了一个基础的动态跟踪人类情感的自发语音有助于现实生活中的心理健康退化的评估。摘要：Speech-based machine learning (ML) has been heralded as a promising solution for tracking prosodic and spectrotemporal patterns in real-life that are indicative of emotional changes, providing a valuable window into one's cognitive and mental state. Yet, the scarcity of labelled data in ambulatory studies prevents the reliable training of ML models, which usually rely on "data-hungry" distribution-based learning. Leveraging the abundance of labelled speech data from acted emotions, this paper proposes a few-shot learning approach for automatically recognizing emotion in spontaneous speech from a small number of labelled samples. Few-shot learning is implemented via a metric learning approach through a siamese neural network, which models the relative distance between samples rather than relying on learning absolute patterns of the corresponding distributions of each emotion. Results indicate the feasibility of the proposed metric learning in recognizing emotions from spontaneous speech in four datasets, even with a small amount of labelled samples. They further demonstrate superior performance of the proposed metric learning compared to commonly used adaptation methods, including network fine-tuning and adversarial learning. Findings from this work provide a foundation for the ambulatory tracking of human emotion in spontaneous speech contributing to the real-life assessment of mental health degradation.

【3】 Self-adaptive deep neural network: Numerical approximation to functions and PDEs 标题：自适应深度神经网络：函数和偏微分方程的数值逼近链接：https://arxiv.org/abs/2109.02839

作者：Zhiqiang Cai,Jingshuang Chen,Min Liu 机构： University Street 备注：submitted to Journal of Computational Physics 摘要：在许多机器学习应用中，为给定的任务设计一个最优的深层神经网络是重要的，也是具有挑战性的。为了解决这个问题，我们引入了一种自适应算法：自适应网络增强（ANE）方法，它被写成训练、估计和增强的循环形式。从一个小的两层神经网络（NN）开始，分步训练用于解决当前NN的优化问题；阶跃估计是使用当前NN的解计算后验估计量/指标；增强的步骤是向当前神经网络中添加新的神经元。本文提出了一种新的基于计算估计器/指示器的网络增强策略，以确定有多少新神经元以及何时应在当前神经网络中添加新层。ANE方法提供了在训练当前神经网络时获得良好初始化的自然过程；此外，我们还介绍了如何初始化新添加的神经元以获得更好的近似值的高级过程。我们证明了ANE方法可以自动设计一个几乎最小的神经网络，用于显示双曲偏微分方程的尖锐过渡层和不连续解的学习函数。摘要：Designing an optimal deep neural network for a given task is important and challenging in many machine learning applications. To address this issue, we introduce a self-adaptive algorithm: the adaptive network enhancement (ANE) method, written as loops of the form train, estimate and enhance. Starting with a small two-layer neural network (NN), the step train is to solve the optimization problem at the current NN; the step estimate is to compute a posteriori estimator/indicators using the solution at the current NN; the step enhance is to add new neurons to the current NN. Novel network enhancement strategies based on the computed estimator/indicators are developed in this paper to determine how many new neurons and when a new layer should be added to the current NN. The ANE method provides a natural process for obtaining a good initialization in training the current NN; in addition, we introduce an advanced procedure on how to initialize newly added neurons for a better approximation. We demonstrate that the ANE method can automatically design a nearly minimal NN for learning functions exhibiting sharp transitional layers as well as discontinuous solutions of hyperbolic partial differential equations.

【4】 Few-shot Learning via Dependency Maximization and Instance Discriminant Analysis 标题：基于依赖最大化和实例判别分析的少概率学习链接：https://arxiv.org/abs/2109.02820

作者：Zejiang Hou,Sun-Yuan Kung 机构：Princeton University 摘要：我们研究了少数镜头学习（FSL）问题，模型学习识别新对象，每个类别的标记训练数据非常少。以往的FSL方法大多采用元学习范式，通过学习大量训练任务来积累归纳偏差，从而解决一个新的看不见的少数镜头任务。相比之下，我们提出了一种简单的方法来利用伴随Few-Shot任务的未标记数据来提高Few-Shot性能。首先，我们提出了一种基于互协方差算子Hilbert-Schmidt范数的相关性最大化方法，该方法最大化了未标记数据的嵌入特征与其标签预测之间的统计相关性，以及支持集上的监督损失。然后，我们使用得到的模型来推断那些未标记数据的伪标签。此外，我们还提出了一种立场判别分析来评估每个伪标记示例的可信度，并将最忠实的示例选择到一个扩展支持集中，以便像第一步一样重新训练模型。我们迭代上述过程，直到未标记数据的伪标签变得稳定。根据标准的转换和半监督FSL设置，我们的实验表明，所提出的方法在四个广泛使用的基准上，包括mini ImageNet、tiered ImageNet、CUB和CIFARF，优于以前的最先进方法。摘要：We study the few-shot learning (FSL) problem, where a model learns to recognize new objects with extremely few labeled training data per category. Most of previous FSL approaches resort to the meta-learning paradigm, where the model accumulates inductive bias through learning many training tasks so as to solve a new unseen few-shot task. In contrast, we propose a simple approach to exploit unlabeled data accompanying the few-shot task for improving few-shot performance. Firstly, we propose a Dependency Maximization method based on the Hilbert-Schmidt norm of the cross-covariance operator, which maximizes the statistical dependency between the embedded feature of those unlabeled data and their label predictions, together with the supervised loss over the support set. We then use the obtained model to infer the pseudo-labels for those unlabeled data. Furthermore, we propose anInstance Discriminant Analysis to evaluate the credibility of each pseudo-labeled example and select the most faithful ones into an augmented support set to retrain the model as in the first step. We iterate the above process until the pseudo-labels for the unlabeled data becomes stable. Following the standard transductive and semi-supervised FSL setting, our experiments show that the proposed method out-performs previous state-of-the-art methods on four widely used benchmarks, including mini-ImageNet, tiered-ImageNet, CUB, and CIFARFS.

【5】 Zero-Shot Open Set Detection by Extending CLIP 标题：基于扩展CLIP的Zero-Shot开集检测链接：https://arxiv.org/abs/2109.02748

作者：Sepideh Esmaeilpour,Bing Liu,Eric Robertson,Lei Shu 机构： University of Illinois at Chicago, PAR Government 摘要：在正则开集检测问题中，使用已知类（也称为闭集类）的样本来训练特殊的分类器。在测试中，分类器可以（1）将已知类的测试样本分类到各自的类中，（2）还可以检测不属于任何已知类的样本（我们说它们属于某些未知或开放集类）。本文研究了零炮开集检测问题，该问题在测试中仍然执行相同的两个任务，但除了使用已知的类名外没有训练。本文提出了一种新颖而简单的方法（称为ZO-CLIP）来解决这个问题。ZO-CLIP建立在通过多模态表示学习实现Zero-Shot分类的最新进展之上。它首先通过在片段顶部训练基于文本的图像描述生成器来扩展预先训练的多模态模型片段。在测试中，它使用扩展模型为每个测试样本生成一些候选未知类名，并基于已知类名和候选未知类名计算置信度分数，用于零炮开集检测。在5个开放集检测基准数据集上的实验结果证实，ZO-CLIP的性能大大优于基线。摘要：In a regular open set detection problem, samples of known classes (also called closed set classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and (2) also detect samples that do not belong to any of the known classes (we say they belong to some unknown or open set classes). This paper studies the problem of zero-shot open-set detection, which still performs the same two tasks in testing but has no training except using the given known class names. This paper proposes a novel and yet simple method (called ZO-CLIP) to solve the problem. ZO-CLIP builds on top of the recent advances in zero-shot classification through multi-modal representation learning. It first extends the pre-trained multi-modal model CLIP by training a text-based image description generator on top of CLIP. In testing, it uses the extended model to generate some candidate unknown class names for each test sample and computes a confidence score based on both the known class names and candidate unknown class names for zero-shot open set detection. Experimental results on 5 benchmark datasets for open set detection confirm that ZO-CLIP outperforms the baselines by a large margin.

强化学习(2篇)

【1】 Optimizing Quantum Variational Circuits with Deep Reinforcement Learning 标题：用深度强化学习优化量子变分电路链接：https://arxiv.org/abs/2109.03188

作者：Owen Lockwood 机构：Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 摘要：量子机器学习（QML）被认为是近期量子器件最有前途的应用之一。然而，量子机器学习模型的优化带来了许多挑战，这些挑战源于硬件的不完善以及在指数尺度希尔BERT空间中导航的基本障碍。在这项工作中，我们评估了现代深度强化学习方法在量子变分电路中增强基于梯度的优化例程的潜力。我们发现，在噪声环境中，强化学习增广优化器始终优于梯度下降优化器。所有代码和预训练权重都可用于复制结果或部署模型https://github.com/lockwo/rl_qvc_opt. 摘要：Quantum Machine Learning (QML) is considered to be one of the most promising applications of near term quantum devices. However, the optimization of quantum machine learning models presents numerous challenges arising from the imperfections of hardware and the fundamental obstacles in navigating an exponentially scaling Hilbert space. In this work, we evaluate the potential of contemporary methods in deep reinforcement learning to augment gradient based optimization routines in quantum variational circuits. We find that reinforcement learning augmented optimizers consistently outperform gradient descent in noisy environments. All code and pretrained weights are available to replicate the results or deploy the models at https://github.com/lockwo/rl_qvc_opt.

【2】 Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions 标题：基于高斯过程和控制屏障函数的安全临界模块时态深度强化学习链接：https://arxiv.org/abs/2109.02791

作者：Mingyu Cai,Cristian-Ioan Vasile 机构： especially while the learning 1Department of Mechanical Engineering, Lehigh University 备注：Under Review 摘要：强化学习（RL）是一种很有前途的方法，在实际应用中取得的成功有限，因为确保安全探索或促进充分利用是控制具有未知模型和测量不确定性的机器人系统的一项挑战。对于连续空间（状态空间和动作空间）上的复杂任务，这种学习问题变得更加棘手。在本文中，我们提出了一个基于学习的控制框架，包括以下几个方面：（1）利用线性时态逻辑（LTL）在无限的视界上简化复杂的任务，并将其转化为一种新的自动机结构(2）在形式保证下，我们提出了一个创新的RL代理奖励方案，使得全局最优策略使满足LTL规范的概率最大化(3）基于奖励成形技术，我们开发了一个模块化的策略梯度结构，利用自动机结构的优点来分解总体任务，提高学习控制器的性能(4）通过结合高斯过程（GPs）对不确定动态系统进行估计，我们利用指数控制屏障函数（ECBF）合成了一种基于模型的防护措施，以解决高阶相对度问题。此外，我们还利用LTL自动机和ECBF的特性构建了一个指导过程，以进一步提高勘探效率。最后，我们通过几个机器人环境验证了该框架的有效性。我们证明了这种基于ECBF的模块化深度RL算法在训练过程中获得了近乎完美的成功率和高概率置信度的安全防护。摘要：Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications, because ensuring safe exploration or facilitating adequate exploitation is a challenges for controlling robotic systems with unknown models and measurement uncertainties. Such a learning problem becomes even more intractable for complex tasks over continuous space (state-space and action-space). In this paper, we propose a learning-based control framework consisting of several aspects: (1) linear temporal logic (LTL) is leveraged to facilitate complex tasks over an infinite horizons which can be translated to a novel automaton structure; (2) we propose an innovative reward scheme for RL-agent with the formal guarantee such that global optimal policies maximize the probability of satisfying the LTL specifications; (3) based on a reward shaping technique, we develop a modular policy-gradient architecture utilizing the benefits of automaton structures to decompose overall tasks and facilitate the performance of learned controllers; (4) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safeguard using Exponential Control Barrier Functions (ECBFs) to address problems with high-order relative degrees. In addition, we utilize the properties of LTL automatons and ECBFs to construct a guiding process to further improve the efficiency of exploration. Finally, we demonstrate the effectiveness of the framework via several robotic environments. And we show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability confidence during training.

元学习(1篇)

【1】 Hyper Meta-Path Contrastive Learning for Multi-Behavior Recommendation 标题：面向多行为推荐的超元路径对比学习链接：https://arxiv.org/abs/2109.02859

作者：Haoran Yang,Hongxu Chen,Lin Li,Philip S. Yu,Guandong Xu 机构：¶School of Computer Science, University of Technology Sydney, Sydney, Australia, ‡School of Computer and Artificial Intelligence, Wuhan University of Technology, Wuhan, China, §Department of Computer Science, University of Illinois at Chicago, Chicago, U.S.A 备注：Accepted by ICDM 2021 as a regular paper 摘要：多行为信息下的用户购买预测是当前推荐系统面临的一个难题。通过利用图形神经网络（GNNs）或多任务学习的优势，已经提出了各种方法来解决这一问题。然而，现有的研究大多没有考虑用户不同行为之间的复杂依赖关系。他们利用简单固定的方案，如邻域信息聚合或向量的数学计算，融合不同用户行为的嵌入，以获得统一的嵌入来表示用户的行为模式，这些模式将用于下游推荐任务。为了应对这一挑战，本文首先提出了超元路径的概念来构造超元路径或超元图，以明确说明用户不同行为之间的依赖关系。如何从超元路径为用户获得统一的嵌入，同时避免前面提到的限制是至关重要的。由于最近图形对比学习的成功，我们利用它来自适应地学习用户行为模式的嵌入，而不是指定一个固定的方案来理解不同行为之间的依赖关系。通过与超元路径的耦合，提出了一种新的基于图形对比学习的框架，即HMG-CR，该框架在广泛的对比实验中始终显著优于所有基线。摘要：User purchasing prediction with multi-behavior information remains a challenging problem for current recommendation systems. Various methods have been proposed to address it via leveraging the advantages of graph neural networks (GNNs) or multi-task learning. However, most existing works do not take the complex dependencies among different behaviors of users into consideration. They utilize simple and fixed schemes, like neighborhood information aggregation or mathematical calculation of vectors, to fuse the embeddings of different user behaviors to obtain a unified embedding to represent a user's behavioral patterns which will be used in downstream recommendation tasks. To tackle the challenge, in this paper, we first propose the concept of hyper meta-path to construct hyper meta-paths or hyper meta-graphs to explicitly illustrate the dependencies among different behaviors of a user. How to obtain a unified embedding for a user from hyper meta-paths and avoid the previously mentioned limitations simultaneously is critical. Thanks to the recent success of graph contrastive learning, we leverage it to learn embeddings of user behavior patterns adaptively instead of assigning a fixed scheme to understand the dependencies among different behaviors. A new graph contrastive learning based framework is proposed by coupling with hyper meta-paths, namely HMG-CR, which consistently and significantly outperforms all baselines in extensive comparison experiments.

医学相关(4篇)

【1】 Fruit-CoV: An Efficient Vision-based Framework for Speedy Detection and Diagnosis of SARS-CoV-2 Infections Through Recorded Cough Sounds 标题：水果冠状病毒：一种高效的基于视觉的咳嗽音快速检测和诊断SARS-CoV-2感染的框架链接：https://arxiv.org/abs/2109.03219

作者：Long H. Nguyen,Nhat Truong Pham,Van Huong Do,Liu Tai Nguyen,Thanh Tin Nguyen,Van Dung Do,Hai Nguyen,Ngoc Duy Nguyen 机构：Division of Computational Mechatronics, Institute for Computational Science, Ton Duc Thang University, Ho Chi Minh City, Vietnam, ASICLAND, Suwon, South Korea, Human Computer Interaction Lab, Sejong University, Seoul, South Korea 备注：4 pages 摘要：SARS-CoV-2俗称COVID-19，于2019年12月首次爆发。这种致命病毒已在全球传播，自2020年3月以来参与了全球大流行疾病。此外，最近一种名为Delta的SARS-CoV-2变种具有难以控制的传染性，导致全球400多万人死亡。因此，在国内拥有SARS-CoV-2的自我检测服务至关重要。在这项研究中，我们介绍了水果冠状病毒，一个两阶段的视觉框架，它能够通过记录的咳嗽声检测SARS-CoV-2感染。具体来说，我们将声音转换为对数Mel谱图，并在第一阶段使用EfficientNet-V2网络提取其视觉特征。在第二阶段，我们使用从大规模预训练音频神经网络中提取的14个卷积层进行音频模式识别（PANNs）和波形图Log-Mel-CNN来聚合Log-Mel谱图的特征表示。最后，利用组合特征训练二值分类器。在这项研究中，我们使用了爱科维恩115M挑战赛提供的数据集，其中包括在越南、印度和瑞士收集的总共7371个记录的咳嗽声。实验结果表明，我们提出的模型达到了92.8%的AUC分数，并在AICovidVN挑战排行榜上排名第一。更重要的是，我们提出的框架可以集成到呼叫中心或VoIP系统中，以通过在线/记录的咳嗽声加速检测SARS-CoV-2感染。摘要：SARS-CoV-2 is colloquially known as COVID-19 that had an initial outbreak in December 2019. The deadly virus has spread across the world, taking part in the global pandemic disease since March 2020. In addition, a recent variant of SARS-CoV-2 named Delta is intractably contagious and responsible for more than four million deaths over the world. Therefore, it is vital to possess a self-testing service of SARS-CoV-2 at home. In this study, we introduce Fruit-CoV, a two-stage vision framework, which is capable of detecting SARS-CoV-2 infections through recorded cough sounds. Specifically, we convert sounds into Log-Mel Spectrograms and use the EfficientNet-V2 network to extract its visual features in the first stage. In the second stage, we use 14 convolutional layers extracted from the large-scale Pretrained Audio Neural Networks for audio pattern recognition (PANNs) and the Wavegram-Log-Mel-CNN to aggregate feature representations of the Log-Mel Spectrograms. Finally, we use the combined features to train a binary classifier. In this study, we use a dataset provided by the AICovidVN 115M Challenge, which includes a total of 7371 recorded cough sounds collected throughout Vietnam, India, and Switzerland. Experimental results show that our proposed model achieves an AUC score of 92.8% and ranks the 1st place on the leaderboard of the AICovidVN Challenge. More importantly, our proposed framework can be integrated into a call center or a VoIP system to speed up detecting SARS-CoV-2 infections through online/recorded cough sounds.

【2】 A Scalable AI Approach for Clinical Trial Cohort Optimization 标题：一种可扩展的人工智能临床试验队列优化方法链接：https://arxiv.org/abs/2109.02808

作者：Xiong Liu,Cheng Shi,Uday Deore,Yingbo Wang,Myah Tran,Iya Khalil,Murthy Devarakonda 机构： AI Innovation Center, Novartis, Cambridge, MA, USA, RWE Data Science, Novartis Pharma, East Hanover, NJ, USA, Global Drug Development, Novartis, East Hanover, NJ, USA, Global Drug Development, Novartis, Basel, Switzerland 备注：PharML 2021 (Machine Learning for Pharma and Healthcare Applications) at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021) 摘要：FDA一直在推广可通过扩大资格标准提高临床试验人群多样性的登记做法。然而，如何扩大资格仍然是一项重大挑战。我们提出了一种队列优化（AICO）的人工智能方法，通过基于转换器的自然语言处理合格标准，并使用真实数据评估标准。该方法可以从大量相关试验中提取通用资格标准变量，并测量试验设计对真实患者的普遍性。它克服了现有手动方法的可扩展性限制，并能够快速模拟感兴趣疾病的合格标准设计。乳腺癌试验设计的案例研究证明了该方法在提高试验可推广性方面的实用性。摘要：FDA has been promoting enrollment practices that could enhance the diversity of clinical trial populations, through broadening eligibility criteria. However, how to broaden eligibility remains a significant challenge. We propose an AI approach to Cohort Optimization (AICO) through transformer-based natural language processing of the eligibility criteria and evaluation of the criteria using real-world data. The method can extract common eligibility criteria variables from a large set of relevant trials and measure the generalizability of trial designs to real-world patients. It overcomes the scalability limits of existing manual methods and enables rapid simulation of eligibility criteria design for a disease of interest. A case study on breast cancer trial design demonstrates the utility of the method in improving trial generalizability.

【3】 Backpropagation and fuzzy algorithm Modelling to Resolve Blood Supply Chain Issues in the Covid-19 Pandemic 标题：反向传播和模糊算法建模解决冠状病毒大流行中的血液供应链问题链接：https://arxiv.org/abs/2109.02645

作者：Aan Erlansari,Rusdi Effendi,Funny Farady C,Andang Wijanarko,Boko Susilo,Reza Hardiansyah 机构：University of Bengkulu 摘要：血液短缺及其不确定的需求已成为全世界所有国家的一个重大问题。因此，本研究旨在为印度尼西亚本古鲁新冠病毒-19大流行期间的血液分布问题提供解决方案。反向传播算法用于提高发现可用和潜在供体的可能性。此外，还测量了献血的距离、年龄和长度，以便在需要时找到合适的献血者。反向传播使用三个输入层对符合条件的供体进行分类，即年龄、体重和偏差。此外，系统通过其查询功能，通过Fuzzy Tahani自动统计变量，同时访问庞大的数据库。摘要：Bloodstock shortages and its uncertain demand has become a major problem for all countries worldwide. Therefore, this study aims to provide solution to the issues of blood distribution during the Covid-19 Pandemic at Bengkulu, Indonesia. The Backpropagation algorithm was used to improve the possibility of discovering available and potential donors. Furthermore, the distances, age, and length of donation were measured to obtain the right person to donate blood when it needed. The Backpropagation uses three input layers to classify eligible donors, namely age, body, weight, and bias. In addition, the system through its query automatically counts the variables via the Fuzzy Tahani and simultaneously access the vast database.

【4】 Analysis of MRI Biomarkers for Brain Cancer Survival Prediction 标题：MRI生物标志物在脑癌生存期预测中的应用分析链接：https://arxiv.org/abs/2109.02785

作者：Subhashis Banerjee,Sushmita Mitra,Lawrence O. Hall 机构：Machine Intelligence Unit, Indian Statistical Institute, Kolkata , India, Department of Computer Science and Engineering, University of South Florida, E. Fowler Ave., Tampa, FL. ,-, United States 摘要：从多模式MRI预测脑癌患者的总生存率（OS）是一个具有挑战性的研究领域。现有的大多数关于生存预测的文献是基于放射生物学特征的，它不考虑非生物因素或患者的功能神经状态。此外，选择合适的生存截止线和存在审查数据会造成进一步的问题。深度学习模型在操作系统预测中的应用也受到了限制，因为缺乏大型注释的公共可用数据集。在这个场景中，我们分析了两个新的神经影像特征家族的潜力，它们是从脑区图谱和空间栖息地中提取的，以及经典的放射和几何特征；研究其对总生存率分析的综合预测能力。提出了一种基于网格搜索的交叉验证策略，根据预测能力同时选择和评估预测能力最强的特征子集。采用Cox比例风险（CoxPH）模型进行单变量特征选择，然后通过三个多变量简约模型（即。Coxnet、随机生存森林（RSF）和生存支持向量机（SSVM）。本研究中使用的脑癌MRI数据取自癌症影像档案（TCIA）提供的两个开放获取集合TCGA-GBM和TCGA-LGG。从癌症基因组图谱（TCGA）下载每位患者的相应生存数据。使用RSF和选择的最佳$24$功能，获得了$0.82\pm.10$的高交叉验证$C-index$分数。年龄被发现是最重要的生物学预测因子。分别从地块划分、栖息地、辐射和基于区域的要素组中选择了9美元、6美元、6美元和2美元的要素。摘要：Prediction of Overall Survival (OS) of brain cancer patients from multi-modal MRI is a challenging field of research. Most of the existing literature on survival prediction is based on Radiomic features, which does not consider either non-biological factors or the functional neurological status of the patient(s). Besides, the selection of an appropriate cut-off for survival and the presence of censored data create further problems. Application of deep learning models for OS prediction is also limited due to the lack of large annotated publicly available datasets. In this scenario we analyse the potential of two novel neuroimaging feature families, extracted from brain parcellation atlases and spatial habitats, along with classical radiomic and geometric features; to study their combined predictive power for analysing overall survival. A cross validation strategy with grid search is proposed to simultaneously select and evaluate the most predictive feature subset based on its predictive power. A Cox Proportional Hazard (CoxPH) model is employed for univariate feature selection, followed by the prediction of patient-specific survival functions by three multivariate parsimonious models viz. Coxnet, Random survival forests (RSF) and Survival SVM (SSVM). The brain cancer MRI data used for this research was taken from two open-access collections TCGA-GBM and TCGA-LGG available from The Cancer Imaging Archive (TCIA). Corresponding survival data for each patient was downloaded from The Cancer Genome Atlas (TCGA). A high cross validation $C-index$ score of $0.82\pm.10$ was achieved using RSF with the best $24$ selected features. Age was found to be the most important biological predictor. There were $9$, $6$, $6$ and $2$ features selected from the parcellation, habitat, radiomic and region-based feature groups respectively.

推荐(1篇)

【1】 Recommendation Fairness: From Static to Dynamic 标题：推荐公平：从静电到动态链接：https://arxiv.org/abs/2109.03150

作者：Dell Zhang,Jun Wang 机构：ByteDance AI Lab, London, UK, University College London 备注：A position paper for the FAccTRec-2021 workshop 摘要：由于需要捕捉用户不断变化的兴趣并优化其长期体验，越来越多的推荐系统开始将推荐建模为马尔可夫决策过程，并采用强化学习来解决该问题。对推荐系统公平性的研究是否应该遵循从静态评估和一次性干预到动态监控和不间断控制的相同趋势？在本文中，我们首先描述了推荐系统的最新发展，然后讨论了如何将公平性融入到推荐的强化学习技术中。此外，我们认为，为了进一步促进推荐公平性，我们可以考虑在随机博弈的一般框架中考虑多智能体（博弈论）优化、多目标（帕累托）优化和基于仿真的优化。摘要：Driven by the need to capture users' evolving interests and optimize their long-term experiences, more and more recommender systems have started to model recommendation as a Markov decision process and employ reinforcement learning to address the problem. Shouldn't research on the fairness of recommender systems follow the same trend from static evaluation and one-shot intervention to dynamic monitoring and non-stop control? In this paper, we portray the recent developments in recommender systems first and then discuss how fairness could be baked into the reinforcement learning techniques for recommendation. Moreover, we argue that in order to make further progress in recommendation fairness, we may want to consider multi-agent (game-theoretic) optimization, multi-objective (Pareto) optimization, and simulation-based optimization, in the general framework of stochastic games.

自动驾驶|车辆|车道检测等(1篇)

【1】 OdoNet: Untethered Speed Aiding for Vehicle Navigation Without Hardware Wheeled Odometer 标题：OdoNet：无硬件轮式里程表的车辆导航无绳速度辅助链接：https://arxiv.org/abs/2109.03091

作者：Hailiang Tang,Xiaoji Niu,Tisheng Zhang,You Li,Jingnan Liu 机构：Wuhan , China., Center, Wuhan University, Wuhan , China, and also with the, Collaborative Innovation Center of Geospatial Technology, Wuhan University, Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 备注：13 pages, 15 figures 摘要：里程表已被证明能显著提高全球导航卫星系统/惯性导航系统（GNSS/INS）组合车辆导航在受到GNSS挑战的环境中的精度。然而，里程表在许多应用中无法使用，尤其是对于售后设备。为了在不使用硬件轮式里程表的情况下应用前进速度辅助，我们提出了一种无约束的基于一维卷积神经网络（CNN）的伪里程表模型ODNET，该模型可以作为轮式里程表的替代方案。已经进行了专门的实验，以验证Ordonet的可行性和鲁棒性。结果表明，IMU个性、车辆载荷和道路条件对齿形网的鲁棒性和精度影响不大，而IMU偏差和安装角度可能会显著损坏齿形网。因此，增加了数据清理程序，以有效缓解IMU偏置和安装角度的影响。与仅使用非完整约束（NHC）的过程相比，采用伪里程表后，定位误差降低了68%左右，而硬件轮式里程表的定位误差降低了74%左右。综上所述，提出的ODNET可以作为一种无约束的车辆导航伪里程表，可以有效地提高GNSS环境下定位的精度和可靠性。摘要：Odometer has been proven to significantly improve the accuracy of the Global Navigation Satellite System / Inertial Navigation System (GNSS/INS) integrated vehicle navigation in GNSS-challenged environments. However, the odometer is inaccessible in many applications, especially for aftermarket devices. To apply forward speed aiding without hardware wheeled odometer, we propose OdoNet, an untethered one-dimensional Convolution Neural Network (CNN)-based pseudo-odometer model learning from a single Inertial Measurement Unit (IMU), which can act as an alternative to the wheeled odometer. Dedicated experiments have been conducted to verify the feasibility and robustness of the OdoNet. The results indicate that the IMU individuality, the vehicle loads, and the road conditions have little impact on the robustness and precision of the OdoNet, while the IMU biases and the mounting angles may notably ruin the OdoNet. Thus, a data-cleaning procedure is added to effectively mitigate the impacts of the IMU biases and the mounting angles. Compared to the process using only non-holonomic constraint (NHC), after employing the pseudo-odometer, the positioning error is reduced by around 68%, while the percentage is around 74% for the hardware wheeled odometer. In conclusion, the proposed OdoNet can be employed as an untethered pseudo-odometer for vehicle navigation, which can efficiently improve the accuracy and reliability of the positioning in GNSS-denied environments.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 Pano3D: A Holistic Benchmark and a Solid Baseline for 360^o Depth Estimation标题：Pano3D：360^o深度估计的整体基准和可靠基线链接：https://arxiv.org/abs/2109.02749

作者：Georgios Albanis,Nikolaos Zioulis,Petros Drakoulis,Vasileios Gkitsas,Vladimiros Sterzentsenko,Federico Alvarez,Dimitrios Zarpalas,Petros Daras 机构： Centre for Research and Technology Hellas, Thessaloniki, Greece, Universidad Polit´ecnica de Madrid, Madrid, Spain, vcl,d.github.ioPano,D 备注：Presented at the OmniCV CVPR 2021 workshop. Code, models, data and demo at this https URL 摘要：Pano3D是一种新的基于球形全景图的深度估计基准。它旨在评估所有深度估计特征的性能，主要的直接深度估计性能目标精度和准确性，以及次要特征、边界保持和平滑度。此外，Pano3D从典型的数据集内评估转移到数据集间性能评估。通过将不可见数据归纳为不同的测试分割，Pano3D代表了一个360美元深度估计的整体基准。我们使用它作为扩展分析的基础，试图提供深度估计经典选择的见解。这为全景深度提供了坚实的基线，后续工作可以以此为基础指导未来的进展。摘要：Pano3D is a new benchmark for depth estimation from spherical panoramas. It aims to assess performance across all depth estimation traits, the primary direct depth estimation performance targeting precision and accuracy, and also the secondary traits, boundary preservation, and smoothness. Moreover, Pano3D moves beyond typical intra-dataset evaluation to inter-dataset performance assessment. By disentangling the capacity to generalize to unseen data into different test splits, Pano3D represents a holistic benchmark for $360^o$ depth estimation. We use it as a basis for an extended analysis seeking to offer insights into classical choices for depth estimation. This results in a solid baseline for panoramic depth that follow-up works can build upon to steer future progress.

推理|分析|理解|解释(4篇)

【1】 ExCode-Mixed: Explainable Approaches towards Sentiment Analysis on Code-Mixed Data using BERT models 标题：Excode-Mixed：使用ERT模型对混合代码数据进行情感分析的可解释方法链接：https://arxiv.org/abs/2109.03200

作者：Aman Priyanshu,Aleti Vardhan,Sudarshan Sivakumar,Supriti Vijay,Nipuna Chhabra 机构：Manipal Institute of Technology 备注：3 pages, 1 figure 摘要：在印度等国家，社交媒体网站的使用越来越多，导致了大量代码混合数据。对这些数据的情绪分析可以为人们的观点和观点提供完整的见解。开发强大的解释性技术，解释模型为什么做出预测变得至关重要。在本文中，我们提出了一种适当的方法，将可解释的方法集成到代码混合情感分析中。摘要：The increasing use of social media sites in countries like India has given rise to large volumes of code-mixed data. Sentiment analysis of this data can provide integral insights into people's perspectives and opinions. Developing robust explainability techniques which explain why models make their predictions becomes essential. In this paper, we propose an adequate methodology to integrate explainable approaches into code-mixed sentiment analysis.

【2】 Early ICU Mortality Prediction and Survival Analysis for Respiratory Failure 标题：呼吸衰竭ICU早期病死率预测及生存分析链接：https://arxiv.org/abs/2109.03048

作者：Yilin Yin,Chun-An Chou 机构：The authors are with the Department of Mechanical and IndustrialEngineering, Northeastern University 摘要：呼吸衰竭是重症监护病房的主要死亡原因之一。在新冠肺炎爆发期间，重症监护病房因呼吸衰竭相关综合征而出现了机械通气的极度短缺。为此，呼吸衰竭患者的早期死亡风险预测可为临床治疗和资源管理提供及时支持。在这项研究中，我们提出了一种基于前24小时ICU生理数据的呼吸衰竭患者早期死亡风险预测的动态建模方法。我们提出的模型在eICU协作数据库上进行了验证。与最先进的预测模型相比，我们在ICU入院后第5天取得了较高的AUROC表现（80-83%），AUCPR显著提高4%。此外，我们还说明了生存曲线包括早期ICU入院生存分析的时变信息。摘要：Respiratory failure is the one of major causes of death in critical care unit. During the outbreak of COVID-19, critical care units experienced an extreme shortage of mechanical ventilation because of respiratory failure related syndromes. To help this, the early mortality risk prediction in patients who suffer respiratory failure can provide timely support for clinical treatment and resource management. In the study, we propose a dynamic modeling approach for early mortality risk prediction of the respiratory failure patients based on the first 24 hours ICU physiological data. Our proposed model is validated on the eICU collaborate database. We achieved a high AUROC performance (80-83%) and significantly improved AUCPR 4% on Day 5 since ICU admission, compared to the state-of-art prediction models. In addition, we illustrated that the survival curve includes the time-varying information for the early ICU admission survival analysis.

【3】 Understanding Model Drift in a Large Cellular Network 标题：理解大型蜂窝网络中的模型漂移链接：https://arxiv.org/abs/2109.03011

作者：Shinan Liu,Francesco Bronzino,Paul Schmitt,Nick Feamster,Ricardo Borges,Hector Garcia Crespo,Brian Ward 机构：University of Chicago, †Université Savoie Mont Blanc, ‡Information Sciences Institute, §Verizon 摘要：运营网络越来越多地将机器学习模型用于各种任务，包括检测异常、推断应用程序性能和预测需求。准确的模型很重要，但由于概念漂移，随着时间的推移，准确度可能会降低，因此数据的特征随时间变化（数据漂移），或者特征与目标预测值之间的关系随时间变化（模型漂移）。漂移检测非常重要，因为基础数据的属性或与目标预测的关系的变化可能需要模型重新训练，这可能非常耗时和昂贵。从软件升级到季节性变化，再到用户行为的变化，概念漂移在运营网络中出现的原因多种多样。然而，尽管漂移在网络中很普遍，但其范围和对预测精度的影响尚未得到广泛研究。本文在需求预测的背景下，对美国大型蜂窝网络中的概念漂移进行了初步探讨。我们发现，概念漂移主要是由于数据漂移引起的，它出现在不同的关键绩效指标（KPI）、模型、训练集大小和时间间隔中。对于预测下行链路容量的特定问题，我们确定了概念漂移的来源。周和季节模式引入了高频率和低频模型漂移，而灾害和升级导致了由于外部冲击而产生的突然漂移。人口密度高、交通量低、速度快的地区也往往与更多的概念漂移相关。对概念漂移贡献最大的特征是用户设备（UE）下行链路分组、UE上行链路分组和实时传输协议（RTP）总接收分组。摘要：Operational networks are increasingly using machine learning models for a variety of tasks, including detecting anomalies, inferring application performance, and forecasting demand. Accurate models are important, yet accuracy can degrade over time due to concept drift, whereby either the characteristics of the data change over time (data drift) or the relationship between the features and the target predictor change over time (model drift). Drift is important to detect because changes in properties of the underlying data or relationships to the target prediction can require model retraining, which can be time-consuming and expensive. Concept drift occurs in operational networks for a variety of reasons, ranging from software upgrades to seasonality to changes in user behavior. Yet, despite the prevalence of drift in networks, its extent and effects on prediction accuracy have not been extensively studied. This paper presents an initial exploration into concept drift in a large cellular network in the United States for a major metropolitan area in the context of demand forecasting. We find that concept drift arises largely due to data drift, and it appears across different key performance indicators (KPIs), models, training set sizes, and time intervals. We identify the sources of concept drift for the particular problem of forecasting downlink volume. Weekly and seasonal patterns introduce both high and low-frequency model drift, while disasters and upgrades result in sudden drift due to exogenous shocks. Regions with high population density, lower traffic volumes, and higher speeds also tend to correlate with more concept drift. The features that contribute most significantly to concept drift are User Equipment (UE) downlink packets, UE uplink packets, and Real-time Transport Protocol (RTP) total received packets.

【4】 Prescriptive Process Monitoring Under Resource Constraints: A Causal Inference Approach 标题：资源约束下的规定性过程监控：一种因果推理方法链接：https://arxiv.org/abs/2109.02894

作者：Mahmoud Shoush,Marlon Dumas 机构：University of Tartu, Tartu, Estonia 摘要：规定性流程监控是一系列通过在运行时触发干预来优化业务流程性能的技术。现有的规定性过程监控技术假设可能触发的干预数量是无限的。然而，在实践中，具体的干预措施消耗有限的资源。例如，在贷款发放过程中，干预可能包括准备替代贷款，以增加申请人获得贷款的机会。这种干预需要信贷官员花费一定的时间，因此，不可能在所有情况下触发这种干预。本文提出了一种规定性的过程监控技术，在固定资源约束下触发干预措施以优化成本函数。所提出的技术依赖于预测建模来识别可能导致负面结果的案例，并结合因果推理来估计干预对案例结果的影响。然后利用这些产出将资源分配给干预措施，以使成本函数最大化。初步的经验评估表明，与纯粹的预测（非因果）基线相比，拟议的方法产生了更高的净收益。摘要：Prescriptive process monitoring is a family of techniques to optimize the performance of a business process by triggering interventions at runtime. Existing prescriptive process monitoring techniques assume that the number of interventions that may be triggered is unbounded. In practice, though, specific interventions consume resources with finite capacity. For example, in a loan origination process, an intervention may consist of preparing an alternative loan offer to increase the applicant's chances of taking a loan. This intervention requires a certain amount of time from a credit officer, and thus, it is not possible to trigger this intervention in all cases. This paper proposes a prescriptive process monitoring technique that triggers interventions to optimize a cost function under fixed resource constraints. The proposed technique relies on predictive modeling to identify cases that are likely to lead to a negative outcome, in combination with causal inference to estimate the effect of an intervention on the outcome of the case. These outputs are then used to allocate resources to interventions to maximize a cost function. A preliminary empirical evaluation suggests that the proposed approach produces a higher net gain than a purely predictive (non-causal) baseline.

检测相关(4篇)

【1】 BERT based classification system for detecting rumours on Twitter 标题：基于BERT的Twitter谣言检测分类系统链接：https://arxiv.org/abs/2109.02975

作者：Rini Anggrainingsih,Ghulam Mubashar Hassan,Amitava Datta 备注：Consists of 10 pages, 5 figures, and 8 tables, has been submitted to IEEE transactions on Computational and Social Systems (still underreview process) 摘要：社交媒体在舆论形成中的作用对社会的各个领域都有深远的影响。尽管社交媒体提供了表达新闻和观点的平台，但由于Twitter和Facebook等平台上的帖子数量巨大，因此很难控制帖子的质量。错误信息和谣言会对社会产生持久的影响，因为它们往往会影响人们的观点，也可能会激发人们的非理性行为。因此，发现并消除这些平台上的谣言非常重要。防止谣言传播的唯一方法是通过自动检测和分类社交媒体帖子。本文的重点是Twitter社交媒体，因为从Twitter收集数据相对容易。以前的大多数研究都使用监督学习方法对推特上的谣言进行分类。这些方法依赖于特征提取，从推文文本中获取内容和上下文特征，以区分谣言和非谣言。但是，考虑到推特的数量，手动提取特征非常耗时。我们提出了一种新的方法来处理这个问题，即使用BERT的句子嵌入来识别Twitter上的谣言，而不是通常的特征提取技术。我们使用BERT进行句子嵌入，根据tweet的上下文含义将每条tweet的句子表示为一个向量。我们使用各种监督学习技术将这些向量分为谣言和非谣言。与以前的方法相比，我们基于BERT的模型提高了大约10%的精确度。摘要：The role of social media in opinion formation has far-reaching implications in all spheres of society. Though social media provide platforms for expressing news and views, it is hard to control the quality of posts due to the sheer volumes of posts on platforms like Twitter and Facebook. Misinformation and rumours have lasting effects on society, as they tend to influence people's opinions and also may motivate people to act irrationally. It is therefore very important to detect and remove rumours from these platforms. The only way to prevent the spread of rumours is through automatic detection and classification of social media posts. Our focus in this paper is the Twitter social medium, as it is relatively easy to collect data from Twitter. The majority of previous studies used supervised learning approaches to classify rumours on Twitter. These approaches rely on feature extraction to obtain both content and context features from the text of tweets to distinguish rumours and non-rumours. Manually extracting features however is time-consuming considering the volume of tweets. We propose a novel approach to deal with this problem by utilising sentence embedding using BERT to identify rumours on Twitter, rather than the usual feature extraction techniques. We use sentence embedding using BERT to represent each tweet's sentences into a vector according to the contextual meaning of the tweet. We classify those vectors into rumours or non-rumours by using various supervised learning techniques. Our BERT based models improved the accuracy by approximately 10% as compared to previous methods.

【2】 FastAudio: A Learnable Audio Front-End for Spoof Speech Detection 标题：FastAudio：一种用于欺骗语音检测的可学习音频前端链接：https://arxiv.org/abs/2109.02774

作者：Quchen Fu,Zhongwei Teng,Jules White,Maria Powell,Douglas C. Schmidt 摘要：语音助手，如智能音箱，已经在流行中爆炸。据目前估计，美国成年人中智能扬声器的采用率已超过35%。制造商集成了说话人识别技术，试图确定说话人的身份，为同一家庭的不同成员提供个性化服务。说话人识别也可以在控制智能扬声器的使用方式方面发挥重要作用。例如，在播放音乐时正确识别用户并不重要。但是，当大声阅读用户的电子邮件时，关键是要正确验证发出请求的说话人是授权用户。因此，说话人验证系统需要验证说话人身份，作为网关守卫，以防止各种旨在模拟注册用户的欺骗攻击。本文将流行的可学习前端与下游任务（端到端）进行比较，前者通过联合训练学习音频的表示。我们通过定义两种通用架构对前端进行分类，然后根据学习约束分析这两种类型的过滤阶段。我们建议用一个可以更好地适应反欺骗任务的可学习层来替换固定过滤器组。然后，使用两个流行的后端测试拟议的FastAudio前端，以测量ASVspoof 2019数据集的LA轨道上的性能。与固定前端相比，FastAudio前端实现了27%的相对改进，在这项任务上优于所有其他可学习的前端。摘要：Voice assistants, such as smart speakers, have exploded in popularity. It is currently estimated that the smart speaker adoption rate has exceeded 35% in the US adult population. Manufacturers have integrated speaker identification technology, which attempts to determine the identity of the person speaking, to provide personalized services to different members of the same family. Speaker identification can also play an important role in controlling how the smart speaker is used. For example, it is not critical to correctly identify the user when playing music. However, when reading the user's email out loud, it is critical to correctly verify the speaker that making the request is the authorized user. Speaker verification systems, which authenticate the speaker identity, are therefore needed as a gatekeeper to protect against various spoofing attacks that aim to impersonate the enrolled user. This paper compares popular learnable front-ends which learn the representations of audio by joint training with downstream tasks (End-to-End). We categorize the front-ends by defining two generic architectures and then analyze the filtering stages of both types in terms of learning constraints. We propose replacing fixed filterbanks with a learnable layer that can better adapt to anti-spoofing tasks. The proposed FastAudio front-end is then tested with two popular back-ends to measure the performance on the LA track of the ASVspoof 2019 dataset. The FastAudio front-end achieves a relative improvement of 27% when compared with fixed front-ends, outperforming all other learnable front-ends on this task.

【3】 gen2Out: Detecting and Ranking Generalized Anomalies 标题：Gen2Out：检测和排序广义异常链接：https://arxiv.org/abs/2109.02704

作者：Meng-Chieh Lee,Shubhranshu Shekhar,Christos Faloutsos,T. Noah Hutson,Leon Iasemidis 机构：Carnegie Mellon University, Louisiana Tech University 备注：Under submission at IEEE International Conference on Big Data (Big Data), 2021, December 摘要：在m维数据点云中，我们如何发现单点异常以及组异常，并对其进行排序？我们是第一个将异常检测推广到两个维度的人：第一个维度是我们在一个统一的视图下处理点异常和组异常——我们将它们称为广义异常。第二个维度是gen2Out不仅检测异常，而且按照可疑顺序对异常进行排序。异常的检测和排序有许多应用：例如，在癫痫患者的脑电图记录中，异常可能表示癫痫发作；在计算机网络流量数据中，它可能表示电源故障或DoS/DDoS攻击。我们从设定一些合理的公理开始；令人惊讶的是，早期的方法没有一个能通过所有的公理。我们的主要贡献是gen2Out算法，该算法具有以下可取的特性：（a）遵循检测器公理的原则性和合理的异常评分，（b）双重通用性，因为它可以检测并对广义异常（点异常和组异常）进行排序，（c）可伸缩性，它快速且可伸缩，与输入大小成线性关系(d）在真实世界癫痫记录（200GB）上的实验证明了gen2Out的有效性，这一点得到了临床医生的证实。在27个真实基准数据集上的实验表明，gen2Out可以检测地面真值组，在准确性上与点异常基线算法相匹配或优于点异常基线算法，对组异常没有竞争，在股票机器上100万个数据点需要大约2分钟的时间。摘要：In a cloud of m-dimensional data points, how would we spot, as well as rank, both single-point- as well as group- anomalies? We are the first to generalize anomaly detection in two dimensions: The first dimension is that we handle both point-anomalies, as well as group-anomalies, under a unified view -- we shall refer to them as generalized anomalies. The second dimension is that gen2Out not only detects, but also ranks, anomalies in suspiciousness order. Detection, and ranking, of anomalies has numerous applications: For example, in EEG recordings of an epileptic patient, an anomaly may indicate a seizure; in computer network traffic data, it may signify a power failure, or a DoS/DDoS attack. We start by setting some reasonable axioms; surprisingly, none of the earlier methods pass all the axioms. Our main contribution is the gen2Out algorithm, that has the following desirable properties: (a) Principled and Sound anomaly scoring that obeys the axioms for detectors, (b) Doubly-general in that it detects, as well as ranks generalized anomaly -- both point- and group-anomalies, (c) Scalable, it is fast and scalable, linear on input size. (d) Effective, experiments on real-world epileptic recordings (200GB) demonstrate effectiveness of gen2Out as confirmed by clinicians. Experiments on 27 real-world benchmark datasets show that gen2Out detects ground truth groups, matches or outperforms point-anomaly baseline algorithms on accuracy, with no competition for group-anomalies and requires about 2 minutes for 1 million data points on a stock machine.

【4】 OKSP: A Novel Deep Learning Automatic Event Detection Pipeline for Seismic Monitoringin Costa Rica 标题：OKSP：一种用于哥斯达黎加地震监测的新型深度学习自动事件检测管道链接：https://arxiv.org/abs/2109.02723

作者：Leonardo van der Laat,Ronald J. L. Baldares,Esteban J. Chaves,Esteban Meneses 机构：Costa Rica High Technology, San Jos´e, Costa Rica, ac.cr, Ronald J.L. Baldares, Center and, Costa Rica Institute of, Volcanological and, Seismological Observatory of, National University, Heredia, Costa Rica 摘要：小震级地震最为丰富，但由于其低振幅和高频率通常被非均匀噪声源所掩盖，因此最难准确定位。它们突出了地震周期中断层系统的应力状态和时空行为的关键信息，因此，其完整特征对于改进地震危险性评估至关重要。随着计算能力的提高，现代DL算法正在利用不断增长的地震学数据库，使科学家能够提高地震目录的完整性，系统地检测较小震级的地震，并减少主要由人为干预引起的误差。在这项工作中，我们介绍了OKSP，一种用于哥斯达黎加地震监测的新型自动地震探测管道。使用哥斯达黎加高科技中心的Kabre超级计算机，我们将OKSP应用于2019年6月26日发生的波多黎各阿穆列斯6.5级地震的前一天和后5天，沿着哥斯达黎加-巴拿马边界，又发现了1100多起此前哥斯达黎加火山和地震观测台未确认的地震。从这些事件来看，主震前一天到数小时，总共发生了23次震级低于1.0级的地震，揭示了导致这一生产性地震序列发生的破裂起始和地震相互作用。我们的观察结果表明，在研究期间，模型100%详尽，82%精确，F1得分为0.90。这是哥斯达黎加首次尝试使用深度学习方法自动检测地震，并证明在不久的将来，地震监测程序将完全由人工智能算法执行。摘要：Small magnitude earthquakes are the most abundant but the most difficult to locate robustly and well due to their low amplitudes and high frequencies usually obscured by heterogeneous noise sources. They highlight crucial information about the stress state and the spatio-temporal behavior of fault systems during the earthquake cycle, therefore, its full characterization is then crucial for improving earthquake hazard assessment. Modern DL algorithms along with the increasing computational power are exploiting the continuously growing seismological databases, allowing scientists to improve the completeness for earthquake catalogs, systematically detecting smaller magnitude earthquakes and reducing the errors introduced mainly by human intervention. In this work, we introduce OKSP, a novel automatic earthquake detection pipeline for seismic monitoring in Costa Rica. Using Kabre supercomputer from the Costa Rica High Technology Center, we applied OKSP to the day before and the first 5 days following the Puerto Armuelles, M6.5, earthquake that occurred on 26 June, 2019, along the Costa Rica-Panama border and found 1100 more earthquakes previously unidentified by the Volcanological and Seismological Observatory of Costa Rica. From these events, a total of 23 earthquakes with magnitudes below 1.0 occurred a day to hours prior to the mainshock, shedding light about the rupture initiation and earthquake interaction leading to the occurrence of this productive seismic sequence. Our observations show that for the study period, the model was 100% exhaustive and 82% precise, resulting in an F1 score of 0.90. This effort represents the very first attempt for automatically detecting earthquakes in Costa Rica using deep learning methods and demonstrates that, in the near future, earthquake monitoring routines will be carried out entirely by AI algorithms.

分类|识别(4篇)

【1】 ICCAD Special Session Paper: Quantum-Classical Hybrid Machine Learning for Image Classification 标题：ICCAD专题会议论文：用于图像分类的量子-经典混合机器学习链接：https://arxiv.org/abs/2109.02862

作者：Mahabubul Alam,Satwik Kundu,Rasit Onur Topaloglu,Swaroop Ghosh 机构：School of Electrical Engineering and Computer Science, Penn State University, University Park, IBM Corporation 摘要：图像分类是传统深度学习（DL）的一个主要应用领域。量子机器学习（QML）有可能彻底改变图像分类。在任何典型的基于DL的图像分类中，我们使用卷积神经网络（CNN）从图像中提取特征，并使用多层感知器网络（MLP）创建实际的决策边界。一方面，QML模型在这两项任务中都很有用。参数化量子电路卷积（Quanvolution）可以从图像中提取丰富的特征。另一方面，量子神经网络（QNN）模型可以创建复杂的决策边界。因此，Quanvolution和QNN可用于创建用于图像分类的端到端QML模型。或者，我们可以使用经典降维技术（如主成分分析（PCA）或卷积自动编码器（CAE））分别提取图像特征，并使用提取的特征训练QNN。我们回顾了用于图像分类的量子经典混合ML模型的两个建议，即量子进化神经网络和使用经典算法和QNN进行降维。特别是，我们提出了一个可训练滤波器的例子，用于图像数据集的量子进化和基于CAE的特征提取（而不是使用线性变换（如PCA）进行降维）。我们将讨论这些模型的各种设计选择、潜在机会和缺点。我们还发布了一个基于Python的框架来创建和探索这些具有各种设计选择的混合模型。摘要：Image classification is a major application domain for conventional deep learning (DL). Quantum machine learning (QML) has the potential to revolutionize image classification. In any typical DL-based image classification, we use convolutional neural network (CNN) to extract features from the image and multi-layer perceptron network (MLP) to create the actual decision boundaries. On one hand, QML models can be useful in both of these tasks. Convolution with parameterized quantum circuits (Quanvolution) can extract rich features from the images. On the other hand, quantum neural network (QNN) models can create complex decision boundaries. Therefore, Quanvolution and QNN can be used to create an end-to-end QML model for image classification. Alternatively, we can extract image features separately using classical dimension reduction techniques such as, Principal Components Analysis (PCA) or Convolutional Autoencoder (CAE) and use the extracted features to train a QNN. We review two proposals on quantum-classical hybrid ML models for image classification namely, Quanvolutional Neural Network and dimension reduction using a classical algorithm followed by QNN. Particularly, we make a case for trainable filters in Quanvolution and CAE-based feature extraction for image datasets (instead of dimension reduction using linear transformations such as, PCA). We discuss various design choices, potential opportunities, and drawbacks of these models. We also release a Python-based framework to create and explore these hybrid models with a variety of design choices.

【2】 SS-BERT: Mitigating Identity Terms Bias in Toxic Comment Classification by Utilising the Notion of "Subjectivity" and "Identity Terms" 链接：https://arxiv.org/abs/2109.02691

作者：Zhixue Zhao,Ziqi Zhang,Frank Hopfgartner 机构：University of Sheffield, United Kingdom 备注：12 pages, 6 tables, 3 figures 摘要：有毒评论分类模型往往偏向于身份术语，身份术语是表征特定人群（如“穆斯林”和“黑人”）的术语。这种偏见通常反映在假阳性预测中，即带有身份术语的无毒评论。在这项工作中，我们提出了一种新的方法来解决有毒评论分类中的这种偏见，利用评论的主观性水平和身份术语的存在。我们假设，当对一群具有身份术语特征的人发表评论时，该评论有毒的可能性与评论的主观性水平有关，即评论传达个人情感和观点的程度。在BERT模型的基础上，我们提出了一种能够利用这些特性的新结构，并在4个不同大小、代表不同社交媒体平台的数据集上全面评估了我们的模型。结果表明，我们的模型始终优于BERT模型和SOTA模型，前者以不同的方式处理身份项偏差，F1的最大改善率分别为2.43%和1.91%。摘要：Toxic comment classification models are often found biased toward identity terms which are terms characterizing a specific group of people such as "Muslim" and "black". Such bias is commonly reflected in false-positive predictions, i.e. non-toxic comments with identity terms. In this work, we propose a novel approach to tackle such bias in toxic comment classification, leveraging the notion of subjectivity level of a comment and the presence of identity terms. We hypothesize that when a comment is made about a group of people that is characterized by an identity term, the likelihood of that comment being toxic is associated with the subjectivity level of the comment, i.e. the extent to which the comment conveys personal feelings and opinions. Building upon the BERT model, we propose a new structure that is able to leverage these features, and thoroughly evaluate our model on 4 datasets of varying sizes and representing different social media platforms. The results show that our model can consistently outperform BERT and a SOTA model devised to address identity term bias in a different way, with a maximum improvement in F1 of 2.43% and 1.91% respectively.

【3】 Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks 标题：基于卷积残差网络的低维流形上的Besov函数逼近和二值分类链接：https://arxiv.org/abs/2109.02832

作者：Hao Liu,Minshuo Chen,Tuo Zhao,Wenjing Liao 机构：Hao Liu is affiliated with the Department of Mathematics at Hong Kong Baptist University; Wenjing Liao isaffiliated with the School of Mathematics at Georgia Tech; Minshuo Chen and Tuo Zhao are affiliated with the ISYEdepartment at Georgia Tech ; Email 备注：None 摘要：现有的大多数关于深度神经网络的统计理论都存在样本复杂度问题，因此无法很好地解释在高维数据上深度学习的经验成功。为了弥补这一差距，我们建议利用真实世界数据集的低维几何结构。我们建立了卷积残差网络（ConvResNet）在函数逼近和二元分类统计估计方面的理论保证。具体地说，假设数据位于等距嵌入$\mathbb{R}^d$中的$d$维流形上，我们证明，如果网络结构选择得当，convresnet可以（1）以任意精度逼近流形上的贝索夫函数，（2）通过最小化经验逻辑风险来学习分类器，它给出的超额风险为$n^{-\frac{s}{2s+2（s\vee d）}}$，其中$s$是平滑度参数。这意味着样本复杂性取决于内在维度$d$，而不是数据维度$d$。我们的结果表明，ConvResNets能够适应数据集的低维结构。摘要：Most of existing statistical theories on deep neural networks have sample complexities cursed by the data dimension and therefore cannot well explain the empirical success of deep learning on high-dimensional data. To bridge this gap, we propose to exploit low-dimensional geometric structures of the real world data sets. We establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification. Specifically, given the data lying on a $d$-dimensional manifold isometrically embedded in $\mathbb{R}^D$, we prove that if the network architecture is properly chosen, ConvResNets can (1) approximate Besov functions on manifolds with arbitrary accuracy, and (2) learn a classifier by minimizing the empirical logistic risk, which gives an excess risk in the order of $n^{-\frac{s}{2s+2(s\vee d)}}$, where $s$ is a smoothness parameter. This implies that the sample complexity depends on the intrinsic dimension $d$, instead of the data dimension $D$. Our results demonstrate that ConvResNets are adaptive to low-dimensional structures of data sets.

【4】 Large-Scale System Identification Using a Randomized SVD 标题：基于随机奇异值分解的大系统辨识链接：https://arxiv.org/abs/2109.02703

作者：Han Wang,James Anderson 机构：Columbia University 摘要：从输入/输出数据中学习动态系统是控制设计管道中的一项基本任务。在部分观测的情况下，辨识有两个部分：参数估计以学习马尔可夫参数，系统实现以获得状态空间模型。在这两个子问题中，隐式假设标准数值算法（如奇异值分解（SVD））可以轻松可靠地计算。当尝试将高维模型与数据相匹配时，例如在网络物理系统环境中，即使计算SVD也很难。在这项工作中，我们证明了使用随机方法获得的近似矩阵分解可以替代实现算法中的标准SVD，同时保持经典方法的非渐近（数据集大小）性能和鲁棒性保证。数值例子表明，对于大型系统模型，这是唯一能够生成模型的方法。摘要：Learning a dynamical system from input/output data is a fundamental task in the control design pipeline. In the partially observed setting there are two components to identification: parameter estimation to learn the Markov parameters, and system realization to obtain a state space model. In both sub-problems it is implicitly assumed that standard numerical algorithms such as the singular value decomposition (SVD) can be easily and reliably computed. When trying to fit a high-dimensional model to data, for example in the cyber-physical system setting, even computing an SVD is intractable. In this work we show that an approximate matrix factorization obtained using randomized methods can replace the standard SVD in the realization algorithm while maintaining the non-asymptotic (in data-set size) performance and robustness guarantees of classical methods. Numerical examples illustrate that for large system models, this is the only method capable of producing a model.

表征(1篇)

【1】 Scale-invariant representation of machine learning 标题：机器学习的尺度不变表示链接：https://arxiv.org/abs/2109.02914

作者：Sungyeop Lee,Junghyo Jo 机构：Department of Physics and Astronomy, Seoul National University, Seoul , Korea, Department of Physics Education and Center for Theoretical Physics and, Artificial Intelligence Institute, Seoul National University, Seoul , Korea 摘要：机器学习的成功源于它的结构化数据表示。类似的数据具有紧密的表示形式，例如用于分类的压缩代码或用于聚类的出现标签。我们观察到，在有监督和无监督学习中，内部表征的频率都遵循幂律。尺度不变分布意味着机器学习在很大程度上压缩了频繁的典型数据，同时将许多非典型数据区分为异常值。在这项研究中，我们推导了幂律是如何在机器学习中自然产生的。在信息论方面，尺度不变表示对应于在保证预先指定的学习精度的可能表示中的最大不确定数据分组。摘要：The success of machine learning stems from its structured data representation. Similar data have close representation as compressed codes for classification or emerged labels for clustering. We observe that the frequency of the internal representation follows power laws in both supervised and unsupervised learning. The scale-invariant distribution implies that machine learning largely compresses frequent typical data, and at the same time, differentiates many atypical data as outliers. In this study, we derive how the power laws can naturally arise in machine learning. In terms of information theory, the scale-invariant representation corresponds to a maximally uncertain data grouping among possible representations that guarantee pre-specified learning accuracy.

优化|敛散性(1篇)

【1】 COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic Convex Optimization 标题：可可消噪器：在随机凸优化中利用共矫顽力降低方差链接：https://arxiv.org/abs/2109.03207

作者：Manuel Madeira,Renato Negrinho,João Xavier,Pedro M. Q. Aguiar 机构：�Instituto de Sistemas e Robótica, Instituto Superior Técnico, Lisboa, Portugal, �Carnegie Mellon University, Pittsbugh PA, USA 备注：25 pages, 14 figures 摘要：随机优化的一阶方法具有不可否认的相关性，部分原因在于它们在机器学习中的关键作用。对这些算法进行方差约简已经成为一个重要的研究课题。与很少利用目标函数全局模型的常用方法不同，我们利用凸性和L-平滑性来改进随机梯度预言机输出的噪声估计。我们的方法，命名为COCO去噪器，是多个函数梯度的联合最大似然估计，从它们的噪声观测，受共同矫顽力约束。由此产生的估计是一个凸二次约束二次问题的解。虽然用内点法求解这个问题花费很大，但我们利用它的结构应用了一种加速的一阶算法，即快速对偶近端梯度法。除了对所提出的估计进行分析表征外，我们还从经验上证明，增加查询点的数量和接近度可以得到更好的梯度估计。我们还将COCO应用于随机设置中，将其插入现有算法（如SGD、Adam或STRSAGA）中，即使在建模假设不匹配的情况下，其性能也优于其普通版本。摘要：First-order methods for stochastic optimization have undeniable relevance, in part due to their pivotal role in machine learning. Variance reduction for these algorithms has become an important research topic. In contrast to common approaches, which rarely leverage global models of the objective function, we exploit convexity and L-smoothness to improve the noisy estimates outputted by the stochastic gradient oracle. Our method, named COCO denoiser, is the joint maximum likelihood estimator of multiple function gradients from their noisy observations, subject to co-coercivity constraints between them. The resulting estimate is the solution of a convex Quadratically Constrained Quadratic Problem. Although this problem is expensive to solve by interior point methods, we exploit its structure to apply an accelerated first-order algorithm, the Fast Dual Proximal Gradient method. Besides analytically characterizing the proposed estimator, we show empirically that increasing the number and proximity of the queried points leads to better gradient estimates. We also apply COCO in stochastic settings by plugging it in existing algorithms, such as SGD, Adam or STRSAGA, outperforming their vanilla versions, even in scenarios where our modelling assumptions are mismatched.

预测|估计(3篇)

【1】 Predicting Mood Disorder Symptoms with Remotely Collected Videos Using an Interpretable Multimodal Dynamic Attention Fusion Network 标题：使用可解释的多模态动态注意融合网络利用远程采集的视频预测情绪障碍症状链接：https://arxiv.org/abs/2109.03029

作者：Tathagata Banerjee,Matthew Kollada,Pablo Gersberg,Oscar Rodriguez,Jane Tiller,Andrew E Jaffe,John Reynders 机构：∗Equal contribution,BlackThorn Therapeutics, San Francisco,California, USA. 备注：8 pages, 3 figures, Published in the Computational Approaches to Mental Health Workshop of the International Conference on Machine Learning 2021, this https URL 摘要：我们开发了一种新的、可解释的多模态分类方法来识别心境障碍的症状，即。使用从智能手机应用程序中收集的音频、视频和文本，可以判断抑郁、焦虑和快感缺乏。我们使用基于CNN的单峰编码器来学习每个模态的动态嵌入，然后通过Transformer编码器将它们组合起来。我们将这些方法应用于一个新的数据集——由一个智能手机应用程序收集——对3002名参与者进行了长达三次的记录。与使用静态嵌入的现有方法相比，我们的方法具有更好的多模态分类性能。最后，我们使用SHapley加法解释（SHAP）对模型中可能作为潜在数字标记的重要特征进行优先排序。摘要：We developed a novel, interpretable multimodal classification method to identify symptoms of mood disorders viz. depression, anxiety and anhedonia using audio, video and text collected from a smartphone application. We used CNN-based unimodal encoders to learn dynamic embeddings for each modality and then combined these through a transformer encoder. We applied these methods to a novel dataset - collected by a smartphone application - on 3002 participants across up to three recording sessions. Our method demonstrated better multimodal classification performance compared to existing methods that employed static embeddings. Lastly, we used SHapley Additive exPlanations (SHAP) to prioritize important features in our model that could serve as potential digital markers.

【2】 Individual Mobility Prediction via Attentive Marked Temporal Point Processes 标题：基于注意力标记时间点过程的个体流动性预测链接：https://arxiv.org/abs/2109.02715

作者：Yuankai Wu,Zhanhong Cheng,Lijun Sun 机构：McGill University, Montreal, QC, Canada 摘要：个体流动性预测是交通需求管理和交通系统运行的一项重要任务。在位置序列建模和预测用户下一个位置方面存在大量工作；然而，很少有人关注下一次出行的预测，这取决于不同属性之间的强时空依赖性，包括出行开始时间$t$、起点$o$和目的地$d$。为了填补这一空白，本文提出了一种新的基于点过程的模型——注意标记时间点过程（AMTPP）——对人类的流动性进行建模，并以联合方式预测整个行程$（t，o，d）$。为了对历史旅行的影响进行编码，AMTPP采用了具有精心设计的位置嵌入的自我注意机制，以捕捉个人旅行行为的每日/每周周期性和规律性。鉴于人类行为中事件间时间的独特峰值性质，我们使用非对称对数-拉普拉斯混合分布精确模拟行程开始时间$t$的分布。此外，还开发了一个起点-终点（OD）矩阵学习块来模拟每个起点和终点对之间的关系。在两个大型地铁出行数据集上的实验结果表明，AMTPP具有优越的性能。摘要：Individual mobility prediction is an essential task for transportation demand management and traffic system operation. There exist a large body of works on modeling location sequence and predicting the next location of users; however, little attention is paid to the prediction of the next trip, which is governed by the strong spatiotemporal dependencies between diverse attributes, including trip start time $t$, origin $o$, and destination $d$. To fill this gap, in this paper we propose a novel point process-based model -- Attentive Marked temporal point processes (AMTPP) -- to model human mobility and predict the whole trip $(t,o,d)$ in a joint manner. To encode the influence of history trips, AMTPP employs the self-attention mechanism with a carefully designed positional embedding to capture the daily/weekly periodicity and regularity in individual travel behavior. Given the unique peaked nature of inter-event time in human behavior, we use an asymmetric log-Laplace mixture distribution to precisely model the distribution of trip start time $t$. Furthermore, an origin-destination (OD) matrix learning block is developed to model the relationship between every origin and destination pair. Experimental results on two large metro trip datasets demonstrate the superior performance of AMTPP.

【3】 Improving Phenotype Prediction using Long-Range Spatio-Temporal Dynamics of Functional Connectivity 标题：利用功能连接性的长程时空动力学改进表型预测链接：https://arxiv.org/abs/2109.03115

作者：Simon Dahan,Logan Z. J. Williams,Daniel Rueckert,Emma C. Robinson 机构：Robinson, Department of Biomedical Engineering, School of Biomedical Engineering and, Imaging Sciences, King’s College London, London, SE,EH, UK, Centre for the Developing Brain, Department of Perinatal Imaging and Health 备注：MLCN 2021 摘要：脑功能连通性（FC）的研究对于理解许多精神疾病的潜在机制非常重要。最近的许多分析采用图卷积网络来研究功能相关态之间的非线性相互作用。然而，尽管已知大脑激活模式在空间和时间上都是分层组织的，但许多方法都未能提取出强大的时空特征。为了克服这些挑战，并提高对长期功能动力学的理解，我们从基于骨架的动作识别领域提出了一种方法，旨在对跨空间和时间的交互进行建模。我们使用人类连接组项目（HCP）数据集对该方法进行性别分类和流体智能预测。为了解释功能组织的受试者地形变异性，我们使用多分辨率双回归（受试者特定）ICA节点对功能连接体进行建模。结果显示，性别分类的预测准确率为94.4%（与其他方法相比增加了6.2%），与流体智能的相关性相对于分别编码空间和时间的基线模型提高了0.325 vs 0.144。结果表明，大脑功能活动时空动态的显式编码可以提高未来预测行为和认知表型的准确性。摘要：The study of functional brain connectivity (FC) is important for understanding the underlying mechanisms of many psychiatric disorders. Many recent analyses adopt graph convolutional networks, to study non-linear interactions between functionally-correlated states. However, although patterns of brain activation are known to be hierarchically organised in both space and time, many methods have failed to extract powerful spatio-temporal features. To overcome those challenges, and improve understanding of long-range functional dynamics, we translate an approach, from the domain of skeleton-based action recognition, designed to model interactions across space and time. We evaluate this approach using the Human Connectome Project (HCP) dataset on sex classification and fluid intelligence prediction. To account for subject topographic variability of functional organisation, we modelled functional connectomes using multi-resolution dual-regressed (subject-specific) ICA nodes. Results show a prediction accuracy of 94.4% for sex classification (an increase of 6.2% compared to other methods), and an improvement of correlation with fluid intelligence of 0.325 vs 0.144, relative to a baseline model that encodes space and time separately. Results suggest that explicit encoding of spatio-temporal dynamics of brain functional activity may improve the precision with which behavioural and cognitive phenotypes may be predicted in the future.

其他神经网络|深度学习|模型|建模(16篇)

【1】 Revisiting Recursive Least Squares for Training Deep Neural Networks 标题：重访递归最小二乘训练深度神经网络链接：https://arxiv.org/abs/2109.03220

作者：Chunyuan Zhang,Qi Song,Hui Zhou,Yigui Ou,Hongyao Deng,Laurence Tianruo Yang 机构： Ou is with the School of Science 备注：12 pages,5 figures, IEEE Transactions on Neural Networks and Learning Systems under review 摘要：递归最小二乘（RLS）算法因其收敛速度快，曾被广泛用于训练小规模神经网络。然而，以往的RLS算法计算复杂度高，前提条件多，不适合训练深度神经网络。在本文中，为了克服这些缺点，我们提出了三种新的RLS优化算法，用于训练前馈神经网络、卷积神经网络和递归神经网络（包括长短期记忆网络），使用误差反向传播和我们的平均近似RLS方法，以及线性最小二乘损失函数相对于隐藏层线性输出的等效梯度。与以前的RLS优化算法相比，我们的算法简单而优雅。它们可以看作是一种改进的随机梯度下降（SGD）算法，它使用每一层的逆自相关矩阵作为自适应学习率。它们的时间和空间复杂性仅是SGD的几倍。它们只要求损耗函数是均方误差，输出层的激活函数是可逆的。事实上，我们的算法也可以与其他一阶优化算法结合使用，而不需要这两个前提条件。此外，我们还提出了两种改进的算法。最后，我们在MNIST、CIFAR-10和IMDB数据集上证明了它们与Adam算法相比的有效性，并通过实验研究了它们的超参数的影响。摘要：Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. However, previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions. In this paper, to overcome these drawbacks, we propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks (including long short-term memory networks), by using the error backpropagation and our average-approximation RLS method, together with the equivalent gradients of the linear least squares loss function with respect to the linear outputs of hidden layers. Compared with previous RLS optimization algorithms, our algorithms are simple and elegant. They can be viewed as an improved stochastic gradient descent (SGD) algorithm, which uses the inverse autocorrelation matrix of each layer as the adaptive learning rate. Their time and space complexities are only several times those of SGD. They only require the loss function to be the mean squared error and the activation function of the output layer to be invertible. In fact, our algorithms can be also used in combination with other first-order optimization algorithms without requiring these two preconditions. In addition, we present two improved methods for our algorithms. Finally, we demonstrate their effectiveness compared to the Adam algorithm on MNIST, CIFAR-10 and IMDB datasets, and investigate the influences of their hyperparameters experimentally.

【2】 Learning Fast Sample Re-weighting Without Reward Data 标题：学习无奖励数据的快速样本加权链接：https://arxiv.org/abs/2109.03216

作者：Zizhao Zhang,Tomas Pfister 机构：Google Cloud AI 备注：ICCV2021 摘要：训练样本重新加权是解决数据偏差（如标签不平衡和损坏）的有效方法。最近的方法基于强化学习和元学习的框架，开发了基于学习的算法，结合模型训练来学习样本重新加权策略。然而，依赖于额外的无偏奖励数据限制了它们的普遍适用性。此外，现有的基于学习的样本重加权方法需要对模型和加权参数进行嵌套优化，这需要昂贵的二阶计算。本文针对这两个问题，提出了一种新的基于学习的快速样本重加权（FSR）方法，该方法不需要额外的奖励数据。该方法基于两个关键思想：从历史中学习建立代理奖励数据和特征共享以降低优化成本。我们的实验表明，与现有的标签噪声鲁棒性和长尾识别技术相比，该方法取得了具有竞争力的结果，同时显著提高了训练效率。源代码可在https://github.com/google-research/google-research/tree/master/ieg. 摘要：Training sample re-weighting is an effective approach for tackling data biases such as imbalanced and corrupted labels. Recent methods develop learning-based algorithms to learn sample re-weighting strategies jointly with model training based on the frameworks of reinforcement learning and meta learning. However, depending on additional unbiased reward data is limiting their general applicability. Furthermore, existing learning-based sample re-weighting methods require nested optimizations of models and weighting parameters, which requires expensive second-order computation. This paper addresses these two problems and presents a novel learning-based fast sample re-weighting (FSR) method that does not require additional reward data. The method is based on two key ideas: learning from history to build proxy reward data and feature sharing to reduce the optimization cost. Our experiments show the proposed method achieves competitive results compared to state of the arts on label noise robustness and long-tailed recognition, and does so while achieving significantly improved training efficiency. The source code is publicly available at https://github.com/google-research/google-research/tree/master/ieg.

【3】 Learning to Bid in Contextual First Price Auctions 标题：学习在上下文第一价格拍卖中投标链接：https://arxiv.org/abs/2109.03173

作者：Ashwinkumar Badanidiyuru,Zhe Feng,Guru Guruganesh 机构：Google Research, Mountain View 摘要：在本文中，我们研究了重复上下文第一价格拍卖中如何出价的问题。我们认为一个单一的投标人（学习者）谁反复投标在第一次价格拍卖：在每次$T$，学习者观察上下文$ XYT \ \ \“MthBB{R} D $，并决定出价基于历史信息和$ XYT $。我们假设所有其他$m_t=\alpha_0\cdot x_t+z_t$的最大出价的结构化线性模型，其中$\alpha_0\in\mathbb{R}^d$对于学习者来说是未知的，$z_t$是从噪声分布$\mathcal{F}$中随机采样的，具有对数凹密度函数$F$。我们考虑两个\ EMPH{二进制反馈}（学习者只能观察她是否获胜）和\ EMPH{{完全信息反馈}（学习者可以观察$MYT $）在每次$T$结束时。对于二元反馈，当噪声分布$\mathcal{F}$已知时，我们提出了一种投标算法，通过使用最大似然估计（MLE）方法实现最大$\widetilde{O}（\sqrt{\log（d）T}）$遗憾。此外，我们将该算法推广到具有二元反馈且噪声分布未知但属于参数化分布族的情况。对于噪声分布为{emph{unknown}的全信息反馈，我们提供了一种算法，该算法实现的遗憾最大为$\widetilde{O}（\sqrt{dT}）$。我们的方法结合对数凹密度函数的估计器和MLE方法来同时学习噪声分布$\mathcal{F}$和线性权重$\alpha_0$。我们还提供了一个下限结果，这样，即使学习者收到完整的信息反馈并且已知$\mathcal{F}$，大类中的任何投标策略也必须至少达到$\Omega（\sqrt{T}）$。摘要：In this paper, we investigate the problem about how to bid in repeated contextual first price auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions: at each time $t$, the learner observes a context $x_t\in \mathbb{R}^d$ and decides the bid based on historical information and $x_t$. We assume a structured linear model of the maximum bid of all the others $m_t = \alpha_0\cdot x_t + z_t$, where $\alpha_0\in \mathbb{R}^d$ is unknown to the learner and $z_t$ is randomly sampled from a noise distribution $\mathcal{F}$ with log-concave density function $f$. We consider both \emph{binary feedback} (the learner can only observe whether she wins or not) and \emph{full information feedback} (the learner can observe $m_t$) at the end of each time $t$. For binary feedback, when the noise distribution $\mathcal{F}$ is known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most $\widetilde{O}(\sqrt{\log(d) T})$ regret. Moreover, we generalize this algorithm to the setting with binary feedback and the noise distribution is unknown but belongs to a parametrized family of distributions. For the full information feedback with \emph{unknown} noise distribution, we provide an algorithm that achieves regret at most $\widetilde{O}(\sqrt{dT})$. Our approach combines an estimator for log-concave density functions and then MLE method to learn the noise distribution $\mathcal{F}$ and linear weight $\alpha_0$ simultaneously. We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least $\Omega(\sqrt{T})$, even when the learner receives the full information feedback and $\mathcal{F}$ is known.

【4】 Regularized Learning in Banach Spaces 标题：Banach空间中的正则化学习链接：https://arxiv.org/abs/2109.03159

作者：Liren Huang,Qi Ye 备注：30 pages, 1 figure 摘要：本文提出了一种研究广义数据正则化学习理论的不同方法，包括表示中心定理和收敛定理。广义数据由线性泛函和实标量组成，表示局部模型的离散信息。在经典机器学习的基础上，利用广义数据和损失函数计算经验风险。根据正则化技术，通过最小化Banach空间上的正则化经验风险来逼近全局解。自适应地选择Banach空间，使广义输入数据具有紧致性，使得近似解的存在性和收敛性由弱*拓扑保证。摘要：This article presents a different way to study the theory of regularized learning for generalized data including representer theorems and convergence theorems. The generalized data are composed of linear functionals and real scalars to represent the discrete information of the local models. By the extension of the classical machine learning, the empirical risks are computed by the generalized data and the loss functions. According to the techniques of regularization, the global solutions are approximated by minimizing the regularized empirical risks over the Banach spaces. The Banach spaces are adaptively chosen to endow the generalized input data with compactness such that the existence and convergence of the approximate solutions are guaranteed by the weak* topology.

【5】 NumGPT: Improving Numeracy Ability of Generative Pre-trained Models 标题：NumGPT：提高生成性预训练模型的数值能力链接：https://arxiv.org/abs/2109.03137

作者：Zhihua Jin,Xin Jiang,Xingbo Wang,Qun Liu,Yong Wang,Xiaozhe Ren,Huamin Qu 机构： Department of Computer Science & Engineering, Hong Kong University of Science and Technology, Hong Kong, Huawei Noah’s Ark Lab, School of Computing and Information Systems, Singapore Management University, Singapore 备注：8 pages, 3 figures 摘要：现有的生成性预训练语言模型（如GPT）侧重于对一般文本的语言结构和语义进行建模。然而，这些模型不考虑数字的数值性质，并且不能在数值推理任务（例如，数学单词问题和测量估计）上可靠地执行。在本文中，我们提出了NumGPT，一个生成的预训练模型，它显式地模拟文本中数字的数值特性。具体来说，它利用基于原型的数字嵌入对数字的尾数进行编码，并利用单个嵌入对数字的指数进行编码。设计了一个数字感知损失函数，将数字集成到NumGPT的预训练目标中。我们在四个不同的数据集上进行了广泛的实验，以评估NumGPT的计算能力。实验结果表明，NumGPT在测量估计、数字比较、数学单词问题和震级分类等一系列数字推理任务上优于基线模型（如GPT和带DICE的GPT）。还进行了消融研究，以评估训练前和模型超参数对性能的影响。摘要：Existing generative pre-trained language models (e.g., GPT) focus on modeling the language structure and semantics of general texts. However, those models do not consider the numerical properties of numbers and cannot perform robustly on numerical reasoning tasks (e.g., math word problems and measurement estimation). In this paper, we propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts. Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number. A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT. We conduct extensive experiments on four different datasets to evaluate the numeracy ability of NumGPT. The experiment results show that NumGPT outperforms baseline models (e.g., GPT and GPT with DICE) on a range of numerical reasoning tasks such as measurement estimation, number comparison, math word problems, and magnitude classification. Ablation studies are also conducted to evaluate the impact of pre-training and model hyperparameters on the performance.

【6】 Optimizing model-agnostic Random Subspace ensembles 标题：优化模型不可知随机子空间集成链接：https://arxiv.org/abs/2109.03099

作者：Vân Anh Huynh-Thu,Pierre Geurts 机构：Dept. of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium 摘要：提出了一种用于监督学习的模型不可知集成方法。所提出的方法在（1）使用随机子空间方法的参数版本学习模型集合（其中特征子集根据贝努利分布进行采样）和（2）识别贝努利分布的参数以最小化集合模型的泛化误差之间交替进行。通过使用重要抽样方法，参数优化变得容易处理，该方法能够估计任何给定参数集的预期模型输出，而无需学习新模型。虽然随机化程度由标准随机子空间中的超参数控制，但在我们的参数版本中，它具有自动调整的优势。此外，模型不可知的特征重要性分数可以很容易地从训练的集合模型中得到。我们在模拟数据集和真实数据集上展示了该方法在预测和特征排序方面的良好性能。我们还表明，我们的方法可以成功地用于重建基因调控网络。摘要：This paper presents a model-agnostic ensemble approach for supervised learning. The proposed approach alternates between (1) learning an ensemble of models using a parametric version of the Random Subspace approach, in which feature subsets are sampled according to Bernoulli distributions, and (2) identifying the parameters of the Bernoulli distributions that minimize the generalization error of the ensemble model. Parameter optimization is rendered tractable by using an importance sampling approach able to estimate the expected model output for any given parameter set, without the need to learn new models. While the degree of randomization is controlled by a hyper-parameter in standard Random Subspace, it has the advantage to be automatically tuned in our parametric version. Furthermore, model-agnostic feature importance scores can be easily derived from the trained ensemble model. We show the good performance of the proposed approach, both in terms of prediction and feature ranking, on simulated and real-world datasets. We also show that our approach can be successfully used for the reconstruction of gene regulatory networks.

【7】 Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks 标题：加速深卷积神经网络的有限数值精度可重构协处理器结构链接：https://arxiv.org/abs/2109.03040

作者：Sasindu Wijeratne,Sandaruwan Jayaweera,Mahesh Dananjaya,Ajith Pasqual 机构：Dept. of Electronic and Telecommunication Engineering, University of Moratuwa, Sri Lanka 摘要：卷积神经网络（CNN）广泛应用于深度学习应用，例如视觉系统、机器人等。然而，现有的软件解决方案并不高效。因此，已经提出了许多硬件加速器来优化实现的性能、功率和资源利用率。在现有的解决方案中，基于现场可编程门阵列（FPGA）的体系结构提供了更好的成本-能源-性能权衡，以及可扩展性和最小化开发时间。在本文中，我们提出了一种与模型无关的可重构协同处理体系结构来加速CNN。我们的体系结构由并行乘法和累加（MAC）单元和缓存技术以及互连网络组成，以利用最大的数据并行性。与现有解决方案相比，我们为算术表示和运算引入了有限精度32位Q格式定点量化。因此，我们的体系结构实现了资源利用率的显著降低，并且具有竞争性的准确性。此外，我们还开发了一种汇编类型的微指令来访问协同处理结构，以管理分层并行性，从而重用有限的资源。最后，我们在Xilinx Virtex 7 FPGA上测试了高达9x9内核大小的体系结构，实现了高达226.2 GOp/S的吞吐量（3x3内核大小）。摘要：Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc. However, existing software solutions are not efficient. Therefore, many hardware accelerators have been proposed optimizing performance, power and resource utilization of the implementation. Amongst existing solutions, Field Programmable Gate Array (FPGA) based architecture provides better cost-energy-performance trade-offs as well as scalability and minimizing development time. In this paper, we present a model-independent reconfigurable co-processing architecture to accelerate CNNs. Our architecture consists of parallel Multiply and Accumulate (MAC) units with caching techniques and interconnection networks to exploit maximum data parallelism. In contrast to existing solutions, we introduce limited precision 32 bit Q-format fixed point quantization for arithmetic representations and operations. As a result, our architecture achieved significant reduction in resource utilization with competitive accuracy. Furthermore, we developed an assembly-type microinstructions to access the co-processing fabric to manage layer-wise parallelism, thereby making re-use of limited resources. Finally, we have tested our architecture up to 9x9 kernel size on Xilinx Virtex 7 FPGA, achieving a throughput of up to 226.2 GOp/S for 3x3 kernel size.

【8】 Semiparametric Bayesian Networks 标题：半参数贝叶斯网络链接：https://arxiv.org/abs/2109.03008

作者：David Atienza,Concha Bielza,Pedro Larrañaga 机构：Universidad Polit´ecnica de Madrid, Departamento de Inteligencia Artificial, Boadilla, del Monte, Spain 备注：44 pages, 13 figures, 4 tables, submitted to Information Sciences 摘要：我们介绍了结合参数和非参数条件概率分布的半参数贝叶斯网络。他们的目标是结合这两个组件的优点：参数模型的有限复杂性和非参数模型的灵活性。我们证明了半参数贝叶斯网络推广了两种著名的贝叶斯网络：高斯贝叶斯网络和核密度估计贝叶斯网络。为此，我们考虑在半参数贝叶斯网络中需要的两种不同的条件概率分布。此外，我们还对两种著名的算法（贪婪爬山算法和PC算法）进行了改进，以从数据中学习半参数贝叶斯网络的结构。为了实现这一点，我们采用了基于交叉验证的评分函数。此外，使用验证数据集，我们应用早期停止标准以避免过度拟合。为了评估该算法的适用性，我们对混合线性和非线性函数采样的合成数据、高斯贝叶斯网络采样的多元正态数据、UCI存储库中的真实数据以及轴承退化数据进行了详尽的实验。作为该实验的结果，我们得出结论，所提出的算法准确地学习了参数和非参数分量的组合，同时实现了与最先进的方法所提供的性能相当的性能。摘要：We introduce semiparametric Bayesian networks that combine parametric and nonparametric conditional probability distributions. Their aim is to incorporate the advantages of both components: the bounded complexity of parametric models and the flexibility of nonparametric ones. We demonstrate that semiparametric Bayesian networks generalize two well-known types of Bayesian networks: Gaussian Bayesian networks and kernel density estimation Bayesian networks. For this purpose, we consider two different conditional probability distributions required in a semiparametric Bayesian network. In addition, we present modifications of two well-known algorithms (greedy hill-climbing and PC) to learn the structure of a semiparametric Bayesian network from data. To realize this, we employ a score function based on cross-validation. In addition, using a validation dataset, we apply an early-stopping criterion to avoid overfitting. To evaluate the applicability of the proposed algorithm, we conduct an exhaustive experiment on synthetic data sampled by mixing linear and nonlinear functions, multivariate normal data sampled from Gaussian Bayesian networks, real data from the UCI repository, and bearings degradation data. As a result of this experiment, we conclude that the proposed algorithm accurately learns the combination of parametric and nonparametric components, while achieving a performance comparable with those provided by state-of-the-art methods.

【9】 Trojan Signatures in DNN Weights 标题：以DNN权重表示的特洛伊木马签名链接：https://arxiv.org/abs/2109.02836

作者：Greg Fields,Mohammad Samragh,Mojan Javaheripi,Farinaz Koushanfar,Tara Javidi 机构：University of California San Diego 备注：8 pages, 13 figures 摘要：深度神经网络已被证明易受后门攻击或特洛伊木马攻击，其中对手在训练时在网络中嵌入触发器，使得模型正确分类所有标准输入，但对包含触发器的任何输入生成有针对性的错误分类。在本文中，我们提出了第一种超轻量和高效的特洛伊木马检测方法，该方法不需要访问训练/测试数据，不涉及任何昂贵的计算，并且不假设特洛伊木马触发的性质。我们的方法侧重于分析网络的最终线性层的权重。我们根据经验证明了这些权重的几个特征，这些特征在特洛伊木马网络中经常出现，但在良性网络中却没有。特别是，我们证明了与特洛伊木马目标类关联的权重分布与与其他类关联的权重明显不同。利用这一点，我们展示了我们所提出的检测方法对跨各种体系结构、数据集和触发器类型的最新攻击的有效性。摘要：Deep neural networks have been shown to be vulnerable to backdoor, or trojan, attacks where an adversary has embedded a trigger in the network at training time such that the model correctly classifies all standard inputs, but generates a targeted, incorrect classification on any input which contains the trigger. In this paper, we present the first ultra light-weight and highly effective trojan detection method that does not require access to the training/test data, does not involve any expensive computations, and makes no assumptions on the nature of the trojan trigger. Our approach focuses on analysis of the weights of the final, linear layer of the network. We empirically demonstrate several characteristics of these weights that occur frequently in trojaned networks, but not in benign networks. In particular, we show that the distribution of the weights associated with the trojan target class is clearly distinguishable from the weights associated with other classes. Using this, we demonstrate the effectiveness of our proposed detection method against state-of-the-art attacks across a variety of architectures, datasets, and trigger types.

【10】 Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model 标题：使用轻量级辅助模型用原始波形补充手工制作的特征链接：https://arxiv.org/abs/2109.02773

作者：Zhongwei Teng,Quchen Fu,Jules White,Maria Powell,Douglas C. Schmidt 摘要：音频处理的一个新兴趋势是从原始波形中捕获低级语音表示。这些表示方法在语音识别和语音分离等多种任务中显示了良好的效果。与手工制作的功能相比，通过反向传播学习语音功能在理论上为模型表示不同任务的数据提供了更大的灵活性。然而，实证研究的结果表明，在一些任务中，例如语音欺骗检测，手工制作的特征比学习的特征更具竞争力。本文提出了一种辅助Rawnet模型，用从原始波形中学习的特征来补充手工特征，而不是单独评估手工特征和原始波形。该方法的一个主要优点是，它可以在相对较低的计算成本下提高精度。使用ASVspoof 2019数据集对提议的辅助Rawnet模型进行了测试，该数据集的结果表明，重量轻的波形编码器可以潜在地提高基于手工特征的编码器的性能，以换取少量的额外计算工作。摘要：An emerging trend in audio processing is capturing low-level speech representations from raw waveforms. These representations have shown promising results on a variety of tasks, such as speech recognition and speech separation. Compared to handcrafted features, learning speech features via backpropagation provides the model greater flexibility in how it represents data for different tasks theoretically. However, results from empirical study shows that, in some tasks, such as voice spoof detection, handcrafted features are more competitive than learned features. Instead of evaluating handcrafted features and raw waveforms independently, this paper proposes an Auxiliary Rawnet model to complement handcrafted features with features learned from raw waveforms. A key benefit of the approach is that it can improve accuracy at a relatively low computational cost. The proposed Auxiliary Rawnet model is tested using the ASVspoof 2019 dataset and the results from this dataset indicate that a light-weight waveform encoder can potentially boost the performance of handcrafted-features-based encoders in exchange for a small amount of additional computational work.

【11】 Training Deep Networks from Zero to Hero: avoiding pitfalls and going beyond 标题：从零到英雄训练深度网络：躲避陷阱超越链接：https://arxiv.org/abs/2109.02752

作者：Moacir Antonelli Ponti,Fernando Pereira dos Santos,Leo Sampaio Ferraz Ribeiro,Gabriel Biscaro Cavallari 机构： CavallariICMC – Universidade de S˜ao Paulo (USP) 备注：9 pgs 摘要：在实际数据中，训练深层神经网络可能是一项挑战。当涉及到小数据集或特定应用时，将模型作为黑盒使用，即使使用转移学习，也可能导致泛化性差或结果不确定。本教程介绍了改进模型的基本步骤和最新选项，特别是但不限于监督学习。它在准备得不如挑战中的数据集好，并且在注释和/或小数据稀少的情况下特别有用。我们描述了基本程序：数据准备、优化和转移学习，以及最新的架构选择，如Transformer模块、替代卷积层、激活函数、广域和深层网络的使用，以及训练程序，包括课程、对比和自我监督学习。摘要：Training deep neural networks may be challenging in real world data. Using models as black-boxes, even with transfer learning, can result in poor generalization or inconclusive results when it comes to small datasets or specific applications. This tutorial covers the basic steps as well as more recent options to improve models, in particular, but not restricted to, supervised learning. It can be particularly useful in datasets that are not as well-prepared as those in challenges, and also under scarce annotation and/or small data. We describe basic procedures: as data preparation, optimization and transfer learning, but also recent architectural choices such as use of transformer modules, alternative convolutional layers, activation functions, wide and deep networks, as well as training procedures including as curriculum, contrastive and self-supervised learning.

【12】 Machine Learning: Challenges, Limitations, and Compatibility for Audio Restoration Processes 标题：机器学习：音频恢复过程的挑战、限制和兼容性链接：https://arxiv.org/abs/2109.02692

作者：Owen Casey,Rushit Dave,Naeem Seliya,Evelyn R Sowells Boone 机构：#Department of Computer Science, University of Wisconsin-Eau Claire, Garfield Ave, Eau Claire, WI , United States, Department of Computer Systems Technology, North Carolina A&T State University, Greensboro, NC , United States 备注：6 pages, 2 figures 摘要：本文探讨了机器学习网络在恢复降级和压缩语音音频方面的应用。该项目的目的是从语音数据中构建一个新的训练模型，以学习由有损压缩和分辨率损失导致的数据丢失引起的压缩伪影失真的特征，并使用SEGAN：语音增强生成对抗网络中提出的现有算法。然后，从该模型生成的生成器将用于恢复降级的语音音频。本文详细分析了使用不推荐使用的代码所带来的后续兼容性和操作问题，这些问题阻碍了经过训练的模型的成功开发。本文进一步探讨了当前机器学习的挑战、局限性和兼容性。摘要：In this paper machine learning networks are explored for their use in restoring degraded and compressed speech audio. The project intent is to build a new trained model from voice data to learn features of compression artifacting distortion introduced by data loss from lossy compression and resolution loss with an existing algorithm presented in SEGAN: Speech Enhancement Generative Adversarial Network. The resulting generator from the model was then to be used to restore degraded speech audio. This paper details an examination of the subsequent compatibility and operational issues presented by working with deprecated code, which obstructed the trained model from successfully being developed. This paper further serves as an examination of the challenges, limitations, and compatibility in the current state of machine learning.

【13】 Deep Convolutional Neural Networks Predict Elasticity Tensors and their Bounds in Homogenization 标题：深卷积神经网络在均匀化中预测弹性张量及其界限链接：https://arxiv.org/abs/2109.03020

作者：Bernhard Eidel 机构：DFG-Heisenberg-Fellow, Institute of Mechanics, Department Mechanical Engineering, University Siegen, Siegen, Paul-Bonatz-Str. ,-, Germany 摘要：在目前的工作中，三维卷积神经网络（CNN）被训练成将任意相分数的随机非均质两相材料与其弹性宏观刚度联系起来，从而取代显式均匀化模拟。为了减少由于未知边界条件（BCs）导致的合成复合材料真实刚度的不确定性，CNN预测周期性BC的刚度，上限通过运动均匀BC，下限通过应力均匀BC。这项工作描述了均匀化CNN的工作流程，从CNN设计的微观结构生成、卷积、非线性激活和池化操作、训练和验证以及反向传播到测试中的性能测量。其中，CNN不仅证明了标准测试集的预测精度，而且还证明了金刚石基涂层真实两相微观结构样品的预测精度。涵盖所有三种边界类型的CNN实际上与三种不同网络中的单独处理一样精确。该贡献的CNN通过刚度界限为单个快照样本提供了适当RVE大小的指标。此外，它们可以对合成微结构的整体进行有效弹性刚度的统计分析，而无需昂贵的模拟。摘要：In the present work, 3D convolutional neural networks (CNNs) are trained to link random heterogeneous, two-phase materials of arbitrary phase fractions to their elastic macroscale stiffness thus replacing explicit homogenization simulations. In order to reduce the uncertainty of the true stiffness of the synthetic composites due to unknown boundary conditions (BCs), the CNNs predict beyond the stiffness for periodic BC the upper bound through kinematically uniform BC, and the lower bound through stress uniform BC. This work describes the workflow of the homogenization-CNN, from microstructure generation over the CNN design, the operations of convolution, nonlinear activation and pooling as well as training and validation along with backpropagation up to performance measurements in tests. Therein the CNNs demonstrate the predictive accuracy not only for the standard test set but also for samples of the real, two-phase microstructure of a diamond-based coating. The CNN that covers all three boundary types is virtually as accurate as the separate treatment in three different nets. The CNNs of this contribution provide through stiffness bounds an indicator of the proper RVE size for individual snapshot samples. Moreover, they enable statistical analyses for the effective elastic stiffness on ensembles of synthetical microstructures without costly simulations.

【14】 Instance-dependent Label-noise Learning under a Structural Causal Model 标题：结构因果模型下依赖实例的标签噪声学习链接：https://arxiv.org/abs/2109.02986

作者：Yu Yao,Tongliang Liu,Mingming Gong,Bo Han,Gang Niu,Kun Zhang 机构：University of Sydney; ,University of Melbourne; ,Hong Kong Baptist University;, RIKEN AIP; ,Carnegie Mellon University 摘要：标签噪声会使深度学习算法的性能退化，因为深度神经网络容易过度拟合标签误差。让X和Y分别表示实例和干净标签。当Y是X的一个原因时，根据该原因已经构建了许多数据集，例如SVHN和CIFAR，P（X）和P（Y | X）的分布是纠缠的。这意味着无监督实例有助于学习分类器，从而减少标签噪声的副作用。然而，如何利用因果信息来处理标签噪声问题仍然是个谜。在本文中，通过利用结构因果模型，我们提出了一种新的基于实例的标签噪声学习生成方法。特别是，我们表明，适当地建模实例将有助于标签噪声转移矩阵的可识别性，从而产生更好的分类器。从经验上看，我们的方法在合成和真实标签噪声数据集上都优于所有最先进的方法。摘要：Label noise will degenerate the performance of deep learning algorithms because deep neural networks easily overfit label errors. Let X and Y denote the instance and clean label, respectively. When Y is a cause of X, according to which many datasets have been constructed, e.g., SVHN and CIFAR, the distributions of P(X) and P(Y|X) are entangled. This means that the unsupervised instances are helpful to learn the classifier and thus reduce the side effect of label noise. However, it remains elusive on how to exploit the causal information to handle the label noise problem. In this paper, by leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning. In particular, we show that properly modeling the instances will contribute to the identifiability of the label noise transition matrix and thus lead to a better classifier. Empirically, our method outperforms all state-of-the-art methods on both synthetic and real-world label-noise datasets.

【15】 BioNetExplorer: Architecture-Space Exploration of Bio-Signal Processing Deep Neural Networks for Wearables 标题：BioNetExplorer：可穿戴设备生物信号处理深度神经网络的架构-空间探索链接：https://arxiv.org/abs/2109.02909

作者：Bharath Srinivas Prabakaran,Asima Akhtar,Semeen Rehman,Osman Hasan,Muhammad Shafique 备注：None 摘要：在这项工作中，我们提出了BioNetExplorer框架来系统地生成和探索可穿戴设备中生物信号处理的多种DNN架构。我们的框架采用关键的神经结构参数来搜索具有低硬件开销的嵌入式DNN，该DNN可部署在可穿戴边缘设备中，以分析生物信号数据并提取相关信息，如心律失常和癫痫发作。我们的框架还通过在探索阶段施加用户需求和硬件约束（存储、触发器等），使用遗传算法实现硬件感知的DNN体系结构搜索，从而限制探索的网络数量。此外，BioNetExplorer还可用于根据用户所需的输出类搜索DNN；例如，由于遗传倾向或预先存在的心脏病，用户可能需要特定的输出类别。与穷举搜索相比，使用遗传算法平均减少了9倍的搜索时间。我们成功地确定了帕累托最优设计，在质量损失小于0.5%的情况下，可将DNN的存储开销减少约30MB。为了实现低成本的嵌入式DNN，BioNetExplorer还采用了不同的模型压缩技术，在质量损失<0.2%的情况下，将网络的存储开销进一步降低了53倍。摘要：In this work, we propose the BioNetExplorer framework to systematically generate and explore multiple DNN architectures for bio-signal processing in wearables. Our framework adapts key neural architecture parameters to search for an embedded DNN with a low hardware overhead, which can be deployed in wearable edge devices to analyse the bio-signal data and to extract the relevant information, such as arrhythmia and seizure. Our framework also enables hardware-aware DNN architecture search using genetic algorithms by imposing user requirements and hardware constraints (storage, FLOPs, etc.) during the exploration stage, thereby limiting the number of networks explored. Moreover, BioNetExplorer can also be used to search for DNNs based on the user-required output classes; for instance, a user might require a specific output class due to genetic predisposition or a pre-existing heart condition. The use of genetic algorithms reduces the exploration time, on average, by 9x, compared to exhaustive exploration. We are successful in identifying Pareto-optimal designs, which can reduce the storage overhead of the DNN by ~30MB for a quality loss of less than 0.5%. To enable low-cost embedded DNNs, BioNetExplorer also employs different model compression techniques to further reduce the storage overhead of the network by up to 53x for a quality loss of <0.2%.

【16】 Using Satellite Imagery and Machine Learning to Estimate the Livelihood Impact of Electricity Access 标题：利用卫星图像和机器学习评估供电对民生的影响链接：https://arxiv.org/abs/2109.02890

作者：Nathan Ratledge,Gabe Cadamuro,Brandon de la Cuesta,Matthieu Stigler,Marshall Burke 机构：Emmett Interdisciplinary Program in Environment and Resources, Stanford University, Palo Alto, CA , USA, Atlas AI, Palo Alto, CA , USA, King Center for International Development, Stanford University, Palo Alto, CA , USA 摘要：在世界许多地区，关键经济成果的稀疏数据阻碍了公共政策的制定、目标确定和评估。我们展示了卫星图像和机器学习的进步如何帮助改善这些数据和推理挑战。在乌干达电网扩张的背景下，我们展示了如何将卫星图像和计算机视觉结合起来，开发适合于推断电力接入对生计的因果影响的地方级生计测量。然后，我们展示了基于ML的推理技术在应用于这些数据时，如何比传统的替代方案更可靠地估计电气化的因果影响。我们估计，电网接入将乌干达农村地区的村级资产财富提高了0.17个标准差，比我们研究期间未经处理地区的增长率翻了一番多。我们的研究结果为关键基础设施投资的影响提供了国家级证据，并为数据稀疏环境下的未来政策评估提供了一种低成本、可推广的方法。摘要：In many regions of the world, sparse data on key economic outcomes inhibits the development, targeting, and evaluation of public policy. We demonstrate how advancements in satellite imagery and machine learning can help ameliorate these data and inference challenges. In the context of an expansion of the electrical grid across Uganda, we show how a combination of satellite imagery and computer vision can be used to develop local-level livelihood measurements appropriate for inferring the causal impact of electricity access on livelihoods. We then show how ML-based inference techniques deliver more reliable estimates of the causal impact of electrification than traditional alternatives when applied to these data. We estimate that grid access improves village-level asset wealth in rural Uganda by 0.17 standard deviations, more than doubling the growth rate over our study period relative to untreated areas. Our results provide country-scale evidence on the impact of a key infrastructure investment, and provide a low-cost, generalizable approach to future policy evaluation in data sparse environments.

其他(14篇)

【1】 Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression 标题：超越保留的准确性：评估BERT压缩的忠诚度和稳健性链接：https://arxiv.org/abs/2109.03228

作者：Canwen Xu,Wangchunshu Zhou,Tao Ge,Ke Xu,Julian McAuley,Furu Wei 机构： University of California, San Diego , Stanford University , Microsoft Research Asia , Beihang University 备注：Accepted to EMNLP 2021 (main conference) 摘要：最近关于预训练语言模型（例如，BERT）压缩的研究通常使用保留精度作为评估指标。在本文中，我们提出了两个新的度量标准，标签忠诚和概率忠诚，用于衡量压缩模型（即学生）与原始模型（即教师）的模仿程度。我们还探讨了在对抗性攻击下压缩对鲁棒性的影响。我们将量化、剪枝、知识提炼和渐进式模块替换为忠诚和健壮性。通过结合多种压缩技术，我们提供了一种实用的策略，以实现更好的准确性、忠诚度和鲁棒性。摘要：Recent studies on compression of pretrained language models (e.g., BERT) usually use preserved accuracy as the metric for evaluation. In this paper, we propose two new metrics, label loyalty and probability loyalty that measure how closely a compressed model (i.e., student) mimics the original model (i.e., teacher). We also explore the effect of compression with regard to robustness under adversarial attacks. We benchmark quantization, pruning, knowledge distillation and progressive module replacing with loyalty and robustness. By combining multiple compression techniques, we provide a practical strategy to achieve better accuracy, loyalty and robustness.

【2】 Robust Predictable Control 标题：鲁棒可预测控制链接：https://arxiv.org/abs/2109.03214

作者：Benjamin Eysenbach,Ruslan Salakhutdinov,Sergey Levine 机构：Carnegie Mellon University, Google Brain, UC Berkeley 备注：Project site with videos and code: this https URL 摘要：当今强化学习（RL）算法面临的许多挑战，如鲁棒性、泛化、迁移和计算效率，都与压缩密切相关。先前的工作令人信服地论证了为什么最小化信息在有监督的学习环境中是有用的，但标准的RL算法缺乏明确的压缩机制。RL设置是唯一的，因为（1）其顺序性质允许代理使用过去的信息来避免查看未来的观察结果，（2）代理可以优化其行为，以选择决策需要少量比特的状态。我们利用这些特性提出了一种学习简单策略的方法（RPC）。该方法将信息瓶颈、基于模型的RL和位反向编码的思想结合到一个简单且理论上合理的算法中。我们的方法联合优化了潜在空间模型和策略，使其具有自一致性，从而使策略避免了模型不准确的状态。我们证明，我们的方法比以前的方法实现了更严格的压缩，实现了比标准信息瓶颈高5倍的回报。我们还证明了我们的方法学习的策略更加健壮，并且能够更好地推广到新任务。摘要：Many of the challenges facing today's reinforcement learning (RL) algorithms, such as robustness, generalization, transfer, and computational efficiency are closely related to compression. Prior work has convincingly argued why minimizing information is useful in the supervised learning setting, but standard RL algorithms lack an explicit mechanism for compression. The RL setting is unique because (1) its sequential nature allows an agent to use past information to avoid looking at future observations and (2) the agent can optimize its behavior to prefer states where decision making requires few bits. We take advantage of these properties to propose a method (RPC) for learning simple policies. This method brings together ideas from information bottlenecks, model-based RL, and bits-back coding into a simple and theoretically-justified algorithm. Our method jointly optimizes a latent-space model and policy to be self-consistent, such that the policy avoids states where the model is inaccurate. We demonstrate that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck. We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.

【3】 PAUSE: Positive and Annealed Unlabeled Sentence Embedding 标题：暂停：积极的、退火式的无标记句子嵌入链接：https://arxiv.org/abs/2109.03155

作者：Lele Cao,Emil Larsson,Vilhelm von Ehrenheim,Dhiana Deva Cavalcanti Rocha,Anna Martin,Sonja Horn 机构：Motherbrain, EQT Group, Stockholm, Sweden, Modulai, Stockholm, Sweden 备注：Accepted by EMNLP 2021 main conference as long paper (12 pages and 2 figures). For source code, see this https URL 摘要：句子嵌入是指将原始文本转换为数字向量表示的一组有效且通用的技术，可用于广泛的自然语言处理（NLP）应用。这些技术大多是有监督或无监督的。与无监督方法相比，有监督方法对优化目标的假设较少，通常获得更好的结果。然而，训练需要大量的标记句子对，这在许多工业场景中是不可用的。为此，我们提出了一种通用的端到端方法——暂停（积极和退火的未标记句子嵌入），能够从部分标记的数据集中学习高质量的句子嵌入。我们的实验表明，在各种基准测试任务中，停顿仅使用了一小部分标记句子对，就达到了，有时甚至超过了，最先进的结果。当应用于实际的工业用例时，如果标签样本很少，PAUSE鼓励我们扩展数据集，而不需要大量的手动注释工作。摘要：Sentence embedding refers to a set of effective and versatile techniques for converting raw text into numerical vector representations that can be used in a wide range of natural language processing (NLP) applications. The majority of these techniques are either supervised or unsupervised. Compared to the unsupervised methods, the supervised ones make less assumptions about optimization objectives and usually achieve better results. However, the training requires a large amount of labeled sentence pairs, which is not available in many industrial scenarios. To that end, we propose a generic and end-to-end approach -- PAUSE (Positive and Annealed Unlabeled Sentence Embedding), capable of learning high-quality sentence embeddings from a partially labeled dataset. We experimentally show that PAUSE achieves, and sometimes surpasses, state-of-the-art results using only a small fraction of labeled sentence pairs on various benchmark tasks. When applied to a real industrial use case where labeled samples are scarce, PAUSE encourages us to extend our dataset without the liability of extensive manual annotation work.

【4】 PEEK: A Large Dataset of Learner Engagement with Educational Videos 标题：PEEK：学习者参与教育视频的大型数据集链接：https://arxiv.org/abs/2109.03154

作者：Sahan Bulathwela,Maria Perez-Ortiz,Erik Novak,Emine Yilmaz,John Shawe-Taylor 机构：SHAWE-TAYLOR, Centre for Artificial Intelligence, University College London (UK) 备注：To be published at ORSUM '21: 4th Workshop on Online Recommender Systems and User Modeling at ACM RecSys 2021 摘要：与电子商务和娱乐相关的推荐人相比，教育推荐人受到的关注要少得多，尽管高效的智能导师在提高学习收益方面具有巨大潜力。推进这一研究方向的主要挑战之一是缺乏大型、公开可用的数据集。在这项工作中，我们发布了一个大型、新颖的学习者数据集，这些学习者在野外参与教育视频。该数据集名为“知识主题个性化教育参与PEEK”，是第一个公开的此类数据集。视频讲座与维基百科中与讲座内容相关的概念相关联，从而提供了一种直观的分类法。我们相信，与丰富的内容表示一致的细粒度学习者参与信号将为构建强大的个性化算法铺平道路，从而彻底改变教育和信息推荐系统。为了实现这一目标，我们1）从一个流行的视频讲座库构建一个新的数据集，2）确定一组基准算法来建模参与度，3）在PEEK数据集上进行大量实验以证明其价值。我们对该数据集的实验表明，它有望构建功能强大的信息推荐系统。数据集和支持代码是公开的。摘要：Educational recommenders have received much less attention in comparison to e-commerce and entertainment-related recommenders, even though efficient intelligent tutors have great potential to improve learning gains. One of the main challenges in advancing this research direction is the scarcity of large, publicly available datasets. In this work, we release a large, novel dataset of learners engaging with educational videos in-the-wild. The dataset, named Personalised Educational Engagement with Knowledge Topics PEEK, is the first publicly available dataset of this nature. The video lectures have been associated with Wikipedia concepts related to the material of the lecture, thus providing a humanly intuitive taxonomy. We believe that granular learner engagement signals in unison with rich content representations will pave the way to building powerful personalization algorithms that will revolutionise educational and informational recommendation systems. Towards this goal, we 1) construct a novel dataset from a popular video lecture repository, 2) identify a set of benchmark algorithms to model engagement, and 3) run extensive experimentation on the PEEK dataset to demonstrate its value. Our experiments with the dataset show promise in building powerful informational recommender systems. The dataset and the support code is available publicly.

【5】 Efficient ADMM-based Algorithms for Convolutional Sparse Coding 标题：基于ADMM的高效卷积稀疏编码算法链接：https://arxiv.org/abs/2109.02969

作者：Farshad G. Veshki,Sergiy A. Vorobyov 机构： ϵ is the upper bound on the energy ofThe authors are with the Department of Signal Processing and Acoustics, Aalto University 摘要：卷积稀疏编码通过引入全局平移不变模型，改进了标准稀疏近似。最有效的卷积稀疏编码方法是基于乘法器交替方向法和卷积定理的。这些方法之间唯一的主要区别是如何处理卷积最小二乘拟合子问题。这封信提出了这个子问题的解决方案，它提高了最先进算法的效率。我们也使用同样的方法来开发一种有效的卷积字典学习方法。此外，我们还提出了一种新的卷积稀疏编码算法，该算法具有近似误差约束。摘要：Convolutional sparse coding improves on the standard sparse approximation by incorporating a global shift-invariant model. The most efficient convolutional sparse coding methods are based on the alternating direction method of multipliers and the convolution theorem. The only major difference between these methods is how they approach a convolutional least-squares fitting subproblem. This letter presents a solution to this subproblem, which improves the efficiency of the state-of-the-art algorithms. We also use the same approach for developing an efficient convolutional dictionary learning method. Furthermore, we propose a novel algorithm for convolutional sparse coding with a constraint on the approximation error.

【6】 Countering Online Hate Speech: An NLP Perspective 标题：打击网上仇恨言论：NLP视角链接：https://arxiv.org/abs/2109.02941

作者：Mudit Chaudhary,Chandni Saxena,Helen Meng 机构：The Chinese University of Hong Kong 备注：12 pages 摘要：网络仇恨言论引起了所有人的注意，因为它与新冠疫情、美国选举和全球抗议有关。在线毒害——在线仇恨行为的总称，以在线仇恨言论等形式表现出来。仇恨言论是针对个人或群体的蓄意攻击，其动机是目标实体的身份或观点。通过社交媒体进行的日益增多的大众传播进一步加剧了在线仇恨言论的有害后果。虽然已经有了大量关于使用自然语言处理（NLP）识别仇恨语音的研究，但利用NLP预防和干预在线仇恨语音的工作相对缺乏。本文提出了一个关于仇恨言语NLP对抗方法的整体概念框架，并对NLP对抗在线仇恨言语的最新进展进行了全面的调查。它根据对抗技术的作用时间对其进行分类，并确定该主题未来的潜在研究领域。摘要：Online hate speech has caught everyone's attention from the news related to the COVID-19 pandemic, US elections, and worldwide protests. Online toxicity - an umbrella term for online hateful behavior, manifests itself in forms such as online hate speech. Hate speech is a deliberate attack directed towards an individual or a group motivated by the targeted entity's identity or opinions. The rising mass communication through social media further exacerbates the harmful consequences of online hate speech. While there has been significant research on hate-speech identification using Natural Language Processing (NLP), the work on utilizing NLP for prevention and intervention of online hate speech lacks relatively. This paper presents a holistic conceptual framework on hate-speech NLP countering methods along with a thorough survey on the current progress of NLP for countering online hate speech. It classifies the countering techniques based on their time of action, and identifies potential future research areas on this topic.

【7】 Fishr: Invariant Gradient Variances for Out-of-distribution Generalization 标题：FISIR：离散型泛化的不变梯度方差链接：https://arxiv.org/abs/2109.02934

作者：Alexandre Rame,Corentin Dancette,Matthieu Cord 机构：Sorbonne Universit´e, CNRS, LIP, Paris, France, Valeo.ai 备注：31 pages, 12 tables, 6 figures 摘要：学习在数据分布发生变化时能够很好地概括的健壮模型对于实际应用至关重要。为此，人们对同时从多个训练领域学习的兴趣日益高涨，同时在这些领域中强制执行不同类型的不变性。然而，所有现有的方法都未能在公平评估协议下显示出系统性的好处。在本文中，我们提出了一种新的学习方案来在损失函数的梯度空间中实现域不变性：具体地说，我们引入了一个正则化项来匹配跨训练域的梯度的域级方差。关键的是，我们的策略名为Fishr，它与Fisher信息和损失的Hessian密切相关。我们表明，在学习过程中，强制域级梯度协方差相似最终会使域级损失景观围绕最终权重局部对齐。大量的实验证明了Fishr在非分布泛化中的有效性。特别是，Fishr改进了领域基准的最新技术，其性能明显优于经验风险最小化。该代码发布于https://github.com/alexrame/fishr. 摘要：Learning robust models that generalize well under changes in the data distribution is critical for real-world applications. To this end, there has been a growing surge of interest to learn simultaneously from multiple training domains - while enforcing different types of invariance across those domains. Yet, all existing approaches fail to show systematic benefits under fair evaluation protocols. In this paper, we propose a new learning scheme to enforce domain invariance in the space of the gradients of the loss function: specifically, we introduce a regularization term that matches the domain-level variances of gradients across training domains. Critically, our strategy, named Fishr, exhibits close relations with the Fisher Information and the Hessian of the loss. We show that forcing domain-level gradient covariances to be similar during the learning procedure eventually aligns the domain-level loss landscapes locally around the final weights. Extensive experiments demonstrate the effectiveness of Fishr for out-of-distribution generalization. In particular, Fishr improves the state of the art on the DomainBed benchmark and performs significantly better than Empirical Risk Minimization. The code is released at https://github.com/alexrame/fishr.

【8】 Refinement of Hottopixx and its Postprocessing 标题：Hottopixx的求精及其后处理链接：https://arxiv.org/abs/2109.02863

作者：Tomohiko Mizutani 备注：32 pages, 2 figures 摘要：Bittorf等人在NIPS 2012年提出的Hottopixx是一种在可分性假设下解决非负矩阵分解（NMF）问题的算法。可分离NMF在文档主题提取、高光谱图像分解等方面有着重要的应用。在此类应用中，算法对噪声的鲁棒性是成功的关键。Hottopixx已被证明对噪声具有鲁棒性，并且可以通过后处理进一步增强其鲁棒性。然而，也有一个缺点。Hottopixx及其后处理要求我们在运行之前估计要分解的矩阵中涉及的噪声级，因为它们将其用作输入数据的一部分。噪声级估计并非易事。在本文中，我们克服了这个缺点。我们提出了一种改进的Hottopixx及其后处理，在没有噪声级先验知识的情况下运行。结果表明，改进后的算法对噪声的鲁棒性几乎与原算法相同。摘要：Hottopixx, proposed by Bittorf et al. at NIPS 2012, is an algorithm for solving nonnegative matrix factorization (NMF) problems under the separability assumption. Separable NMFs have important applications, such as topic extraction from documents and unmixing of hyperspectral images. In such applications, the robustness of the algorithm to noise is the key to the success. Hottopixx has been shown to be robust to noise, and its robustness can be further enhanced through postprocessing. However, there is a drawback. Hottopixx and its postprocessing require us to estimate the noise level involved in the matrix we want to factorize before running, since they use it as part of the input data. The noise-level estimation is not an easy task. In this paper, we overcome this drawback. We present a refinement of Hottopixx and its postprocessing that runs without prior knowledge of the noise level. We show that the refinement has almost the same robustness to noise as the original algorithm.

【9】 ArGoT: A Glossary of Terms extracted from the arXiv 标题：行话：从arxiv中提取的术语词汇表链接：https://arxiv.org/abs/2109.02801

作者：Luis Berlioz 机构：This work is licensed under the, Creative Commons Attribution License., ArGoT: A Glossary of Terms extracted from the arXiv, University of Pittsburgh, Pennsylvania, USA 备注：None 摘要：我们介绍ArGoT，一组从arXiv网站上的文章中提取的数学术语。术语是文章中定义的任何数学概念。使用本文源代码中的标签和其他流行数学网站的示例，我们挖掘arXiv数据中的所有术语，并编译一个全面的数学术语词汇表。然后，可以使用术语的定义和arXiv的元数据将每个术语组织成依赖关系图。使用双曲线和标准单词嵌入，我们展示了这种结构如何反映在文本的向量表示中，以及它们如何捕捉数学概念中的蕴涵关系。该数据集是将自然数学文本与正式验证语句的现有交互式定理证明库（ITP）对齐的持续努力的一部分。摘要：We introduce ArGoT, a data set of mathematical terms extracted from the articles hosted on the arXiv website. A term is any mathematical concept defined in an article. Using labels in the article's source code and examples from other popular math websites, we mine all the terms in the arXiv data and compile a comprehensive vocabulary of mathematical terms. Each term can be then organized in a dependency graph by using the term's definitions and the arXiv's metadata. Using both hyperbolic and standard word embeddings, we demonstrate how this structure is reflected in the text's vector representation and how they capture relations of entailment in mathematical concepts. This data set is part of an ongoing effort to align natural mathematical text with existing Interactive Theorem Prover Libraries (ITPs) of formally verified statements.

【10】 Puzzle Solving without Search or Human Knowledge: An Unnatural Language Approach 标题：在没有搜索或人类知识的情况下解谜：一种非自然语言的方法链接：https://arxiv.org/abs/2109.02797

作者：David Noever,Ryerson Burdick 机构：PeopleTec, Inc., University of Maryland, College Park, Huntsville, AL, College Park, MD 摘要：生成式预训练转换器（GPT-2）在学习文本存档游戏符号方面的应用为探索稀疏奖励游戏提供了一个模型环境。事实证明，transformer架构适合于对描述迷宫、魔方和数独解算器的已解决文本档案进行训练。该方法得益于微调transformer架构，以可视化从人类启发式或领域专业知识的任何指导之外衍生的合理策略。游戏的大搜索空间（$>10^{19}$）提供了一个益智环境，在这个环境中，解决方案几乎没有中间奖励，只有解决挑战的最后一步。摘要：The application of Generative Pre-trained Transformer (GPT-2) to learn text-archived game notation provides a model environment for exploring sparse reward gameplay. The transformer architecture proves amenable to training on solved text archives describing mazes, Rubik's Cube, and Sudoku solvers. The method benefits from fine-tuning the transformer architecture to visualize plausible strategies derived outside any guidance from human heuristics or domain expertise. The large search space ($>10^{19}$) for the games provides a puzzle environment in which the solution has few intermediate rewards and a final move that solves the challenge.

【11】 Bringing a Ruler Into the Black Box: Uncovering Feature Impact from Individual Conditional Expectation Plots 标题：将标尺带入黑匣子：揭示单个条件期望值图对特征的影响链接：https://arxiv.org/abs/2109.02724

作者：Andrew Yeh,Anhthy Ngo 机构： The Wharton School of the University of Pennsylvania, Philadelphia PA , USA 备注：Accepted to Advances in Interpretable Machine Learning and Artificial Intelligence (AIMLAI) workshop in ECML PKDD 2021 摘要：随着机器学习系统变得越来越普遍，理解和解释这些模型的方法变得越来越重要。特别是，实践者通常对模型所依赖的特性以及模型如何依赖它们感兴趣——特性对模型预测的影响。先前关于特征影响的工作，包括部分依赖图（PDP）和个体条件期望图（ICE），主要集中在特征影响的视觉解释上。我们建议对具有ICE特征影响的ICE图进行自然扩展，这是从ICE图中提取的模型不可知、性能不可知的特征影响度量，可解释为与线性回归系数的近似。此外，我们还引入了ICE特征影响的分布内变量，以改变分布外点的影响以及表征特征影响的异质性和非线性措施。最后，我们使用真实数据演示了ICE特征影响在多个任务中的实用性。摘要：As machine learning systems become more ubiquitous, methods for understanding and interpreting these models become increasingly important. In particular, practitioners are often interested both in what features the model relies on and how the model relies on them--the feature's impact on model predictions. Prior work on feature impact including partial dependence plots (PDPs) and Individual Conditional Expectation (ICE) plots has focused on a visual interpretation of feature impact. We propose a natural extension to ICE plots with ICE feature impact, a model-agnostic, performance-agnostic feature impact metric drawn out from ICE plots that can be interpreted as a close analogy to linear regression coefficients. Additionally, we introduce an in-distribution variant of ICE feature impact to vary the influence of out-of-distribution points as well as heterogeneity and non-linearity measures to characterize feature impact. Lastly, we demonstrate ICE feature impact's utility in several tasks using real-world data.

【12】 Iterative Pseudo-Labeling with Deep Feature Annotation and Confidence-Based Sampling 标题：基于深度特征标注和置信度采样的迭代伪标注链接：https://arxiv.org/abs/2109.02717

作者：Barbara C Benato,Alexandru C Telea,Alexandre X Falcão 机构：Laboratory of Image Data Science, Institute of Computing, University of Campinas, br† Department of Information and Computing Sciences, Utrecht University 摘要：当无法获得大的带注释的数据集时，训练深度神经网络是一项挑战。对数据样本进行广泛的手动注释非常耗时、昂贵且容易出错，尤其是在需要专家进行注释时。为了解决这个问题，人们越来越关注将不确定标签（也称为伪标签）传播到大量无监督样本并用于训练模型的技术。然而，这些技术仍然需要训练集中每类数百个监督样本和一个带有额外监督样本的验证集来调整模型。我们改进了最新的迭代伪标记技术deepfeatureannotation（DeepFA），通过选择最可靠的无监督样本来迭代训练deepneural网络。我们基于置信度的抽样策略只依赖于每个类几十个带注释的训练样本，没有验证集，这大大减少了用户在数据注释中的工作量。我们首先确定基线的最佳配置——一个自我训练的深度神经网络——然后评估不同置信阈值的置信度深度FA。在六个数据集上的实验表明，DeepFA已经优于自训练基线，但信心DeepFA可以大大优于原始DeepFA和基线。摘要：Training deep neural networks is challenging when large and annotated datasets are unavailable. Extensive manual annotation of data samples is time-consuming, expensive, and error-prone, notably when it needs to be done by experts. To address this issue, increased attention has been devoted to techniques that propagate uncertain labels (also called pseudo labels) to large amounts of unsupervised samples and use them for training the model. However, these techniques still need hundreds of supervised samples per class in the training set and a validation set with extra supervised samples to tune the model. We improve a recent iterative pseudo-labeling technique, Deep Feature Annotation (DeepFA), by selecting the most confident unsupervised samples to iteratively train a deep neural network. Our confidence-based sampling strategy relies on only dozens of annotated training samples per class with no validation set, considerably reducing user effort in data annotation. We first ascertain the best configuration for the baseline -- a self-trained deep neural network -- and then evaluate our confidence DeepFA for different confidence thresholds. Experiments on six datasets show that DeepFA already outperforms the self-trained baseline, but confidence DeepFA can considerably outperform the original DeepFA and the baseline.

【13】 On the Out-of-distribution Generalization of Probabilistic Image Modelling 标题：关于概率图像建模的非分布泛化链接：https://arxiv.org/abs/2109.02639

作者：Mingtian Zhang,Andi Zhang,Steven McDonagh 机构：AI Center, University College London, Computer Science, University of Cambridge, Huawei Noah’s Ark Lab 摘要：分布外（OOD）检测和无损压缩构成了两个问题，可以通过在第一个数据集上训练概率模型，然后在数据分布不同的第二个数据集上进行似然评估来解决。通过定义概率模型的似然泛化，我们表明，在图像模型的情况下，OOD泛化能力由局部特征决定。这促使我们提出局部自回归模型，专门对局部图像特征建模，以提高OOD性能。我们将所提出的模型应用于OOD检测任务，并在不引入额外数据的情况下实现最先进的无监督OOD检测性能。此外，我们使用我们的模型构建了一个新的无损图像压缩程序：NeLLoC（神经局部无损压缩程序），并报告了最先进的压缩率和模型大小。摘要：Out-of-distribution (OOD) detection and lossless compression constitute two problems that can be solved by the training of probabilistic models on a first dataset with subsequent likelihood evaluation on a second dataset, where data distributions differ. By defining the generalization of probabilistic models in terms of likelihood we show that, in the case of image models, the OOD generalization ability is dominated by local features. This motivates our proposal of a Local Autoregressive model that exclusively models local image features towards improving OOD performance. We apply the proposed model to OOD detection tasks and achieve state-of-the-art unsupervised OOD detection performance without the introduction of additional data. Additionally, we employ our model to build a new lossless image compressor: NeLLoC (Neural Local Lossless Compressor) and report state-of-the-art compression rates and model size.

【14】 Motion Artifact Reduction In Photoplethysmography For Reliable Signal Selection 标题：用于可靠信号选择的光体积描记中运动伪影的消除链接：https://arxiv.org/abs/2109.02755

作者：Runyu Mao,Mackenzie Tweardy,Stephan W. Wegerich,Craig J. Goergen,George R. Wodicka,Fengqing Zhu 摘要：光体积描记术（PPG）是一种无创、经济的提取人体生命体征的技术。虽然PPG信号已广泛应用于消费级和研究级腕部设备，以跟踪用户的生理，但PPG信号对运动非常敏感，这可能会破坏信号的质量。现有的运动伪影（MA）减少技术已通过使用合成噪声信号或高强度活动期间收集的信号进行开发和评估，这两种方法都很难推广到实际场景中。因此，在日常生活活动（ADL）中采集真实的PPG信号，以开发实用的信号去噪和分析方法是非常有价值的。在这项工作中，我们提出了一种自动伪干净PPG生成过程，用于可靠的PPG信号选择。对于每个有噪声的PPG段，相应的伪干净PPG减少了MAs，并包含描述心脏特征的丰富时间细节。我们的实验结果表明，从ADL采集的伪清洁PPG中有71%可以被认为是高质量的片段，其中心率和呼吸频率的MAE分别为1.46 BPM和3.93 BrPM。因此，我们提出的方法可以通过考虑相应伪干净PPG信号的质量来确定原始噪声PPG的可靠性。摘要：Photoplethysmography (PPG) is a non-invasive and economical technique to extract vital signs of the human body. Although it has been widely used in consumer and research grade wrist devices to track a user's physiology, the PPG signal is very sensitive to motion which can corrupt the signal's quality. Existing Motion Artifact (MA) reduction techniques have been developed and evaluated using either synthetic noisy signals or signals collected during high-intensity activities - both of which are difficult to generalize for real-life scenarios. Therefore, it is valuable to collect realistic PPG signals while performing Activities of Daily Living (ADL) to develop practical signal denoising and analysis methods. In this work, we propose an automatic pseudo clean PPG generation process for reliable PPG signal selection. For each noisy PPG segment, the corresponding pseudo clean PPG reduces the MAs and contains rich temporal details depicting cardiac features. Our experimental results show that 71% of the pseudo clean PPG collected from ADL can be considered as high quality segment where the derived MAE of heart rate and respiration rate are 1.46 BPM and 3.93 BrPM, respectively. Therefore, our proposed method can determine the reliability of the raw noisy PPG by considering quality of the corresponding pseudo clean PPG signal.

机器翻译，仅供参考

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-09-08，如有侵权请联系 cloudcommunity@tencent.com 删除

linux