人工智能学术速递[6.17]

公众号-arXiv每日学术速递

发布于 2021-07-02 19:00:37

1.2K0

发布于 2021-07-02 19:00:37

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.AI人工智能，共计57篇

【1】 Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate 标题：使用可微分代理的约束构型分期综合

作者：Xingyuan Sun,Tianju Xue,Szymon M. Rusinkiewicz,Ryan P. Adams 备注：16 pages, 9 figures 链接：https://arxiv.org/abs/2106.09019 摘要：在设计、制造和控制问题中，我们经常面临着综合任务，在综合任务中，我们必须生成满足一组约束的对象或配置，同时使一个或多个目标函数最大化。综合问题的典型特征是一个物理过程，在这个物理过程中，许多不同的实现可能实现目标。这种多对一映射对前馈综合的监督学习提出了挑战，因为可行的设计集可能具有复杂的结构。此外，许多物理模拟的不可微性妨碍了直接优化。我们解决这两个问题与两个阶段的神经网络架构，我们可以认为是一个自动编码器。我们首先学习解码器：一个近似多对一物理实现过程的可微代理。然后我们学习编码器，它从目标映射到设计，同时使用固定的解码器来评估实现的质量。我们评估了两个案例研究的方法：挤出机路径规划在加性制造和约束软机器人逆运动学。我们比较了我们的方法直接优化设计使用学习代理，并监督学习的综合问题。我们发现，我们的方法产生了比监督学习更高质量的解决方案，同时与直接优化在质量上具有竞争力，在计算成本大大降低。摘要：In design, fabrication, and control problems, we are often faced with the task of synthesis, in which we must generate an object or configuration that satisfies a set of constraints while maximizing one or more objective functions. The synthesis problem is typically characterized by a physical process in which many different realizations may achieve the goal. This many-to-one map presents challenges to the supervised learning of feed-forward synthesis, as the set of viable designs may have a complex structure. In addition, the non-differentiable nature of many physical simulations prevents direct optimization. We address both of these problems with a two-stage neural network architecture that we may consider to be an autoencoder. We first learn the decoder: a differentiable surrogate that approximates the many-to-one physical realization process. We then learn the encoder, which maps from goal to design, while using the fixed decoder to evaluate the quality of the realization. We evaluate the approach on two case studies: extruder path planning in additive manufacturing and constrained soft robot inverse kinematics. We compare our approach to direct optimization of design using the learned surrogate, and to supervised learning of the synthesis problem. We find that our approach produces higher quality solutions than supervised learning, while being competitive in quality with direct optimization, at a greatly reduced computational cost.

【2】 End-to-End Semi-Supervised Object Detection with Soft Teacher 标题：基于软教师的端到端半监督目标检测

作者：Mengde Xu,Zheng Zhang,Han Hu,Jianfeng Wang,Lijuan Wang,Fangyun Wei,Xiang Bai,Zicheng Liu 链接：https://arxiv.org/abs/2106.09018 摘要：本文提出了一种端到端的半监督目标检测方法，与以往较为复杂的多阶段方法相比。在课程中，端到端的训练逐渐提高了伪标签的质量，而越来越精确的伪标签又有利于目标检测训练。在此框架下，我们还提出了两种简单而有效的方法：一种软教师机制，其中每个未标记边界框的分类损失由教师网络产生的分类分数来衡量；为盒回归学习选择可靠伪盒的盒抖动方法。在COCO基准上，在不同的标记率（1%、5%和10%）下，该方法比以往的方法有了很大的改进。此外，当标记数据量相对较大时，我们的方法也表现良好。例如，通过利用COCO的123K未标记图像，它可以改进使用+3.6map设置的完整COCO训练集训练的40.9map基线检测器，达到44.5map。在最先进的基于Swin-Transformer的目标检测仪（test dev上为58.9map）上，仍能显著提高+1.5map的检测精度，达到60.4map；实例分割精度提高+1.2map，达到52.4map，推动了新的技术水平。摘要：This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods. The end-to-end training gradually improves pseudo label qualities during the curriculum, and the more and more accurate pseudo labels in turn benefit object detection training. We also propose two simple yet effective techniques within this framework: a soft teacher mechanism where the classification loss of each unlabeled bounding box is weighed by the classification score produced by the teacher network; a box jittering approach to select reliable pseudo boxes for the learning of box regression. On COCO benchmark, the proposed approach outperforms previous methods by a large margin under various labeling ratios, i.e. 1\%, 5\% and 10\%. Moreover, our approach proves to perform also well when the amount of labeled data is relatively large. For example, it can improve a 40.9 mAP baseline detector trained using the full COCO training set by +3.6 mAP, reaching 44.5 mAP, by leveraging the 123K unlabeled images of COCO. On the state-of-the-art Swin Transformer-based object detector (58.9 mAP on test-dev), it can still significantly improve the detection accuracy by +1.5 mAP, reaching 60.4 mAP, and improve the instance segmentation accuracy by +1.2 mAP, reaching 52.4 mAP, pushing the new state-of-the-art.

【3】 An Intelligent Question Answering System based on Power Knowledge Graph 标题：一种基于幂知识图的智能答疑系统

作者：Yachen Tang,Haiyun Han,Xianmao Yu,Jing Zhao,Guangyi Liu,Longfei Wei 备注：5 pages,6 figures, IEEE General Meeting 2020 链接：https://arxiv.org/abs/2106.09013 摘要：智能问答系统通过理解自然语言问题，从海量知识库中高效地搜索相关内容，并将答案直接返回给用户，从而准确地捕捉用户的搜索意图。由于IQA系统在数据搜索和推理方面可以节省大量的时间和人力，因此在数据科学和人工智能领域受到越来越多的关注。介绍了利用图形数据库和图形计算技术，从电力行业海量异构数据中提取领域知识图的方法。然后提出了一种基于电力知识图的智能问答系统，利用自然语言处理（NLP）方法提取自然询问的意图和约束，通过知识推理构造图形数据查询语句，并完成准确的知识搜索和分析，为用户提供直观的可视化。该方法充分结合了知识图和图计算的特点，实现了海量知识的高速多跳知识关联推理分析。本文的工作也为上下文感知智能问答提供了基础。摘要：The intelligent question answering (IQA) system can accurately capture users' search intention by understanding the natural language questions, searching relevant content efficiently from a massive knowledge-base, and returning the answer directly to the user. Since the IQA system can save inestimable time and workforce in data search and reasoning, it has received more and more attention in data science and artificial intelligence. This article introduced a domain knowledge graph using the graph database and graph computing technologies from massive heterogeneous data in electric power. It then proposed an IQA system based on the electrical power knowledge graph to extract the intent and constraints of natural interrogation based on the natural language processing (NLP) method, to construct graph data query statements via knowledge reasoning, and to complete the accurate knowledge search and analysis to provide users with an intuitive visualization. This method thoroughly combined knowledge graph and graph computing characteristics, realized high-speed multi-hop knowledge correlation reasoning analysis in tremendous knowledge. The proposed work can also provide a basis for the context-aware intelligent question and answer.

【4】 End-to-End Spoken Language Understanding for Generalized Voice Assistants 标题：通用语音助理的端到端口语理解

作者：Michael Saxon,Samridhi Choudhary,Joseph P. McKenna,Athanasios Mouchtaris 备注：Accepted to Interspeech 2021; 5 pages, 2 tables, 1 figure 链接：https://arxiv.org/abs/2106.09009 摘要：端到端（E2E）口语理解（SLU）系统使用单一模型直接从语音中预测话语语义。这方面的工作主要集中在固定域中的目标任务，其中输出语义结构是先验的，输入语音的复杂性是有限的。在这项工作中，我们提出了我们的方法来开发一个E2E模型的广义SLU在商业语音助理（VAs）。我们提出了一个完全可微的、基于Transformer的、层次化的系统，可以在ASR和NLU两个层次上进行预训练。然后对转录和语义分类损失进行微调，以处理不同的意图和参数组合。这导致SLU系统在复杂的内部广义VA数据集上实现了显著的基线改进，准确率提高了43%，同时仍然满足流行的Fluent Speech Commands数据集上99%的准确率基准。我们在一个硬测试集上进一步评估了我们的模型，该测试集只包含训练中看不到的时隙参数，并展示了近20%的改进，显示了我们的方法在真正苛刻的VA场景中的有效性。摘要：End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model. Previous work in this area has focused on targeted tasks in fixed domains, where the output semantic structure is assumed a priori and the input speech is of limited complexity. In this work we present our approach to developing an E2E model for generalized SLU in commercial voice assistants (VAs). We propose a fully differentiable, transformer-based, hierarchical system that can be pretrained at both the ASR and NLU levels. This is then fine-tuned on both transcription and semantic classification losses to handle a diverse set of intent and argument combinations. This leads to an SLU system that achieves significant improvements over baselines on a complex internal generalized VA dataset with a 43% improvement in accuracy, while still meeting the 99% accuracy benchmark on the popular Fluent Speech Commands dataset. We further evaluate our model on a hard test set, exclusively containing slot arguments unseen in training, and demonstrate a nearly 20% improvement, showing the efficacy of our approach in truly demanding VA scenarios.

【5】 Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data 标题：具有小的强标注数据和大的弱标注数据的命名实体识别

作者：Haoming Jiang,Danqing Zhang,Tianyu Cao,Bing Yin,Tuo Zhao 备注：The 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021) 链接：https://arxiv.org/abs/2106.08977 摘要：弱监督在许多自然语言处理任务中显示出了良好的效果，如命名实体识别（NER）。现有的研究主要集中在只学习弱监督下的deep-NER模型，即不需要任何人工标注，并且表明仅使用弱标记数据可以获得很好的性能，但是使用人工/强标记数据仍然不如完全监督的NER。在本文中，我们考虑一个更实际的场景，其中我们有少量的强标记数据和大量的弱标记数据。不幸的是，我们观察到，当我们在强标记和弱标记数据的简单或加权组合上训练deep-NER模型时，弱标记数据不一定改善甚至恶化模型性能（由于弱标记中的广泛噪声）。为了解决这个问题，我们提出了一个新的多阶段计算框架——针，它包含三个基本要素：（1）弱标记完成；（2）噪声感知损失函数；（3）强标记数据的最终微调。通过在电子商务查询引擎和生物医学引擎上的实验，证明了NEEDLE算法能够有效地抑制弱标签的噪声，并优于现有的方法。特别是，我们在3个生物医学NER数据集上获得了新的SOTA F1分数：BC5CDR化学93.74，BC5CDR疾病90.69，NCBI疾病92.28。摘要：Weak supervision has shown promising results in many natural language processing tasks, such as Named Entity Recognition (NER). Existing work mainly focuses on learning deep NER models only with weak supervision, i.e., without any human annotation, and shows that by merely using weakly labeled data, one can achieve good performance, though still underperforms fully supervised NER with manually/strongly labeled data. In this paper, we consider a more practical scenario, where we have both a small amount of strongly labeled data and a large amount of weakly labeled data. Unfortunately, we observe that weakly labeled data does not necessarily improve, or even deteriorate the model performance (due to the extensive noise in the weak labels) when we train deep NER models over a simple or weighted combination of the strongly labeled and weakly labeled data. To address this issue, we propose a new multi-stage computational framework -- NEEDLE with three essential ingredients: (1) weak label completion, (2) noise-aware loss function, and (3) final fine-tuning over the strongly labeled data. Through experiments on E-commerce query NER and Biomedical NER, we demonstrate that NEEDLE can effectively suppress the noise of the weak labels and outperforms existing methods. In particular, we achieve new SOTA F1-scores on 3 Biomedical NER datasets: BC5CDR-chem 93.74, BC5CDR-disease 90.69, NCBI-disease 92.28.

【6】 A Predictive Coding Account for Chaotic Itinerancy 标题：一种混沌巡游的预测编码方法

作者：Louis Annabi,Alexandre Pitti,Mathias Quoy 链接：https://arxiv.org/abs/2106.08937 摘要：作为一种动态系统中允许在稳定行为之间进行自主切换的现象，混沌巡游已经引起了神经机器人研究的兴趣。在这项研究中，我们通过展示一个实现预测编码的递归神经网络如何在存在输入噪声的情况下产生类似于混沌轨迹的神经轨迹，将这一现象与预测编码理论联系起来。我们提出了两个方案，生成随机和过去独立的吸引子切换轨迹使用我们的模型。摘要：As a phenomenon in dynamical systems allowing autonomous switching between stable behaviors, chaotic itinerancy has gained interest in neurorobotics research. In this study, we draw a connection between this phenomenon and the predictive coding theory by showing how a recurrent neural network implementing predictive coding can generate neural trajectories similar to chaotic itinerancy in the presence of input noise. We propose two scenarios generating random and past-independent attractor switching trajectories using our model.

【7】 Development of Quantized DNN Library for Exact Hardware Emulation 标题：面向精确硬件仿真的量化DNN库的开发

作者：Masato Kiyama,Motoki Amagasaki,Masahiro Iida 链接：https://arxiv.org/abs/2106.08892 摘要：在AI芯片等边缘设备上运行深度神经网络（DNN）时，量化可以加快执行速度和节省功耗。为了研究量化的效果，我们需要先用32位浮点精度将DNN的权值量化一定的位宽，然后再量化回32位浮点精度。这是因为DNN库只能处理浮点数。然而，仿真的准确性并不能提供精确的精度。我们需要精确的精度来检测MAC操作中的溢出或验证边缘设备上的操作。我们已经开发了PyParch，一个DNN库，它以与硬件完全相同的性能执行量化DNN（QNN）。在本文中，我们描述了一个新的方案和实现PyParch。评估结果表明，对于像YOLOv5这样的复杂dnn和large，可以估计任意比特宽度的qnn的精度，并且可以检测到溢出。我们评估了仿真时间的开销，发现与正常DNN执行时间相比，QNN和带溢出检测的QNN分别慢5.6倍和42倍。摘要：Quantization is used to speed up execution time and save power when runnning Deep neural networks (DNNs) on edge devices like AI chips. To investigate the effect of quantization, we need performing inference after quantizing the weights of DNN with 32-bit floating-point precision by a some bit width, and then quantizing them back to 32-bit floating-point precision. This is because the DNN library can only handle floating-point numbers. However, the accuracy of the emulation does not provide accurate precision. We need accurate precision to detect overflow in MAC operations or to verify the operation on edge de vices. We have developed PyParch, a DNN library that executes quantized DNNs (QNNs) with exactly the same be havior as hardware. In this paper, we describe a new proposal and implementation of PyParch. As a result of the evaluation, the accuracy of QNNs with arbitrary bit widths can be estimated for la rge and complex DNNs such as YOLOv5, and the overflow can be detected. We evaluated the overhead of the emulation time and found that it was 5.6 times slower for QNN and 42 times slower for QNN with overflow detection compared to the normal DNN execution time.

【8】 ModelDiff: Testing-Based DNN Similarity Comparison for Model Reuse Detection 标题：ModelDiff：基于测试的DNN相似度比较模型重用检测

作者：Yuanchun Li,Ziqi Zhang,Bingyan Liu,Ziyue Yang,Yunxin Liu 备注：ISSTA 2021 链接：https://arxiv.org/abs/2106.08890 摘要：深度学习模式的知识可能会转移到学生模式，导致知识产权侵权或漏洞传播。检测这样的知识重用是非常重要的，因为可疑的模型可能不是白盒可访问的和/或可能服务于不同的任务。本文提出了一种基于测试的深度学习模型相似性比较方法ModelDiff。我们不是直接比较两个模型的权重、激活或输出，而是在同一组测试输入上比较它们的行为模式。具体来说，模型的行为模式被表示为决策距离向量（DDV），其中每个元素是模型对一对输入的反应之间的距离。两个模型之间的知识相似度用其ddv之间的余弦相似度来度量。为了评估ModelDiff，我们创建了一个包含144对模型的基准，这些模型涵盖了最流行的模型重用方法，包括转移学习、模型压缩和模型窃取。我们的方法在基准测试上达到了91.7%的正确率，证明了使用ModelDiff进行模型重用检测的有效性。一项针对移动深度学习应用程序的研究表明，ModelDiff在现实世界模型上是可行的。摘要：The knowledge of a deep learning model may be transferred to a student model, leading to intellectual property infringement or vulnerability propagation. Detecting such knowledge reuse is nontrivial because the suspect models may not be white-box accessible and/or may serve different tasks. In this paper, we propose ModelDiff, a testing-based approach to deep learning model similarity comparison. Instead of directly comparing the weights, activations, or outputs of two models, we compare their behavioral patterns on the same set of test inputs. Specifically, the behavioral pattern of a model is represented as a decision distance vector (DDV), in which each element is the distance between the model's reactions to a pair of inputs. The knowledge similarity between two models is measured with the cosine similarity between their DDVs. To evaluate ModelDiff, we created a benchmark that contains 144 pairs of models that cover most popular model reuse methods, including transfer learning, model compression, and model stealing. Our method achieved 91.7% correctness on the benchmark, which demonstrates the effectiveness of using ModelDiff for model reuse detection. A study on mobile deep learning apps has shown the feasibility of ModelDiff on real-world models.

【9】 Bandit Modeling of Map Selection in Counter-Strike: Global Offensive 标题：反打击：全球攻势中地图选择的强盗建模

作者：Guido Petri,Michael H. Stanley,Alec B. Hon,Alexander Dong,Peter Xenopoulos,Cláudio Silva 备注：6 pages, 3 figures, IJCAI-AISA 2021 链接：https://arxiv.org/abs/2106.08888 摘要：许多电子竞技在比赛开始前都会使用一个取缔过程来定义比赛的参数。在反击：全球进攻（CSGO）比赛中，两队首先选择和禁止地图，或虚拟世界，发挥。团队通常会基于各种因素来禁止和选择地图，比如禁止他们不练习的地图，或者根据团队最近的表现来选择地图。我们引入一个上下文bandit框架来解决CSGO中的地图选择问题，并研究团队的取缔决策。使用超过3500个CSGO匹配和超过25000个地图选择决策的数据集，我们考虑问题的不同框架、不同的上下文和不同的奖励指标。我们发现团队在挑选和禁止方面都有次优的地图选择策略。我们还定义了一种奖励ban的方法，这在bandit环境中还没有被探索过，并且发现加入ban奖励可以提高模型的性能。最后，我们确定，使用我们的模型可以提高球队的预测地图获胜概率高达11%，提高整体比赛获胜概率为19.8%，为均匀匹配的球队。摘要：Many esports use a pick and ban process to define the parameters of a match before it starts. In Counter-Strike: Global Offensive (CSGO) matches, two teams first pick and ban maps, or virtual worlds, to play. Teams typically ban and pick maps based on a variety of factors, such as banning maps which they do not practice, or choosing maps based on the team's recent performance. We introduce a contextual bandit framework to tackle the problem of map selection in CSGO and to investigate teams' pick and ban decision-making. Using a data set of over 3,500 CSGO matches and over 25,000 map selection decisions, we consider different framings for the problem, different contexts, and different reward metrics. We find that teams have suboptimal map choice policies with respect to both picking and banning. We also define an approach for rewarding bans, which has not been explored in the bandit setting, and find that incorporating ban rewards improves model performance. Finally, we determine that usage of our model could improve teams' predicted map win probability by up to 11% and raise overall match win probabilities by 19.8% for evenly-matched teams.

【10】 Grounding Spatio-Temporal Language with Transformers 标题：用Transformer将时空语言接地

作者：Tristan Karch,Laetitia Teodorescu,Katja Hofmann,Clément Moulin-Frier,Pierre-Yves Oudeyer 备注：Contains main article and supplementaries 链接：https://arxiv.org/abs/2106.08858 摘要：语言是与外部世界的接口。为了让具身代理使用它，语言必须建立在其他感觉运动模式的基础上。虽然有大量的文献研究机器如何学习扎根语言，但如何学习时空语言概念的主题仍然是个未知数。为了在这个方向上取得进展，我们在这里介绍了一个新的时空语言基础任务，其目标是学习具体化代理的行为轨迹的时空描述的意义。这是通过训练真值函数来实现的，真值函数可以预测描述是否与给定的观测历史相匹配。描述包括过去时态和现在时态的时间扩展谓词，以及对场景中对象的时空引用。为了研究架构偏差在这个任务中的作用，我们训练了几个模型，包括多模态Transformer架构；后者实现了跨空间和跨时间的单词和对象之间不同的注意力计算。我们测试了两类泛化模型：1）对随机出现的句子进行泛化；2）语法原语的泛化。我们观察到，在我们的Transformer的注意力计算中保持对象的同一性有助于总体上获得良好的泛化性能，并且在单个标记中总结对象轨迹对性能的影响很小。然后，我们将讨论如何为语言引导的自主体现代理打开新的视角。我们还发布了开放源码许可证下的代码，以及预先训练的模型和数据集，以鼓励更广泛的社区在未来建立和扩展我们的工作。摘要：Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temporal language grounding task where the goal is to learn the meaning of spatio-temporal descriptions of behavioral traces of an embodied agent. This is achieved by training a truth function that predicts if a description matches a given history of observations. The descriptions involve time-extended predicates in past and present tense as well as spatio-temporal references to objects in the scene. To study the role of architectural biases in this task, we train several models including multimodal Transformer architectures; the latter implement different attention computations between words and objects across space and time. We test models on two classes of generalization: 1) generalization to randomly held-out sentences; 2) generalization to grammar primitives. We observe that maintaining object identity in the attention computation of our Transformers is instrumental to achieving good performance on generalization overall, and that summarizing object traces in a single token has little influence on performance. We then discuss how this opens new perspectives for language-guided autonomous embodied agents. We also release our code under open-source license as well as pretrained models and datasets to encourage the wider community to build upon and extend our work in the future.

【11】 Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization 标题：超越Tikhonov：通过迭代正则化在具有自协调损失的情况下更快地学习

作者：Gaspard Beugnot,Julien Mairal,Alessandro Rudi 链接：https://arxiv.org/abs/2106.08855 摘要：谱滤波理论是理解核学习统计特性的重要工具。对于最小二乘法，它允许导出各种正则化方案，这些方案产生比Tikhonov正则化更快的超额风险收敛速度。这通常是通过利用称为源和容量条件的经典假设来实现的，这些假设表征了学习任务的难度。为了理解从其他损失函数导出的估计量，Marteau-Ferey等人将Tikhonov正则化理论推广到广义自洽损失函数（GSC），其中包含例如logistic损失。本文进一步证明了迭代Tikhonov正则化方法能使GSC达到快速最优速率，它与优化中的近点法有着内在的联系，克服了经典Tikhonov正则化方法的局限性。摘要：The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels. For least squares, it allows to derive various regularization schemes that yield faster convergence rates of the excess risk than with Tikhonov regularization. This is typically achieved by leveraging classical assumptions called source and capacity conditions, which characterize the difficulty of the learning task. In order to understand estimators derived from other loss functions, Marteau-Ferey et al. have extended the theory of Tikhonov regularization to generalized self concordant loss functions (GSC), which contain, e.g., the logistic loss. In this paper, we go a step further and show that fast and optimal rates can be achieved for GSC by using the iterated Tikhonov regularization scheme, which is intrinsically related to the proximal point method in optimization, and overcomes the limitation of the classical Tikhonov regularization.

【12】 Algorithm to Compilation Codesign: An Integrated View of Neural Network Sparsity 标题：编译协同设计算法：神经网络稀疏性的综合观点

作者：Fu-Ming Guo,Austin Huang 链接：https://arxiv.org/abs/2106.08846 摘要：减少神经网络的计算量、推理延迟和内存占用是剪枝和稀疏性的研究动机。然而，实现这些好处以及理解算法设计和正则化对运行时执行的端到端影响通常没有深入研究。在这里，我们将结构化和非结构化剪枝应用于BERT语言模型的变换块的注意权重，同时还扩展了TVM编译器中的块稀疏表示（BSR）操作。BSR操作的集成使得TVM运行时执行能够利用由模型正则化引起的结构化模式稀疏性。这种修剪算法的集成视图使我们能够研究建模决策之间的关系及其对稀疏性增强执行的直接影响。我们的主要发现是：1）我们验证了结构化稀疏块正则化的性能优势必须通过对TVM的BSR增强来实现，相对于vanilla PyTorch有4倍的加速比，相对于标准TVM编译有2.2倍的加速比（没有扩展的BSR支持）。2）对于注意权重，在这个CPU推断上下文中，端到端的最优块稀疏形状不是一个正方形块（如{gray2017gpu}），而是一个线性32x1块3）性能和块大小/形状之间的关系暗示了模型正则化参数如何与任务调度器优化交互，从而导致观察到的结果端到端性能。摘要：Reducing computation cost, inference latency, and memory footprint of neural networks are frequently cited as research motivations for pruning and sparsity. However, operationalizing those benefits and understanding the end-to-end effect of algorithm design and regularization on the runtime execution is not often examined in depth. Here we apply structured and unstructured pruning to attention weights of transformer blocks of the BERT language model, while also expanding block sparse representation (BSR) operations in the TVM compiler. Integration of BSR operations enables the TVM runtime execution to leverage structured pattern sparsity induced by model regularization. This integrated view of pruning algorithms enables us to study relationships between modeling decisions and their direct impact on sparsity-enhanced execution. Our main findings are: 1) we validate that performance benefits of structured sparsity block regularization must be enabled by the BSR augmentations to TVM, with 4x speedup relative to vanilla PyTorch and 2.2x speedup relative to standard TVM compilation (without expanded BSR support). 2) for BERT attention weights, the end-to-end optimal block sparsity shape in this CPU inference context is not a square block (as in \cite{gray2017gpu}) but rather a linear 32x1 block 3) the relationship between performance and block size / shape is is suggestive of how model regularization parameters interact with task scheduler optimizations resulting in the observed end-to-end performance.

【13】 Costs and Benefits of Wasserstein Fair Regression 标题：瓦瑟斯坦公平回归的成本与收益

作者：Han Zhao 链接：https://arxiv.org/abs/2106.08812 摘要：在现实世界中，机器学习工具在高风险领域的应用通常被规定为公平的，即预测的目标应该满足一些关于受保护属性的等价性的定量概念。然而，公平性和准确性与实值目标之间的确切权衡尚不清楚。在本文中，我们通过提供一个关于任何公平回归器误差的下界来描述回归设置中统计奇偶性和准确性之间的内在权衡。我们的下界是尖锐的，算法无关，并允许一个简单的解释：当目标的时刻不同的群体，任何公平的算法必须作出一个大的错误，至少一个群体。我们进一步扩展了这个结果，给出了任意（近似）公平算法联合误差的一个下界，利用Wasserstein距离来度量近似的质量。另一方面，我们建立了个体公平性、准确性平价和瓦塞尔斯坦距离之间的第一个联系，证明了如果一个回归变量是个体公平的，它也近似地验证了准确性平价，其中差距由两组之间的瓦塞尔斯坦距离给出。受理论结果的启发，我们从表征学习的角度提出了一种公平回归的实用算法，并在真实数据集上进行了实验验证。摘要：Real-world applications of machine learning tools in high-stakes domains are often regulated to be fair, in the sense that the predicted target should satisfy some quantitative notion of parity with respect to a protected attribute. However, the exact tradeoff between fairness and accuracy with a real-valued target is not clear. In this paper, we characterize the inherent tradeoff between statistical parity and accuracy in the regression setting by providing a lower bound on the error of any fair regressor. Our lower bound is sharp, algorithm-independent, and admits a simple interpretation: when the moments of the target differ between groups, any fair algorithm has to make a large error on at least one of the groups. We further extend this result to give a lower bound on the joint error of any (approximately) fair algorithm, using the Wasserstein distance to measure the quality of the approximation. On the upside, we establish the first connection between individual fairness, accuracy parity, and the Wasserstein distance by showing that if a regressor is individually fair, it also approximately verifies the accuracy parity, where the gap is given by the Wasserstein distance between the two groups. Inspired by our theoretical results, we develop a practical algorithm for fair regression through the lens of representation learning, and conduct experiments on a real-world dataset to corroborate our findings.

【14】 PRASEMap: A Probabilistic Reasoning and Semantic Embedding based Knowledge Graph Alignment System 标题：PRASEMap：一种基于概率推理和语义嵌入的知识图对齐系统

作者：Zhiyuan Qi,Ziheng Zhang,Jiaoyan Chen,Xi Chen,Yefeng Zheng 链接：https://arxiv.org/abs/2106.08801 摘要：知识图对齐的目的是寻找两个知识图之间的等价实体和关系（即映射）。现有的方法要么利用基于推理的技术，要么利用基于语义嵌入的技术，但很少有研究探讨它们的结合。在这个演示中，我们介绍了PRASEMap，一个无监督的KG对齐系统，它使用概率推理（PR）和语义嵌入（SE）技术迭代计算映射。PRASEMap可以作为SE模块支持各种基于嵌入的KG对齐方法，并且支持简单的人机交互，这还为用户提供了一个选项，可以将映射注释反馈给系统以获得更好的结果。该演示通过一个具有用户友好界面的独立Web应用程序展示了这些功能。摘要：Knowledge Graph (KG) alignment aims at finding equivalent entities and relations (i.e., mappings) between two KGs. The existing approaches utilize either reasoning-based or semantic embedding-based techniques, but few studies explore their combination. In this demonstration, we present PRASEMap, an unsupervised KG alignment system that iteratively computes the Mappings with both Probabilistic Reasoning (PR) And Semantic Embedding (SE) techniques. PRASEMap can support various embedding-based KG alignment approaches as the SE module, and enables easy human computer interaction that additionally provides an option for users to feed the mapping annotations back to the system for better results. The demonstration showcases these features via a stand-alone Web application with user friendly interfaces.

【15】 Unsupervised Person Re-identification via Multi-Label Prediction and Classification based on Graph-Structural Insight 标题：基于图结构洞察力的多标签预测和分类的无监督人员再识别

作者：Jongmin Yu,Hyeontaek Oh 备注：submitted to ICCV 链接：https://arxiv.org/abs/2106.08798 摘要：提出了一种基于图结构洞察的多标签预测和分类的无监督人员再识别方法。我们的方法从人物图像中提取特征，并生成一个由特征和它们的成对相似性组成的图，分别作为节点和边。在图的基础上，提出了一种基于图结构的多标签预测（GSMLP）方法，该方法通过考虑节点间的成对相似性和相邻节点的分布来预测多标签。将GSMLP生成的多标签应用于所提出的选择性多标签分类（SMLC）算法。SMLC集成了硬样本挖掘方案和多标签分类。提出的GSMLP和SMLC算法在没有任何预标记数据集的情况下提高了无监督人员识别的性能。实验结果证明了该方法在无监督人员识别中的优越性。本文的源代码可在'https://github.com/uknownpioneer/GSMLP-SMLC.git'. 摘要：This paper addresses unsupervised person re-identification (Re-ID) using multi-label prediction and classification based on graph-structural insight. Our method extracts features from person images and produces a graph that consists of the features and a pairwise similarity of them as nodes and edges, respectively. Based on the graph, the proposed graph structure based multi-label prediction (GSMLP) method predicts multi-labels by considering the pairwise similarity and the adjacency node distribution of each node. The multi-labels created by GSMLP are applied to the proposed selective multi-label classification (SMLC) loss. SMLC integrates a hard-sample mining scheme and a multi-label classification. The proposed GSMLP and SMLC boost the performance of unsupervised person Re-ID without any pre-labelled dataset. Experimental results justify the superiority of the proposed method in unsupervised person Re-ID by producing state-of-the-art performance. The source code for this paper is publicly available on 'https://github.com/uknownpioneer/GSMLP-SMLC.git'.

【16】 Optical Tactile Sim-to-Real Policy Transfer via Real-to-Sim Tactile Image Translation 标题：通过真实到模拟触觉图像转换的光学触觉模拟到真实策略传输

作者：Alex Church,John Lloyd,Raia Hadsell,Nathan F. Lepora 链接：https://arxiv.org/abs/2106.08796 摘要：为了安全有效地从视觉和本体感知输入中获取通用和复杂的控制策略，仿真成为深度强化学习的关键。触觉信息虽然与环境相互作用有直接关系，但通常不被考虑。在这项工作中，我们提出了一套针对触觉机器人和强化学习的模拟环境。提出了一种简单、快速的光学触觉传感器模拟方法，将高分辨率的接触几何图形表示为深度图像。最近策略优化（PPO）用于学习所有考虑任务的成功策略。数据驱动的方法可以将真实触觉传感器的当前状态转换为相应的模拟深度图像。该策略在一个物理机器人的实时控制回路中实现，以演示在多个需要触觉的物理交互任务上的零拍sim-to-real策略转换。摘要：Simulation has recently become key for deep reinforcement learning to safely and efficiently acquire general and complex control policies from visual and proprioceptive inputs. Tactile information is not usually considered despite its direct relation to environment interaction. In this work, we present a suite of simulated environments tailored towards tactile robotics and reinforcement learning. A simple and fast method of simulating optical tactile sensors is provided, where high-resolution contact geometry is represented as depth images. Proximal Policy Optimisation (PPO) is used to learn successful policies across all considered tasks. A data-driven approach enables translation of the current state of a real tactile sensor to corresponding simulated depth images. This policy is implemented within a real-time control loop on a physical robot to demonstrate zero-shot sim-to-real policy transfer on several physically-interactive tasks requiring a sense of touch.

【17】 SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model 标题：SEOVER：基于语句级情感倾向向量的会话情感识别模型

作者：Zaijing Li,Fengxiao Tang,Tieyu Sun,Yusen Zhu,Ming Zhao 链接：https://arxiv.org/abs/2106.08785 摘要：在会话情感识别方面，近年来的研究主要集中在说话人关系的建模上，而忽略了话语情感倾向的作用，本文提出了一种新的句子级情感倾向向量表达范式来模拟句子向量间情感的潜在相关性。在此基础上，我们设计了一个情感识别模型，该模型从语言模型中提取句子级情感定向向量，并从对话情感分析模型中联合学习，提取句子级情感定向向量来识别说话人在会话中的情感定向。我们在两个基准数据集上进行了实验，并与五个基准模型进行了比较，实验结果表明我们的模型在所有的数据集上都有较好的性能。摘要：For the task of conversation emotion recognition, recent works focus on speaker relationship modeling but ignore the role of utterance's emotional tendency.In this paper, we propose a new expression paradigm of sentence-level emotion orientation vector to model the potential correlation of emotions between sentence vectors. Based on it, we design an emotion recognition model, which extracts the sentence-level emotion orientation vectors from the language model and jointly learns from the dialogue sentiment analysis model and extracted sentence-level emotion orientation vectors to identify the speaker's emotional orientation during the conversation. We conduct experiments on two benchmark datasets and compare them with the five baseline models.The experimental results show that our model has better performance on all data sets.

【18】 Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism? 标题：马尔可夫强盗的强化学习：后验抽样比乐观主义更具可扩展性吗？

作者：Nicolas Gast,Bruno Gaujal,Kimang Khun 链接：https://arxiv.org/abs/2106.08771 摘要：研究了具有折扣的经典马氏强盗问题的学习算法。我们解释了如何调整PSRL[24]和UCRL2[2]来开发问题结构。这些变体称为MB-PSRL和MB-UCRL2。虽然PSRL和UCRL2的普通实现的遗憾边界和运行时是强盗数量的指数，但是我们表明MB-PSRL和MB-UCRL2的情节遗憾是Õ(S$\sqrt$nK），其中K是事件数，n是盗贼数，S是每个盗贼的状态数（S，n和K的确切界限在本文中给出）。当因子$\sqrt$S时，这与我们在本文中推导的$\Omega$（$\sqrt$SnK）的下界相匹配。MB-PSRL的计算效率也很高：它的运行时间与盗贼的数量成线性关系。我们进一步证明，这种线性运行时不能通过将经典的非贝叶斯算法（如UCRL2或UCBVI）应用于马尔可夫bandit问题来实现。最后，我们进行了数值实验，证实了MB-PSRL算法在计算时间和遗憾方面都优于其他现有算法。摘要：We study learning algorithms for the classical Markovian bandit problem with discount. We explain how to adapt PSRL [24] and UCRL2 [2] to exploit the problem structure. These variants are called MB-PSRL and MB-UCRL2. While the regret bound and runtime of vanilla implementations of PSRL and UCRL2 are exponential in the number of bandits, we show that the episodic regret of MB-PSRL and MB-UCRL2 isÕ(S $\sqrt$ nK) where K is the number of episodes, n is the number of bandits and S is the number of states of each bandit (the exact bound in S, n and K is given in the paper). Up to a factor $\sqrt$ S, this matches the lower bound of $\Omega$($\sqrt$ SnK) that we also derive in the paper. MB-PSRL is also computationally efficient: its runtime is linear in the number of bandits. We further show that this linear runtime cannot be achieved by adapting classical non-Bayesian algorithms such as UCRL2 or UCBVI to Markovian bandit problems. Finally, we perform numerical experiments that confirm that MB-PSRL outperforms other existing algorithms in practice, both in terms of regret and of computation time.

【19】 Knowledge-Adaptation Priors 标题：知识适应优先

作者：Mohammad Emtiyaz Khan,Siddharth Swaroop 链接：https://arxiv.org/abs/2106.08769 摘要：人类和动物有一种快速适应环境的自然能力，但机器学习模型在受到变化影响时，往往需要从头开始进行全面的再训练。我们提出知识适应先验（K-priors）来降低再训练的成本，通过快速准确地适应各种各样的任务和模型。这是通过结合权值和函数空间先验来重建过去的梯度而实现的，它恢复和概括了许多现有的，但似乎不相关的适应策略。用简单的一阶梯度方法进行训练，通常可以通过选择足够大的内存来恢复精确的再训练模型到任意精度。实证结果证实，适应可以是廉价和准确的，是一个有希望的替代再训练。摘要：Humans and animals have a natural ability to quickly adapt to their surroundings, but machine-learning models, when subjected to changes, often require a complete retraining from scratch. We present Knowledge-adaptation priors (K-priors) to reduce the cost of retraining by enabling quick and accurate adaptation for a wide-variety of tasks and models. This is made possible by a combination of weight and function-space priors to reconstruct the gradients of the past, which recovers and generalizes many existing, but seemingly-unrelated, adaptation strategies. Training with simple first-order gradient methods can often recover the exact retrained model to an arbitrary accuracy by choosing a sufficiently large memory of the past data. Empirical results confirm that the adaptation can be cheap and accurate, and a promising alternative to retraining.

【20】 Towards Optimally Weighted Physics-Informed Neural Networks in Ocean Modelling 标题：海洋模拟中的最优加权物理信息神经网络

作者：Taco de Wolff,Hugo Carrillo,Luis Mart{Ã},Nayat Sanchez-Pi 链接：https://arxiv.org/abs/2106.08747 摘要：世界海洋的碳泵在地球生物圈和气候中起着至关重要的作用，促使人们更好地了解海洋在气候变化分析中的作用和影响。开发能够捕捉洋流和温度流复杂性的模型需要最先进的技术。这项工作探讨了利用物理信息神经网络（PINNs）求解与海洋建模有关的偏微分方程的好处；例如Burgers，wave和对流扩散方程。我们探讨在求解偏微分方程时，在PINNs中使用数据与物理模型的权衡。PINNs解释了对物理定律的偏离，以提高学习和泛化能力。我们观察了损失函数中数据和物理模型之间的相对权重如何影响训练结果，其中小数据集从添加的物理信息中受益更多。摘要：The carbon pump of the world's ocean plays a vital role in the biosphere and climate of the earth, urging improved understanding of the functions and influences of the ocean for climate change analyses. State-of-the-art techniques are required to develop models that can capture the complexity of ocean currents and temperature flows. This work explores the benefits of using physics-informed neural networks (PINNs) for solving partial differential equations related to ocean modeling; such as the Burgers, wave, and advection-diffusion equations. We explore the trade-offs of using data vs. physical models in PINNs for solving partial differential equations. PINNs account for the deviation from physical laws in order to improve learning and generalization. We observed how the relative weight between the data and physical model in the loss function influence training results, where small data sets benefit more from the added physics information.

【21】 Real-time Attacks Against Deep Reinforcement Learning Policies 标题：针对深度强化学习策略的实时攻击

作者：Buse G. A. Tekgul,Shelly Wang,Samuel Marchal,N. Asokan 备注：10 pages, 3 figures 链接：https://arxiv.org/abs/2106.08746 摘要：最近的研究发现，深度强化学习（DRL）策略容易受到对抗性例子的攻击。这些攻击通过扰乱代理所观察到的环境状态来误导DRL代理的策略。它们在原则上是可行的，但太慢，无法实时愚弄DRL策略。我们提出了一种新的攻击愚弄DRL策略，它既有效又高效，可以实时挂载。我们利用普遍对抗摄动（UAP）方法来计算有效摄动，而不依赖于它们所应用的单个输入。通过使用Atari 2600对策的广泛评估，我们证明了我们的技术是有效的，因为它完全降低了确定性和随机策略的性能（高达100%，即使扰动的$l\infty$界小到0.005）。我们还证明了我们的攻击是有效的，平均在线计算开销为0.027ms。与具有不同DRL策略的代理的响应时间（平均0.6毫秒）相比，它更快，并且比以前的攻击（平均2.7毫秒）快得多。此外，我们证明了已知的防御对普遍扰动是无效的。我们提出了一种有效的检测技术，它可以为基于普遍扰动的攻击的鲁棒防御奠定基础。摘要：Recent work has discovered that deep reinforcement learning (DRL) policies are vulnerable to adversarial examples. These attacks mislead the policy of DRL agents by perturbing the state of the environment observed by agents. They are feasible in principle but too slow to fool DRL policies in real time. We propose a new attack to fool DRL policies that is both effective and efficient enough to be mounted in real time. We utilize the Universal Adversarial Perturbation (UAP) method to compute effective perturbations independent of the individual inputs to which they are applied. Via an extensive evaluation using Atari 2600 games, we show that our technique is effective, as it fully degrades the performance of both deterministic and stochastic policies (up to 100%, even when the $l_\infty$ bound on the perturbation is as small as 0.005). We also show that our attack is efficient, incurring an online computational cost of 0.027ms on average. It is faster compared to the response time (0.6ms on average) of agents with different DRL policies, and considerably faster than prior attacks (2.7ms on average). Furthermore, we demonstrate that known defenses are ineffective against universal perturbations. We propose an effective detection technique which can form the basis for robust defenses against attacks based on universal perturbations.

【22】 AMA-GCN: Adaptive Multi-layer Aggregation Graph Convolutional Network for Disease Prediction 标题：AMA-GCN：用于疾病预测的自适应多层聚集图卷积网络

作者：Hao Chen,Fuzhen Zhuang,Li Xiao,Ling Ma,Haiyan Liu,Ruifang Zhang,Huiqin Jiang,Qing He 链接：https://arxiv.org/abs/2106.08732 摘要：近年来，图卷积网络（GCNs）已被证明是计算机辅助诊断（CADx）的有力手段。这种方法需要构建一个总体图来聚集结构信息，其中图的邻接矩阵表示节点之间的关系。到目前为止，这种邻接矩阵通常是基于表型信息手工定义的。本文提出了一种编码器，该编码器根据表型测度的空间分布自动选择合适的表型测度，并利用文本相似度感知机制计算节点间的边缘权值。编码器可以利用对最终结果有积极影响的表型测度自动构建群体图，进一步实现多模态信息的融合。此外，提出了一种基于多层聚合机制的图卷积网络结构。该结构可以在抑制过平滑的同时获得深层结构信息，增加同类型节点之间的相似度。在两个数据库上的实验结果表明，该方法能显著提高孤独症谱系障碍和乳腺癌的诊断准确率，表明该方法在利用多模态数据进行疾病预测方面具有普遍性。摘要：Recently, Graph Convolutional Networks (GCNs) have proven to be a powerful mean for Computer Aided Diagnosis (CADx). This approach requires building a population graph to aggregate structural information, where the graph adjacency matrix represents the relationship between nodes. Until now, this adjacency matrix is usually defined manually based on phenotypic information. In this paper, we propose an encoder that automatically selects the appropriate phenotypic measures according to their spatial distribution, and uses the text similarity awareness mechanism to calculate the edge weights between nodes. The encoder can automatically construct the population graph using phenotypic measures which have a positive impact on the final results, and further realizes the fusion of multimodal information. In addition, a novel graph convolution network architecture using multi-layer aggregation mechanism is proposed. The structure can obtain deep structure information while suppressing over-smooth, and increase the similarity between the same type of nodes. Experimental results on two databases show that our method can significantly improve the diagnostic accuracy for Autism spectrum disorder and breast cancer, indicating its universality in leveraging multimodal data for disease prediction.

【23】 Mobile Augmented Reality: User Interfaces, Frameworks, and Intelligence 标题：移动增强现实：用户界面、框架和智能

作者：Jacky Cao,Kit-Yung Lam,Lik-Hang Lee,Xiaoli Liu,Pan Hui,Xiang Su 备注：This work is currently under review in an international journal 链接：https://arxiv.org/abs/2106.08710 摘要：移动增强现实（MAR）将计算机生成的虚拟对象与移动设备的物理环境相结合。MAR系统使用户能够与MAR设备（如智能手机和头戴式可穿戴设备）进行交互，实现从物理世界到数字实体混合世界的无缝过渡。这些MAR系统通过使用MAR设备提供对数字内容的通用访问来支持用户体验。在过去的20年中，已经开发了许多MAR系统，然而，MAR框架的研究和设计还没有从以用户为中心的设计角度进行系统的回顾。本文介绍了对现有MAR框架的初步研究（计数：37），并通过自顶向下的方法进一步讨论了MAR的最新研究：1）MAR应用；2）适应用户移动和环境的MAR可视化技术；3）系统评估MAR框架，包括支持的平台和相应的功能，如跟踪、特征提取和感知能力；支持MAR系统中智能操作的底层机器学习方法。最后，我们总结了新兴研究领域的发展，当前的最新进展，并讨论了重要的开放挑战和可能的理论和技术方向。这项调查旨在使研究人员和MAR系统开发人员都受益。摘要：Mobile Augmented Reality (MAR) integrates computer-generated virtual objects with physical environments for mobile devices. MAR systems enable users to interact with MAR devices, such as smartphones and head-worn wearables, and performs seamless transitions from the physical world to a mixed world with digital entities. These MAR systems support user experiences by using MAR devices to provide universal accessibility to digital contents. Over the past 20 years, a number of MAR systems have been developed, however, the studies and design of MAR frameworks have not yet been systematically reviewed from the perspective of user-centric design. This article presents the first effort of surveying existing MAR frameworks (count: 37) and further discusses the latest studies on MAR through a top-down approach: 1) MAR applications; 2) MAR visualisation techniques adaptive to user mobility and contexts; 3) systematic evaluation of MAR frameworks including supported platforms and corresponding features such as tracking, feature extraction plus sensing capabilities; and 4) underlying machine learning approaches supporting intelligent operations within MAR systems. Finally, we summarise the development of emerging research fields, current state-of-the-art, and discuss the important open challenges and possible theoretical and technical directions. This survey aims to benefit both researchers and MAR system developers alike.

【24】 Detecting message modification attacks on the CAN bus with Temporal Convolutional Networks 标题：利用时态卷积网络检测CAN总线上的消息修改攻击

作者：Irina Chiscop,AndrÃ¡s Gazdag,Joost Bosman,Gergely BiczÃ³k 链接：https://arxiv.org/abs/2106.08692 摘要：多次攻击表明车内网络存在可利用的漏洞。确保现代汽车控制器局域网（CAN）的安全已成为汽车制造商的一项必要任务。一些攻击可能会向CAN网络中注入大量的假消息；然而，这种攻击相对容易被发现。在更复杂的攻击中，原始消息会被修改，使检测成为更复杂的问题。本文提出了一种基于机器学习的CAN网络入侵检测方法。我们专注于检测消息修改攻击，它不会改变通信的定时模式。我们提出的基于时间卷积网络的解决方案可以学习can信号的正常行为，并将其与恶意信号区分开来。该方法在两个包含不同攻击类型的公共数据集中的多个CAN总线消息id上进行了评估。性能结果表明，我们的轻量级方法优于最先进的无监督学习方法，在广泛的场景中实现了相似或更好的准确性，假阳性率显著降低。摘要：Multiple attacks have shown that in-vehicle networks have vulnerabilities which can be exploited. Securing the Controller Area Network (CAN) for modern vehicles has become a necessary task for car manufacturers. Some attacks inject potentially large amount of fake messages into the CAN network; however, such attacks are relatively easy to detect. In more sophisticated attacks, the original messages are modified, making the de- tection a more complex problem. In this paper, we present a novel machine learning based intrusion detection method for CAN networks. We focus on detecting message modification attacks, which do not change the timing patterns of communications. Our proposed temporal convolutional network-based solution can learn the normal behavior of CAN signals and differentiate them from malicious ones. The method is evaluated on multiple CAN-bus message IDs from two public datasets including different types of attacks. Performance results show that our lightweight approach compares favorably to the state-of-the-art unsupervised learning approach, achieving similar or better accuracy for a wide range of scenarios with a significantly lower false positive rate.

【25】 Evaluating Gender Bias in Hindi-English Machine Translation 标题：评价印英机器翻译中的性别偏见

作者：Gauri Gupta,Krithika Ramesh,Sanjay Singh 链接：https://arxiv.org/abs/2106.08680 摘要：随着语言模型在现实世界中的应用越来越广泛，解决语言模型输出的公平性问题显得尤为重要。这些语言模型的嵌入词表示往往隐含着不必要的联想，在模型中形成社会偏见。像印地语这样的性别化语言的性质，对量化和减少偏见提出了另一个问题，因为句子中的单词形式根据主题的性别而发生变化。此外，在印度语的测量和借记系统领域中所做的工作很少。在我们的工作中，我们试图评估和量化印地语-英语机器翻译系统中的性别偏见。基于印地语的语法考虑，我们实现了现有TGBI度量的一个修改版本。我们还比较和对比了预先训练的嵌入和机器翻译模型学习的嵌入的多个度量的偏差测量结果。摘要：With language models being deployed increasingly in the real world, it is essential to address the issue of the fairness of their outputs. The word embedding representations of these language models often implicitly draw unwanted associations that form a social bias within the model. The nature of gendered languages like Hindi, poses an additional problem to the quantification and mitigation of bias, owing to the change in the form of the words in the sentence, based on the gender of the subject. Additionally, there is sparse work done in the realm of measuring and debiasing systems for Indic languages. In our work, we attempt to evaluate and quantify the gender bias within a Hindi-English machine translation system. We implement a modified version of the existing TGBI metric based on the grammatical considerations for Hindi. We also compare and contrast the resulting bias measurements across multiple metrics for pre-trained embeddings and the ones learned by our machine translation model.

【26】 The Difficulty of Novelty Detection in Open-World Physical Domains: An Application to Angry Birds 标题：开放世界物理领域中新颖性检测的难度--以“愤怒的小鸟”为例

作者：Vimukthini Pinto,Cheng Xue,Chathura Nagoda Gamage,Jochen Renz 链接：https://arxiv.org/abs/2106.08670 摘要：在开放的世界环境中发现和应对新情况是人类认知的一项关键能力。当前的人工智能（AI）研究人员致力于开发能够在开放世界环境中运行的系统。新颖性检测是人工智能系统的一项重要功能。在一个开放的世界里，新奇事物以各种形式出现，发现它们的难度也各不相同。因此，为了准确评估人工智能系统的检测能力，有必要研究新发现的难度。在这篇论文中，我们提出了一种基于定性物理的方法来量化开放世界物理领域的新颖性检测的难度。我们在一个流行的物理模拟游戏《愤怒的小鸟》中应用了我们的方法。我们在《愤怒的小鸟》中用不同新奇感的人类玩家做了一个实验来验证我们的方法。结果表明，计算出的难度值与人体运动员的检测难度相吻合。摘要：Detecting and responding to novel situations in open-world environments is a key capability of human cognition. Current artificial intelligence (AI) researchers strive to develop systems that can perform in open-world environments. Novelty detection is an important ability of such AI systems. In an open-world, novelties appear in various forms and the difficulty to detect them varies. Therefore, to accurately evaluate the detection capability of AI systems, it is necessary to investigate the difficulty to detect novelties. In this paper, we propose a qualitative physics-based method to quantify the difficulty of novelty detection focusing on open-world physical domains. We apply our method in a popular physics simulation game, Angry Birds. We conduct an experiment with human players with different novelties in Angry Birds to validate our method. Results indicate that the calculated difficulty values are in line with the detection difficulty of the human players.

【27】 A Dataset-Level Geometric Framework for Ensemble Classifiers 标题：一种面向集成分类器的数据集级几何框架

作者：Shengli Wu,Weimin Ding 备注：number of pages: 32 number of figures: 2 链接：https://arxiv.org/abs/2106.08658 摘要：集成分类器在人工智能和机器学习领域已经得到了广泛的研究。多数投票和加权多数投票是集成学习中两种常用的组合方案。然而，对它们的理解充其量是不完整的，有些性质甚至被误解。本文在数据集层次的几何框架下，形式化地给出了这两种方案的一组性质。两个关键因素，每一个组件基分类器的性能和每一对组件分类器之间的不相似性是由相同的度量-欧氏距离来评估的。因此，集成成为一个确定性问题，集成的性能可以通过一个公式直接计算。我们证明了几个有趣的定理，并解释了它们对集合的含义。特别地，我们比较和比较了组件分类器数目对这两种集成方案的影响。在使用其他指标（如准确度）时，也进行了实证研究以验证理论结果。我们相信，本文的结果对于我们理解这两种组合方案的基本性质以及总体上集成分类器的原理是非常有用的。这些结果也有助于我们研究集成分类器中的一些问题，如集成性能预测、选择少量的基分类器以获得高效的集成。摘要：Ensemble classifiers have been investigated by many in the artificial intelligence and machine learning community. Majority voting and weighted majority voting are two commonly used combination schemes in ensemble learning. However, understanding of them is incomplete at best, with some properties even misunderstood. In this paper, we present a group of properties of these two schemes formally under a dataset-level geometric framework. Two key factors, every component base classifier's performance and dissimilarity between each pair of component classifiers are evaluated by the same metric - the Euclidean distance. Consequently, ensembling becomes a deterministic problem and the performance of an ensemble can be calculated directly by a formula. We prove several theorems of interest and explain their implications for ensembles. In particular, we compare and contrast the effect of the number of component classifiers on these two types of ensemble schemes. Empirical investigation is also conducted to verify the theoretical results when other metrics such as accuracy are used. We believe that the results from this paper are very useful for us to understand the fundamental properties of these two combination schemes and the principles of ensemble classifiers in general. The results are also helpful for us to investigate some issues in ensemble classifiers, such as ensemble performance prediction, selecting a small number of base classifiers to obtain efficient and effective ensembles.

【28】 Maxmin-Fair Ranking: Individual Fairness under Group-Fairness Constraints 标题：最大最小公平排序：群体公平约束下的个人公平

作者：David Garcia-Soriano,Francesco Bonchi 备注：In proceedings of KDD 2021 链接：https://arxiv.org/abs/2106.08652 摘要：我们研究了一个新的排序公平性问题，目的是最小化在执行组公平性约束时引入的个体不公平性。我们的建议是植根于分配最大最小公平理论，它使用随机化来最大化最差个人的期望满意度。我们设计了一个精确的多项式时间算法来寻找一般搜索问题（包括但不限于排名）的最大最小公平分布，并证明了我们的算法可以产生排名，在满足给定的群公平约束的同时，确保给个体带来最大可能的值。摘要：We study a novel problem of fairness in ranking aimed at minimizing the amount of individual unfairness introduced when enforcing group-fairness constraints. Our proposal is rooted in the distributional maxmin fairness theory, which uses randomization to maximize the expected satisfaction of the worst-off individuals. We devise an exact polynomial-time algorithm to find maxmin-fair distributions of general search problems (including, but not limited to, ranking), and show that our algorithm can produce rankings which, while satisfying the given group-fairness constraints, ensure that the maximum possible value is brought to individuals.

【29】 Topic Classification on Spoken Documents Using Deep Acoustic and Linguistic Features 标题：基于深度声学和语言特征的口语文档主题分类

作者：Tan Liu,Wu Guo,Bin Gu 链接：https://arxiv.org/abs/2106.08637 摘要：基于语音文档的主题分类系统通常由两个模块组成：一个用于将语音转换为文本的自动语音识别（ASR）模块和一个用于从解码文本中预测主题类的文本主题分类（TTC）模块。本文将深层声学特征和语言学特征相结合，代替传统的ASR文本，对口语文档进行主题分类。更具体地说，首先训练以音素为输出单元的基于CTC的声学模型（AM），将训练后的AM中线性音素分类器前一层的输出作为语音文档的深层声学特征。此外，这些深层声学特征被馈送到一个音素到单词（P2W）模块以获得深层语言特征。最后，本文提出了一个局部多头部注意模块，将这两类深层特征融合到主题分类中。实验结果表明，该框架优于传统的ASR+TTC系统，ACC提高了3.13%。摘要：Topic classification systems on spoken documents usually consist of two modules: an automatic speech recognition (ASR) module to convert speech into text and a text topic classification (TTC) module to predict the topic class from the decoded text. In this paper, instead of using the ASR transcripts, the fusion of deep acoustic and linguistic features is used for topic classification on spoken documents. More specifically, a conventional CTC-based acoustic model (AM) using phonemes as output units is first trained, and the outputs of the layer before the linear phoneme classifier in the trained AM are used as the deep acoustic features of spoken documents. Furthermore, these deep acoustic features are fed to a phoneme-to-word (P2W) module to obtain deep linguistic features. Finally, a local multi-head attention module is proposed to fuse these two types of deep features for topic classification. Experiments conducted on a subset selected from Switchboard corpus show that our proposed framework outperforms the conventional ASR+TTC systems and achieves a 3.13% improvement in ACC.

【30】 Structured DropConnect for Uncertainty Inference in Image Classification 标题：用于图像分类不确定性推理的结构化DropConnect

作者：Wenqing Zheng,Jiyang Xie,Weidong Liu,Zhanyu Ma 备注：5 pages,1 figures 链接：https://arxiv.org/abs/2106.08624 摘要：随着网络结构的复杂性，不确定性推理成为提高人工智能系统分类精度的重要任务。对于图像分类任务，我们提出了一个结构化DropConnect（SDC）框架，通过Dirichlet分布对深度神经网络的输出进行建模。在训练过程中，我们在完全连接的层中引入了一种权重的DropConnect策略。在测试中，我们将网络分割成若干个子网络，然后通过将其矩与这些子网络输出的均值和方差相匹配来模拟Dirichlet分布。最后利用估计的Dirichlet分布的熵进行不确定性推理。该框架在LeNet$5$和VGG$16$模型上实现，用于MNIST和CIFAR$10$数据集的误分类检测和分布外检测。实验结果表明，该方法的性能可以与其他不确定性推理方法相媲美。此外，SDC还可以很好地适应不同的网络结构，具有一定的泛化能力和研究前景。摘要：With the complexity of the network structure, uncertainty inference has become an important task to improve the classification accuracy for artificial intelligence systems. For image classification tasks, we propose a structured DropConnect (SDC) framework to model the output of a deep neural network by a Dirichlet distribution. We introduce a DropConnect strategy on weights in the fully connected layers during training. In test, we split the network into several sub-networks, and then model the Dirichlet distribution by match its moments with the mean and variance of the outputs of these sub-networks. The entropy of the estimated Dirichlet distribution is finally utilized for uncertainty inference. In this paper, this framework is implemented on LeNet$5$ and VGG$16$ models for misclassification detection and out-of-distribution detection on MNIST and CIFAR-$10$ datasets. Experimental results show that the performance of the proposed SDC can be comparable to other uncertainty inference methods. Furthermore, the SDC is adapted well to different network structures with certain generalization capabilities and research prospects.

【31】 PatchNet: Unsupervised Object Discovery based on Patch Embedding 标题：PatchNet：基于补丁嵌入的无监督对象发现

作者：Hankyu Moon,Heng Hao,Sima Didari,Jae Oh Woo,Patrick Bangert 链接：https://arxiv.org/abs/2106.08599 摘要：我们证明了通过自监督从少量图像（100到200）中训练随机抽样的面片可以发现频繁出现的物体。这种方法的关键是模式空间，一种表示给定图像数据的所有可能子图像的潜在模式空间。模式空间中的距离结构反映了频繁对象导致的模式共生现象。模式空间嵌入是通过最小化随机生成的相邻面片之间的对比损失来学习的。为了防止嵌入对背景的学习，我们采用基于颜色的目标显著性和背景相异性来调节对比度损失。学习的距离结构作为目标记忆，通过对随机抽取的样本样本进行模式向量聚类，简单地发现频繁目标。基于图像块的图像表示方法很自然地处理了位置和尺度不变性，这对多目标发现至关重要。该方法已经被证明非常有效，并成功地应用于从自然图像中发现多个人脸和人体。摘要：We demonstrate that frequently appearing objects can be discovered by training randomly sampled patches from a small number of images (100 to 200) by self-supervision. Key to this approach is the pattern space, a latent space of patterns that represents all possible sub-images of the given image data. The distance structure in the pattern space captures the co-occurrence of patterns due to the frequent objects. The pattern space embedding is learned by minimizing the contrastive loss between randomly generated adjacent patches. To prevent the embedding from learning the background, we modulate the contrastive loss by color-based object saliency and background dissimilarity. The learned distance structure serves as object memory, and the frequent objects are simply discovered by clustering the pattern vectors from the random patches sampled for inference. Our image representation based on image patches naturally handles the position and scale invariance property that is crucial to multi-object discovery. The method has been proven surprisingly effective, and successfully applied to finding multiple human faces and bodies from natural images.

【32】 Unsupervised Lexical Acquisition of Relative Spatial Concepts Using Spoken User Utterances 标题：使用口语用户话语的相对空间概念的无监督词汇获取

作者：Rikunari Sagara,Ryo Taguchi,Akira Taniguchi,Tadahiro Taniguchi,Koosuke Hattori,Masahiro Hoguro,Taizo Umezaki 备注：27 pages, 12 figures, submitted to Advanced Robotics 链接：https://arxiv.org/abs/2106.08574 摘要：本文提出了一种基于用户口语的无监督空间概念词汇习得方法。具有灵活的语音对话系统的机器人必须能够像儿童一样通过与人类的互动获得特定于环境的语言表达及其意义。具体来说，相对空间概念（如前、右）在我们的日常生活中有着广泛的应用，但是当机器人学习相对空间概念时，哪个对象是参考对象并不明显。因此，我们提出了一种方法，让机器人在没有文字先验知识的情况下学习相关的空间概念。该方法采用概率模型来估计适当的参考对象和同时表示概念的分布。实验结果表明，在机器人不知道哪个被定位对象是参考对象的情况下，可以学习相对的空间概念和表示每个概念的音素序列。此外，我们还证明了该方法中的两个过程提高了概念的估计精度：通过类n-gram生成候选词序列和利用位置信息选择词序列。此外，我们还发现，即使候选参考对象的数量增加，参考对象的线索也能提高准确性。摘要：This paper proposes methods for unsupervised lexical acquisition for relative spatial concepts using spoken user utterances. A robot with a flexible spoken dialog system must be able to acquire linguistic representation and its meaning specific to an environment through interactions with humans as children do. Specifically, relative spatial concepts (e.g., front and right) are widely used in our daily lives, however, it is not obvious which object is a reference object when a robot learns relative spatial concepts. Therefore, we propose methods by which a robot without prior knowledge of words can learn relative spatial concepts. The methods are formulated using a probabilistic model to estimate the proper reference objects and distributions representing concepts simultaneously. The experimental results show that relative spatial concepts and a phoneme sequence representing each concept can be learned under the condition that the robot does not know which located object is the reference object. Additionally, we show that two processes in the proposed method improve the estimation accuracy of the concepts: generating candidate word sequences by class n-gram and selecting word sequences using location information. Furthermore, we show that clues to reference objects improve accuracy even though the number of candidate reference objects increases.

【33】 TSO: Curriculum Generation using continuous optimization 标题：TSO：基于连续优化的课程生成

作者：Dipankar Sarkar,Mukur Gupta 备注：10 pages, along with all experiment details 链接：https://arxiv.org/abs/2106.08569 摘要：深度学习模型的训练带来了巨大的挑战，包括参数调整和训练数据排序。在课程学习中，为了优化训练数据的顺序，人们进行了大量的研究。最近的工作主要集中在使用复杂的强化学习技术来寻找最佳的数据排序策略，以最大限度地提高对给定网络的学习。本文提出了一种简单有效的基于连续优化的方法。我们称这种新方法为训练序列优化（TSO）。在我们提出的方法中有三个关键组成部分：（a）编码器网络将训练序列映射/嵌入到连续空间中(b）预测网络使用策略的连续表示作为输入，并预测固定网络结构的精度(c）解码器进一步将策略的连续表示映射到有序训练数据集。性能预测器和编码器使我们能够在连续空间中进行基于梯度的优化，以找到具有潜在更好精度的最优训练数据顺序的嵌入。实验结果表明，与使用CIFAR-100数据集的随机策略相比，我们生成的最优课程策略可以获得2AP，并且比现有的CL算法具有更好的性能。我们通过改变体系结构、数据集和样本大小进行了烧蚀研究，展示了我们的方法的健壮性。摘要：The training of deep learning models poses vast challenges of including parameter tuning and ordering of training data. Significant research has been done in Curriculum learning for optimizing the sequence of training data. Recent works have focused on using complex reinforcement learning techniques to find the optimal data ordering strategy to maximize learning for a given network. In this paper, we present a simple and efficient technique based on continuous optimization. We call this new approach Training Sequence Optimization (TSO). There are three critical components in our proposed approach: (a) An encoder network maps/embeds training sequence into continuous space. (b) A predictor network uses the continuous representation of a strategy as input and predicts the accuracy for fixed network architecture. (c) A decoder further maps a continuous representation of a strategy to the ordered training dataset. The performance predictor and encoder enable us to perform gradient-based optimization in the continuous space to find the embedding of optimal training data ordering with potentially better accuracy. Experiments show that we can gain 2AP with our generated optimal curriculum strategy over the random strategy using the CIFAR-100 dataset and have better boosts than the state of the art CL algorithms. We do an ablation study varying the architecture, dataset and sample sizes showcasing our approach's robustness.

【34】 Adaptive Visibility Graph Neural Network and It's Application in Modulation Classification 标题：自适应可见度图神经网络及其在调制识别中的应用

作者：Qi Xuan,Kunfeng Qiu,Jinchao Zhou,Zhuangzhi Chen,Dongwei Xu,Shilian Zheng,Xiaoniu Yang 链接：https://arxiv.org/abs/2106.08564 摘要：我们的数字世界充满了捕捉许多复杂系统各个方面的时间序列和图形。传统上，处理这两种不同类型的数据有各自的方法，如递归神经网络（RNN）和图神经网络（GNN），而近年来，时间序列可以通过可视图（VG）等技术映射到图上，因此，研究人员可以使用图算法来挖掘时间序列中的知识。这种映射方法在时间序列和图形之间架起了一座桥梁，对分析各种真实的时间序列具有很大的潜力。然而，VG方法及其变种只是基于固定的规则，缺乏灵活性，极大地限制了其在现实中的应用。本文提出了一种自适应可见性图（AVG）算法，该算法能够将时间序列自适应地映射成图，并在此基础上利用GNN模型DiffPool作为分类器，进一步建立了AVGNet的端到端分类框架。然后采用AVGNet进行无线信号调制分类，这是无线通信领域的一个重要课题。仿真结果验证了AVGNet的性能优于一系列先进的深度学习方法，达到了该任务的最新水平。摘要：Our digital world is full of time series and graphs which capture the various aspects of many complex systems. Traditionally, there are respective methods in processing these two different types of data, e.g., Recurrent Neural Network (RNN) and Graph Neural Network (GNN), while in recent years, time series could be mapped to graphs by using the techniques such as Visibility Graph (VG), so that researchers can use graph algorithms to mine the knowledge in time series. Such mapping methods establish a bridge between time series and graphs, and have high potential to facilitate the analysis of various real-world time series. However, the VG method and its variants are just based on fixed rules and thus lack of flexibility, largely limiting their application in reality. In this paper, we propose an Adaptive Visibility Graph (AVG) algorithm that can adaptively map time series into graphs, based on which we further establish an end-to-end classification framework AVGNet, by utilizing GNN model DiffPool as the classifier. We then adopt AVGNet for radio signal modulation classification which is an important task in the field of wireless communication. The simulations validate that AVGNet outperforms a series of advanced deep learning methods, achieving the state-of-the-art performance in this task.

【35】 Coreference-Aware Dialogue Summarization 标题：有共指意识的对话摘要

作者：Zhengyuan Liu,Ke Shi,Nancy F. Chen 备注：accepted for presentation at SIGDIAL-2021 链接：https://arxiv.org/abs/2106.08556 摘要：最近，通过神经方法总结会话的研究越来越受到重视，但要找到切实可行的解决方案仍然是一个挑战。这类挑战的例子包括对话中的非结构化信息交换、演讲者之间的非正式互动以及演讲者随着对话的发展而发生的动态角色变化。许多这样的挑战导致复杂的共指链接。因此，在这项工作中，我们研究了不同的方法来显式地将共指信息整合到神经抽象对话摘要模型中，以应对上述挑战。实验结果表明，所提出的方法达到了最先进的性能，这意味着它是有用的利用共指信息的对话摘要。事实正确性的评估结果表明，这种共指感知模型能够更好地跟踪对话者之间的信息流，并将准确的状态/动作与相应的对话者和人物提及联系起来。摘要：Summarizing conversations via neural approaches has been gaining research traction lately, yet it is still challenging to obtain practical solutions. Examples of such challenges include unstructured information exchange in dialogues, informal interactions between speakers, and dynamic role changes of speakers as the dialogue evolves. Many of such challenges result in complex coreference links. Therefore, in this work, we investigate different approaches to explicitly incorporate coreference information in neural abstractive dialogue summarization models to tackle the aforementioned challenges. Experimental results show that the proposed approaches achieve state-of-the-art performance, implying it is useful to utilize coreference information in dialogue summarization. Evaluation results on factual correctness suggest such coreference-aware models are better at tracing the information flow among interlocutors and associating accurate status/actions with the corresponding interlocutors and person mentions.

【36】 Distilling Self-Knowledge From Contrastive Links to Classify Graph Nodes Without Passing Messages 标题：从对比链接中提取自知识实现不传递消息的图节点分类

作者：Yi Luo,Aiguo Chen,Ke Yan,Ling Tian 备注：9 pages, 2 figures, 4 tables 链接：https://arxiv.org/abs/2106.08541 摘要：目前，基于消息传递范式的图神经网络（GNNs）已成为图形数据学习的主流方法。这种模式中的模型必须花费额外的空间来查找具有邻接矩阵的相邻节点，并花费额外的时间来聚合来自相邻节点的多个消息。为了解决这个问题，我们开发了一种称为LinkDist的方法，该方法将连接节点对中的自知识提取到一个多层感知器（MLP）中，而不需要聚合消息。在8个真实数据集上的实验表明，基于LinkDist的MLP可以在不知道节点邻接的情况下预测节点的标签，但在半监督和全监督节点分类的情况下，其精度与GNNs相当。此外，LinkDist还得益于它的非消息传递范式，即我们可以通过对比的方式从任意采样的节点对中提取自我知识，从而进一步提高LinkDist的性能。摘要：Nowadays, Graph Neural Networks (GNNs) following the Message Passing paradigm become the dominant way to learn on graphic data. Models in this paradigm have to spend extra space to look up adjacent nodes with adjacency matrices and extra time to aggregate multiple messages from adjacent nodes. To address this issue, we develop a method called LinkDist that distils self-knowledge from connected node pairs into a Multi-Layer Perceptron (MLP) without the need to aggregate messages. Experiment with 8 real-world datasets shows the MLP derived from LinkDist can predict the label of a node without knowing its adjacencies but achieve comparable accuracy against GNNs in the contexts of semi- and full-supervised node classification. Moreover, LinkDist benefits from its Non-Message Passing paradigm that we can also distil self-knowledge from arbitrarily sampled node pairs in a contrastive way to further boost the performance of LinkDist.

【37】 ECKPN: Explicit Class Knowledge Propagation Network for Transductive Few-shot Learning 标题：ECKPN：面向传导式Few-Shot学习的显式类知识传播网络

作者：Chaofan Chen,Xiaoshan Yang,Changsheng Xu,Xuhui Huang,Zhe Ma 备注：Accepted by CVPR2021 链接：https://arxiv.org/abs/2106.08523 摘要：近年来，基于transductive图的分类方法在Few-Shot分类中取得了很大的成功。然而，大多数现有的方法忽略了对类级知识的探索，而这些知识很容易被人类从少数几个样本中学习到。针对这一问题，本文提出了一种由比较、压缩和校正模块组成的显式类知识传播网络（ECKPN）。具体来说，我们首先使用比较模块来探索成对样本关系，以学习实例级图中的丰富样本表示。然后对实例级图进行压缩，生成类级图，有助于获取类级的可视化知识，便于对不同类之间的关系进行建模。其次，利用校正模块对类之间的关系进行显式刻画，得到更具区分性的类级知识表示。最后，将类级知识与实例级样本表示相结合，指导查询样本的推理。我们在四个镜头分类基准上进行了大量的实验，实验结果表明，所提出的ECKPN明显优于现有的方法。摘要：Recently, the transductive graph-based methods have achieved great success in the few-shot classification task. However, most existing methods ignore exploring the class-level knowledge that can be easily learned by humans from just a handful of samples. In this paper, we propose an Explicit Class Knowledge Propagation Network (ECKPN), which is composed of the comparison, squeeze and calibration modules, to address this problem. Specifically, we first employ the comparison module to explore the pairwise sample relations to learn rich sample representations in the instance-level graph. Then, we squeeze the instance-level graph to generate the class-level graph, which can help obtain the class-level visual knowledge and facilitate modeling the relations of different classes. Next, the calibration module is adopted to characterize the relations of the classes explicitly to obtain the more discriminative class-level knowledge representations. Finally, we combine the class-level knowledge with the instance-level sample representations to guide the inference of the query samples. We conduct extensive experiments on four few-shot classification benchmarks, and the experimental results show that the proposed ECKPN significantly outperforms the state-of-the-art methods.

【38】 Optimizing Graph Transformer Networks with Graph-based Techniques 标题：基于图的技术优化图形Transformer网络

作者：Loc Hoang,Udit Agarwal,Gurbinder Gill,Roshan Dathathri,Abhik Seal,Brian Martin,Keshav Pingali 链接：https://arxiv.org/abs/2106.08500 摘要：图变换网络（GTN）是图卷积网络（GCN）的一种变体，其目标是异构图，其中节点和边具有相关的类型信息，可用于提高推理精度。gtn在图中学习重要的元路径，为这些元路径创建加权边，并在GCN中使用生成的图。目前，唯一可用的GTNs实现使用密集矩阵乘法来寻找元路径。不幸的是，这种方法的空间开销可能很大，因此实际上它只用于小型图。此外，基于矩阵的实现不够细粒度，无法使用基于随机游走的方法来优化元路径发现。在这篇论文中，我们提出了一个基于图的GTN元路径发现问题的公式和实现。这种基于图的公式比基于矩阵的方法有两个优点。首先，它比最初的GTN实现更节省空间，并且对于实际感兴趣的元路径大小更高效。其次，它允许我们实现一种采样方法，减少必须枚举的元路径的数量，从而允许实现用于更大的图和更大的元路径大小。实验结果表明，在元路径长度为4的情况下，我们的实现平均比原始GTN实现快6.5美元，在不影响GTN精度的情况下，我们的采样实现平均比这个实现快155美元。摘要：Graph transformer networks (GTN) are a variant of graph convolutional networks (GCN) that are targeted to heterogeneous graphs in which nodes and edges have associated type information that can be exploited to improve inference accuracy. GTNs learn important metapaths in the graph, create weighted edges for these metapaths, and use the resulting graph in a GCN. Currently, the only available implementation of GTNs uses dense matrix multiplication to find metapaths. Unfortunately, the space overhead of this approach can be large, so in practice it is used only for small graphs. In addition, the matrix-based implementation is not fine-grained enough to use random-walk based methods to optimize metapath finding. In this paper, we present a graph-based formulation and implementation of the GTN metapath finding problem. This graph-based formulation has two advantages over the matrix-based approach. First, it is more space efficient than the original GTN implementation and more compute-efficient for metapath sizes of practical interest. Second, it permits us to implement a sampling method that reduces the number of metapaths that must be enumerated, allowing the implementation to be used for larger graphs and larger metapath sizes. Experimental results show that our implementation is $6.5\times$ faster than the original GTN implementation on average for a metapath length of 4, and our sampling implementation is $155\times$ faster on average than this implementation without compromising on the accuracy of the GTN.

【39】 ICDAR 2021 Competition on Components Segmentation Task of Document Photos 标题：ICDAR 2021文档照片分量分割任务竞赛

作者：Celso A. M. Lopes Junior,Ricardo B. das Neves Junior,Byron L. D. Bezerra,Alejandro H. Toselli,Donato Impedovo 备注：This paper was accepted for ICDAR 2021 Conference 链接：https://arxiv.org/abs/2106.08499 摘要：本文描述了在第16届国际文献分析与识别会议（icdar2021）背景下进行的文献照片组分分割任务的短期竞赛。本次比赛的目的是聚集在识别文件图像处理领域的研究人员，并为他们提供一个合适的基准，以比较他们在文件图像的组件分割任务的技术。提出了三个挑战性任务，需要在提供的数据集上执行不同的分段分配。收集的数据来自几种类型的巴西身份证文件，这些文件的个人信息被方便地替换。共有16名参与者，他们在部分或全部三项任务中获得的结果显示，所采用的指标的比率不同，比如骰子相似系数在0.06到0.99之间。不同策略的进入者采用不同的深度学习模式，在每项任务中取得最佳效果。结果表明，目前用于解决其中一项任务（文档边界检测）的实用方法已经很好地建立起来。然而，对于另外两个挑战性任务（文本区域和手写符号检测），仍然需要研究和开发更健壮的方法来获得可接受的结果。摘要：This paper describes the short-term competition on Components Segmentation Task of Document Photos that was prepared in the context of the 16th International Conference on Document Analysis and Recognition (ICDAR 2021). This competition aims to bring together researchers working on the filed of identification document image processing and provides them a suitable benchmark to compare their techniques on the component segmentation task of document images. Three challenge tasks were proposed entailing different segmentation assignments to be performed on a provided dataset. The collected data are from several types of Brazilian ID documents, whose personal information was conveniently replaced. There were 16 participants whose results obtained for some or all the three tasks show different rates for the adopted metrics, like Dice Similarity Coefficient ranging from 0.06 to 0.99. Different Deep Learning models were applied by the entrants with diverse strategies to achieve the best results in each of the tasks. Obtained results show that the current applied methods for solving one of the proposed tasks (document boundary detection) are already well stablished. However, for the other two challenge tasks (text zone and handwritten sign detection) research and development of more robust approaches are still required to achieve acceptable results.

【40】 Developing a Fidelity Evaluation Approach for Interpretable Machine Learning 标题：开发一种可解释机器学习的保真度评估方法

作者：Mythreyi Velmurugan,Chun Ouyang,Catarina Moreira,Renuka Sindhgatta 链接：https://arxiv.org/abs/2106.08492 摘要：尽管现代机器学习和深度学习方法允许进行复杂和深入的数据分析，但这些方法生成的预测模型通常非常复杂，并且缺乏透明度。可解释人工智能（XAI）方法用于提高这些复杂模型的可解释性，从而提高透明度。然而，这些可解释方法的内在适用性很难评估。特别是，评估解释对潜在黑匣子的保真度的方法需要进一步发展，特别是对于表格数据。在本文中，我们（a）提出了一个三阶段的方法来开发一个评估方法(b）采用现有的主要针对图像和文本数据的评估方法来评估基于表格数据的模型；用这种评价方法评价两种流行的解释方法。我们的评估表明，潜在预测模型的内部机制、所用可解释方法的内部机制以及模型和数据复杂性都会影响解释保真度。鉴于解释保真度对上下文、使用的工具和数据非常敏感，我们无法清楚地确定任何特定的可解释方法优于另一种方法。摘要：Although modern machine learning and deep learning methods allow for complex and in-depth data analytics, the predictive models generated by these methods are often highly complex, and lack transparency. Explainable AI (XAI) methods are used to improve the interpretability of these complex models, and in doing so improve transparency. However, the inherent fitness of these explainable methods can be hard to evaluate. In particular, methods to evaluate the fidelity of the explanation to the underlying black box require further development, especially for tabular data. In this paper, we (a) propose a three phase approach to developing an evaluation method; (b) adapt an existing evaluation method primarily for image and text data to evaluate models trained on tabular data; and (c) evaluate two popular explainable methods using this evaluation method. Our evaluations suggest that the internal mechanism of the underlying predictive model, the internal mechanism of the explainable method used and model and data complexity all affect explanation fidelity. Given that explanation fidelity is so sensitive to context and tools and data used, we could not clearly identify any specific explainable method as being superior to another.

【41】 Minimizing Communication while Maximizing Performance in Multi-Agent Reinforcement Learning 标题：多智能体强化学习中最小化通信同时最大化性能的研究

作者：Varun Kumar Vijay,Hassam Sheikh,Somdeb Majumdar,Mariano Phielipp 链接：https://arxiv.org/abs/2106.08482 摘要：在需要协调以实现共享目标的多代理任务中，代理间通信可以显著提高性能。已有的研究表明，利用多智能体强化学习和消息传递网络结构来学习智能体间的通信协议是可能的。然而，这些模型使用无约束的广播通信模型，其中一个代理在每一步都与所有其他代理通信，即使任务不需要它。在实际应用中，通信可能受到带宽、功率和网络容量等系统限制，因此可能需要减少发送的消息数量。在这项工作中，我们探索了一种在多任务学习中最小化交流同时最大化性能的简单方法：同时优化任务特定目标和交流惩罚。我们表明，目标可以优化使用加强和Gumbel Softmax重参数化。我们介绍了两种稳定训练的技术：50%训练和消息转发。只有50%的剧集会受到沟通惩罚，这使得我们的模特们无法关闭他们发出的信息。第二，重复以前收到的消息有助于模型保留信息，并进一步提高性能。使用这些技术，我们可以在不损失性能的情况下减少75%的通信量。摘要：Inter-agent communication can significantly increase performance in multi-agent tasks that require co-ordination to achieve a shared goal. Prior work has shown that it is possible to learn inter-agent communication protocols using multi-agent reinforcement learning and message-passing network architectures. However, these models use an unconstrained broadcast communication model, in which an agent communicates with all other agents at every step, even when the task does not require it. In real-world applications, where communication may be limited by system constraints like bandwidth, power and network capacity, one might need to reduce the number of messages that are sent. In this work, we explore a simple method of minimizing communication while maximizing performance in multi-task learning: simultaneously optimizing a task-specific objective and a communication penalty. We show that the objectives can be optimized using Reinforce and the Gumbel-Softmax reparameterization. We introduce two techniques to stabilize training: 50% training and message forwarding. Training with the communication penalty on only 50% of the episodes prevents our models from turning off their outgoing messages. Second, repeating messages received previously helps models retain information, and further improves performance. With these techniques, we show that we can reduce communication by 75% with no loss of performance.

【42】 Enabling AI and Robotic Coaches for Physical Rehabilitation Therapy: Iterative Design and Evaluation with Therapists and Post-Stroke Survivors 标题：为物理康复治疗启用人工智能和机器人教练：与治疗师和中风后幸存者的迭代设计和评估

作者：Min Hun Lee,Daniel P. Siewiorek,Asim Smailagic,Alexandre Bernardino,Sergi Bermúdez i Badia 链接：https://arxiv.org/abs/2106.08458 摘要：人工智能（AI）和机器人教练承诺通过社会互动提高患者对康复训练的参与度。虽然先前的工作探索了人工智能和机器人教练自动监控训练的潜力，但这些系统的部署仍然是一个挑战。以前的工作描述了缺乏利益相关者参与设计这类功能的主要原因之一。在这篇论文中，我们提出了我们的努力，以引出详细的设计规范，说明人工智能和机器人教练如何与患者互动，并以一种有效和可接受的方式指导患者的训练，4名治疗师和5名中风后幸存者。通过反复的问卷调查和访谈，我们发现中风后幸存者和治疗师都意识到人工智能和机器人教练在实现更系统的管理、提高自我效能和康复治疗动机方面的潜在益处。此外，我们的评估还揭示了几个实际问题（例如，认知障碍患者的交互可能存在困难、系统故障等）。我们讨论了利益相关者早期参与的价值，以及补充系统故障的交互技术，同时也支持个性化治疗课程，以便更好地部署人工智能和机器人训练教练。摘要：Artificial intelligence (AI) and robotic coaches promise the improved engagement of patients on rehabilitation exercises through social interaction. While previous work explored the potential of automatically monitoring exercises for AI and robotic coaches, the deployment of these systems remains a challenge. Previous work described the lack of involving stakeholders to design such functionalities as one of the major causes. In this paper, we present our efforts on eliciting the detailed design specifications on how AI and robotic coaches could interact with and guide patient's exercises in an effective and acceptable way with four therapists and five post-stroke survivors. Through iterative questionnaires and interviews, we found that both post-stroke survivors and therapists appreciated the potential benefits of AI and robotic coaches to achieve more systematic management and improve their self-efficacy and motivation on rehabilitation therapy. In addition, our evaluation sheds light on several practical concerns (e.g. a possible difficulty with the interaction for people with cognitive impairment, system failures, etc.). We discuss the value of early involvement of stakeholders and interactive techniques that complement system failures, but also support a personalized therapy session for the better deployment of AI and robotic exercise coaches.

【43】 Faster than LASER -- Towards Stream Reasoning with Deep Neural Networks 标题：比激光更快--用深度神经网络进行流推理

作者：João Ferreira,Diogo Lavado,Ricardo Gonçalves,Matthias Knorr,Ludwig Krippahl,João Leite 备注：Extended version of EPIA 21 paper 链接：https://arxiv.org/abs/2106.08457 摘要：随着物联网、社交网络、智能城市等领域可用数据的不断增加，智能体对这些数据进行实时处理和推理已成为基础。然而，随着时间的推移，利用背景知识对带注释的数据进行推理可能是有挑战性的，由于此类数据的生成量和速度，在代理需要发现潜在问题的场景中，这种复杂的推理是必要的，而这不能用简单的流处理技术来完成。流推理机的目标是在推理和流处理之间架起一座桥梁，而激光就是这样一种流推理机，设计用于分析和执行数据流上的复杂推理。它是基于LARS（一种扩展答案集编程的基于规则的逻辑语言）的，与其他最先进的流式推理系统相比，具有更好的运行结果。然而，对于高水平的数据吞吐量，甚至激光也可能无法及时计算出答案。在本文中，我们研究了卷积和递归神经网络，这两种神经网络已被证明特别适合于时间序列的预测和分类，是否可以训练为近似推理与激光，使代理人可以受益于他们的高处理速度。摘要：With the constant increase of available data in various domains, such as the Internet of Things, Social Networks or Smart Cities, it has become fundamental that agents are able to process and reason with such data in real time. Whereas reasoning over time-annotated data with background knowledge may be challenging, due to the volume and velocity in which such data is being produced, such complex reasoning is necessary in scenarios where agents need to discover potential problems and this cannot be done with simple stream processing techniques. Stream Reasoners aim at bridging this gap between reasoning and stream processing and LASER is such a stream reasoner designed to analyse and perform complex reasoning over streams of data. It is based on LARS, a rule-based logical language extending Answer Set Programming, and it has shown better runtime results than other state-of-the-art stream reasoning systems. Nevertheless, for high levels of data throughput even LASER may be unable to compute answers in a timely fashion. In this paper, we study whether Convolutional and Recurrent Neural Networks, which have shown to be particularly well-suited for time series forecasting and classification, can be trained to approximate reasoning with LASER, so that agents can benefit from their high processing speed.

【44】 Deep Neural Networks for Approximating Stream Reasoning with C-SPARQL 标题：C-SPARQL逼近流推理的深度神经网络

作者：Ricardo Ferreira,Carolina Lopes,Ricardo Gonçalves,Matthias Knorr,Ludwig Krippahl,João Leite 备注：Accepted at the 20th EPIA Conference on Artificial Intelligence, EPIA 2021 链接：https://arxiv.org/abs/2106.08452 摘要：无论是报纸、博客和社交网络，还是监控系统，产生的信息量都在迅速增加。实时处理所有这些数据，同时考虑到有关问题领域的高级知识，这是一项挑战，但在及时评估潜在风险至关重要的情况下需要这样做。C-SPARQL是一种用于对RDF数据流进行连续查询的语言，是流推理中比较突出的方法之一，它提供了对动态数据的连续推理能力，而不仅仅是流处理。然而，已经证明，在存在大量数据的情况下，C-SPARQL可能无法及时回答查询，特别是当传入数据的频率高于用该数据进行推理所需的时间时。本文研究了C-SPARQL推理是否可以用递归神经网络和卷积神经网络来逼近，这两种神经网络结构已被证明非常适合于时间序列预测和时间序列分类，一旦网络经过训练，就可以利用它们更高的处理速度。我们考虑了各种不同类型的查询，并以高精度获得了总体积极的结果，同时通常将处理时间提高了几个数量级。摘要：The amount of information produced, whether by newspapers, blogs and social networks, or by monitoring systems, is increasing rapidly. Processing all this data in real-time, while taking into consideration advanced knowledge about the problem domain, is challenging, but required in scenarios where assessing potential risks in a timely fashion is critical. C-SPARQL, a language for continuous queries over streams of RDF data, is one of the more prominent approaches in stream reasoning that provides such continuous inference capabilities over dynamic data that go beyond mere stream processing. However, it has been shown that, in the presence of huge amounts of data, C-SPARQL may not be able to answer queries in time, in particular when the frequency of incoming data is higher than the time required for reasoning with that data. In this paper, we investigate whether reasoning with C-SPARQL can be approximated using Recurrent Neural Networks and Convolutional Neural Networks, two neural network architectures that have been shown to be well-suited for time series forecasting and time series classification, to leverage on their higher processing speed once the network has been trained. We consider a variety of different kinds of queries and obtain overall positive results with high accuracies while improving processing time often by several orders of magnitude.

【45】 Machine learning-based analysis of hyperspectral images for automated sepsis diagnosis 标题：基于机器学习的高光谱图像分析在脓毒症自动诊断中的应用

作者：Maximilian Dietrich,Silvia Seidlitz,Nicholas Schreck,Manuel Wiesenfarth,Patrick Godau,Minu Tizabi,Jan Sellner,Sebastian Marx,Samuel Knödler,Michael M. Allers,Leonardo Ayala,Karsten Schmidt,Thorsten Brenner,Alexander Studier-Fischer,Felix Nickel,Beat P. Müller-Stich,Annette Kopp-Schneider,Markus A. Weigand,Lena Maier-Hein 备注：Maximilian Dietrich and Silvia Seidlitz contributed equally. Markus A. Weigand and Lena Maier-Hein contributed equally 链接：https://arxiv.org/abs/2106.08445 摘要：脓毒症是世界范围内导致死亡和严重疾病的主要原因。虽然用于早期诊断的可靠的生物标志物仍然缺失，但最近的研究表明，高光谱成像（HSI）有可能通过监测微循环改变来克服这一瓶颈。然而，基于HSI数据的基于机器学习的脓毒症自动诊断至今尚未被探索。鉴于文献中的这一差距，我们利用现有的数据集（1）研究基于HSI的脓毒症自动诊断是否可行，以及（2）提出一系列与基于HSI的组织分类相关的可能的混杂因素。虽然我们能够利用现有数据对脓毒症进行分类，准确度超过98美元，\%$，但我们的研究还揭示了几个与受试者、治疗和成像相关的混杂因素，这些因素可能导致在患者组之间不平衡时高估算法性能。我们的结论是，进一步的前瞻性研究，仔细设计这些混杂因素，是必要的，以证实在这项研究中获得的初步结果。摘要：Sepsis is a leading cause of mortality and critical illness worldwide. While robust biomarkers for early diagnosis are still missing, recent work indicates that hyperspectral imaging (HSI) has the potential to overcome this bottleneck by monitoring microcirculatory alterations. Automated machine learning-based diagnosis of sepsis based on HSI data, however, has not been explored to date. Given this gap in the literature, we leveraged an existing data set to (1) investigate whether HSI-based automated diagnosis of sepsis is possible and (2) put forth a list of possible confounders relevant for HSI-based tissue classification. While we were able to classify sepsis with an accuracy of over $98\,\%$ using the existing data, our research also revealed several subject-, therapy- and imaging-related confounders that may lead to an overestimation of algorithm performance when not balanced across the patient groups. We conclude that further prospective studies, carefully designed with respect to these confounders, are necessary to confirm the preliminary results obtained in this study.

【46】 Code to Comment Translation: A Comparative Study on Model Effectiveness & Errors 标题：代码评语翻译：模式有效性与错误的比较研究

作者：Junayed Mahmud,Fahim Faisal,Raihan Islam Arnob,Antonios Anastasopoulos,Kevin Moran 备注：Accepted to the 2021 NLP4Prog Workshop co-located with The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021) 链接：https://arxiv.org/abs/2106.08415 摘要：自动源代码摘要是一个流行的软件工程研究课题，其中机器翻译模型被用来将代码片段“翻译”成相关的自然语言描述。大多数此类模型的评估都是使用基于自动参考的度量进行的。然而，考虑到编程语言和自然语言之间相对较大的语义差距，我们认为这一研究方向将受益于对当前最先进模型的各种错误模式的定性研究。因此，在这项工作中，我们对最近提出的三种源代码摘要模型进行了定量和定性的比较。在我们的定量评估中，我们比较了基于平滑BLEU-4、METEOR和ROUGE-L机器翻译度量的模型，在我们的定性评估中，我们对模型在与地面真相字幕进行比较时犯下的最常见错误执行手动开放编码。我们的研究揭示了基于度量的性能和模型预测错误之间的关系的新见解，这种关系建立在一种可用于推动未来研究工作的经验衍生错误分类法的基础上摘要：Automated source code summarization is a popular software engineering research topic wherein machine translation models are employed to "translate" code snippets into relevant natural language descriptions. Most evaluations of such models are conducted using automatic reference-based metrics. However, given the relatively large semantic gap between programming languages and natural language, we argue that this line of research would benefit from a qualitative investigation into the various error modes of current state-of-the-art models. Therefore, in this work, we perform both a quantitative and qualitative comparison of three recently proposed source code summarization models. In our quantitative evaluation, we compare the models based on the smoothed BLEU-4, METEOR, and ROUGE-L machine translation metrics, and in our qualitative evaluation, we perform a manual open-coding of the most common errors committed by the models when compared to ground truth captions. Our investigation reveals new insights into the relationship between metric-based performance and model prediction errors grounded in an empirically derived error taxonomy that can be used to drive future research efforts

【47】 On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control 标题：连续控制中重尾策略搜索的样本复杂性和亚稳性

作者：Amrit Singh Bedi,Anjaly Parayil,Junyu Zhang,Mengdi Wang,Alec Koppel 链接：https://arxiv.org/abs/2106.08414 摘要：强化学习是一种交互式决策的框架，它不需要系统动力学模型，而是在不同的时间顺序地揭示激励机制。由于其可扩展到连续空间，我们将重点放在策略搜索上，其中一个迭代地改进带有随机策略梯度（PG）更新的参数化策略。在表马尔可夫决策问题（MDPs）中，通过不断的探索和适当的参数化，可以得到全局最优解。相比之下，在连续空间中，非凸性带来了一个病态的挑战，现有的收敛结果大多局限于平稳性或任意的局部极值。为了缩小这一差距，我们通过策略参数化（由尾部指数参数alpha定义的较重尾部分布定义）在连续空间中进行持续探索，这增加了状态空间中跳跃的可能性。这样做会使PG所共有的分数函数的光滑性条件失效。因此，我们建立了收敛到平稳性的速度如何依赖于策略的尾部指数α、Holder连续性参数、可积性条件和本文首次引入的探索容差参数。进一步，我们通过一个适当定义的马尔可夫链的退出和过渡时间分析来刻画局部极大值集对尾部指数的依赖性，确定与较重尾部的Levy过程相关的策略收敛到较宽的峰值。这一现象提高了监督学习对扰动的稳定性，我们也证实了这一点，特别是当短视和远视的激励机制失调时，策略搜索的性能也得到了改善。摘要：Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model. Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates. In tabular Markov Decision Problems (MDPs), under persistent exploration and suitable parameterization, global optimality may be obtained. By contrast, in continuous space, the non-convexity poses a pathological challenge as evidenced by existing convergence results being mostly limited to stationarity or arbitrary local extrema. To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space. Doing so invalidates smoothness conditions of the score function common to PG. Thus, we establish how the convergence rate to stationarity depends on the policy's tail index alpha, a Holder continuity parameter, integrability conditions, and an exploration tolerance parameter introduced here for the first time. Further, we characterize the dependence of the set of local maxima on the tail index through an exit and transition time analysis of a suitably defined Markov chain, identifying that policies associated with Levy Processes of a heavier tail converge to wider peaks. This phenomenon yields improved stability to perturbations in supervised learning, which we corroborate also manifests in improved performance of policy search, especially when myopic and farsighted incentives are misaligned.

【48】 Robust Reinforcement Learning Under Minimax Regret for Green Security 标题：极小极大绿色安全遗憾下的鲁棒强化学习

作者：Lily Xu,Andrew Perrault,Fei Fang,Haipeng Chen,Milind Tambe 备注：Accepted at the Conference on Uncertainty in Artificial Intelligence (UAI) 2021. 11 pages, 5 figures 链接：https://arxiv.org/abs/2106.08413 摘要：绿色安全领域的特点是防御者在偷猎者、非法伐木者和非法渔民的敌对行为不确定的情况下计划巡逻。重要的是，巡逻对敌方未来行为的威慑作用使得巡逻计划成为一个连续的决策问题。因此，本文主要研究基于minimax后悔准则的绿色安全鲁棒序贯巡逻规划问题。我们将问题描述为防御者和控制对抗行为参数值的自然人之间的博弈，并设计了一个算法镜像来寻找鲁棒策略。MIRROR使用了两个基于强化学习的oracle来解决一个受限的游戏，考虑了有限的防御策略和参数值。我们对真实世界的偷猎数据进行评估。摘要：Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries' future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature. We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement learning-based oracles and solves a restricted game considering limited defender strategies and parameter values. We evaluate MIRROR on real-world poaching data.

【49】 Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content 标题：具有文本转录的模因基准数据集，用于自动检测多模式厌女症内容

作者：Francesca Gasparini,Giulia Rizzi,Aurora Saibene,Elisabetta Fersini 链接：https://arxiv.org/abs/2106.08409 摘要：在本文中，我们提出了一个基准数据集作为一个项目的一部分，自动识别网上内容中的厌女症，其中特别侧重于模因。这里描述的基准由800个模因组成，这些模因来自最流行的社交媒体平台，如Facebook、Twitter、Instagram和Reddit，以及专门收集和创建模因的咨询网站。为了收集厌女迷因，考虑到对妇女仇恨的不同表现形式，如身体羞辱、陈规定型、客观化和暴力，以提及厌女内容的特定关键词作为搜索标准。同时，没有厌恶女性内容的模因也被从相同的网络资源中手动下载。在所有收集到的模因中，三位领域专家选择了一个800个模因的数据集，这些模因在厌女症和非厌女症之间平衡。该数据集已通过众包平台进行验证，涉及60名受试者的标签过程，以便为每个实例收集三个评估。专家和众包平台还收集了另外两个二元标签，用于评价厌恶女性的模因，涉及攻击性和讽刺性。最后，对于每一个模因，文本都被手动转录。因此，提供的数据集由800个模因、专家给出的标签和众包验证获得的标签以及转录文本组成。这些数据可以用来解决依靠文本和视觉线索自动检测网络上厌恶女性的内容的问题，面对网络性别歧视和技术促进的暴力等每天都在增长的现象。摘要：In this paper we present a benchmark dataset generated as part of a project for automatic identification of misogyny within online content, which focuses in particular on memes. The benchmark here described is composed of 800 memes collected from the most popular social media platforms, such as Facebook, Twitter, Instagram and Reddit, and consulting websites dedicated to collection and creation of memes. To gather misogynistic memes, specific keywords that refer to misogynistic content have been considered as search criterion, considering different manifestations of hatred against women, such as body shaming, stereotyping, objectification and violence. In parallel, memes with no misogynist content have been manually downloaded from the same web sources. Among all the collected memes, three domain experts have selected a dataset of 800 memes equally balanced between misogynistic and non-misogynistic ones. This dataset has been validated through a crowdsourcing platform, involving 60 subjects for the labelling process, in order to collect three evaluations for each instance. Two further binary labels have been collected from both the experts and the crowdsourcing platform, for memes evaluated as misogynistic, concerning aggressiveness and irony. Finally for each meme, the text has been manually transcribed. The dataset provided is thus composed of the 800 memes, the labels given by the experts and those obtained by the crowdsourcing validation, and the transcribed texts. This data can be used to approach the problem of automatic detection of misogynistic content on the Web relying on both textual and visual cues, facing phenomenons that are growing every day such as cybersexism and technology-facilitated violence.

【50】 On the Objective Evaluation of Post Hoc Explainers 标题：论岗位讲解员的客观评价

作者：Zachariah Carmichael,Walter J. Scheirer 备注：14 pages, 4 figures. Under review 链接：https://arxiv.org/abs/2106.08376 摘要：数据驱动模型的许多应用要求决策的透明度，特别是在医疗、刑事司法和其他高风险环境中。机器学习研究的现代趋势导致算法越来越复杂，以至于被认为是黑匣子。为了减少决策的不透明性，人们提出了以人类可理解的方式分析此类模型内部工作的方法。这些事后技术被描述为是通用的解释者——能够忠实地用算法洞察力增强决策。不幸的是，关于什么是“好的”解释，人们几乎没有达成一致意见。此外，现有的解释性评价方法是从主观或代理手段中衍生出来的。在这项工作中，我们提出了一个框架，以评估事后解释的地面真相，这是直接来自一个模型的加法结构。通过对数千个合成任务和多个真实任务中流行的解释者进行评估，我们证明了该框架在理解解释者方面的有效性。该框架揭示了解释可能是准确的，但错误地认为个别特征的重要性。摘要：Many applications of data-driven models demand transparency of decisions, especially in health care, criminal justice, and other high-stakes environments. Modern trends in machine learning research have led to algorithms that are increasingly intricate to the degree that they are considered to be black boxes. In an effort to reduce the opacity of decisions, methods have been proposed to construe the inner workings of such models in a human-comprehensible manner. These post hoc techniques are described as being universal explainers - capable of faithfully augmenting decisions with algorithmic insight. Unfortunately, there is little agreement about what constitutes a "good" explanation. Moreover, current methods of explanation evaluation are derived from either subjective or proxy means. In this work, we propose a framework for the evaluation of post hoc explainers on ground truth that is directly derived from the additive structure of a model. We demonstrate the efficacy of the framework in understanding explainers by evaluating popular explainers on thousands of synthetic and several real-world tasks. The framework unveils that explanations may be accurate but misattribute the importance of individual features.

【51】 Rinascimento: searching the behaviour space of Splendor 标题：Rinascimento：寻找辉煌的行为空间

作者：Ivan Bravi,Simon Lucas 备注：11 pages, 6 figures 链接：https://arxiv.org/abs/2106.08371 摘要：与面向性能的游戏相比，人工智能在游戏中的主要应用仍然处于次要地位。游戏玩法测试的主要目的之一是收集游戏玩法的数据，突出游戏设计的优缺点，为游戏设计者改进设计提供有用的见解。使用人工智能代理有可能大大加快这一过程。本研究的目的是用一般的方法来绘制游戏的行为空间。利用MAP-Elites算法搜索超参数空间rinacimento-AI-agents，并将其映射到由若干行为度量定义的BSpace。这种方法能够突出示范和退化的行为，在原来的游戏设计的辉煌和两个变种。特别是，与基于经典分数奖励信号的代理相比，事件值函数的使用通常在BSpace的覆盖率方面有显著的改进。摘要：The use of Artificial Intelligence (AI) for play-testing is still on the sidelines of main applications of AI in games compared to performance-oriented game-playing. One of the main purposes of play-testing a game is gathering data on the gameplay, highlighting good and bad features of the design of the game, providing useful insight to the game designers for improving the design. Using AI agents has the potential of speeding the process dramatically. The purpose of this research is to map the behavioural space (BSpace) of a game by using a general method. Using the MAP-Elites algorithm we search the hyperparameter space Rinascimento AI agents and map it to the BSpace defined by several behavioural metrics. This methodology was able to highlight both exemplary and degenerated behaviours in the original game design of Splendor and two variations. In particular, the use of event-value functions has generally shown a remarkable improvement in the coverage of the BSpace compared to agents based on classic score-based reward signals.

【52】 Predicting Unreliable Predictions by Shattering a Neural Network 标题：通过粉碎神经网络来预测不可靠的预测

作者：Xu Ji,Razvan Pascanu,Devon Hjelm,Andrea Vedaldi,Balaji Lakshminarayanan,Yoshua Bengio 链接：https://arxiv.org/abs/2106.08365 摘要：分段线性神经网络可以分为子函数，每个子函数都有自己的激活模式、域和经验误差。整个网络的经验误差可以写成子函数经验误差的期望。构造子函数经验误差的推广界表明，子函数在表示空间中被训练样本包围的越密集，其预测就越可靠。此外，它还表明，激活区较少的模型泛化效果更好，而抽象知识程度较高的模型泛化效果更好，其他条件都相同。我们不仅提出了一个推理子函数误差界的理论框架，而且提出了一种近似计算子函数误差界的实用方法，用于预测网络不能成功推广到哪些样本。我们测试了我们的方法对错误分类和分布外样本的检测，发现它在这两种情况下都有竞争力。简而言之，一些网络激活模式比其他模式具有更高的可靠性，并且可以使用子功能错误界限来识别这些模式。摘要：Piecewise linear neural networks can be split into subfunctions, each with its own activation pattern, domain, and empirical error. Empirical error for the full network can be written as an expectation over empirical error of subfunctions. Constructing a generalization bound on subfunction empirical error indicates that the more densely a subfunction is surrounded by training samples in representation space, the more reliable its predictions are. Further, it suggests that models with fewer activation regions generalize better, and models that abstract knowledge to a greater degree generalize better, all else equal. We propose not only a theoretical framework to reason about subfunction error bounds but also a pragmatic way of approximately evaluating it, which we apply to predicting which samples the network will not successfully generalize to. We test our method on detection of misclassification and out-of-distribution samples, finding that it performs competitively in both cases. In short, some network activation patterns are associated with higher reliability than others, and these can be identified using subfunction error bounds.

【53】 Surgical task expertise detected by a self-organizing neural network map 标题：用自组织神经网络映射检测外科任务专业知识

作者：Birgitta Dresp-Langley,Rongrong Liu,John M. Wandeto 备注：Conference on Automation in Medical Engineering AUTOMED21, University Hospital Basel, Switzerland, 2021, June 8-9 链接：https://arxiv.org/abs/2106.08995 摘要：使用为内窥镜手术设计的机器人控制装置对专家和新手的双手仿真器任务性能的个人握力分析允许定义基准标准，以区分新手或实习外科医生的技能和真正的专家任务技能。在一个真正的专家和一个执行机器人辅助手术模拟器任务的新手中，握力的变异性揭示了作为任务专长函数的统计显著性差异。在这里，我们展示了通过自组织神经网络图（SOM）的输出度量来预测局部握力的技能特异性差异，该自组织神经网络图（SOM）具有生物启发的功能结构，映射了灵长类动物大脑中体感神经网络的功能连接。摘要：Individual grip force profiling of bimanual simulator task performance of experts and novices using a robotic control device designed for endoscopic surgery permits defining benchmark criteria that tell true expert task skills from the skills of novices or trainee surgeons. Grip force variability in a true expert and a complete novice executing a robot assisted surgical simulator task reveal statistically significant differences as a function of task expertise. Here we show that the skill specific differences in local grip forces are predicted by the output metric of a Self Organizing neural network Map (SOM) with a bio inspired functional architecture that maps the functional connectivity of somatosensory neural networks in the primate brain.

【54】 Nonequilibrium thermodynamics of self-supervised learning 标题：自我监督学习的非平衡热力学

作者：Domingos S. P. Salazar 备注：6 pages, 1 figure 链接：https://arxiv.org/abs/2106.08981 摘要：基于能量模型的自监督学习（SSL）与平衡热力学有着直观的联系，因为softmax层将能量映射到概率，是Gibbs分布。然而，SSL以何种方式是热力学过程？我们证明了一些SSL范式表现为一个热力学复合系统，由表示和自标记与非平衡储层接触而形成。此外，该系统还受到绝热膨胀和等容加热等常规热力学循环的影响，产生了广义吉布斯系综。在这幅图中，我们展示了学习被视为一个恶魔，它通过反馈测量来从系统中提取负功。作为应用程序，我们研究了一些使用这种思想的SSL算法。摘要：Self-supervised learning (SSL) of energy based models has an intuitive relation to equilibrium thermodynamics because the softmax layer, mapping energies to probabilities, is a Gibbs distribution. However, in what way SSL is a thermodynamic process? We show that some SSL paradigms behave as a thermodynamic composite system formed by representations and self-labels in contact with a nonequilibrium reservoir. Moreover, this system is subjected to usual thermodynamic cycles, such as adiabatic expansion and isochoric heating, resulting in a generalized Gibbs ensemble (GGE). In this picture, we show that learning is seen as a demon that operates in cycles using feedback measurements to extract negative work from the system. As applications, we examine some SSL algorithms using this idea.

【55】 Directed Graph Embeddings in Pseudo-Riemannian Manifolds 标题：伪黎曼流形中的有向图嵌入

作者：Aaron Sim,Maciej Wiatrak,Angus Brayne,Páidí Creed,Saee Paliwal 备注：Accepted at ICML 2021 链接：https://arxiv.org/abs/2106.08678 摘要：图表示学习算法的归纳偏差通常被编码在其嵌入空间的背景几何中。在本文中，我们证明了一般有向图可以由一个嵌入模型有效地表示，该嵌入模型由三个部分组成：伪黎曼度量结构、非平凡全局拓扑和一个在嵌入空间中显式地包含一个优先方向的唯一似然函数。通过将该方法应用于自然语言应用和生物学中一系列合成的和真实的有向图的链接预测任务，证明了该方法的表示能力。特别地，我们证明了低维圆柱Minkowski和anti-de-Sitter时空可以产生与高维曲黎曼流形相等或更好的图表示。摘要：The inductive biases of graph representation learning algorithms are often encoded in the background geometry of their embedding space. In this paper, we show that general directed graphs can be effectively represented by an embedding model that combines three components: a pseudo-Riemannian metric structure, a non-trivial global topology, and a unique likelihood function that explicitly incorporates a preferred direction in embedding space. We demonstrate the representational capabilities of this method by applying it to the task of link prediction on a series of synthetic and real directed graphs from natural language applications and biology. In particular, we show that low-dimensional cylindrical Minkowski and anti-de Sitter spacetimes can produce equal or better graph representations than curved Riemannian manifolds of higher dimensions.

【56】 Lorenz System State Stability Identification using Neural Networks 标题：基于神经网络的Lorenz系统状态稳定性辨识

作者：Megha Subramanian,Ramakrishna Tipireddy,Samrat Chatterjee 链接：https://arxiv.org/abs/2106.08489 摘要：非线性动力学系统，如Lorenz63方程，在本质上是混沌的，对初始条件敏感。结果，初始条件中的一个小扰动在几个时间步之后导致状态轨迹的偏差。精确识别系统状态所需的算法和计算资源取决于解是否处于过渡区。我们把过渡区和非过渡区分别称为不稳定区和稳定区。我们将一个系统状态标记为稳定的，如果它的最近过去和未来状态位于同一个系统中。然而，在给定的时间步长下，我们不知道系统是处于稳定区域还是不稳定区域。本文提出并训练一个前馈（多层感知器）神经网络，将Lorenz系统的状态分为稳定和不稳定两类。我们把这个任务作为一个有监督学习问题，在Lorenz系统上训练神经网络，该系统的状态被标记为稳定或不稳定。然后，我们测试神经网络模型的能力，以确定稳定和不稳定的状态对一个不同的洛伦兹系统，这是产生不同的初始条件。我们还评估了不匹配情况下的分类性能，即当训练和验证数据的初始条件从不同的时间间隔取样时。我们证明了某些规范化方案可以极大地提高神经网络的性能，特别是在这些不匹配的情况下。本文提出的分类框架可以作为更大范围的序贯决策框架的预处理器，在序贯决策框架中，基于观测到的稳定或不稳定状态进行决策。摘要：Nonlinear dynamical systems such as Lorenz63 equations are known to be chaotic in nature and sensitive to initial conditions. As a result, a small perturbation in the initial conditions results in deviation in state trajectory after a few time steps. The algorithms and computational resources needed to accurately identify the system states vary depending on whether the solution is in transition region or not. We refer to the transition and non-transition regions as unstable and stable regions respectively. We label a system state to be stable if it's immediate past and future states reside in the same regime. However, at a given time step we don't have the prior knowledge about whether system is in stable or unstable region. In this paper, we develop and train a feed forward (multi-layer perceptron) Neural Network to classify the system states of a Lorenz system as stable and unstable. We pose this task as a supervised learning problem where we train the neural network on Lorenz system which have states labeled as stable or unstable. We then test the ability of the neural network models to identify the stable and unstable states on a different Lorenz system that is generated using different initial conditions. We also evaluate the classification performance in the mismatched case i.e., when the initial conditions for training and validation data are sampled from different intervals. We show that certain normalization schemes can greatly improve the performance of neural networks in especially these mismatched scenarios. The classification framework developed in the paper can be a preprocessor for a larger context of sequential decision making framework where the decision making is performed based on observed stable or unstable states.

【57】 Model Predictive Control with and without Terminal Weight: Stability and Algorithms 标题：有终端权和无终端权的模型预测控制：稳定性和算法

作者：Wen-Hua Chen 链接：https://arxiv.org/abs/2011.14193 摘要：本文提出了一种用于模型预测控制（MPC）的稳定性分析工具。有限视界无终端权的MPC稳定性分析是一个长期存在的开放性问题。利用修正值函数作为Lyapunov函数的候选函数，利用最优性原理，建立了这类广泛应用的MPC算法的稳定性条件。提出了一种新的无终端权的稳定保证MPC算法。通过设计一个新的由提前一步代价值函数定义的子级集，给出了检验该算法递归可行性和稳定性的条件。新的稳定性条件和导出的MPC克服了现有基于终端权值的MPC框架中存在的困难，包括需要寻找合适的终端权值和可能由于终端权值不合适而导致的性能下降。这项工作进一步扩展到MPC与终端重量的完整性。数值算例表明了该方法的有效性，而现有的稳定性分析方法要么不适用，要么结果相当保守。结果表明，所提出的工具提供了许多实现稳定性的机制：调整状态和/或控制权重、延长视界长度以及在优化过程中对第一或第二状态添加简单的额外约束。摘要：This paper presents stability analysis tools for model predictive control (MPC) with and without terminal weight. Stability analysis of MPC with a limited horizon but without terminal weight is a long-standing open problem. By using a modified value function as an Lyapunov function candidate and the principle of optimality, this paper establishes stability conditions for this type of widely spread MPC algorithms. A new stability guaranteed MPC algorithm without terminal weight (MPCS) is presented. With the help of designing a new sublevel set defined by the value function of one-step ahead stage cost, conditions for checking its recursive feasibility and stability of the proposed MPC algorithm are presented. The new stability condition and the derived MPCS overcome the difficulties arising in the existing terminal weight based MPC framework, including the need of searching a suitable terminal weight and possible poor performance caused by an inappropriate terminal weight. This work is further extended to MPC with a terminal weight for the completeness. Numerical examples are presented to demonstrate the effectiveness of the proposed tool, whereas the existing stability analysis tools are either not applicable or lead to quite conservative results. It shows that the proposed tools offer a number of mechanisms to achieve stability: adjusting state and/or control weights, extending the length of horizon, and adding a simple extra constraint on the first or second state in the optimisation.

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-06-17，如有侵权请联系 cloudcommunity@tencent.com 删除

linux