人工智能学术速递[12.10]

公众号-arXiv每日学术速递

发布于 2021-12-10 17:02:47

8340

发布于 2021-12-10 17:02:47

Update！H5支持摘要折叠，体验更佳！涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.AI人工智能，共计36篇

【1】 PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning 标题：PTR：基于零件的概念推理、关系推理和物理推理的基准链接：https://arxiv.org/abs/2112.05136

作者：Yining Hong,Li Yi,Joshua B. Tenenbaum,Antonio Torralba,Chuang Gan 机构：UCLA, Stanford University, MIT BCS, CBMM, CSAIL, MIT CSAIL, MIT-IBM Watson AI Lab 备注：NeurIPS 2021. Project page: this http URL 摘要：将视觉对象进一步分解为一个整体和一个部分的能力是构成视觉层次的关键。这种复合结构可以归纳出丰富的语义概念和关系，从而在视觉信号的解释和组织以及视觉感知和推理的泛化中发挥重要作用。然而，现有的视觉推理基准主要关注对象而不是零件。由于更细粒度的概念、更丰富的几何关系和更复杂的物理，基于完整零件-整体层次结构的可视化推理比以对象为中心的推理更具挑战性。因此，为了更好地为基于零件的概念、关系和物理推理服务，我们引入了一个新的大规模诊断可视化推理数据集PTR。PTR包含约70k RGBD合成图像，其中包含关于语义实例分割、颜色属性、空间和几何关系以及某些物理属性（如稳定性）的地面真实对象和零件级注释。这些图像与700k机器生成的问题相结合，涵盖各种类型的推理类型，使它们成为视觉推理模型的良好测试平台。我们在此数据集上研究了几种最先进的视觉推理模型，并观察到它们在人类可以轻松推断正确答案的情况下仍然会犯许多令人惊讶的错误。我们相信，该数据集将为基于零件的推理提供新的机会。摘要：A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies. Such composite structures could induce a rich set of semantic concepts and relations, thus playing an important role in the interpretation and organization of visual signals as well as for the generalization of visual perception and reasoning. However, existing visual reasoning benchmarks mostly focus on objects rather than parts. Visual reasoning based on the full part-whole hierarchy is much more challenging than object-centric reasoning due to finer-grained concepts, richer geometry relations, and more complex physics. Therefore, to better serve for part-based conceptual, relational and physical reasoning, we introduce a new large-scale diagnostic visual reasoning dataset named PTR. PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations regarding semantic instance segmentation, color attributes, spatial and geometric relationships, and certain physical properties such as stability. These images are paired with 700k machine-generated questions covering various types of reasoning types, making them a good testbed for visual reasoning models. We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes in situations where humans can easily infer the correct answer. We believe this dataset will open up new opportunities for part-based reasoning.

【2】 Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation 标题：神经描述符域：SE(3)-用于操作的等变对象表示链接：https://arxiv.org/abs/2112.05124

作者：Anthony Simeonov,Yilun Du,Andrea Tagliasacchi,Joshua B. Tenenbaum,Alberto Rodriguez,Pulkit Agrawal,Vincent Sitzmann 机构：Massachusetts Institute of Technology, Google Research, University of Toronto, ∗Authors contributed equally, order determined by coin flip. †Equal Advising., Small Handful (~,-,) of Demonstrations, Test-time executions: Unseen objects in out-of-distribution poses 备注：Website: this https URL First two authors contributed equally (order determined by coin flip), last two authors equal advising 摘要：我们提出了神经描述符字段（NDF），一种通过类别级描述符对对象和目标（如用于悬挂的机器人夹具或机架）之间的点和相对姿势进行编码的对象表示。我们将此表示用于对象操作，在给定任务演示的情况下，我们希望在同一类别的新对象实例上重复相同的任务。我们建议通过搜索（通过优化）描述符与演示中观察到的匹配的姿势来实现这一目标。NDF通过不依赖专家标记的关键点的3D自动编码任务以自我监督的方式方便地进行训练。此外，NDF是SE（3）-等变的，保证了在所有可能的3D对象平移和旋转中通用的性能。我们在模拟和真实机器人上演示了从少量（5-10）演示中学习操作任务。我们的性能概括了对象实例和6自由度对象姿势，并且显著优于最近依赖2D描述符的基线。项目网站：https://yilundu.github.io/ndf/. 摘要：We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target (such as a robot gripper or a rack used for hanging) via category-level descriptors. We employ this representation for object manipulation, where given a task demonstration, we want to repeat the same task on a new object instance from the same category. We propose to achieve this objective by searching (via optimization) for the pose whose descriptor matches that observed in the demonstration. NDFs are conveniently trained in a self-supervised fashion via a 3D auto-encoding task that does not rely on expert-labeled keypoints. Further, NDFs are SE(3)-equivariant, guaranteeing performance that generalizes across all possible 3D object translations and rotations. We demonstrate learning of manipulation tasks from few (5-10) demonstrations both in simulation and on a real robot. Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors. Project website: https://yilundu.github.io/ndf/.

【3】 Extending the WILDS Benchmark for Unsupervised Adaptation 标题：扩展无监督适应的WARDS基准链接：https://arxiv.org/abs/2112.05090

作者：Shiori Sagawa,Pang Wei Koh,Tony Lee,Irena Gao,Sang Michael Xie,Kendrick Shen,Ananya Kumar,Weihua Hu,Michihiro Yasunaga,Henrik Marklund,Sara Beery,Etienne David,Ian Stavness,Wei Guo,Jure Leskovec,Kate Saenko,Tatsunori Hashimoto,Sergey Levine,Chelsea Finn,Percy Liang

【4】 Locally Shifted Attention With Early Global Integration 标题：通过早期的全球一体化在当地转移注意力链接：https://arxiv.org/abs/2112.05080

作者：Shelly Sheynin,Sagie Benaim,Adam Polyak,Lior Wolf 机构：Tel Aviv University, Facebook AI Research

【5】 Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies 标题：用分层潜在混合策略学习可迁移运动技能链接：https://arxiv.org/abs/2112.05062

作者：Dushyant Rao,Fereshteh Sadeghi,Leonard Hasenclever,Markus Wulfmeier,Martina Zambelli,Giulia Vezzani,Dhruva Tirumala,Yusuf Aytar,Josh Merel,Nicolas Heess,Raia Hadsell 机构：DeepMind, London, UK 摘要：对于在现实世界中运行的机器人，需要学习可有效转移并适应多种任务和场景的可重用行为。我们提出了一种利用层次混合潜变量模型从数据中学习抽象运动技能的方法。与现有工作相比，我们的方法利用了离散和连续潜在变量的三级层次结构，以捕获一组高级行为，同时考虑执行方式的差异。我们在操作域中证明，该方法可以有效地将离线数据聚类成不同的可执行行为，同时保持连续潜在变量模型的灵活性。与现有的基于技能和模仿的方法相比，生成的技能可以根据新任务、看不见的对象以及从状态到基于视觉的策略进行转移和微调，从而产生更好的样本效率和渐进性能。我们进一步分析了技能如何以及何时最有益：它们鼓励定向探索，以覆盖与任务相关的状态空间的大区域，使它们在挑战稀疏奖励设置时最有效。摘要：For robots operating in the real world, it is desirable to learn reusable behaviours that can effectively be transferred and adapted to numerous tasks and scenarios. We propose an approach to learn abstract motor skills from data using a hierarchical mixture latent variable model. In contrast to existing work, our method exploits a three-level hierarchy of both discrete and continuous latent variables, to capture a set of high-level behaviours while allowing for variance in how they are executed. We demonstrate in manipulation domains that the method can effectively cluster offline data into distinct, executable behaviours, while retaining the flexibility of a continuous latent variable model. The resulting skills can be transferred and fine-tuned on new tasks, unseen objects, and from state to vision-based policies, yielding better sample efficiency and asymptotic performance compared to existing skill- and imitation-based methods. We further analyse how and when the skills are most beneficial: they encourage directed exploration to cover large regions of the state space relevant to the task, making them most effective in challenging sparse-reward settings.

【6】 End-to-End Learning of Joint Geometric and Probabilistic Constellation Shaping 标题：联合几何和概率星座成形的端到端学习链接：https://arxiv.org/abs/2112.05050

作者：Vahid Aref,Mathieu Chagnon 机构：Nokia, Magirusstr. , Stuttgart, Germany 备注：Will be presented at OFC 2022 (invited talk)

【7】 Wikidated 1.0: An Evolving Knowledge Graph Dataset of Wikidata's Revision History 标题：维基百科1.0：维基数据修订历史的进化知识图数据集链接：https://arxiv.org/abs/2112.05003

作者：Lukas Schmelzeisen,Corina Dima,Steffen Staab 机构： University of Stuttgart, Germany, University of Southampton, United Kingdom 备注：None 摘要：Wikidata是目前公开提供的最大的通用知识库。它由数千名志愿者编辑合作编辑，因此自2012年成立以来有了很大的发展。在本文中，我们介绍了Wikidated 1.0，这是Wikidata完整修订历史的数据集，它将Wikidata修订之间的更改编码为RDF三元组的删除和添加集。据我们所知，它构成了演化知识图的第一个大型数据集，这是语义Web社区中最近出现的一个研究课题。我们介绍了从Wikidata转储生成Wikidated 1.0的方法，讨论了其实现和局限性，并展示了数据集的统计特征。摘要：Wikidata is the largest general-interest knowledge base that is openly available. It is collaboratively edited by thousands of volunteer editors and has thus evolved considerably since its inception in 2012. In this paper, we present Wikidated 1.0, a dataset of Wikidata's full revision history, which encodes changes between Wikidata revisions as sets of deletions and additions of RDF triples. To the best of our knowledge, it constitutes the first large dataset of an evolving knowledge graph, a recently emerging research subject in the Semantic Web community. We introduce the methodology for generating Wikidated 1.0 from dumps of Wikidata, discuss its implementation and limitations, and present statistical characteristics of the dataset.

【8】 Smart Support for Mission Success 标题：为任务成功提供智能支持链接：https://arxiv.org/abs/2112.04957

作者：Juliette Mattioli,Pierre-Olivier Robic 机构： Thales, France, Thales Global Services, France 备注：8 pages, 2 figures 摘要：今天的战场环境复杂、动态且不确定，需要有效的支持以确保任务成功。这取决于适当的保障战略，以提供能够完成任务的受保障装备。在系统和组织都很复杂的国防环境中，采用整体方法具有挑战性，部队和支持机构需要依赖高效的决策支持系统。后勤、准备和可持续性是资产管理的关键因素，资产管理可受益于人工智能，以达到“智能服务”水平，尤其依赖于预测性和规定性方法以及运营资源的有效管理。智能支持能力可以通过适当的指标进行监控，并通过多标准决策支持和知识管理系统进行改进。根据信息和目标方面的操作环境，不同的人工智能范式（数据驱动人工智能、基于知识的人工智能）适用于混合人工智能的组合。摘要：Today's battlefield environment is complex, dynamic and uncertain, and requires efficient support to ensure mission success. This relies on a proper support strategy to provide supported equipment able to fulfill the mission. In the context of defense where both systems and organization are complex, having a holistic approach is challenging by nature, forces and support agencies need to rely on an efficient decision support system. Logistics, readiness and sustainability are critical factors for asset management, which can benefit from AI to reach "Smart In Service" level relying especially on predictive and prescriptive approaches and on effective management of operational re-sources. Smart Support capacities can be then monitored by appropriate metrics and improved by multi-criteria decision support and knowledge management system. Depending on the operational context in terms of information and the objective, different AI paradigms (data-driven AI, knowledge-based AI) are suitable even a combination through hybrid AI.

【9】 Machine Learning for Utility Prediction in Argument-Based Computational Persuasion 标题：基于参数的计算说服中效用预测的机器学习链接：https://arxiv.org/abs/2112.04953

作者：Ivan Donadello,Anthony Hunter,Stefano Teso,Mauro Dragoni 机构： Free University of Bozen-Bolzano, Italy, University College London, United Kingdom, Fondazione Bruno Kessler, Italy, University of Trento, Italy 摘要：自动说服系统（APS）旨在通过对话说服用户相信某件事，在对话中交换论点和反驳。为了最大限度地提高APS成功说服用户的可能性，它可以确定一个全局策略，该策略将允许APS在对话的每个阶段选择最佳论据，无论用户提出什么论据。然而，在实际应用中，例如在医疗保健方面，对话结果的效用对于AP和用户来说不太可能是相同的，或者是完全相反的。为了应对这种情况，在两党决策理论中，扩展形式的博弈被用于论证。这开启了我们在本文中要解决的新问题：（1）我们如何使用机器学习（ML）方法来预测不同用户亚群体的效用函数？（2）我们如何从我们所学的功能中为新用户确定最佳实用功能？在这个程度上，我们开发了两种ML方法，EAI和EDS，它们利用来自用户的信息来预测他们的实用程序。EAI仅限于固定数量的信息，而EDS可以选择最能检测用户亚群体的信息。我们在一个模拟环境和一个关于健康饮食习惯的现实案例研究中评估EAI和EDS。结果在这两种情况下都是有希望的，但EDS在预测有用的效用函数方面更有效。摘要：Automated persuasion systems (APS) aim to persuade a user to believe something by entering into a dialogue in which arguments and counterarguments are exchanged. To maximize the probability that an APS is successful in persuading a user, it can identify a global policy that will allow it to select the best arguments it presents at each stage of the dialogue whatever arguments the user presents. However, in real applications, such as for healthcare, it is unlikely the utility of the outcome of the dialogue will be the same, or the exact opposite, for the APS and user. In order to deal with this situation, games in extended form have been harnessed for argumentation in Bi-party Decision Theory. This opens new problems that we address in this paper: (1) How can we use Machine Learning (ML) methods to predict utility functions for different subpopulations of users? and (2) How can we identify for a new user the best utility function from amongst those that we have learned? To this extent, we develop two ML methods, EAI and EDS, that leverage information coming from the users to predict their utilities. EAI is restricted to a fixed amount of information, whereas EDS can choose the information that best detects the subpopulations of a user. We evaluate EAI and EDS in a simulation setting and in a realistic case study concerning healthy eating habits. Results are promising in both cases, but EDS is more effective at predicting useful utility functions.

【10】 DVHN: A Deep Hashing Framework for Large-scale Vehicle Re-identification 标题：DVHN：一种面向大规模车辆识别的深度散列框架链接：https://arxiv.org/abs/2112.04937

作者：Yongbiao Chen,Sheng Zhang,Fangxin Liu,Chenggang Wu,Kaicheng Guo,Zhengwei Qi 机构：∗School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China, †University of Southern California, Los Angeles, USA 摘要：在本文中，我们首次尝试研究深度哈希学习与车辆重新识别的集成。我们提出了一个基于深度散列的车辆再识别框架，称为DVHN，该框架在保留最近邻搜索精度的同时，大大减少了内存使用，提高了检索效率。具体地说，~DVHN通过联合优化特征学习网络和哈希码生成模块，直接学习每个图像的离散紧凑二进制哈希码。具体地说，我们直接将卷积神经网络的输出约束为离散二进制码，并确保所学习的二进制码对于分类是最优的。为了优化深度离散散列框架，我们进一步提出了一种学习二进制相似性保持散列码的交替最小化方法。在两个被广泛研究的车辆重新识别数据集\textbf{VehicleID}和\textbf{VeRi}上的大量实验证明了我们的方法相对于最先进的深度散列方法的优越性。\$2048$位的textbf{DVHN}在\textbf{mAP}和\textbf方面可以实现13.94%和10.21%的精度改进{Rank@1}对于\textbf{VehicleID（800）}数据集。对于\textbf{VeRi}，我们为\textbf实现了35.45%和32.72%的性能提升{Rank@1}和\textbf{mAP}。摘要：In this paper, we make the very first attempt to investigate the integration of deep hash learning with vehicle re-identification. We propose a deep hash-based vehicle re-identification framework, dubbed DVHN, which substantially reduces memory usage and promotes retrieval efficiency while reserving nearest neighbor search accuracy. Concretely,~DVHN directly learns discrete compact binary hash codes for each image by jointly optimizing the feature learning network and the hash code generating module. Specifically, we directly constrain the output from the convolutional neural network to be discrete binary codes and ensure the learned binary codes are optimal for classification. To optimize the deep discrete hashing framework, we further propose an alternating minimization method for learning binary similarity-preserved hashing codes. Extensive experiments on two widely-studied vehicle re-identification datasets- \textbf{VehicleID} and \textbf{VeRi}-~have demonstrated the superiority of our method against the state-of-the-art deep hash methods. \textbf{DVHN} of $2048$ bits can achieve 13.94\% and 10.21\% accuracy improvement in terms of \textbf{mAP} and \textbf{Rank@1} for \textbf{VehicleID (800)} dataset. For \textbf{VeRi}, we achieve 35.45\% and 32.72\% performance gains for \textbf{Rank@1} and \textbf{mAP}, respectively.

【11】 JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning 标题：JUEWU-MC：用样本高效的分层强化学习玩“我的世界” 链接：https://arxiv.org/abs/2112.04907

作者：Zichuan Lin,Junyou Li,Jianing Shi,Deheng Ye,Qiang Fu,Wei Yang 机构：Tencent AI Lab, Shenzhen, China 备注：The champion solution of NeurIPS 2021 MineRL research competition ( this https URL ) 摘要：由于部分可观察性、高维视觉感知和延迟奖励的复合挑战，在Minecraft等开放世界游戏中学习理性行为仍然是强化学习（RL）研究的挑战。为了解决这个问题，我们提出了JueWu MC，这是一种样本有效的分层RL方法，配备了表征学习和模仿学习来处理感知和探索。具体地说，我们的方法包括两个层次结构，其中高级控制器学习控制选项的策略，低级工作人员学习解决每个子任务。为了促进子任务的学习，我们提出了一种技术组合，包括1）捕捉动作和表征之间潜在关系的动作感知表征学习，2）用于有效探索的基于鉴别器的自我模仿学习，3）集成行为克隆和一致性过滤，以增强策略的稳健性。通过大量的基线实验表明，EWU显著提高了效率。值得注意的是，我们赢得了NYPIPS矿物2021研究竞赛冠军，并取得了最高的成绩。摘要：Learning rational behaviors in open-world games like Minecraft remains to be challenging for Reinforcement Learning (RL) research due to the compound challenge of partial observability, high-dimensional visual perception and delayed reward. To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration. Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task. To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for policy robustness. Extensive experiments show that JueWu-MC significantly improves sample efficiency and outperforms a set of baselines by a large margin. Notably, we won the championship of the NeurIPS MineRL 2021 research competition and achieved the highest performance score ever.

【12】 i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery 标题：I-SpaSP：基于稀疏信号恢复的结构化神经剪枝链接：https://arxiv.org/abs/2112.04905

作者：Cameron R. Wolfe,Anastasios Kyrillidis 机构：Department of Computer Science, Rice University, Houston, TX, USA. 备注：27 pages, 4 figures 摘要：我们提出了一种新的神经网络结构化剪枝算法——迭代稀疏结构化剪枝算法，称为i-SpaSP。受稀疏信号恢复思想的启发，i-SpaSP通过迭代识别网络中对剪枝和密集网络输出之间的残差贡献最大的一组重要参数组（例如，滤波器或神经元），然后基于更小的预定义剪枝比对这些组进行阈值化来运行。对于具有ReLU激活的两层和多层网络结构，我们展示了由i-SpaSP修剪引起的误差以多项式形式衰减，其中该多项式的次数根据稠密网络隐藏表示的稀疏性变得任意大。在我们的实验中，i-SpaSP在各种数据集（即MNIST和ImageNet）和体系结构（即前馈网络、ResNet34和MobileNetV2）上进行评估，结果表明，i-SpaSP可以发现高性能子网络，并将可证明基线方法的修剪效率提高几个数量级。简单地说，i-SpaSP易于通过自动微分实现，获得了很强的经验结果，具有理论上的收敛保证，并且是高效的，因此，它是为数不多的计算高效、实用且可证明的修剪算法之一。摘要：We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP. Inspired by ideas from sparse signal recovery, i-SpaSP operates by iteratively identifying a larger set of important parameter groups (e.g., filters or neurons) within a network that contribute most to the residual between pruned and dense network output, then thresholding these groups based on a smaller, pre-defined pruning ratio. For both two-layer and multi-layer network architectures with ReLU activations, we show the error induced by pruning with i-SpaSP decays polynomially, where the degree of this polynomial becomes arbitrarily large based on the sparsity of the dense network's hidden representations. In our experiments, i-SpaSP is evaluated across a variety of datasets (i.e., MNIST and ImageNet) and architectures (i.e., feed forward networks, ResNet34, and MobileNetV2), where it is shown to discover high-performing sub-networks and improve upon the pruning efficiency of provable baseline methodologies by several orders of magnitude. Put simply, i-SpaSP is easy to implement with automatic differentiation, achieves strong empirical results, comes with theoretical convergence guarantees, and is efficient, thus distinguishing itself as one of the few computationally efficient, practical, and provable pruning algorithms.

【13】 Assessing Fairness in the Presence of Missing Data 标题：在存在丢失数据的情况下评估公平性链接：https://arxiv.org/abs/2112.04899

作者：Yiliang Zhang,Qi Long 机构：University of Pennsylvania, Philadelphia, PA , USA 摘要：缺失数据非常普遍，在实际数据分析中带来了严峻的挑战。虽然有越来越多的关于完全观察数据分析中的公平性的文献，但关于不完整数据分析中公平性的研究却很少。在实践中，处理缺失数据的一种流行分析方法是仅使用一组完整的案例，即所有特征均已完全观察到的观测值来训练预测算法。然而，根据缺失数据机制的不同，完整案例的分布和完整数据的分布可能会有很大的不同。当目标是在不存在缺失值的完整数据域中开发公平算法时，在完整案例域中公平的算法可能会对完整数据域中的某些边缘化群体表现出不成比例的偏见。为了填补这一重大空白，我们研究了仅使用完整案例评估的任意模型在完整数据域中的公平性估计问题。我们提供了公平性估计误差的上界和下界，并进行了数值实验来评估我们的理论结果。我们的工作提供了第一个已知的不完全数据分析中公平性保证的理论结果。摘要：Missing data are prevalent and present daunting challenges in real data analysis. While there is a growing body of literature on fairness in analysis of fully observed data, there has been little theoretical work on investigating fairness in analysis of incomplete data. In practice, a popular analytical approach for dealing with missing data is to use only the set of complete cases, i.e., observations with all features fully observed to train a prediction algorithm. However, depending on the missing data mechanism, the distribution of complete cases and the distribution of the complete data may be substantially different. When the goal is to develop a fair algorithm in the complete data domain where there are no missing values, an algorithm that is fair in the complete case domain may show disproportionate bias towards some marginalized groups in the complete data domain. To fill this significant gap, we study the problem of estimating fairness in the complete data domain for an arbitrary model evaluated merely using complete cases. We provide upper and lower bounds on the fairness estimation error and conduct numerical experiments to assess our theoretical results. Our work provides the first known theoretical results on fairness guarantee in analysis of incomplete data.

【14】 Multi-Task Learning on Networks 标题：网络环境下的多任务学习链接：https://arxiv.org/abs/2112.04891

作者：Andrea Ponti 机构：Matricola , Anno Accademico ,-, arXiv:,.,v, [cs.LG] , Dec 备注：94 pages, 53 figures, 8 tables 摘要：多任务学习（MTL）范式可以追溯到Caruana（1997）的一篇早期论文中，其中认为可以使用多个任务的数据来获得比独立学习每个任务更好的性能。具有冲突目标的MTL解决方案需要对它们之间的权衡进行建模，这通常超出了直线组合可以实现的范围。一个理论上有原则且计算上有效的策略是寻找不受其他人支配的解，正如帕累托分析中所述。多任务学习环境中出现的多目标优化问题具有特定的特点，需要特殊的方法。对这些特征的分析和一种新的计算方法的提出是这项工作的重点。多目标进化算法（MOEA）可以很容易地包含优势的概念，因此也包括帕累托分析。MOEAs的主要缺点是功能评估的样本效率较低。这一缺点的关键原因是大多数进化方法不使用模型来近似目标函数。贝叶斯优化采用了一种完全不同的基于替代模型的方法，如高斯过程。在本论文中，输入空间中的解表示为概率分布，封装了函数求值中包含的知识。在这一概率分布空间中，利用Wasserstein距离给出的度量，可以设计一种新的算法MOEA/WST，其中模型不是直接在目标函数上，而是在中间信息空间中，其中来自输入空间的对象映射为直方图。计算结果表明，MOEA/WST提供的样本效率和Pareto集的质量明显优于标准MOEA。摘要：The multi-task learning (MTL) paradigm can be traced back to an early paper of Caruana (1997) in which it was argued that data from multiple tasks can be used with the aim to obtain a better performance over learning each task independently. A solution of MTL with conflicting objectives requires modelling the trade-off among them which is generally beyond what a straight linear combination can achieve. A theoretically principled and computationally effective strategy is finding solutions which are not dominated by others as it is addressed in the Pareto analysis. Multi-objective optimization problems arising in the multi-task learning context have specific features and require adhoc methods. The analysis of these features and the proposal of a new computational approach represent the focus of this work. Multi-objective evolutionary algorithms (MOEAs) can easily include the concept of dominance and therefore the Pareto analysis. The major drawback of MOEAs is a low sample efficiency with respect to function evaluations. The key reason for this drawback is that most of the evolutionary approaches do not use models for approximating the objective function. Bayesian Optimization takes a radically different approach based on a surrogate model, such as a Gaussian Process. In this thesis the solutions in the Input Space are represented as probability distributions encapsulating the knowledge contained in the function evaluations. In this space of probability distributions, endowed with the metric given by the Wasserstein distance, a new algorithm MOEA/WST can be designed in which the model is not directly on the objective function but in an intermediate Information Space where the objects from the input space are mapped into histograms. Computational results show that the sample efficiency and the quality of the Pareto set provided by MOEA/WST are significantly better than in the standard MOEA.

【15】 Artificial Intelligence and Design of Experiments for Assessing Security of Electricity Supply: A Review and Strategic Outlook 标题：人工智能与供电安全评估实验设计：回顾与战略展望链接：https://arxiv.org/abs/2112.04889

作者：Jan Priesmann,Justin Münch,Elias Ridha,Thomas Spiegel,Marius Reich,Mario Adam,Lars Nolting,Aaron Praktiknjo 机构：ON Energy Research CenterSchool of Business and Economics, RWTH Aachen University, Germany bUniversity of Applied Sciences Duesseldorf 摘要：评估能源转型和能源市场自由化对资源充足性的影响是一项日益重要和艰巨的任务。能源系统日益增加的复杂性要求有足够的能源系统建模方法，从而导致计算需求的增加。此外，随着复杂性的增加，不确定性也增加，同样需要进行概率评估和情景分析。为了充分有效地满足这些不同的需求，需要数据科学领域的新方法来加速当前的方法。通过系统的文献回顾，我们希望缩小三个学科（1）供电安全评估，（2）人工智能和（3）实验设计之间的差距。为此，我们对选定的应用领域和方法进行了大规模的定量审查，并对不同学科之间的关系进行了综合。在其他发现中，我们确定了使用人工智能方法对复杂的供电安全模型进行元建模，以及基于人工智能的方法在预测存储调度和（非）可用性方面的应用，这是尚未充分涵盖的有前景的应用领域。最后，我们得出了一个新的方法管道，以充分有效地应对当前和即将到来的电力供应安全评估挑战。摘要：Assessing the effects of the energy transition and liberalization of energy markets on resource adequacy is an increasingly important and demanding task. The rising complexity in energy systems requires adequate methods for energy system modeling leading to increased computational requirements. Furthermore, with complexity, uncertainty increases likewise calling for probabilistic assessments and scenario analyses. To adequately and efficiently address these various requirements, new methods from the field of data science are needed to accelerate current methods. With our systematic literature review, we want to close the gap between the three disciplines (1) assessment of security of electricity supply, (2) artificial intelligence, and (3) design of experiments. For this, we conduct a large-scale quantitative review on selected fields of application and methods and make a synthesis that relates the different disciplines to each other. Among other findings, we identify metamodeling of complex security of electricity supply models using AI methods and applications of AI-based methods for forecasts of storage dispatch and (non-)availabilities as promising fields of application that have not sufficiently been covered, yet. We end with deriving a new methodological pipeline for adequately and efficiently addressing the present and upcoming challenges in the assessment of security of electricity supply.

【16】 KGE-CL: Contrastive Learning of Knowledge Graph Embeddings 标题：KGE-CL：知识图嵌入的对比学习链接：https://arxiv.org/abs/2112.04871

作者：Wentao Xu,Zhiping Luo,Weiqing Liu,Jiang Bian,Jian Yin,Tie-Yan Liu 机构：School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China, Microsoft Research Asia, Beijing, China, School of Artificial Intelligence, Sun Yat-sen University, Zhuhai, China 摘要：学习知识图的嵌入在人工智能中是至关重要的，并且可以受益于各种下游应用，例如推荐和问答。近年来，人们对知识图嵌入进行了大量的研究。然而，以往的大多数知识图嵌入方法都忽略了不同三元组中相关实体和实体-关系对之间的语义相似性，因为它们分别使用评分函数对每个三元组进行优化。针对这一问题，我们提出了一个简单而有效的知识图嵌入对比学习框架，该框架可以缩短不同三元组中相关实体和实体-关系对的语义距离，从而提高知识图嵌入的表达能力。我们在三个标准的知识图基准上评估了我们提出的方法。值得注意的是，我们的方法可以产生一些新的最先进的结果，实现51.2%的MRR，46.8%Hits@1在WN18RR数据集上，和59.1%的MRR，51.8%Hits@1在YAGO3-10数据集上。摘要：Learning the embeddings of knowledge graphs is vital in artificial intelligence, and can benefit various downstream applications, such as recommendation and question answering. In recent years, many research efforts have been proposed for knowledge graph embedding. However, most previous knowledge graph embedding methods ignore the semantic similarity between the related entities and entity-relation couples in different triples since they separately optimize each triple with the scoring function. To address this problem, we propose a simple yet efficient contrastive learning framework for knowledge graph embeddings, which can shorten the semantic distance of the related entities and entity-relation couples in different triples and thus improve the expressiveness of knowledge graph embeddings. We evaluate our proposed method on three standard knowledge graph benchmarks. It is noteworthy that our method can yield some new state-of-the-art results, achieving 51.2% MRR, 46.8% Hits@1 on the WN18RR dataset, and 59.1% MRR, 51.8% Hits@1 on the YAGO3-10 dataset.

【17】 Siamese Attribute-missing Graph Auto-encoder 标题：暹罗属性-缺失图形自动编码器链接：https://arxiv.org/abs/2112.04842

作者：Wenxuan Tu,Sihang Zhou,Yue Liu,Xinwang Liu 机构：National University of Defense Technology 备注：under review 摘要：属性缺失图的图表示学习（GRL）是一个常见但具有挑战性的问题，近年来受到了广泛关注。我们观察到现有文献：1）孤立了属性和结构嵌入的学习，因此未能充分利用这两类信息；2）对潜在空间变量施加了过于严格的分布假设，导致特征表示的差别较小。在本文中，基于在两个信息源之间引入亲密信息交互的思想，我们提出了暹罗属性缺失图自动编码器（SAGA）。具体而言，已经实施了三项战略。首先，我们通过引入连体网络结构，将属性嵌入和结构嵌入纠缠在一起，以共享两个过程学习到的参数，从而使网络训练受益于更丰富多样的信息。其次，我们引入K-最近邻（KNN）和结构约束增强学习机制，通过过滤不可靠连接来提高缺失属性潜在特征的质量。第三，我们手动屏蔽多个相邻矩阵上的连接，并强制结构信息嵌入子网络恢复真实的相邻矩阵，从而强制生成的网络能够选择性地利用更高阶的鉴别特征来完成数据。在六个基准数据集上的大量实验证明了我们的SAGA相对于最先进的方法的优越性。摘要：Graph representation learning (GRL) on attribute-missing graphs, which is a common yet challenging problem, has recently attracted considerable attention. We observe that existing literature: 1) isolates the learning of attribute and structure embedding thus fails to take full advantages of the two types of information; 2) imposes too strict distribution assumption on the latent space variables, leading to less discriminative feature representations. In this paper, based on the idea of introducing intimate information interaction between the two information sources, we propose our Siamese Attribute-missing Graph Auto-encoder (SAGA). Specifically, three strategies have been conducted. First, we entangle the attribute embedding and structure embedding by introducing a siamese network structure to share the parameters learned by both processes, which allows the network training to benefit from more abundant and diverse information. Second, we introduce a K-nearest neighbor (KNN) and structural constraint enhanced learning mechanism to improve the quality of latent features of the missing attributes by filtering unreliable connections. Third, we manually mask the connections on multiple adjacent matrices and force the structural information embedding sub-network to recover the true adjacent matrix, thus enforcing the resulting network to be able to selectively exploit more high-order discriminative features for data completion. Extensive experiments on six benchmark datasets demonstrate the superiority of our SAGA against the state-of-the-art methods.

【18】 Explainability of the Implications of Supervised and Unsupervised Face Image Quality Estimations Through Activation Map Variation Analyses in Face Recognition Models 标题：激活图变异分析在人脸识别模型中对监督和非监督人脸图像质量估计含义的可解释性链接：https://arxiv.org/abs/2112.04827

作者：Biying Fu,Naser Damer 机构：Fraunhofer Institute for Computer Graphics Research IGD, Darmstadt, Germany, Department of Computer Science, TU Darmstadt, Darmstadt, Germany 备注：accepted at the IEEE Winter Conference on Applications of Computer Vision Workshops, WACV Workshops 2022 摘要：推导无监督或基于统计的人脸图像质量评估（FIQA）方法的可解释性是一个挑战。在这项工作中，我们提出了一套新的可解释性工具来推导不同FIQA决策的推理及其人脸识别（FR）性能影响。在处理具有不同FIQA决策的样本时，我们根据FR模型的行为进行分析，从而避免将工具的部署局限于某些FIQA方法。这导致了可解释性工具，可用于任何FIQA方法和任何基于CNN的FR解决方案，使用激活映射来显示从面部嵌入衍生的网络激活。为了避免FR模型中低质量图像和高质量图像的一般空间激活映射之间的低区分，我们通过分析具有不同质量决策的图像集的FR激活映射的变化，在高导数空间中构建了我们的解释工具。我们通过展示内部和内部FIQA方法分析，展示我们的工具并分析四种FIQA方法的结果。我们提出的工具和基于这些工具的分析指出，除其他结论外，高质量图像通常会导致中央面部区域以外区域的持续低激活，而低质量图像尽管普遍低激活，但在这些区域的激活变化很大。我们的可解释性工具还扩展到分析单个图像，其中我们表明，低质量图像倾向于具有FR模型空间激活，这与高质量图像的预期强烈不同，高质量图像的这种差异也倾向于更多地出现在中央面部区域以外的区域，并与以下问题相对应：极端姿势和面部遮挡。建议工具的实现可在此处访问[链接]。摘要：It is challenging to derive explainability for unsupervised or statistical-based face image quality assessment (FIQA) methods. In this work, we propose a novel set of explainability tools to derive reasoning for different FIQA decisions and their face recognition (FR) performance implications. We avoid limiting the deployment of our tools to certain FIQA methods by basing our analyses on the behavior of FR models when processing samples with different FIQA decisions. This leads to explainability tools that can be applied for any FIQA method with any CNN-based FR solution using activation mapping to exhibit the network's activation derived from the face embedding. To avoid the low discrimination between the general spatial activation mapping of low and high-quality images in FR models, we build our explainability tools in a higher derivative space by analyzing the variation of the FR activation maps of image sets with different quality decisions. We demonstrate our tools and analyze the findings on four FIQA methods, by presenting inter and intra-FIQA method analyses. Our proposed tools and the analyses based on them point out, among other conclusions, that high-quality images typically cause consistent low activation on the areas outside of the central face region, while low-quality images, despite general low activation, have high variations of activation in such areas. Our explainability tools also extend to analyzing single images where we show that low-quality images tend to have an FR model spatial activation that strongly differs from what is expected from a high-quality image where this difference also tends to appear more in areas outside of the central face region and does correspond to issues like extreme poses and facial occlusions. The implementation of the proposed tools is accessible here [link].

【19】 Complexity assessments for decidable fragments of Set Theory. III: A quadratic reduction of constraints over nested sets to Boolean formulae 标题：集合论的可判定片段的复杂性评估。III：套集上约束的二次约简为布尔公式链接：https://arxiv.org/abs/2112.04797

作者：Domenico Cantone,Andrea De Domenico,Pietro Maugeri,Eugenio G. Omodeo 机构： Dept. of Mathematics and Computer Science, University of Catania, Italy, Scuola Superiore di Catania, University of Catania, Italy, School of Business and Economics, Vrije Universiteit Amsterdam, Netherlands 摘要：作为对定量集合论推断的贡献，提出了一种将形式为$x=y\set减去z$，$x\neq y\set减去z$和$z=\{x\}$的文字的连词转换为相当简单的连词范式的未量化布尔公式的方法，其中$x，y，z$代表集合的von Neumann宇宙范围内的变量。目标语言中的公式涉及集合的布尔环上的变量，以及表示相等、非不相交和包含的差分运算符和关系式。此外，每次翻译的结果是$x=y\set减去z$，$x\neq y\set减去z$形式的文字和其先行词为独立文字且其结果为变量之间的包含（严格或非严格）或变量之间的相等的含义的结合。除了反映简单自然的语义（确保可满足性保持）外，所提出的翻译具有二次算法时间复杂度，并且桥接了两种已知都存在NP完全可满足性问题的语言。摘要：As a contribution to quantitative set-theoretic inferencing, a translation is proposed of conjunctions of literals of the forms $x=y\setminus z$, $x \neq y\setminus z$, and $z =\{x\}$, where $x,y,z$ stand for variables ranging over the von Neumann universe of sets, into unquantified Boolean formulae of a rather simple conjunctive normal form. The formulae in the target language involve variables ranging over a Boolean ring of sets, along with a difference operator and relators designating equality, non-disjointness and inclusion. Moreover, the result of each translation is a conjunction of literals of the forms $x=y\setminus z$, $x\neq y\setminus z$ and of implications whose antecedents are isolated literals and whose consequents are either inclusions (strict or non-strict) between variables, or equalities between variables. Besides reflecting a simple and natural semantics, which ensures satisfiability-preservation, the proposed translation has quadratic algorithmic time-complexity, and bridges two languages both of which are known to have an NP-complete satisfiability problem.

【20】 VMAgent: Scheduling Simulator for Reinforcement Learning 标题：VMAgent：强化学习调度模拟器链接：https://arxiv.org/abs/2112.04785

作者：Junjie Sheng,Shengliang Cai,Haochuan Cui,Wenhao Li,Yun Hua,Bo Jin,Wenli Zhou,Yiqiu Hu,Lei Zhu,Qian Peng,Hongyuan Zha,Xiangfeng Wang 摘要：引入了一种称为VMAgent的新型模拟器，以帮助RL研究人员更好地探索新方法，特别是虚拟机调度。VMAgent受实际虚拟机（VM）调度任务的启发，提供了一个能够反映云计算真实情况的高效仿真平台。从实际云计算中总结出三种场景（衰退、恢复和扩展），它们对应于许多强化学习挑战（高维状态和行动空间、高非平稳性和终身需求）。VMAgent为RL研究人员提供了灵活的配置，以设计考虑不同问题特征的定制调度环境。从VM调度的角度来看，VMAgent还有助于探索更好的基于学习的调度解决方案。摘要：A novel simulator called VMAgent is introduced to help RL researchers better explore new methods, especially for virtual machine scheduling. VMAgent is inspired by practical virtual machine (VM) scheduling tasks and provides an efficient simulation platform that can reflect the real situations of cloud computing. Three scenarios (fading, recovering, and expansion) are concluded from practical cloud computing and corresponds to many reinforcement learning challenges (high dimensional state and action spaces, high non-stationarity, and life-long demand). VMAgent provides flexible configurations for RL researchers to design their customized scheduling environments considering different problem features. From the VM scheduling perspective, VMAgent also helps to explore better learning-based scheduling solutions.

【21】 Co-evolutionary hybrid intelligence 标题：协同进化混合智能链接：https://arxiv.org/abs/2112.04751

作者：Kirill Krinkin,Yulia Shichkina,Andrey Ignatyev 机构：Alexander Popov International, Innovation Institute for AI, Cybersecurity and Communications, SPbETU "LETI", St. Petersburg, Russia, MGIMO University, Center for Global IT, Cooperation (CGITC), Moscow, Russia 备注：4 pages 摘要：人工智能是现代技术发展的动力之一。当前开发智能系统的方法是以数据为中心的。它有几个局限性：从根本上说，不可能为复杂对象和过程建模而收集数据；训练神经网络需要大量的计算和能源；解决办法是无法解释的。本文讨论了一种基于人机杂交及其协同进化的人工智能系统开发的替代方法。摘要：Artificial intelligence is one of the drivers of modern technological development. The current approach to the development of intelligent systems is data-centric. It has several limitations: it is fundamentally impossible to collect data for modeling complex objects and processes; training neural networks requires huge computational and energy resources; solutions are not explainable. The article discusses an alternative approach to the development of artificial intelligence systems based on human-machine hybridization and their co-evolution.

【22】 LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading 标题：LipSound 2：唇语转换重建和唇读的自我监督预训练链接：https://arxiv.org/abs/2112.04748

作者：Leyuan Qu,Cornelius Weber,Stefan Wermter 机构： Department of Informatics, University of Hamburg 备注：SUBMITTED TO IEEE Transaction on Neural Networks and Learning Systems 摘要：这项工作的目的是通过利用视频中音频和视频流的自然共存，研究跨模态自我监督预训练对语音重建（视频到音频）的影响。我们提出了LipSound2，它由编码器-解码器体系结构和位置感知注意机制组成，可以直接将人脸图像序列映射到mel尺度的光谱图，而无需任何人类注释。建议的LipSound2模型首先在$sim$2400h多语种（如英语和德语）视听数据（VoxCeleb2）上进行预训练。为了验证该方法的通用性，我们在特定领域的数据集（GRID，TCD-TIMIT）上对预先训练的模型进行了微调，以进行英语语音重建，并与之前的方法相比，在说话人相关和独立的环境下，在语音质量和可懂度方面取得了显著的改进。除了英语之外，我们还在CMLR数据集上进行汉语语音重建，以验证对可迁移性的影响。最后，我们通过在预先训练的语音识别系统上微调生成的音频来训练级联唇读（视频到文本）系统，并在英语和汉语基准数据集上实现最先进的性能。摘要：The aim of this work is to investigate the impact of crossmodal self-supervised pre-training for speech reconstruction (video-to-audio) by leveraging the natural co-occurrence of audio and visual streams in videos. We propose LipSound2 which consists of an encoder-decoder architecture and location-aware attention mechanism to map face image sequences to mel-scale spectrograms directly without requiring any human annotations. The proposed LipSound2 model is firstly pre-trained on $\sim$2400h multi-lingual (e.g. English and German) audio-visual data (VoxCeleb2). To verify the generalizability of the proposed method, we then fine-tune the pre-trained model on domain-specific datasets (GRID, TCD-TIMIT) for English speech reconstruction and achieve a significant improvement on speech quality and intelligibility compared to previous approaches in speaker-dependent and -independent settings. In addition to English, we conduct Chinese speech reconstruction on the CMLR dataset to verify the impact on transferability. Lastly, we train the cascaded lip reading (video-to-text) system by fine-tuning the generated audios on a pre-trained speech recognition system and achieve state-of-the-art performance on both English and Chinese benchmark datasets.

【23】 Learning multiple gaits of quadruped robot using hierarchical reinforcement learning 标题：基于分层强化学习的四足机器人多步态学习链接：https://arxiv.org/abs/2112.04741

作者：Yunho Kim,Bukun Son,Dongjun Lee 机构： model based analyticDepartment of Mechanical Engineering, Seoul National University 摘要：由于其鲁棒性和可扩展性，使用强化学习学习四足机器人的速度指令跟踪控制器越来越引起人们的兴趣。然而，无论指令速度如何，端到端训练的单一策略通常显示单一步态。考虑到根据四足动物的速度存在最佳步态，这可能是次优解。在这项工作中，我们提出了一种四足机器人的分层控制器，可以在跟踪速度指令的同时产生多个步态（即步速、小跑、跳跃）。我们的控制器由两个策略组成，每个策略作为中央模式生成器和局部反馈控制器，并通过分层强化学习进行训练。实验结果表明：1）特定速度范围内存在最优步态；2）与通常显示单个步态的单一策略组成的控制器相比，我们的分层控制器的效率更高。代码是公开的。摘要：There is a growing interest in learning a velocity command tracking controller of quadruped robot using reinforcement learning due to its robustness and scalability. However, a single policy, trained end-to-end, usually shows a single gait regardless of the command velocity. This could be a suboptimal solution considering the existence of optimal gait according to the velocity for quadruped animals. In this work, we propose a hierarchical controller for quadruped robot that could generate multiple gaits (i.e. pace, trot, bound) while tracking velocity command. Our controller is composed of two policies, each working as a central pattern generator and local feedback controller, and trained with hierarchical reinforcement learning. Experiment results show 1) the existence of optimal gait for specific velocity range 2) the efficiency of our hierarchical controller compared to a controller composed of a single policy, which usually shows a single gait. Codes are publicly available.

【24】 From Good to Best: Two-Stage Training for Cross-lingual Machine Reading Comprehension 标题：从好到好：跨语种机器阅读理解的两阶段训练链接：https://arxiv.org/abs/2112.04735

作者：Nuo Chen,Linjun Shou,Min Gong,Jian Pei,Daxin Jiang 机构：ADSPLAB, School of ECE, Peking University, Shenzhen, China, NLP Group, Microsoft STCA, School of Computing Science, Simon Fraser University 摘要：由于缺乏低资源语言的训练数据，跨语言机器阅读理解（xMRC）具有挑战性。最近的方法仅使用英语等资源丰富的语言中的训练数据来微调大规模跨语言预训练语言模型。由于语言之间的巨大差异，仅由源语言微调的模型可能无法在目标语言中运行良好。有趣的是，我们观察到，虽然先前方法预测的前1名结果可能经常无法找到基本的真相答案，但正确的答案通常包含在前k名预测结果中。基于这一观察，我们开发了一种两阶段方法来提高模型性能。第一阶段的目标是回忆：我们设计了一个硬学习（HL）算法，以最大化top-k预测包含准确答案的可能性。第二阶段侧重于精确性：开发了一种答案感知对比学习（AA-CL）机制，以了解准确答案与其他候选答案之间的细微差异。我们的大量实验表明，在两个跨语言的MRC基准数据集上，我们的模型明显优于一系列强基线。摘要：Cross-lingual Machine Reading Comprehension (xMRC) is challenging due to the lack of training data in low-resource languages. The recent approaches use training data only in a resource-rich language like English to fine-tune large-scale cross-lingual pre-trained language models. Due to the big difference between languages, a model fine-tuned only by a source language may not perform well for target languages. Interestingly, we observe that while the top-1 results predicted by the previous approaches may often fail to hit the ground-truth answers, the correct answers are often contained in the top-k predicted results. Based on this observation, we develop a two-stage approach to enhance the model performance. The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer. The second stage focuses on precision: an answer-aware contrastive learning (AA-CL) mechanism is developed to learn the fine difference between the accurate answer and other candidates. Our extensive experiments show that our model significantly outperforms a series of strong baselines on two cross-lingual MRC benchmark datasets.

【25】 Explainable AI for B5G/6G: Technical Aspects, Use Cases, and Research Challenges 标题：面向B5G/6G的可解释人工智能：技术方面、使用案例和研究挑战链接：https://arxiv.org/abs/2112.04698

作者：Shen Wang,M. Atif Qureshi,Luis Miralles-Pechuaán,Thien Huynh-The,Thippa Reddy Gadekallu,Madhusanka Liyanage

【26】 CWS-PResUNet: Music Source Separation with Channel-wise Subband Phase-aware ResUNet 标题：CWS-PResUNet：基于信道子带相位感知的音乐源分离链接：https://arxiv.org/abs/2112.04685

作者：Haohe Liu,Qiuqiang Kong,Jiafeng Liu 机构： Sound, Audio, and Music Intelligence (SAMI) Group, ByteDance , The Ohio State University, Authors of papers retain, copyright and release the work, under a Creative Commons, Attribution ,., International, License (CC BY ,.,)., In partnership with 备注：Published at MDX Workshop @ ISMIR 2021

【27】 DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition 标题：DualFormer：用于高效视频识别的局部-全局分层转换器链接：https://arxiv.org/abs/2112.04674

作者：Yuxuan Liang,Pan Zhou,Roger Zimmermann,Shuicheng Yan 机构：Sea AI Lab, National University of Singapore 备注：Preprint

【28】 Enhancing Food Intake Tracking in Long-Term Care with Automated Food Imaging and Nutrient Intake Tracking (AFINI-T) Technology 标题：利用自动食物成像和营养素摄取跟踪(AFINI-T)技术加强长期护理中的食物摄入量跟踪链接：https://arxiv.org/abs/2112.04608

作者：Kaylen J. Pfisterer,Robert Amelard,Jennifer Boger,Audrey G. Chung,Heather H. Keller,Alexander Wong 机构：University of Waterloo, Waterloo, Systems Design Engineering, Waterloo, ON, N,L ,G, Canada, Waterloo AI Institute, Waterloo, ON, N,L ,G, Canada, Schlegel-UW Research Institute for Aging, Waterloo, N,J ,E, Canada 备注：Key words: Automatic segmentation, convolutional neural network, deep learning, food intake tracking, volume estimation, malnutrition prevention, long-term care, hospital 摘要：半数长期护理（LTC）居民营养不良，住院率、死亡率、发病率不断增加，生活质量下降。目前的跟踪方法主观且耗时。本文介绍了为LTC设计的自动食品成像和营养素摄入跟踪（AFINI-T）技术。我们提出了一种用于食品分类的新型卷积自动编码器，在增强的UNIMIB2016数据集上进行训练，并在我们的模拟LTC食品摄入数据集上进行测试（12种膳食场景；每个场景最多15种；顶级分类准确率：88.9%；平均摄入误差：-0.4 mL$\pm$36.7 mL）。按体积计算的营养素摄入量估算值与从质量计算的营养素估算值（r^2$0.92至0.99）呈强线性相关，方法之间具有良好的一致性（$\sigma$=-2.7至-0.01；在每个一致性限值内为零）。AFINI-T方法是一种以深度学习为动力的计算营养素传感系统，可为更准确、客观地跟踪LTC居民的食物摄入量提供一种新方法，以支持和预防营养不良跟踪策略。摘要：Half of long-term care (LTC) residents are malnourished increasing hospitalization, mortality, morbidity, with lower quality of life. Current tracking methods are subjective and time consuming. This paper presents the automated food imaging and nutrient intake tracking (AFINI-T) technology designed for LTC. We propose a novel convolutional autoencoder for food classification, trained on an augmented UNIMIB2016 dataset and tested on our simulated LTC food intake dataset (12 meal scenarios; up to 15 classes each; top-1 classification accuracy: 88.9%; mean intake error: -0.4 mL$\pm$36.7 mL). Nutrient intake estimation by volume was strongly linearly correlated with nutrient estimates from mass ($r^2$ 0.92 to 0.99) with good agreement between methods ($\sigma$= -2.7 to -0.01; zero within each of the limits of agreement). The AFINI-T approach is a deep-learning powered computational nutrient sensing system that may provide a novel means for more accurately and objectively tracking LTC resident food intake to support and prevent malnutrition tracking strategies.

【29】 Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings 标题：基于知识图嵌入的化学品不良生物效应预测链接：https://arxiv.org/abs/2112.04605

作者：Erik B. Myklebust,Ernesto Jiménez-Ruiz,Jiaoyan Chen,Raoul Wolf,Knut Erik Tollefsen 机构：a Norwegian Institute for Water Research, Oslo, Norway, b SIRIUS, University of Oslo, Oslo, Norway, c City, University of London, London, United Kingdom, d University of Oxford, Oxford, United Kingdom, e Norwegian University of Life Sciences, Ås, Norway 备注：Accepted for publication in the Semantic Web Journal 摘要：我们根据生态毒理学风险评估中使用的主要数据源创建了一个知识图。我们已将此知识图应用于风险评估中的一项重要任务，即化学效应预测。我们评估了九个知识图嵌入模型，这些模型包括几何模型、分解模型和卷积模型。我们表明，使用知识图嵌入可以提高神经网络预测效果的准确性。此外，我们还实现了一个微调架构，该架构将知识图嵌入到效果预测任务中，从而获得更好的性能。最后，我们评估了知识图嵌入模型的某些特征，以了解各个模型的性能。摘要：We have created a knowledge graph based on major data sources used in ecotoxicological risk assessment. We have applied this knowledge graph to an important task in risk assessment, namely chemical effect prediction. We have evaluated nine knowledge graph embedding models from a selection of geometric, decomposition, and convolutional models on this prediction task. We show that using knowledge graph embeddings can increase the accuracy of effect prediction with neural networks. Furthermore, we have implemented a fine-tuning architecture which adapts the knowledge graph embeddings to the effect prediction task and leads to a better performance. Finally, we evaluate certain characteristics of the knowledge graph embedding models to shed light on the individual model performance.

【30】 Refined Commonsense Knowledge from Large-Scale Web Contents 标题：从大规模Web内容中提炼常识链接：https://arxiv.org/abs/2112.04596

作者：Tuan-Phong Nguyen,Simon Razniewski,Julien Romero,Gerhard Weikum 备注：This is a substantial extension of the WWW paper (arXiv:2011.00905). arXiv admin note: substantial text overlap with arXiv:2011.00905 摘要：关于概念及其属性的常识知识（CSK）对人工智能应用非常有用。以前的作品，如ConceptNet、COMET和其他人编译了大型CSK集合，但其表达能力仅限于主谓宾（SPO）三元组，其中S的概念简单，P和O的字符串简单。本文介绍了一种称为ASCENT++的方法，用于自动构建CSK断言的大规模知识库（KB），与以前的作品相比，具有精致的表现力和更好的精确度和召回率。ASCENT++通过捕获具有子组和方面的复合概念，并通过使用语义方面细化断言，超越了SPO三元组。后者对于表达断言和进一步限定词的时间和空间有效性很重要。ASCENT++将开放信息提取与明智的清理结合起来，并根据典型性和显著性得分进行排序。为了获得高覆盖率，我们的方法利用了具有广泛web内容的大规模爬网。人类判断的评估显示了ASCENT++知识库的卓越质量，而对QA支持任务的外部评估则强调了ASCENT++的好处。可在以下位置访问web界面、数据和代码：https://www.mpi-inf.mpg.de/ascentpp. 摘要：Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications. Prior works like ConceptNet, COMET and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and strings for P and O. This paper presents a method, called ASCENT++, to automatically build a large-scale knowledge base (KB) of CSK assertions, with refined expressiveness and both better precision and recall than prior works. ASCENT++ goes beyond SPO triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter is important to express the temporal and spatial validity of assertions and further qualifiers. ASCENT++ combines open information extraction with judicious cleaning and ranking by typicality and saliency scores. For high coverage, our method taps into the large-scale crawl C4 with broad web contents. The evaluation with human judgements shows the superior quality of the ASCENT++ KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of ASCENT++. A web interface, data and code can be accessed at https://www.mpi-inf.mpg.de/ascentpp.

【31】 Application of Artificial Intelligence and Machine Learning in Libraries: A Systematic Review 标题：人工智能和机器学习在图书馆中的应用：系统评述链接：https://arxiv.org/abs/2112.04573

作者：Rajesh Kumar Das,Mohammad Sharif Ul Islam 摘要：随着人工智能和机器学习等尖端技术的概念和实现变得越来越重要，学者、研究人员和信息专业人员都参与了这一领域的研究。本系统文献综述的目的是提供一个综合的实证研究，探索人工智能和机器学习在图书馆中的应用。为了实现研究目标，根据Kitchenham等人（2009）提出的原始指南，进行了系统的文献综述。数据收集自科学网、Scopus、LISA和LISTA数据库。经过严格/既定的选择过程，最终选择、审查和分析了32篇文章，总结了AI和ML领域的应用以及图书馆中最常用的技术。研究结果表明，与LIS领域相关的AI和ML研究的当前状态主要集中在理论工作上。然而，一些研究人员也强调实施项目或案例研究。这项研究将为研究人员、实践者和教育工作者提供图书馆中AI和ML的全景视图，以进一步推动更面向技术的方法，并预测未来的创新路径。摘要：As the concept and implementation of cutting-edge technologies like artificial intelligence and machine learning has become relevant, academics, researchers and information professionals involve research in this area. The objective of this systematic literature review is to provide a synthesis of empirical studies exploring application of artificial intelligence and machine learning in libraries. To achieve the objectives of the study, a systematic literature review was conducted based on the original guidelines proposed by Kitchenham et al. (2009). Data was collected from Web of Science, Scopus, LISA and LISTA databases. Following the rigorous/ established selection process, a total of thirty-two articles were finally selected, reviewed and analyzed to summarize on the application of AI and ML domain and techniques which are most often used in libraries. Findings show that the current state of the AI and ML research that is relevant with the LIS domain mainly focuses on theoretical works. However, some researchers also emphasized on implementation projects or case studies. This study will provide a panoramic view of AI and ML in libraries for researchers, practitioners and educators for furthering the more technology-oriented approaches, and anticipating future innovation pathways.

【32】 PATO: Producibility-Aware Topology Optimization using Deep Learning for Metal Additive Manufacturing 标题：PATO：基于深度学习的金属添加剂制造可制造性拓扑优化链接：https://arxiv.org/abs/2112.04552

作者：Naresh S. Iyer,Amir M. Mirzendehdel,Sathyanarayanan Raghavan,Yang Jiao,Erva Ulu,Morad Behandish,Saigopal Nelaturi,Dean M. Robinson 机构：GE Research (GER), Research Circle, Niskayuna, NY, United States, Palo Alto Research Center (PARC), Coyote Hill Rd., Palo Alto, CA, United States 摘要：在本文中，我们提出了一种可生产性感知拓扑优化（TO）框架，以帮助有效探索使用金属添加剂制造（AM）制造的部件的设计空间，同时确保裂纹方面的可制造性。具体而言，通过激光粉末床熔接制造的零件容易出现翘曲或开裂等缺陷，这是由于制造过程中产生的陡峭热梯度产生的高残余应力值造成的。成熟此类零件的设计并规划其制造可能需要数月到数年的时间，通常涉及设计和制造工程师之间的多次交接。PATO基于无裂纹设计的先验发现，因此优化零件可以在一开始就无缺陷制造。为确保优化过程中设计无裂纹，可生产性使用裂纹指数在To的标准公式中明确编码。对多重裂纹指数进行了探索，并通过实验验证，证明最大剪切应变指数（MSSI）是一种精确的裂纹指数。模拟构建过程是一个耦合的多物理计算，将其合并到TO循环中可能会在计算上受到限制。我们利用深度卷积神经网络的最新进展，提出了一种基于基于注意的U-Net架构的高保真代理模型，以预测零件域上空间变化的MSSI值。此外，我们采用自动微分直接计算最大MSSI相对于输入设计变量的梯度，并使用基于性能的灵敏度场对其进行增强，以优化设计，同时考虑重量、可制造性和功能性之间的权衡。通过三维基准测试和实验验证，我们证明了该方法的有效性。摘要：In this paper, we propose PATO-a producibility-aware topology optimization (TO) framework to help efficiently explore the design space of components fabricated using metal additive manufacturing (AM), while ensuring manufacturability with respect to cracking. Specifically, parts fabricated through Laser Powder Bed Fusion are prone to defects such as warpage or cracking due to high residual stress values generated from the steep thermal gradients produced during the build process. Maturing the design for such parts and planning their fabrication can span months to years, often involving multiple handoffs between design and manufacturing engineers. PATO is based on the a priori discovery of crack-free designs, so that the optimized part can be built defect-free at the outset. To ensure that the design is crack free during optimization, producibility is explicitly encoded within the standard formulation of TO, using a crack index. Multiple crack indices are explored and using experimental validation, maximum shear strain index (MSSI) is shown to be an accurate crack index. Simulating the build process is a coupled, multi-physics computation and incorporating it in the TO loop can be computationally prohibitive. We leverage the current advances in deep convolutional neural networks and present a high-fidelity surrogate model based on an Attention-based U-Net architecture to predict the MSSI values as a spatially varying field over the part's domain. Further, we employ automatic differentiation to directly compute the gradient of maximum MSSI with respect to the input design variables and augment it with the performance-based sensitivity field to optimize the design while considering the trade-off between weight, manufacturability, and functionality. We demonstrate the effectiveness of the proposed method through benchmark studies in 3D as well as experimental validation.

【33】 Deep Q-Learning Market Makers in a Multi-Agent Simulated Stock Market 标题：多智能体模拟股市中的深度Q学习做市商链接：https://arxiv.org/abs/2112.04494

作者：Oscar Fernández Vicente,Fernando Fernández Rebollo,Francisco Javier García Polo 机构：Universidad Carlos III Madrid, Madrid, Spain 备注：Presented at 2nd ACM International Conference on AI in Finance 摘要：做市商通过提供流动性在金融市场中发挥着关键作用。他们通常在订单簿上填写买入和卖出限价订单，以便为交易者提供可供选择的价格水平。本文正是从基于代理的角度研究这些做市商的策略。特别是，我们提出应用强化学习（RL）在模拟股票市场中创建智能市场标记。在同一时间，他们分析同一个市场中的多个学习者如何适应同一个市场中的多个学习者（学习者）和多个学习者（学习者）的行为。此外，它还涵盖了不同实验之间策略转移的应用，描述了竞争环境对RL代理性能的影响。RL和深度RL技术被证明是有利可图的做市商方法，有助于更好地了解它们在股票市场中的行为。摘要：Market makers play a key role in financial markets by providing liquidity. They usually fill order books with buy and sell limit orders in order to provide traders alternative price levels to operate. This paper focuses precisely on the study of these markets makers strategies from an agent-based perspective. In particular, we propose the application of Reinforcement Learning (RL) for the creation of intelligent market markers in simulated stock markets. This research analyzes how RL market maker agents behaves in non-competitive (only one RL market maker learning at the same time) and competitive scenarios (multiple RL market markers learning at the same time), and how they adapt their strategies in a Sim2Real scope with interesting results. Furthermore, it covers the application of policy transfer between different experiments, describing the impact of competing environments on RL agents performance. RL and deep RL techniques are proven as profitable market maker approaches, leading to a better understanding of their behavior in stock markets.

【34】 Provable Continual Learning via Sketched Jacobian Approximations 标题：基于草图雅可比近似的可证明连续学习链接：https://arxiv.org/abs/2112.05095

作者：Reinhard Heckel 机构：∗Dept. of Electrical and Computer Engineering, Technical University of Munich, †Dept. of Electrical and Computer Engineering, Rice University 摘要：机器学习中的一个重要问题是以顺序方式学习任务的能力。如果使用标准的一阶方法进行训练，大多数模型在接受新任务训练时会忘记以前学习过的任务，这通常被称为灾难性遗忘。克服遗忘的一种流行方法是通过惩罚在以前任务中表现不佳的模型来规范损失函数。例如，弹性权重固结（EWC）采用二次形式进行正则化，其中涉及基于过去数据构建的对角矩阵。虽然EWC在某些设置中工作得很好，但我们表明，即使在其他理想条件下，如果对角矩阵与以前任务的Hessian矩阵的近似性较差，它也可能遭受灾难性遗忘。我们提出了一种简单的方法来克服这一问题：用过去数据的雅可比矩阵草图来正则化新任务的训练。这可以证明能够克服线性模型和广泛的神经网络的灾难性遗忘，而代价是内存。本文的总体目标是提供关于基于正则化的持续学习算法何时工作以及在何种内存成本下工作的见解。摘要：An important problem in machine learning is the ability to learn tasks in a sequential manner. If trained with standard first-order methods most models forget previously learned tasks when trained on a new task, which is often referred to as catastrophic forgetting. A popular approach to overcome forgetting is to regularize the loss function by penalizing models that perform poorly on previous tasks. For example, elastic weight consolidation (EWC) regularizes with a quadratic form involving a diagonal matrix build based on past data. While EWC works very well for some setups, we show that, even under otherwise ideal conditions, it can provably suffer catastrophic forgetting if the diagonal matrix is a poor approximation of the Hessian matrix of previous tasks. We propose a simple approach to overcome this: Regularizing training of a new task with sketches of the Jacobian matrix of past data. This provably enables overcoming catastrophic forgetting for linear models and for wide neural networks, at the cost of memory. The overarching goal of this paper is to provided insights on when regularization-based continual learning algorithms work and under what memory costs.

【35】 Enhancing Column Generation by a Machine-Learning-Based Pricing Heuristic for Graph Coloring 标题：基于机器学习的图着色定价启发式算法增强列生成链接：https://arxiv.org/abs/2112.04906

作者：Yunzhuang Shen,Yuan Sun,Xiaodong Li,Andrew Eberhard,Andreas Ernst 机构： School of Computing Technologies, RMIT University, Australia, School of Computing and Information Systems, University of Melbourne, Australia, School of Science, RMIT University, Australia, School of Mathematics, Monash University, Australia 备注：Machine learning for column generation and branch-and-price; accepted to AAAI 2022 摘要：列生成（CG）是解决大规模优化问题的有效方法。CG首先用列的子集（即变量）解决一个子问题，然后逐渐包括可以改进当前子问题解决方案的新列。通过反复解决定价问题（通常是NP难问题，是CG方法的瓶颈），根据需要生成新列。为了解决这个问题，我们提出了一种基于机器学习的定价启发式算法（MLPH），它可以高效地生成许多高质量的列。在CG的每次迭代中，我们的MLPH利用ML模型预测定价问题的最佳解决方案，然后使用该模型指导采样方法以高效生成多个高质量列。使用图着色问题，我们实证表明，与六种最先进的方法相比，MLPH显著增强了SCG，并且CG的改进可以导致分支精确法和价格精确法的性能显著提高。摘要：Column Generation (CG) is an effective method for solving large-scale optimization problems. CG starts by solving a sub-problem with a subset of columns (i.e., variables) and gradually includes new columns that can improve the solution of the current subproblem. The new columns are generated as needed by repeatedly solving a pricing problem, which is often NP-hard and is a bottleneck of the CG approach. To tackle this, we propose a Machine-Learning-based Pricing Heuristic (MLPH)that can generate many high-quality columns efficiently. In each iteration of CG, our MLPH leverages an ML model to predict the optimal solution of the pricing problem, which is then used to guide a sampling method to efficiently generate multiple high-quality columns. Using the graph coloring problem, we empirically show that MLPH significantly enhancesCG as compared to six state-of-the-art methods, and the improvement in CG can lead to substantially better performance of the branch-and-price exact method.

【36】 One-dimensional Deep Low-rank and Sparse Network for Accelerated MRI 标题：一维深度低秩稀疏网络在加速磁共振成像中的应用链接：https://arxiv.org/abs/2112.04721

作者：Zi Wang,Chen Qian,Di Guo,Hongwei Sun,Rushuai Li,Bo Zhao,Xiaobo Qu 备注：16 pages 摘要：核磁共振成像（MRI）显示出惊人的深度学习能力。由于许多磁共振图像或其对应的k空间是二维的，因此大多数最先进的深度学习重建采用强大的卷积神经网络并执行二维卷积。在这项工作中，我们提出了一种探索一维卷积的新方法，使深层网络更易于训练和推广。我们进一步将一维卷积集成到所提出的深网络中，称为一维深低秩稀疏网络（ODLS），它展开了低秩稀疏重建模型的迭代过程。在活体膝关节和大脑数据集上的广泛结果表明，所提出的ODLS非常适合有限训练对象的情况，并且在视觉和定量上都比最先进的方法提供了更好的重建性能。此外，ODLS对不同的欠采样场景以及训练和测试数据之间的一些不匹配也表现出很好的鲁棒性。总之，我们的工作表明，一维深度学习方案在快速磁共振成像中具有记忆效率和鲁棒性。摘要：Deep learning has shown astonishing performance in accelerated magnetic resonance imaging (MRI). Most state-of-the-art deep learning reconstructions adopt the powerful convolutional neural network and perform 2D convolution since many magnetic resonance images or their corresponding k-space are in 2D. In this work, we present a new approach that explores the 1D convolution, making the deep network much easier to be trained and generalized. We further integrate the 1D convolution into the proposed deep network, named as One-dimensional Deep Low-rank and Sparse network (ODLS), which unrolls the iteration procedure of a low-rank and sparse reconstruction model. Extensive results on in vivo knee and brain datasets demonstrate that, the proposed ODLS is very suitable for the case of limited training subjects and provides improved reconstruction performance than state-of-the-art methods both visually and quantitatively. Additionally, ODLS also shows nice robustness to different undersampling scenarios and some mismatches between the training and test data. In summary, our work demonstrates that the 1D deep learning scheme is memory-efficient and robust in fast MRI.

机器翻译，仅供参考

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-12-10，如有侵权请联系 cloudcommunity@tencent.com 删除

linux