人工智能学术速递[12.23]

公众号-arXiv每日学术速递

发布于 2021-12-27 17:06:42

4910

发布于 2021-12-27 17:06:42

cs.AI人工智能，共计38篇

【1】 Spatio-Temporal CNN baseline method for the Sports Video Task of MediaEval 2021 benchmark 标题：用于中世纪2021年基准体育视频任务的时空CNN基线方法链接：https://arxiv.org/abs/2112.12074

作者：Pierre-Etienne Martin 机构：CCP Department, Max Planck Institute for Evolutionary Anthropology, D-, Leipzig, Germany, Video Stream, Conv, (,x,x,), ReLU, Pool, FC, for classification, for detection, SoftMax, Probabilistic, Output 备注：None 摘要：本文介绍了中世纪2021基准的体育视频任务部分的基线方法。此任务提出了笔划检测和笔划分类子任务。此基线处理两个子任务。该模型的时空CNN结构和训练过程是根据所处理的子任务定制的。该方法的目的是帮助参与者解决任务，而不是达到最先进的表演水平。尽管如此，对于检测任务，基线的表现要好于其他参与者，这突出了此类任务的难度。摘要：This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2021 benchmark. This task proposes a stroke detection and a stroke classification subtasks. This baseline addresses both subtasks. The spatio-temporal CNN architecture and the training process of the model are tailored according to the addressed subtask. The method has the purpose of helping the participants to solve the task and is not meant to reach stateof-the-art performance. Still, for the detection task, the baseline is performing better than the other participants, which stresses the difficulty of such a task.

【2】 Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization 标题：多通道文摘的层次化跨通道语义关联学习模型链接：https://arxiv.org/abs/2112.12072

作者：Litian Zhang,Xiaoming Zhang,Junshu Pan,Feiran Huang 机构：School of Cyber Science and Technology, Beihang University, Beijing, China, College of Cyber Security, Jinan University, Guangzhou, China 备注：Accepted by AAAI2022 摘要：具有多模式输出的多模式摘要（MSMO）生成包含文本和视觉内容的摘要。多模态新闻报道包含异构内容，这使得MSMO非常重要。此外，我们还观察到，新闻报道中不同形式的数据在层次上相互关联。传统的MSMO方法通过学习整个数据的表示来处理不同的数据模式，这不能直接适应异构内容和层次关联。在本文中，我们提出了一个分层跨模态语义相关学习模型（HCSCL）来学习多模态数据中存在的模态内和模态间相关性。HCSCL采用图形网络对模式内相关性进行编码。然后，提出了一种分层融合框架来学习文本和图像之间的分层相关性。此外，我们还构建了一个包含相关图像注释和图像对象标签信息的新数据集，为学习过程提供监督信息。数据集上的大量实验表明，HCSCL在自动摘要度量和细粒度多样性测试方面显著优于基线方法。摘要：Multimodal summarization with multimodal output (MSMO) generates a summary with both textual and visual content. Multimodal news report contains heterogeneous contents, which makes MSMO nontrivial. Moreover, it is observed that different modalities of data in the news report correlate hierarchically. Traditional MSMO methods indistinguishably handle different modalities of data by learning a representation for the whole data, which is not directly adaptable to the heterogeneous contents and hierarchical correlation. In this paper, we propose a hierarchical cross-modality semantic correlation learning model (HCSCL) to learn the intra- and inter-modal correlation existing in the multimodal data. HCSCL adopts a graph network to encode the intra-modal correlation. Then, a hierarchical fusion framework is proposed to learn the hierarchical correlation between text and images. Furthermore, we construct a new dataset with relevant image annotation and image object label information to provide the supervision information for the learning procedure. Extensive experiments on the dataset show that HCSCL significantly outperforms the baseline methods in automatic summarization metrics and fine-grained diversity tests.

【3】 VoiceMoji: A Novel On-Device Pipeline for Seamless Emoji Insertion in Dictation 标题：VoiceMoji：一种新颖的听写无缝Emoji插入流水线链接：https://arxiv.org/abs/2112.12028

作者：Sumit Kumar,Harichandana B S S,Himanshu Arora 机构：Samsung R&D Institute, Bangalore, India 备注：Accepted at IEEE INDICON 2021, 19-21 December, 2021, India 摘要：大多数语音识别系统只恢复语音中的单词，无法捕捉情感。用户必须在文本中手动添加表情符号，以增加语气和交流乐趣。虽然在转录语音的标点符号添加方面做了很多工作，但是情感添加方面的工作还没有涉及到。在本文中，我们提出了一种新的设备上管道来丰富语音输入体验。它涉及到，给定一团转录文本，智能地处理和识别插入表情符号有意义的结构。此外，它还包括语义文本分析，以预测每个子部分的表情，为此，我们提出了一种新的基于注意的字符感知（ACA）LSTM架构，该架构还处理词汇表外（OOV）单词。所有这些任务都完全在设备上执行，因此可以帮助设备听写系统。据我们所知，这是第一部展示如何在转录文本中添加表情符号的作品。我们证明，我们的组件在标点符号添加和表情预测方面取得了与以前的神经方法相当的结果，参数减少了80%。总的来说，我们提出的模型具有非常小的内存占用，仅4MB，以适应设备部署。摘要：Most of the speech recognition systems recover only words in the speech and fail to capture emotions. Users have to manually add emoji(s) in text for adding tone and making communication fun. Though there is much work done on punctuation addition on transcribed speech, the area of emotion addition is untouched. In this paper, we propose a novel on-device pipeline to enrich the voice input experience. It involves, given a blob of transcribed text, intelligently processing and identifying structure where emoji insertion makes sense. Moreover, it includes semantic text analysis to predict emoji for each of the sub-parts for which we propose a novel architecture Attention-based Char Aware (ACA) LSTM which handles Out-Of-Vocabulary (OOV) words as well. All these tasks are executed completely on-device and hence can aid on-device dictation systems. To the best of our knowledge, this is the first work that shows how to add emoji(s) in the transcribed text. We demonstrate that our components achieve comparable results to previous neural approaches for punctuation addition and emoji prediction with 80% fewer parameters. Overall, our proposed model has a very small memory footprint of a mere 4MB to suit on-device deployment.

【4】 Evaluating the Robustness of Deep Reinforcement Learning for Autonomous and Adversarial Policies in a Multi-agent Urban Driving Environment 标题：多智能体城市驾驶环境下自主对抗性策略的深度强化学习鲁棒性评估链接：https://arxiv.org/abs/2112.11947

作者：Aizaz Sharif,Dusica Marijan 机构：Simula Research Laboratory, Norway 摘要：深度强化学习积极用于在基于视觉的城市模拟环境中训练自主驾驶代理。由于各种强化学习算法的大量可用性，我们仍然不确定哪种算法在单智能体和多智能体驾驶环境中训练自动驾驶汽车时效果更好。在基于视觉的自动驾驶中进行深度强化学习的比较，将为更好地训练自动驾驶汽车提供可能性。此外，基于深度强化学习算法训练的自动驾驶汽车因易受对抗性攻击而闻名，而我们对于哪种算法可以充当好的对抗性代理的信息也较少。在这项工作中，我们提供了一个系统的评估和比较分析的6个深度强化学习算法的自主和敌对驾驶在四路交叉口的情况。具体来说，我们首先使用最先进的深度强化学习算法训练自动驾驶汽车。其次，我们在单代理和多代理场景中测试经过训练的自主策略的驱动能力。最后，我们使用相同的深度强化学习算法来训练对抗性驾驶代理，以测试自动驾驶汽车的驾驶性能，并寻找可能的碰撞和越野驾驶场景。我们使用仅限视觉的高保真城市驾驶模拟环境进行实验。摘要：Deep reinforcement learning is actively used for training autonomous driving agents in a vision-based urban simulated environment. Due to the large availability of various reinforcement learning algorithms, we are still unsure of which one works better while training autonomous cars in single-agent as well as multi-agent driving environments. A comparison of deep reinforcement learning in vision-based autonomous driving will open up the possibilities for training better autonomous car policies. Also, autonomous cars trained on deep reinforcement learning-based algorithms are known for being vulnerable to adversarial attacks, and we have less information on which algorithms would act as a good adversarial agent. In this work, we provide a systematic evaluation and comparative analysis of 6 deep reinforcement learning algorithms for autonomous and adversarial driving in four-way intersection scenario. Specifically, we first train autonomous cars using state-of-the-art deep reinforcement learning algorithms. Second, we test driving capabilities of the trained autonomous policies in single-agent as well as multi-agent scenarios. Lastly, we use the same deep reinforcement learning algorithms to train adversarial driving agents, in order to test the driving performance of autonomous cars and look for possible collision and offroad driving scenarios. We perform experiments by using vision-only high fidelity urban driving simulated environments.

【5】 Faster Convergence in Multi-Objective Optimization Algorithms Based on Decomposition 标题：基于分解的多目标优化算法的快速收敛链接：https://arxiv.org/abs/2112.11939

作者：Yuri Lavinas,Marcelo Ladeira,Claus Aranha 机构：School of Systems and Information Engineering, University of Tsukuba, Japan, Department of Computer Science, University of Brasilia, Brazil 摘要：资源分配方法（RA）通过维持大量人口和每代更新少量解决方案来提高MOEA/D的性能。然而，大多数关于RA的研究都集中在不同资源分配指标的性质上。因此，仍然不确定是什么主要因素导致了具有RA的MOEA/D性能的增加。本研究在广泛的MOP集合中调查了部分更新策略对MOEA/D的影响，以深入了解MOEA/D与部分更新的对应关系以及MOEA/D与小种群规模和大种群规模的对应关系。我们的工作对种群的动力学行为进行了深入的分析，考虑到它们的最终近似帕累托集、任意超体积性能、获得的区域和唯一非支配解的数量。我们的结果表明，部分更新的MOEA/D搜索速度与小种群规模的MOEA/D搜索速度一样快，搜索空间与大种群规模的MOEA/D搜索速度一样快。anytime performance和经验达成函数表明，部分更新的MOEA/D在大多数MOP中可以以更好的收敛速度缓解与人口规模选择相关的常见问题，如超容量和唯一非支配解数量的结果所示。摘要：The Resource Allocation approach (RA) improves the performance of MOEA/D by maintaining a big population and updating few solutions each generation. However, most of the studies on RA generally focused on the properties of different Resource Allocation metrics. Thus, it is still uncertain what the main factors are that lead to increments in performance of MOEA/D with RA. This study investigates the effects of MOEA/D with the Partial Update Strategy in an extensive set of MOPs to generate insights into correspondences of MOEA/D with the Partial Update and MOEA/D with small population size and big population size. Our work undertakes an in-depth analysis of the populational dynamics behaviour considering their final approximation Pareto sets, anytime hypervolume performance, attained regions and number of unique non-dominated solutions. Our results indicate that MOEA/D with Partial Update progresses with the search as fast as MOEA/D with small population size and explores the search space as MOEA/D with big population size. MOEA/D with Partial Update can mitigate common problems related to population size choice with better convergence speed in most MOPs, as shown by the results of hypervolume and number of unique non-dominated solutions, the anytime performance and Empirical Attainment Function indicates.

【6】 Adversarial Deep Reinforcement Learning for Trustworthy Autonomous Driving Policies 标题：基于对抗性深度强化学习的可信自主驾驶策略链接：https://arxiv.org/abs/2112.11937

作者：Aizaz Sharif,Dusica Marijan 机构：Simula Research Laboratory, Norway 摘要：深度强化学习被广泛用于在模拟环境中训练自动驾驶汽车。尽管如此，众所周知，自动驾驶汽车在遭遇对手攻击时容易受到攻击。这就提出了一个问题，即我们是否可以将对手训练成一个驾驶代理，以便在自动驾驶汽车中发现故障场景，然后用新的对手输入重新训练自动驾驶汽车，以提高其鲁棒性。在这项工作中，我们首先在两个自定义奖励函数上训练和比较对抗性汽车策略，以测试多智能体环境下自主汽车的驾驶控制决策。其次，我们验证了对抗性示例不仅可以用于发现不想要的自主驾驶行为，还可以帮助自主驾驶汽车改进其深度强化学习策略。通过使用高保真城市驾驶模拟环境和基于视觉的驾驶代理，我们证明了使用对手玩家重新训练的自动驾驶汽车在减少碰撞和越野转向错误方面显著提高了其驾驶策略的性能。摘要：Deep reinforcement learning is widely used to train autonomous cars in a simulated environment. Still, autonomous cars are well known for being vulnerable when exposed to adversarial attacks. This raises the question of whether we can train the adversary as a driving agent for finding failure scenarios in autonomous cars, and then retrain autonomous cars with new adversarial inputs to improve their robustness. In this work, we first train and compare adversarial car policy on two custom reward functions to test the driving control decision of autonomous cars in a multi-agent setting. Second, we verify that adversarial examples can be used not only for finding unwanted autonomous driving behavior, but also for helping autonomous driving cars in improving their deep reinforcement learning policies. By using a high fidelity urban driving simulation environment and vision-based driving agents, we demonstrate that the autonomous cars retrained using the adversary player noticeably increase the performance of their driving policies in terms of reducing collision and offroad steering errors.

【7】 ALP: Data Augmentation using Lexicalized PCFGs for Few-Shot Text Classification 标题：ALP：基于词汇化PCFGS的Few-Shot文本分类数据增强链接：https://arxiv.org/abs/2112.11916

作者：Hazel Kim,Daecheol Woo,Seong Joon Oh,Jeong-Won Cha,Yo-Sub Han 机构： Yonsei University, Seoul, Republic of Korea, NAVER AI Lab, Changwon National University, Changwon, Republic of Korea 备注：Accepted to AAAI2022 摘要：数据扩充是提高学习模型性能的一个重要因素。以前用于少量镜头文本分类的数据增强方法已经极大地提高了性能。然而，它们并不是为了捕捉自然语言复杂的组成结构而设计的。结果，他们无法生成具有似是而非和多样化句子结构的样本。基于此，我们提出了使用词汇化概率上下文无关语法（ALP）进行数据扩充，该语法生成具有不同语法结构和似是而非语法的扩充样本。词汇化的PCFG解析树考虑成分和依赖关系，以产生句法框架，最大化各种单词选择的语法可保存的方式，而无特定领域专家。对少量镜头文本分类任务的实验表明，ALP增强了许多最先进的分类方法。作为第二个贡献，我们深入研究了数据扩充方法发挥作用时的train-val分割方法。我们在经验上认为，与我们的新的基于增强的分割策略相比，传统的训练集和验证集的分割是次优的，该策略使用相同数量的标记数据进一步扩展训练分割。综上所述，我们在数据增强策略方面的贡献为少量镜头文本分类任务提供了强大的训练配方。摘要：Data augmentation has been an important ingredient for boosting performances of learned models. Prior data augmentation methods for few-shot text classification have led to great performance boosts. However, they have not been designed to capture the intricate compositional structure of natural language. As a result, they fail to generate samples with plausible and diverse sentence structures. Motivated by this, we present the data Augmentation using Lexicalized Probabilistic context-free grammars (ALP) that generates augmented samples with diverse syntactic structures with plausible grammar. The lexicalized PCFG parse trees consider both the constituents and dependencies to produce a syntactic frame that maximizes a variety of word choices in a syntactically preservable manner without specific domain experts. Experiments on few-shot text classification tasks demonstrate that ALP enhances many state-of-the-art classification methods. As a second contribution, we delve into the train-val splitting methodologies when a data augmentation method comes into play. We argue empirically that the traditional splitting of training and validation sets is sub-optimal compared to our novel augmentation-based splitting strategies that further expand the training split with the same number of labeled data. Taken together, our contributions on the data augmentation strategies yield a strong training recipe for few-shot text classification tasks.

【8】 Automatic Product Copywriting for E-Commerce 标题：面向电子商务的产品自动抄写链接：https://arxiv.org/abs/2112.11915

作者：Xueying Zhang,Yanyan Zou,Hainan Zhang,Jing Zhou,Shiliang Diao,Jiajia Chen,Zhuoye Ding,Zhen He,Xueqi He,Yun Xiao,Bo Long,Han Yu,Lingfei Wu 机构： JD.COM Silicon Valley Research Center, Nanyang Technological University 备注：Accepted by AAAI 2022/IAAI 2022 under the track of "Highly Innovative Applications of AI" 摘要：产品文案是电子商务推荐平台的重要组成部分。它旨在通过文本描述突出产品特征，吸引用户的兴趣并改善用户体验。在本文中，我们报告了我们在JD中部署拟议的自动产品文案生成（APCG）系统的经验。com电子商务产品推荐平台。它由两个主要部分组成：1）自然语言生成，它由转换器指针网络和基于我们内部平台数百万训练数据的预训练序列到序列模型构建；2）文案质量控制，基于自动评估和人工筛选。对于选定的领域，每天使用更新的训练数据对模型进行训练和更新。此外，该模型还被用作我们直播平台上的实时写作辅助工具。APCG系统已在JD部署。com自2月2021日起。到SEP 2021，它已经产生了253万个产品描述，并分别提高了总的平均点击率（CTR）和转换率（CVR）分别为4.22%和3.61%，与基线相比，分别按年计算。我们的系统所累积的总商品体积（GMV）比FEB 2021的数量提高了213.42%。摘要：Product copywriting is a critical component of e-commerce recommendation platforms. It aims to attract users' interest and improve user experience by highlighting product characteristics with textual descriptions. In this paper, we report our experience deploying the proposed Automatic Product Copywriting Generation (APCG) system into the JD.com e-commerce product recommendation platform. It consists of two main components: 1) natural language generation, which is built from a transformer-pointer network and a pre-trained sequence-to-sequence model based on millions of training data from our in-house platform; and 2) copywriting quality control, which is based on both automatic evaluation and human screening. For selected domains, the models are trained and updated daily with the updated training data. In addition, the model is also used as a real-time writing assistant tool on our live broadcast platform. The APCG system has been deployed in JD.com since Feb 2021. By Sep 2021, it has generated 2.53 million product descriptions, and improved the overall averaged click-through rate (CTR) and the Conversion Rate (CVR) by 4.22% and 3.61%, compared to baselines, respectively on a year-on-year basis. The accumulated Gross Merchandise Volume (GMV) made by our system is improved by 213.42%, compared to the number in Feb 2021.

【9】 Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort 标题：基于主动学习的文本辅助标注实现轻而易举的高质量标注链接：https://arxiv.org/abs/2112.11914

作者：Franziska Weeber,Felix Hamborg,Karsten Donnay,Bela Gipp 机构：∗University of Konstanz, Germany, †Heidelberg Academy of Sciences and Humanities, Germany, ‡University of Zurich, Switzerland, §University of Wuppertal, Germany 摘要：大量带注释的数据变得比以往任何时候都重要，特别是自深度学习技术兴起以来。但是，手动注释的成本很高。我们提出了一种工具，使研究人员能够创建大型、高质量、带注释的数据集，只需少量手动注释，从而大大降低注释成本和工作量。为此，我们将主动学习（AL）方法与预先训练的语言模型相结合，以半自动地识别给定文本文档中的注释类别。为了突出我们的研究方向的潜力，我们评估了识别新闻文章框架的方法。我们的初步结果表明，使用AL可以大大减少注释的数量，以便对这些复杂而微妙的帧进行正确的分类。在框架数据集上，AL方法只需要16.3%的注释即可达到与在完整数据集上训练的模型相同的性能。摘要：Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques. However, manual annotations are costly. We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations, thus strongly reducing annotation cost and effort. For this purpose, we combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories in the given text documents. To highlight our research direction's potential, we evaluate the approach on the task of identifying frames in news articles. Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even these complex and subtle frames. On the framing dataset, the AL approach needs only 16.3\% of the annotations to reach the same performance as a model trained on the full dataset.

【10】 Towards Interactive Language Modeling 标题：走向交互式语言建模链接：https://arxiv.org/abs/2112.11911

作者：Maartje ter Hoeve,Evgeny Kharitonov,Dieuwke Hupkes,Emmanuel Dupoux 机构：University of Amsterdam, Meta AI Labs, EHESS 摘要：照顾者和儿童之间的互动在人类语言习得和发展中起着至关重要的作用。鉴于这一观察结果，显式交互在人工语言建模中几乎不起作用是值得注意的。人工语言建模也以人工模型获取人类语言为目标。此外，语言建模的交互式方法有可能使语言模型更加通用，并对下游应用程序产生重大影响。基于这些考虑，我们开创了交互式语言建模的空间。作为第一个贡献，我们提出了一个路线图，其中详细说明了交互式语言建模需要采取的步骤。然后，我们以身作则，在路线图上迈出第一步，展示我们方法的初步可行性。因此，这项工作旨在成为交互式语言建模更大研究议程的开始。摘要：Interaction between caregivers and children plays a critical role in human language acquisition and development. Given this observation, it is remarkable that explicit interaction plays little to no role in artificial language modeling -- which also targets the acquisition of human language, yet by artificial models. Moreover, an interactive approach to language modeling has the potential to make language models substantially more versatile and to considerably impact downstream applications. Motivated by these considerations, we pioneer the space of interactive language modeling. As a first contribution we present a road map in which we detail the steps that need to be taken towards interactive language modeling. We then lead by example and take the first steps on this road map, showing the initial feasibility of our approach. As such, this work aims to be the start of a larger research agenda on interactive language modeling.

【11】 Few-shot Multi-hop Question Answering over Knowledge Base 标题：基于知识库的Few-Shot多跳问答系统链接：https://arxiv.org/abs/2112.11909

作者：Fan Meihao 摘要：由于缺乏复杂的中文语义分析数据集，以及搜索空间随关系路径长度呈指数增长，以往的中文知识库问答工作受到限制。本文提出了一种高效的流水线方法，该方法采用预先训练好的语言模型和构造人工训练样本的策略，只需少量数据，但在开放领域复杂中文问答任务中表现良好。此外，通过采用基于语言模型标记候选查询元组分数的Beam搜索算法，我们在生成多跳查询路径时降低了增长关系路径的速度。最后，我们通过知识库任务对我们的模型在CCKS2019复杂问题回答中进行了评估，在测试数据集上获得了62.55\%的F1分数。此外，当只使用10%的数据进行训练时，我们的模型仍然可以达到58.54%的F1分数。结果表明，我们的模型具有处理KBQA任务的能力，并且在Few-Shot学习方面具有优势。摘要：Previous work on Chinese Knowledge Base Question Answering has been restricted due to the lack of complex Chinese semantic parsing dataset and the exponentially growth of searching space with the length of relation paths. This paper proposes an efficient pipeline method equipped with a pre-trained language model and a strategy to construct artificial training samples, which only needs small amount of data but performs well on open-domain complex Chinese Question Answering task. Besides, By adopting a Beam Search algorithm based on a language model marking scores for candidate query tuples, we decelerate the growing relation paths when generating multi-hop query paths. Finally, we evaluate our model on CCKS2019 Complex Question Answering via Knowledge Base task and achieves F1-score of 62.55\% on the test dataset. Moreover when training with only 10\% data, our model can still achieves F1-score of 58.54\%. The result shows the capability of our model to process KBQA task and the advantage in few-shot learning.

【12】 Multimodal Analysis of memes for sentiment extraction 标题：用于情感提取的模因多模态分析链接：https://arxiv.org/abs/2112.11850

作者：Nayan Varma Alluri,Neeli Dheeraj Krishna 机构：Department of Computer Science and, Engineering, PES University, Bangalore, India 备注：5 pages 摘要：模因是最普遍的社交媒体传播形式之一。模因本质上是多媒体，研究和处理模因是当前的热门话题。本研究中的研究基于Memotion数据集，该数据集包括根据反讽、喜剧、动机和总体情绪对模因进行分类。开发了三种基于Transformer的独立创新技术，并对其结果进行了全面审查。在我们的所有技术中，最佳算法的幽默分类的F1总分为0.633分，动机分类的F1总分为0.55分，讽刺分类的F1总分为0.61分，模因整体情绪的F1总分为0.575分。摘要：Memes are one of the most ubiquitous forms of social media communication. The study and processing of memes, which are intrinsically multimedia, is a popular topic right now. The study presented in this research is based on the Memotion dataset, which involves categorising memes based on irony, comedy, motivation, and overall-sentiment. Three separate innovative transformer-based techniques have been developed, and their outcomes have been thoroughly reviewed.The best algorithm achieved a macro F1 score of 0.633 for humour classification, 0.55 for motivation classification, 0.61 for sarcasm classification, and 0.575 for overall sentiment of the meme out of all our techniques.

【13】 Bottom-up approaches for multi-person pose estimation and it's applications: A brief review 标题：自下而上的多人姿态估计方法及其应用链接：https://arxiv.org/abs/2112.11834

作者：Milan Kresović,Thong Duy Nguyen 机构：Norwegian University of Science and Technology, Norway. 备注：13 pages, 11 figures 摘要：人体姿态估计（HPE）是计算机视觉的基本问题之一。它的应用范围从虚拟现实、人类行为分析、视频监控、异常检测、自动驾驶到医疗救助。HPE的主要目标是从给定的输入中获取人的姿势。在HPE的不同范例中，有一种范例称为自下而上的多人姿势估计。在自底向上的方法中，首先检测目标的所有关键点，然后在优化阶段，将检测到的关键点与相应的目标关联。本文讨论了HPE自底向上方法的最新进展，并列出了用于训练模型的可能高质量数据集。此外，还讨论了自下而上的方法及其在标准性能矩阵上的定量结果。最后，指出了现有方法的局限性，并对未来的研究方向进行了展望。摘要：Human Pose Estimation (HPE) is one of the fundamental problems in computer vision. It has applications ranging from virtual reality, human behavior analysis, video surveillance, anomaly detection, self-driving to medical assistance. The main objective of HPE is to obtain the person's posture from the given input. Among different paradigms for HPE, one paradigm is called bottom-up multi-person pose estimation. In the bottom-up approach, initially, all the key points of the targets are detected, and later in the optimization stage, the detected key points are associated with the corresponding targets. This review paper discussed the recent advancements in bottom-up approaches for the HPE and listed the possible high-quality datasets used to train the models. Additionally, a discussion of the prominent bottom-up approaches and their quantitative results on the standard performance matrices are given. Finally, the limitations of the existing methods are highlighted, and guidelines of the future research directions are given.

【14】 Lifting Symmetry Breaking Constraints with Inductive Logic Programming 标题：用归纳逻辑编程提升对称破缺约束链接：https://arxiv.org/abs/2112.11806

作者：Alice Tarzariol,Martin Gebser,Konstantin Schekotihin 机构：Received: date Accepted: date 摘要：有效地省略对称候选解是组合问题求解的关键。现有的大多数方法都是针对具体实例的，并且侧重于为每个给定问题实例自动计算对称破缺约束（SBC）。然而，将此类方法应用于大规模实例或高级问题编码可能存在问题，因为计算出的SBC是命题的，因此既不能有意义地解释，也不能转移到其他实例。因此，必须在每次调用解算器之前完成耗时的SBC重新计算。为了克服这些限制，我们引入了一种新的面向模型的答案集编程方法，该方法使用归纳逻辑编程范式将小问题实例的SBC提升为一组可解释的一阶约束。实验证明了我们的框架能够从一系列组合问题的特定实例SBC中学习一般约束。所得结果表明，我们的方法明显优于最先进的实例特定方法以及直接应用解算器的方法。摘要：Efficient omission of symmetric solution candidates is essential for combinatorial problem-solving. Most of the existing approaches are instance-specific and focus on the automatic computation of Symmetry Breaking Constraints (SBCs) for each given problem instance. However, the application of such approaches to large-scale instances or advanced problem encodings might be problematic since the computed SBCs are propositional and, therefore, can neither be meaningfully interpreted nor transferred to other instances. As a result, a time-consuming recomputation of SBCs must be done before every invocation of a solver. To overcome these limitations, we introduce a new model-oriented approach for Answer Set Programming that lifts the SBCs of small problem instances into a set of interpretable first-order constraints using the Inductive Logic Programming paradigm. Experiments demonstrate the ability of our framework to learn general constraints from instance-specific SBCs for a collection of combinatorial problems. The obtained results indicate that our approach significantly outperforms a state-of-the-art instance-specific method as well as the direct application of a solver.

【15】 Neural-Symbolic Integration for Interactive Learning and Conceptual Grounding 标题：用于互动学习和概念基础的神经-符号整合链接：https://arxiv.org/abs/2112.11805

作者：Benedikt Wagner,Artur d'Avila Garcez 机构：Department of Computer Science, City, University of London, London, EC,HB, UK, Artur d’Avila Garcez 备注：None 摘要：我们提出了用于抽象概念解释和交互式学习的神经-符号集成。神经符号集成和解释允许用户和领域专家了解大型神经模型的数据驱动决策过程。使用符号逻辑语言查询模型。然后，与用户的交互确认或拒绝使用基于逻辑的约束对神经模型进行修改，这些约束可以提炼到模型架构中。该方法用逻辑张量网络框架和概念激活向量加以说明，并应用于卷积神经网络。摘要：We propose neural-symbolic integration for abstract concept explanation and interactive learning. Neural-symbolic integration and explanation allow users and domain-experts to learn about the data-driven decision making process of large neural models. The models are queried using a symbolic logic language. Interaction with the user then confirms or rejects a revision of the neural model using logic-based constraints that can be distilled into the model architecture. The approach is illustrated using the Logic Tensor Network framework alongside Concept Activation Vectors and applied to a Convolutional Neural Network.

【16】 Shape Fragments 标题：形状碎片链接：https://arxiv.org/abs/2112.11796

作者：Thomas Delva,Anastasia Dimou,Maxime Jakubowski,Jan Van den Bussche 机构：IDLab, Ghent University, imec, Ghent, Belgium, Dept. Computer Science, KU Leuven, Leuven, Belgium, DSI, Hasselt University, Hasselt, Belgium 摘要：在RDF图的约束语言（如ShEx和SHACL）中，RDF图中节点及其属性上的约束称为“形状”。这些语言中的模式列出了某些目标节点必须满足的各种形状，以使图形符合模式。利用SHACL，我们在本文中提出了一种新的形状用法，即使用一组形状从RDF图中提取一个子图，即所谓的形状片段。我们提出的机制适合于链接数据片段的框架。在本文中，（i）基于最近提出的SHACL形式化，我们正式定义了我们的抽取机制；（ii）我们建立正确性属性，将形状片段与数据库查询的来源概念联系起来；（iii）我们将形状片段与SPARQL查询进行比较；（iv）我们讨论实施方案；（v）我们提供了初步实验，证明形状碎片是一个可行的新想法。摘要：In constraint languages for RDF graphs, such as ShEx and SHACL, constraints on nodes and their properties in RDF graphs are known as "shapes". Schemas in these languages list the various shapes that certain targeted nodes must satisfy for the graph to conform to the schema. Using SHACL, we propose in this paper a novel use of shapes, by which a set of shapes is used to extract a subgraph from an RDF graph, the so-called shape fragment. Our proposed mechanism fits in the framework of Linked Data Fragments. In this paper, (i) we define our extraction mechanism formally, building on recently proposed SHACL formalizations; (ii) we establish correctness properties, which relate shape fragments to notions of provenance for database queries; (iii) we compare shape fragments with SPARQL queries; (iv) we discuss implementation options; and (v) we present initial experiments demonstrating that shape fragments are a feasible new idea.

【17】 Class-aware Sounding Objects Localization via Audiovisual Correspondence 标题：基于视听对应的类感知探测对象定位链接：https://arxiv.org/abs/2112.11749

作者：Di Hu,Yake Wei,Rui Qian,Weiyao Lin,Ruihua Song,Ji-Rong Wen 机构： the ChineseUniversity of Hong Kong 备注：accepted by TPAMI 2021. Code: this https URL 摘要：视听场景在我们的日常生活中无处不在。对于人类来说，区分不同的发声对象是很常见的，但是对于机器来说，在没有类别注释的情况下实现类感知的发声对象定位是相当具有挑战性的，即，定位发声对象并识别其类别。为了解决这个问题，我们提出了一个两阶段分步学习框架，仅使用音频和视觉之间的对应关系来定位和识别复杂视听场景中的发声对象。首先，我们建议在单声源情况下，通过粗粒度视听对应确定探测区域。然后利用发声区域中的视觉特征作为候选对象表示，建立类别表示对象词典，用于表达性视觉特征提取。我们在鸡尾酒会场景中生成类感知的对象定位图，并通过引用该词典使用视听对应来抑制静默区域。最后，我们采用类别级视听一致性作为监督，以实现细粒度音频和声音对象分布对齐。在真实视频和合成视频上的实验表明，我们的模型在定位和识别对象以及滤除无声对象方面具有优越性。我们还将学习到的视听网络转换为无监督的目标检测任务，获得了合理的性能。摘要：Audiovisual scenes are pervasive in our daily life. It is commonplace for humans to discriminatively localize different sounding objects but quite challenging for machines to achieve class-aware sounding objects localization without category annotations, i.e., localizing the sounding object and recognizing its category. To address this problem, we propose a two-stage step-by-step learning framework to localize and recognize sounding objects in complex audiovisual scenarios using only the correspondence between audio and vision. First, we propose to determine the sounding area via coarse-grained audiovisual correspondence in the single source cases. Then visual features in the sounding area are leveraged as candidate object representations to establish a category-representation object dictionary for expressive visual character extraction. We generate class-aware object localization maps in cocktail-party scenarios and use audiovisual correspondence to suppress silent areas by referring to this dictionary. Finally, we employ category-level audiovisual consistency as the supervision to achieve fine-grained audio and sounding object distribution alignment. Experiments on both realistic and synthesized videos show that our model is superior in localizing and recognizing objects as well as filtering out silent ones. We also transfer the learned audiovisual network into the unsupervised object detection task, obtaining reasonable performance.

【18】 A Survey of Natural Language Generation 标题：自然语言生成研究综述链接：https://arxiv.org/abs/2112.11739

作者：Chenhe Dong,Yinghui Li,Haifan Gong,Miaoxin Chen,Junxin Li,Ying Shen,Min Yang 机构： Sun Yat-Sen University, Tsinghua University 备注：36 pages, 4 tables; Under review 摘要：本文全面回顾了近二十年来自然语言生成（NLG）的研究，特别是数据到文本生成和文本到文本生成的深度学习方法，以及NLG技术的新应用。这项调查旨在（a）提供关于NLG核心任务的深度学习研究的最新综合，以及该领域采用的架构；（b）细致全面地细化NLG的各项任务和数据集，关注NLG评估面临的挑战，关注不同的评估方法及其关系；（c）强调由于NLG与其他人工智能领域（如计算机视觉、文本和计算创造力）之间的协同作用日益增强而产生的一些未来重点和相对较新的研究问题。摘要：This paper offers a comprehensive review of the research on Natural Language Generation (NLG) over the past two decades, especially in relation to data-to-text generation and text-to-text generation deep learning methods, as well as new applications of NLG technology. This survey aims to (a) give the latest synthesis of deep learning research on the NLG core tasks, as well as the architectures adopted in the field; (b) detail meticulously and comprehensively various NLG tasks and datasets, and draw attention to the challenges in NLG evaluation, focusing on different evaluation methods and their relationships; (c) highlight some future emphasis and relatively recent research issues that arise due to the increasing synergy between NLG and other artificial intelligence areas, such as computer vision, text and computational creativity.

【19】 Investigating Neighborhood Modeling and Asymmetry Preservation in Digraph Representation Learning 标题：有向图表示学习中的邻域建模与非对称性保持研究链接：https://arxiv.org/abs/2112.11734

作者：Honglu Zhou,Advith Chegu,Samuel Sohn,Mubbasir Kapadia 机构：Department of Computer Science, Rutgers University, Piscataway, NJ, USA. 摘要：传统上，图神经网络（GNN）对于有向图（有向图）表现出较差的性能，这是因为在1）建模邻域和2）保持不对称性方面存在显著的挑战。在本文中，我们通过利用来自多秩序和分区社区的双曲线协作学习，以及受社会心理因素启发的正则化器，解决了传统GNN中的这些挑战。我们得到的形式主义，有向图双曲网络（D-HYPR）学习双曲空间中的节点表示，以避免真实世界有向图的结构和语义失真。我们在4个任务上进行了综合实验：链接预测、节点分类、符号预测和嵌入可视化。在大多数任务和数据集上，D-HYPR在统计上显著优于当前技术水平，而在其他方面则具有竞争力。我们的代码和数据将可用。摘要：Graph Neural Networks (GNNs) traditionally exhibit poor performance for directed graphs (digraphs) due to notable challenges in 1) modeling neighborhoods and 2) preserving asymmetry. In this paper, we address these challenges in traditional GNNs by leveraging hyperbolic collaborative learning from multi-ordered and partitioned neighborhoods, and regularizers inspired by socio-psychological factors. Our resulting formalism, Digraph Hyperbolic Network (D-HYPR) learns node representations in hyperbolic space to avoid structural and semantic distortion of real-world digraphs. We conduct comprehensive experimentation on 4 tasks: link prediction, node classification, sign prediction, and embedding visualization. D-HYPR statistically significantly outperforms the current state of the art on a majority of tasks and datasets, while achieving competitive performance otherwise. Our code and data will be available.

【20】 Graph augmented Deep Reinforcement Learning in the GameRLand3D environment 标题：GameRLand3D环境下的图形增广深度强化学习链接：https://arxiv.org/abs/2112.11731

作者：Edward Beeching,Maxim Peter,Philippe Marcotte,Jilles Debangoye,Olivier Simonin,Joshua Romoff,Christian Wolf 机构： Ubisoft La Forge, Montreal, INRIA Chroma team, CITI Laboratory. INSA-Lyon, France., Universit´e de Lyon, INSA-Lyon, LIRIS, CNRS, France. 摘要：我们在具有挑战性的3D视频游戏中解决规划和导航问题，该游戏的特点是地图上的断开区域可由使用特殊操作的代理访问。在此设置中，经典符号规划者不适用或难以适应。我们介绍了一种混合技术，它将经过强化学习的低级策略和基于图的高级经典规划器相结合。除了提供人类可解释的路径外，该方法还提高了未查看地图中端到端方法的泛化性能，在未查看的1km x 1km大比例尺地图中，与经常使用的端到端代理相比，该方法的成功率绝对提高了20%。在一项深入的实验研究中，我们量化了端到端深度RL方法在广阔环境中的局限性，并引入了“GameRLand3D”，这是一个新的基准，即将发布的环境可以为导航任务生成复杂的程序3D地图。摘要：We address planning and navigation in challenging 3D video games featuring maps with disconnected regions reachable by agents using special actions. In this setting, classical symbolic planners are not applicable or difficult to adapt. We introduce a hybrid technique combining a low level policy trained with reinforcement learning and a graph based high level classical planner. In addition to providing human-interpretable paths, the approach improves the generalization performance of an end-to-end approach in unseen maps, where it achieves a 20% absolute increase in success rate over a recurrent end-to-end agent on a point to point navigation task in yet unseen large-scale maps of size 1km x 1km. In an in-depth experimental study, we quantify the limitations of end-to-end Deep RL approaches in vast environments and we also introduce "GameRLand3D", a new benchmark and soon to be released environment can generate complex procedural 3D maps for navigation tasks.

【21】 Hybrid Curriculum Learning for Emotion Recognition in Conversation 标题：会话中情感识别的混合课程学习链接：https://arxiv.org/abs/2112.11718

作者：Lin Yang,Yi Shen,Yue Mao,Longjun Cai 机构：Alibaba Group, Beijing, China 备注：Accepted by AAAI-2022 摘要：会话中的情感识别（ERC）旨在检测每一个话语的情感标签。最近的研究证明，以有意义的顺序输入训练示例而不是随机考虑它们可以提高模型的性能，因此我们提出了一个面向ERC的混合课程学习框架。我们的框架包括两个课程：（1）会话级课程（CC）；（2）话语水平课程（UC）。在CC中，我们根据会话中的“情绪转移”频率构造了一个难度度量器，然后根据难度度量器返回的难度分数将会话安排在“易到难”的模式中。对于UC，它是从情感相似性的角度实现的，这逐渐增强了模型识别混乱情感的能力。通过提出的模型不可知混合课程学习策略，我们观察到，与现有的各种ERC模型相比，性能显著提高，并且我们能够在四个公共ERC数据集上获得最新的结果。摘要：Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERC-oriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on "emotion shift" frequency within a conversation, then the conversations are scheduled in an "easy to hard" schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model's ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.

【22】 Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination 标题：基于最大熵种群的零射人-人工智能协调训练链接：https://arxiv.org/abs/2112.11701

作者：Rui Zhao,Jinming Song,Hu Haifeng,Yang Gao,Yi Wu,Zhongqian Sun,Yang Wei 机构：Tencent AI Lab ,Tsinghua University 备注：Accepted by NeurIPS Cooperative AI Workshop, 2021, link: this https URL 摘要：人工智能代理应该能够与人类协调解决任务。我们考虑的问题是训练强化学习（RL）代理，而不使用任何人的数据，即，在Zero-Shot设置，使其能够与人类合作。标准RL代理通过自我游戏学习。不幸的是，这些代理只知道如何与自己协作，通常不会与不可见的合作伙伴（如人类）合作。如何以零拍的方式训练一个健壮的代理仍然有待研究。从最大熵RL出发，我们推导了一个集中的群体熵目标，以促进不同群体的代理的学习，该目标随后用于训练一个健壮的代理与不可见的合作伙伴协作。与基线方法（包括自玩PPO、标准的基于群体的训练（PBT）和基于轨迹多样性的PBT）相比，该方法在流行的过度烹饪游戏环境中显示了其有效性。我们还对真实的人类进行了在线实验，并进一步证明了该方法在真实世界中的有效性。显示实验结果的补充视频可在https://youtu.be/Xh-FKD0AAKE. 摘要：An AI agent should be able to coordinate with humans to solve tasks. We consider the problem of training a Reinforcement Learning (RL) agent without using any human data, i.e., in a zero-shot setting, to make it capable of collaborating with humans. Standard RL agents learn through self-play. Unfortunately, these agents only know how to collaborate with themselves and normally do not perform well with unseen partners, such as humans. The methodology of how to train a robust agent in a zero-shot fashion is still subject to research. Motivated from the maximum entropy RL, we derive a centralized population entropy objective to facilitate learning of a diverse population of agents, which is later used to train a robust agent to collaborate with unseen partners. The proposed method shows its effectiveness compared to baseline methods, including self-play PPO, the standard Population-Based Training (PBT), and trajectory diversity-based PBT, in the popular Overcooked game environment. We also conduct online experiments with real humans and further demonstrate the efficacy of the method in the real world. A supplementary video showing experimental results is available at https://youtu.be/Xh-FKD0AAKE.

【23】 Domain Adaptation with Pre-trained Transformers for Query Focused Abstractive Text Summarization 标题：面向查询的摘要文本摘要中基于预训练转换器的领域自适应链接：https://arxiv.org/abs/2112.11670

作者：Md Tahmid Rahman Laskar,Enamul Hoque,Jimmy Xiangji Huang 机构：Dialpad Canada Inc., Information Retrieval & Knowledge Management Research Lab, York University, School of Information Technology, York University, Toronto, Ontario, Canada 备注：The final version will be published in the Computational Linguistics journal 摘要：以查询为中心的文本摘要（QFTS）任务旨在构建基于给定查询生成文本文档摘要的系统。解决此任务的一个关键挑战是缺少用于训练摘要模型的大型标记数据。在本文中，我们通过探索一系列领域适应技术来应对这一挑战。鉴于预训练的transformer模型最近在广泛的自然语言处理任务中取得了成功，我们利用这些模型为单文档和多文档场景的QFTS任务生成抽象摘要。对于域自适应，我们使用预训练的基于转换器的摘要模型应用各种技术，包括转移学习、弱监督学习和远程监督。在六个数据集上进行的大量实验表明，我们提出的方法在生成QFTS任务的抽象摘要方面非常有效，同时在多个数据集中设置了一组自动和人工评估指标的最新结果。摘要：The Query Focused Text Summarization (QFTS) task aims at building systems that generate the summary of the text document(s) based on the given query. A key challenge in addressing this task is the lack of large labeled data for training the summarization model. In this paper, we address this challenge by exploring a series of domain adaptation techniques. Given the recent success of pre-trained transformer models in a wide range of natural language processing tasks, we utilize such models to generate abstractive summaries for the QFTS task for both single-document and multi-document scenarios. For domain adaptation, we apply a variety of techniques using pre-trained transformer-based summarization models including transfer learning, weakly supervised learning, and distant supervision. Extensive experiments on six datasets show that our proposed approach is very effective in generating abstractive summaries for the QFTS task while setting a new state-of-the-art result in several datasets across a set of automatic and human evaluation metrics.

【24】 Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models 标题：基于共生网络的深度核机器翻译模型联合训练链接：https://arxiv.org/abs/2112.11642

作者：Zhengzhe Yu,Jiaxin Guo,Minghan Wang,Daimeng Wei,Hengchao Shang,Zongyao Li,Zhanglin Wu,Yuxia Wang,Yimeng Chen,Chang Su,Min Zhang,Lizhi Lei,shimin tao,Hao Yang 机构：Huawei Translation Services Center, Beijing, China, The University of Melbourne, Melbourne, Australia, lizongyao,wuzhanglin,chenyimeng,suchang,zhangmin 摘要：深编码器已被证明是有效的改进神经机器翻译（NMT）系统，但它达到翻译质量的上限时，编码器层的数量超过18。更糟糕的是，更深层次的网络消耗了大量内存，因此无法有效地训练。在本文中，我们提出了一种共生网络，它包括一个完整的网络作为共生主网（M-Net）和另一个与共生子网（S-Net）结构相同但层次较少的共享子网。我们采用了Transformer深度（m-n）结构上的共生网络，并在NMT中定义了m-Net和S-Net之间的特定正则化损失$\mathcal{L}{\tau}$。我们在共生网络上进行联合训练，旨在提高M-Net的性能。在WMT'14 EN->DE、DE->EN和EN->FR任务的经典训练中，我们提出的训练策略将Transformer deep（12-6）的BLEU提高了0.61、0.49和0.69。此外，我们的Transformer deep（12-6）甚至优于经典的Transformer deep（18-6）。摘要：Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but it reaches the upper bound of translation quality when the number of encoder layers exceeds 18. Worse still, deeper networks consume a lot of memory, making it impossible to train efficiently. In this paper, we present Symbiosis Networks, which include a full network as the Symbiosis Main Network (M-Net) and another shared sub-network with the same structure but less layers as the Symbiotic Sub Network (S-Net). We adopt Symbiosis Networks on Transformer-deep (m-n) architecture and define a particular regularization loss $\mathcal{L}_{\tau}$ between the M-Net and S-Net in NMT. We apply joint-training on the Symbiosis Networks and aim to improve the M-Net performance. Our proposed training strategy improves Transformer-deep (12-6) by 0.61, 0.49 and 0.69 BLEU over the baselines under classic training on WMT'14 EN->DE, DE->EN and EN->FR tasks. Furthermore, our Transformer-deep (12-6) even outperforms classic Transformer-deep (18-6).

【25】 Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation 标题：用于非自回归神经机器翻译的自精馏混合训练链接：https://arxiv.org/abs/2112.11640

作者：Jiaxin Guo,Minghan Wang,Daimeng Wei,Hengchao Shang,Yuxia Wang,Zongyao Li,Zhengzhe Yu,Zhanglin Wu,Yimeng Chen,Chang Su,Min Zhang,Lizhi Lei,shimin tao,Hao Yang 机构：Huawei Translation Services Center, Beijing, China, The University of Melbourne, Melbourne, Australia, lizongyao,yuzhengzhe,wuzhanglin,chenyimeng,suchang 摘要：最近，非自回归（NAT）模型并行预测输出，与自回归（AT）模型相比，实现了发电速度的大幅提高。虽然在原始数据上的表现较差，但大多数NAT模型在AT教师模型生成的提取数据上被训练为学生模型，这称为序列级知识提取。提高AT模型性能的一种有效训练策略是自蒸馏混合（SDM）训练，它根据原始数据预训练模型，通过预训练模型本身生成蒸馏数据，最后根据原始数据和蒸馏数据的组合重新训练模型。在这项工作中，我们的目标是查看针对NAT模型的SDM，但发现直接将SDM应用于NAT模型在翻译质量方面没有任何改进。通过仔细分析，我们观察到无效性与AT教师模型和NAT学生模型之间的建模多样性和确认偏差相关。基于这些发现，我们提出了一种改进的策略，称为SDMRT，在经典SDM中增加了两个阶段：一个是对自提取数据进行预重排序，另一个是对过滤后的教师提取数据进行微调。在多个NAT模型上，我们的结果比基线高出0.6到1.2个BLEU。另一个好处是，对于迭代优化NAT模型，我们的方法可以在一半迭代次数内超越基线，这意味着2倍的加速。摘要：Recently, non-autoregressive (NAT) models predict outputs in parallel, achieving substantial improvements in generation speed compared to autoregressive (AT) models. While performing worse on raw data, most NAT models are trained as student models on distilled data generated by AT teacher models, which is known as sequence-level Knowledge Distillation. An effective training strategy to improve the performance of AT models is Self-Distillation Mixup (SDM) Training, which pre-trains a model on raw data, generates distilled data by the pre-trained model itself and finally re-trains a model on the combination of raw data and distilled data. In this work, we aim to view SDM for NAT models, but find directly adopting SDM to NAT models gains no improvements in terms of translation quality. Through careful analysis, we observe the invalidation is correlated to Modeling Diversity and Confirmation Bias between the AT teacher model and the NAT student models. Based on these findings, we propose an enhanced strategy named SDMRT by adding two stages to classic SDM: one is Pre-Rerank on self-distilled data, the other is Fine-Tune on Filtered teacher-distilled data. Our results outperform baselines by 0.6 to 1.2 BLEU on multiple NAT models. As another bonus, for Iterative Refinement NAT models, our methods can outperform baselines within half iteration number, which means 2X acceleration.

【26】 Diformer: Directional Transformer for Neural Machine Translation 标题：Diformer：用于神经机器翻译的方向变换器链接：https://arxiv.org/abs/2112.11632

作者：Minghan Wang,Jiaxin Guo,Yuxia Wang,Daimeng Wei,Hengchao Shang,Chang Su,Yimeng Chen,Yinglu Li,Min Zhang,Shimin Tao,Hao Yang 机构：Huawei Translation Services Center, Beijing, China, The University of Melbourne, Melbourne, Australia, suchang,chenyimeng,liyinglu,zhangmin 摘要：自回归（AR）和非自回归（NAR）模型在性能和延迟方面都有各自的优势，将它们组合成一个模型可能会同时利用这两个优势。当前的组合框架更多地关注于将多种解码范式与统一的生成模型（如掩蔽语言模型）进行集成。然而，由于训练目标和推理之间的差距，泛化可能对性能有害。在本文中，我们的目标是通过在统一的框架下保持AR和NAR的原始目标来缩小差距。具体而言，我们提出了方向变换器（DIFORER），通过将AR和NAR联合建模为三个生成方向（从左到右、从右到左和从直），并引入一个新的方向变量，该变量通过控制每个令牌的预测来工作，以使其在该方向下具有特定的依赖性。direction实现的统一成功地保留了AR和NAR中使用的原始依赖性假设，同时保留了泛化和性能。在4个WMT基准上的实验表明，Diformer在AR和NAR解码方面优于当前的联合建模工作，其BLEU点数超过1.5，并且与最先进的独立AR和NAR模型具有竞争力。摘要：Autoregressive (AR) and Non-autoregressive (NAR) models have their own superiority on the performance and latency, combining them into one model may take advantage of both. Current combination frameworks focus more on the integration of multiple decoding paradigms with a unified generative model, e.g. Masked Language Model. However, the generalization can be harmful to the performance due to the gap between training objective and inference. In this paper, we aim to close the gap by preserving the original objective of AR and NAR under a unified framework. Specifically, we propose the Directional Transformer (Diformer) by jointly modelling AR and NAR into three generation directions (left-to-right, right-to-left and straight) with a newly introduced direction variable, which works by controlling the prediction of each token to have specific dependencies under that direction. The unification achieved by direction successfully preserves the original dependency assumption used in AR and NAR, retaining both generalization and performance. Experiments on 4 WMT benchmarks demonstrate that Diformer outperforms current united-modelling works with more than 1.5 BLEU points for both AR and NAR decoding, and is also competitive to the state-of-the-art independent AR and NAR models.

【27】 MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context 标题：马赛克：基于解码聚合信息和编码上下文的移动分割链接：https://arxiv.org/abs/2112.11623

作者：Weijun Wang,Andrew Howard 机构：Google Research 摘要：我们提出了一种新一代的神经网络结构，马赛克，用于移动设备上高效准确的语义图像分割。MOSAIC的设计使用了各种移动硬件平台共同支持的神经操作，以便在各种移动平台上灵活部署。MOSAIC采用简单的非对称编解码器结构，该结构由高效的多尺度上下文编码器和轻型混合解码器组成，用于从聚合信息中恢复空间细节，在平衡精度和计算成本的同时实现了最新的性能。MOSAIC部署在基于搜索分类网络的定制特征提取主干上，其绝对准确度提高了5%，超过了当前行业标准的MLPerf模型和最先进的体系结构。摘要：We present a next-generation neural network architecture, MOSAIC, for efficient and accurate semantic image segmentation on mobile devices. MOSAIC is designed using commonly supported neural operations by diverse mobile hardware platforms for flexible deployment across various mobile platforms. With a simple asymmetric encoder-decoder structure which consists of an efficient multi-scale context encoder and a light-weight hybrid decoder to recover spatial details from aggregated information, MOSAIC achieves new state-of-the-art performance while balancing accuracy and computational cost. Deployed on top of a tailored feature extraction backbone based on a searched classification network, MOSAIC achieves a 5% absolute accuracy gain surpassing the current industry standard MLPerf models and state-of-the-art architectures.

【28】 An Alternate Policy Gradient Estimator for Softmax Policies 标题：一种适用于软最大策略的替代策略梯度估计器链接：https://arxiv.org/abs/2112.11622

作者：Shivam Garg,Samuele Tosatto,Yangchen Pan,Martha White,A. Rupam Mahmood 机构： 1University of Alberta, Alberta MachineIntelligence Institute (Amii) 摘要：softmax策略的策略梯度（PG）估计在次优饱和初始化时是无效的，这种情况发生在密度集中于次优动作时。次优策略饱和可能由错误的策略初始化或策略已经收敛后环境中发生的突然变化引起，softmax PG估计器需要大量更新才能恢复有效的策略。这一严重问题导致样本效率高，对新情况的适应性差。为了缓解这个问题，我们提出了一种新的softmax策略梯度估计器，该估计器利用临界估计中的偏差和奖励信号中存在的噪声来避开策略参数空间的饱和区域。我们在bandits和经典MDP基准测试任务上进行的分析和实验表明，我们的估计器对策略饱和更具鲁棒性。摘要：Policy gradient (PG) estimators for softmax policies are ineffective with sub-optimally saturated initialization, which happens when the density concentrates on a sub-optimal action. Sub-optimal policy saturation may arise from bad policy initialization or sudden changes in the environment that occur after the policy has already converged, and softmax PG estimators require a large number of updates to recover an effective policy. This severe issue causes high sample inefficiency and poor adaptability to new situations. To mitigate this problem, we propose a novel policy gradient estimator for softmax policies that utilizes the bias in the critic estimate and the noise present in the reward signal to escape the saturated regions of the policy parameter space. Our analysis and experiments, conducted on bandits and classical MDP benchmarking tasks, show that our estimator is more robust to policy saturation.

【29】 Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions 标题：用于自动驾驶的可解释人工智能：综合综述和未来研究方向的现场指南链接：https://arxiv.org/abs/2112.11561

作者：Shahin Atakishiyev,Mohammad Salameh,Hengshuai Yao,Randy Goebel 机构： Universityof Alberta and Huawei Technologies Canada Co 备注：18 pages 摘要：在过去十年中，自动驾驶在研发方面取得了重大的里程碑。随着在道路上部署自营车辆，人们对这一领域的兴趣越来越大，这意味着交通系统将更加安全和生态友好。随着计算能力强大的人工智能（AI）技术的兴起，自动驾驶车辆可以高精度地感知环境，做出安全的实时决策，并且在没有人为干预的情况下运行更加可靠。然而，在目前的技术水平下，自动驾驶汽车中的智能决策通常不为人类所理解，这种缺陷阻碍了这项技术被社会接受。因此，除了做出安全的实时决策外，自动驾驶汽车的AI系统还需要解释这些决策是如何构建的，以便在多个司法管辖区内符合监管要求。我们的研究为开发用于自动驾驶车辆的可解释人工智能（XAI）方法提供了全面的信息。我们特别作出以下贡献。首先，我们全面概述了目前在最先进的自动驾驶汽车行业的解释方面存在的差距。然后，我们展示了该领域中解释和解释接收者的分类。第三，我们提出了一个端到端自动驾驶系统体系结构的框架，并论证了XAI在调试和调节此类系统中的作用。最后，作为未来的研究方向，我们提供了自主驾驶XAI方法的现场指南，该方法可以提高操作安全性和透明度，从而获得监管机构、制造商和所有参与利益相关者的公开批准。摘要：Autonomous driving has achieved a significant milestone in research and development over the last decade. There is increasing interest in the field as the deployment of self-operating vehicles on roads promises safer and more ecologically friendly transportation systems. With the rise of computationally powerful artificial intelligence (AI) techniques, autonomous vehicles can sense their environment with high precision, make safe real-time decisions, and operate more reliably without human interventions. However, intelligent decision-making in autonomous cars is not generally understandable by humans in the current state of the art, and such deficiency hinders this technology from being socially acceptable. Hence, aside from making safe real-time decisions, the AI systems of autonomous vehicles also need to explain how these decisions are constructed in order to be regulatory compliant across many jurisdictions. Our study sheds a comprehensive light on developing explainable artificial intelligence (XAI) approaches for autonomous vehicles. In particular, we make the following contributions. First, we provide a thorough overview of the present gaps with respect to explanations in the state-of-the-art autonomous vehicle industry. We then show the taxonomy of explanations and explanation receivers in this field. Thirdly, we propose a framework for an architecture of end-to-end autonomous driving systems and justify the role of XAI in both debugging and regulating such systems. Finally, as future research directions, we provide a field guide on XAI approaches for autonomous driving that can improve operational safety and transparency towards achieving public approval by regulators, manufacturers, and all engaged stakeholders.

【30】 Decompose the Sounds and Pixels, Recompose the Events 标题：分解声音和像素，重组事件链接：https://arxiv.org/abs/2112.11547

作者：Varshanth R. Rao,Md Ibrahim Khalil,Haoda Li,Peng Dai,Juwei Lu 机构：Huawei Noah’s Ark Lab, University of Waterloo, Canada, University of Toronto, Canada 备注：Accepted at AAAI 2022 摘要：在本文中，我们提出了一个以事件分解-重组网络（EDRNet）为核心的框架来解决有监督和弱监督环境下的视听事件（AVE）定位问题。现实世界中的AVE显示出常见的分解模式（称为事件进度检查点（EPC）），人类可以通过听觉和视觉感官的合作来感知这些模式。与早期尝试识别整个事件序列的方法不同，EDRNet使用堆叠时间卷积对EPC和EPC间关系进行建模。基于EPC表示理论上与事件类别一致的假设，我们介绍了基于状态机的视频融合，这是一种新的增强技术，它使用不同的EPC模板序列混合源视频。此外，我们还设计了一个新的损失函数，称为陆-岸-海损失，以压缩连续的前景和背景表示。最后，为了缓解在弱监控期间混淆事件的问题，我们提出了一种称为包到实例标签校正的预测稳定方法。在AVE数据集上的实验表明，我们的集体框架比最先进的框架有相当大的优势。摘要：In this paper, we propose a framework centering around a novel architecture called the Event Decomposition Recomposition Network (EDRNet) to tackle the Audio-Visual Event (AVE) localization problem in the supervised and weakly supervised settings. AVEs in the real world exhibit common unravelling patterns (termed as Event Progress Checkpoints (EPC)), which humans can perceive through the cooperation of their auditory and visual senses. Unlike earlier methods which attempt to recognize entire event sequences, the EDRNet models EPCs and inter-EPC relationships using stacked temporal convolutions. Based on the postulation that EPC representations are theoretically consistent for an event category, we introduce the State Machine Based Video Fusion, a novel augmentation technique that blends source videos using different EPC template sequences. Additionally, we design a new loss function called the Land-Shore-Sea loss to compactify continuous foreground and background representations. Lastly, to alleviate the issue of confusing events during weak supervision, we propose a prediction stabilization method called Bag to Instance Label Correction. Experiments on the AVE dataset show that our collective framework outperforms the state-of-the-art by a sizable margin.

【31】 Sentence Embeddings and High-speed Similarity Search for Fast Computer Assisted Annotation of Legal Documents 标题：基于句子嵌入和快速相似度搜索的快速计算机辅助法律文本标注链接：https://arxiv.org/abs/2112.11494

作者：Hannes Westermann,Jaromir Savelka,Vern R. Walker,Kevin D. Ashley,Karim Benyekhlef 机构：Cyberjustice Laboratory, Facult´e de droit, Universit´e de Montr´eal, School of Computer Science, Carnegie Mellon University, LLT Lab, Maurice A. Deane School of Law, Hofstra University, School of Computing and Information, University of Pittsburgh 备注：None 摘要：法律文件中句子的人工注释是许多支持法律任务的机器学习系统的重要前提。通常，注释是按顺序逐句完成的，这通常很耗时，因此成本很高。在这篇文章中，我们介绍了一个“横向”注释句子的概念证明系统这种方法基于这样一种观察，即意义相似的句子在特定类型系统中通常具有相同的标签。我们利用这一观察结果，允许注释者在整个文档库中快速查看和注释语义与给定句子相似的句子。这里，我们展示了系统的界面，并对该方法进行了实证评估。实验表明，横向标注有可能使标注过程更快、更一致。摘要：Human-performed annotation of sentences in legal documents is an important prerequisite to many machine learning based systems supporting legal tasks. Typically, the annotation is done sequentially, sentence by sentence, which is often time consuming and, hence, expensive. In this paper, we introduce a proof-of-concept system for annotating sentences "laterally." The approach is based on the observation that sentences that are similar in meaning often have the same label in terms of a particular type system. We use this observation in allowing annotators to quickly view and annotate sentences that are semantically similar to a given sentence, across an entire corpus of documents. Here, we present the interface of the system and empirically evaluate the approach. The experiments show that lateral annotation has the potential to make the annotation process quicker and more consistent.

【32】 Translating Human Mobility Forecasting through Natural Language Generation 标题：通过自然语言生成翻译人员流动性预测链接：https://arxiv.org/abs/2112.11481

作者：Hao Xue,Flora D. Salim,Yongli Ren,Charles L. A. Clarke 机构：School of Computing Technologies, RMIT University, Melbourne, Victoria, Australia, University of Waterloo, Waterloo, Ontario, Canada 备注：Accepted at WSDM2022 摘要：现有的人员流动预测模型遵循时间序列预测模型的标准设计，该模型以一系列数值作为输入，生成一个数值作为预测。尽管将其视为回归问题似乎很简单，但在设计有效的移动性预测模型时，整合各种上下文信息（如每个兴趣点（POI）的语义类别信息）是一个必要步骤，而且往往是瓶颈。与典型的方法不同，我们将预测视为一个翻译问题，并通过语言生成管道提出了一种新的预测方法。本文旨在将人员流动预测问题作为一项语言翻译任务，以顺序对顺序的方式进行研究。首先引入语言迁移模板，将数字迁移数据描述为自然语言句子。人类移动性预测翻译任务的核心直觉是将输入的移动性描述语句转换为未来的移动性描述，从中可以获得预测目标。在此管道下，设计了一个两分支网络SHIFT（翻译人员流动预测）。具体来说，它由一个主要的语言生成分支和一个辅助分支组成，用于直接学习迁移模式。在训练过程中，我们开发了动量模式，以便更好地连接和训练两个分支机构。在三个真实数据集上进行的大量实验表明，所提出的转换是有效的，并为预测人类流动性提供了一种新的革命性方法。摘要：Existing human mobility forecasting models follow the standard design of the time-series prediction model which takes a series of numerical values as input to generate a numerical value as a prediction. Although treating this as a regression problem seems straightforward, incorporating various contextual information such as the semantic category information of each Place-of-Interest (POI) is a necessary step, and often the bottleneck, in designing an effective mobility prediction model. As opposed to the typical approach, we treat forecasting as a translation problem and propose a novel forecasting through a language generation pipeline. The paper aims to address the human mobility forecasting problem as a language translation task in a sequence-to-sequence manner. A mobility-to-language template is first introduced to describe the numerical mobility data as natural language sentences. The core intuition of the human mobility forecasting translation task is to convert the input mobility description sentences into a future mobility description from which the prediction target can be obtained. Under this pipeline, a two-branch network, SHIFT (Translating Human Mobility Forecasting), is designed. Specifically, it consists of one main branch for language generation and one auxiliary branch to directly learn mobility patterns. During the training, we develop a momentum mode for better connecting and training the two branches. Extensive experiments on three real-world datasets demonstrate that the proposed SHIFT is effective and presents a new revolutionary approach to forecasting human mobility.

【33】 Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies 标题：走向人类-人工智能决策科学：实证研究综述链接：https://arxiv.org/abs/2112.11471

作者：Vivian Lai,Chacha Chen,Q. Vera Liao,Alison Smith-Renner,Chenhao Tan 机构： University of Colorado Boulder, University of Chicago 备注：36 pages, 2 figures, see this https URL for website 摘要：随着人工智能系统显示出越来越强的预测性能，它们在许多领域的应用也越来越多。然而，在刑事司法和医疗保健等高风险领域，出于安全、道德和法律方面的考虑，全自动化通常是不可取的，但全手动方法可能不准确且耗时。因此，研究界越来越有兴趣利用人工智能辅助人类决策。除了为此目的开发人工智能技术外，人类人工智能决策的新兴领域还必须包含经验方法，以形成对人类如何与人工智能互动并与人工智能一起做出决策的基本理解。为了邀请并帮助构建理解和改进人工智能决策的科学的研究工作，我们调查了关于这一主题的实证人类主体研究的最新文献。我们总结了100多篇论文在三个重要方面做出的研究设计选择：（1）决策任务，（2）人工智能模型和人工智能辅助元素，以及（3）评估指标。对于每个方面，我们总结当前趋势，讨论该领域当前实践中的差距，并为未来的研究提出建议。我们的调查强调需要开发通用框架来解释人工智能决策的设计和研究空间，以便研究人员能够在研究设计中做出严格的选择，研究社区能够在彼此的工作基础上发展，产生可概括的科学知识。我们还希望这项调查将成为HCI和AI社区合作的桥梁，共同塑造人类AI决策的经验科学和计算技术。摘要：As AI systems demonstrate increasingly strong predictive performance, their adoption has grown in numerous domains. However, in high-stakes domains such as criminal justice and healthcare, full automation is often not desirable due to safety, ethical, and legal concerns, yet fully manual approaches can be inaccurate and time consuming. As a result, there is growing interest in the research community to augment human decision making with AI assistance. Besides developing AI technologies for this purpose, the emerging field of human-AI decision making must embrace empirical approaches to form a foundational understanding of how humans interact and work with AI to make decisions. To invite and help structure research efforts towards a science of understanding and improving human-AI decision making, we survey recent literature of empirical human-subject studies on this topic. We summarize the study design choices made in over 100 papers in three important aspects: (1) decision tasks, (2) AI models and AI assistance elements, and (3) evaluation metrics. For each aspect, we summarize current trends, discuss gaps in current practices of the field, and make a list of recommendations for future research. Our survey highlights the need to develop common frameworks to account for the design and research spaces of human-AI decision making, so that researchers can make rigorous choices in study design, and the research community can build on each other's work and produce generalizable scientific knowledge. We also hope this survey will serve as a bridge for HCI and AI communities to work together to mutually shape the empirical science and computational technologies for human-AI decision making.

【34】 Deep Reinforcement Learning for Optimal Power Flow with Renewables Using Spatial-Temporal Graph Information 标题：基于时空图信息的含可再生能源最优潮流的深度强化学习链接：https://arxiv.org/abs/2112.11461

作者：Jinhao Li,Ruichang Zhang,Hao Wang,Zhi Liu,Hongyang Lai,Yanru Zhang 机构：ZhangarewiththeSchoolofComputerSci-ence and Engineering, University of Electronic Science and Technol-ogy of China 备注：18 pages, 14 figures 摘要：可再生能源（RER）已越来越多地融入现代电力系统，特别是在大型配电网（DNs）中。在本文中，我们提出了一种基于深度强化学习（DRL）的方法来动态搜索具有高RER率的DNs中的最佳运行点，即最优潮流（OPF）。考虑到RER引起的不确定性和电压波动问题，我们将OPF转化为一个多目标优化（MOO）问题。为了解决MOO问题，我们开发了一种利用配电网图形信息的新型DRL算法。具体而言，我们采用最先进的DRL算法，即深度确定性策略梯度（DDPG），来学习OPF的最优策略。由于DN中的潮流再分配是一个连续的过程，其中节点在时间和空间视图中是自相关和相互关联的，为了充分利用DNs的图形信息，我们开发了一个基于多粒度注意的时空图卷积网络（MG-ASTGCN）用于时空图信息提取，为其顺序DDPG做准备。我们在改进的IEEE 33、69和118母线径向配电系统（RDS）中验证了我们提出的基于DRL的方法，并表明我们基于DRL的方法优于其他基准算法。我们的实验结果还表明，MG-ASTGCN可以显著加快DDPG的训练过程，并提高DDPG对OPF潮流的重新分配能力。所提出的基于DRL的方法还提高了DNs在存在节点故障时的稳定性，特别是对于大规模DNs。摘要：Renewable energy resources (RERs) have been increasingly integrated into modern power systems, especially in large-scale distribution networks (DNs). In this paper, we propose a deep reinforcement learning (DRL)-based approach to dynamically search for the optimal operation point, i.e., optimal power flow (OPF), in DNs with a high uptake of RERs. Considering uncertainties and voltage fluctuation issues caused by RERs, we formulate OPF into a multi-objective optimization (MOO) problem. To solve the MOO problem, we develop a novel DRL algorithm leveraging the graphical information of the distribution network. Specifically, we employ the state-of-the-art DRL algorithm, i.e., deep deterministic policy gradient (DDPG), to learn an optimal strategy for OPF. Since power flow reallocation in the DN is a consecutive process, where nodes are self-correlated and interrelated in temporal and spatial views, to make full use of DNs' graphical information, we develop a multi-grained attention-based spatial-temporal graph convolution network (MG-ASTGCN) for spatial-temporal graph information extraction, preparing for its sequential DDPG. We validate our proposed DRL-based approach in modified IEEE 33, 69, and 118-bus radial distribution systems (RDSs) and show that our DRL-based approach outperforms other benchmark algorithms. Our experimental results also reveal that MG-ASTGCN can significantly accelerate the DDPG training process and improve DDPG's capability in reallocating power flow for OPF. The proposed DRL-based approach also promotes DNs' stability in the presence of node faults, especially for large-scale DNs.

【35】 Activity-based and agent-based Transport model of Melbourne (AToM): an open multi-modal transport simulation model for Greater Melbourne 标题：墨尔本基于活动和基于主体的运输模型(ATOM)：一个开放的大墨尔本多式联运仿真模型链接：https://arxiv.org/abs/2112.12071

作者：Afshin Jafari,Dhirendra Singh,Alan Both,Mahsa Abdollahyar,Lucy Gunn,Steve Pemberton,Billie Giles-Corti 机构：School of Global, Urban and Social Studies, RMIT University, School of Computing Technologies, RMIT University, Data, CSIRO 摘要：近年来，基于Agent和基于活动的交通系统仿真模型受到了广泛关注。然而，很少有研究在全市范围内详细描述了活跃的交通方式，如步行和骑自行车，其中主要的机动交通方式往往是主要关注的问题。本文介绍了一个开放式工作流，用于创建基于多模式代理和基于活动的交通仿真模型，重点是大墨尔本，包括四种主要出行模式（驾驶、公共交通、自行车和步行）的模式选择校准过程。生成并用作模拟模型输入的合成人口代表了基于2016年人口普查的墨尔本人口，以及基于维多利亚州2016-18年旅游调查数据的日常活动和出行。仿真模型中使用的道路网络包括所有可通过包含的出行模式进入的公共道路。我们将模拟模型的输出与真实世界的观测结果在模式共享、道路交通量、行程时间和行程距离方面进行了比较。通过这些比较，我们发现我们的模型适用于研究出行方式选择和道路使用行为。摘要：Agent-based and activity-based models for simulating transportation systems have attracted significant attention in recent years. Few studies, however, include a detailed representation of active modes of transportation - such as walking and cycling - at a city-wide level, where dominating motorised modes are often of primary concern. This paper presents an open workflow for creating a multi-modal agent-based and activity-based transport simulation model, focusing on Greater Melbourne, and including the process of mode choice calibration for the four main travel modes of driving, public transport, cycling and walking. The synthetic population generated and used as an input for the simulation model represented Melbourne's population based on Census 2016, with daily activities and trips based on the Victoria's 2016-18 travel survey data. The road network used in the simulation model includes all public roads accessible via the included travel modes. We compared the output of the simulation model with observations from the real world in terms of mode share, road volume, travel time, and travel distance. Through these comparisons, we showed that our model is suitable for studying mode choice and road usage behaviour of travellers.

【36】 Variational Quantum Soft Actor-Critic 标题：变分量子软演员-批评家链接：https://arxiv.org/abs/2112.11921

作者：Qingfeng Lan 机构：Department of Computing Science, University of Alberta, Edmonton, Canada 备注：A course project paper 摘要：量子计算在解决诸如整数分解和西蒙问题等具体问题方面具有优越的优势。对于机器学习中更一般的任务，通过应用变分量子电路，近年来提出了越来越多的量子算法，特别是在有监督学习和无监督学习中。然而，在强化学习方面几乎没有做过什么工作，可以说强化学习更重要、更具挑战性。量子强化学习的前期工作主要集中于离散控制任务，其中动作空间是离散的。在这项工作中，我们开发了一种基于软参与者批评的量子强化学习算法——连续控制的最新方法之一。具体来说，我们使用了一个混合量子经典策略网络，该网络由一个变分量子电路和一个经典人工神经网络组成。在一个标准的强化学习基准测试中，我们表明，这个量子版本的软演员批评家与原始的软演员批评家相当，使用更少的可调参数。此外，我们还分析了不同超参数和策略网络结构的影响，指出了结构设计对量子强化学习的重要性。摘要：Quantum computing has a superior advantage in tackling specific problems, such as integer factorization and Simon's problem. For more general tasks in machine learning, by applying variational quantum circuits, more and more quantum algorithms have been proposed recently, especially in supervised learning and unsupervised learning. However, little work has been done in reinforcement learning, arguably more important and challenging. Previous work in quantum reinforcement learning mainly focuses on discrete control tasks where the action space is discrete. In this work, we develop a quantum reinforcement learning algorithm based on soft actor-critic -- one of the state-of-the-art methods for continuous control. Specifically, we use a hybrid quantum-classical policy network consisting of a variational quantum circuit and a classical artificial neural network. Tested in a standard reinforcement learning benchmark, we show that this quantum version of soft actor-critic is comparable with the original soft actor-critic, using much less adjustable parameters. Furthermore, we analyze the effect of different hyper-parameters and policy network architectures, pointing out the importance of architecture design for quantum reinforcement learning.

【37】 Robust learning of data anomalies with analytically-solvable entropic outlier sparsification 标题：基于解析可解熵离群点稀疏的数据异常鲁棒学习链接：https://arxiv.org/abs/2112.11768

作者：Illia Horenko 机构：Universit´a della Svizzera Italiana (USI), Institute of Computing, Via, G. Buffi , TI-, Lugano, Switzerland 备注：9 pages, 1 figure 摘要：熵离群点稀疏化（EOS）是一种稳健的计算策略，用于检测一大类学习方法中的数据异常，包括无监督问题（如检测大部分高斯数据中的非高斯离群点）和有监督的错误标记数据学习。状态方程组致力于导出香农熵正则化下（加权）期望误差最小化问题的解析闭式解。与通常的正则化策略不同，这种策略需要计算量随数据维数按多项式进行缩放，而确定的闭式解被证明会带来额外的迭代量，这些迭代量与统计数据的大小成线性关系，并且与数据维数无关。获得的分析结果还解释了为什么球对称高斯混合（在许多流行的数据分析算法中启发式使用）在使用平方欧几里德距离时，代表了非参数概率分布的最佳选择，结合了预期误差最小性、最大熵/无偏性，和线性成本缩放。EOS的性能与一系列常用工具在合成问题和生物医学的部分错误标记监督分类问题上进行了比较。摘要：Entropic Outlier Sparsification (EOS) is proposed as a robust computational strategy for the detection of data anomalies in a broad class of learning methods, including the unsupervised problems (like detection of non-Gaussian outliers in mostly-Gaussian data) and in the supervised learning with mislabeled data. EOS dwells on the derived analytic closed-form solution of the (weighted) expected error minimization problem subject to the Shannon entropy regularization. In contrast to common regularization strategies requiring computational costs that scale polynomial with the data dimension, identified closed-form solution is proven to impose additional iteration costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically-symmetric Gaussians - used heuristically in many popular data analysis algorithms - represent an optimal choice for the non-parametric probability distributions when working with squared Euclidean distances, combining expected error minimality, maximal entropy/unbiasedness, and a linear cost scaling. The performance of EOS is compared to a range of commonly-used tools on synthetic problems and on partially-mislabeled supervised classification problems from biomedicine.

【38】 The Phonetic Footprint of Parkinson's Disease 标题：帕金森病的语音足迹链接：https://arxiv.org/abs/2112.11514

作者：Philipp Klumpp,Tomás Arias-Vergara,Juan Camilo Vásquez-Correa,Paula Andrea Pérez-Toro,Juan Rafael Orozco-Arroyave,Anton Batliner,Elmar Nöth 机构：Friedrich-Alexander-University Erlangen-Nuremberg, Pattern Recognition Lab, Martensstrasse , Erlangen, Germany, Universidad de Antioquia, Medellín, Colombia, Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg 备注：None 摘要：帕金森病（Parkinson'sdisease，PD）是最常见的神经退行性疾病之一，对患者的精细运动技能有重要影响。在言语产生和实现所需肌肉张力的过程中，不同发音者之间复杂的相互作用变得越来越困难，从而导致发音障碍。在受影响的个体中，经常可以观察到元音不稳定、发音模糊和说话缓慢等特征模式，并在以前的研究中进行分析，以确定PD的存在和进展。在这项工作中，我们使用专门针对健康语音数据训练的语音识别器来研究PD如何影响患者的语音足迹。尽管我们的系统以前从未见过任何病理性语言，但我们重新发现了在以前的贡献中描述的许多模式。此外，我们可以证明来自神经网络的中间激活可以作为编码个体疾病状态相关信息的特征向量。我们还能够直接将专家评定的说话人的可懂度与语音预测的平均置信度联系起来。我们的结果支持这样的假设，即训练能够分析PD语音的系统不一定需要病理数据。摘要：As one of the most prevalent neurodegenerative disorders, Parkinson's disease (PD) has a significant impact on the fine motor skills of patients. The complex interplay of different articulators during speech production and realization of required muscle tension become increasingly difficult, thus leading to a dysarthric speech. Characteristic patterns such as vowel instability, slurred pronunciation and slow speech can often be observed in the affected individuals and were analyzed in previous studies to determine the presence and progression of PD. In this work, we used a phonetic recognizer trained exclusively on healthy speech data to investigate how PD affected the phonetic footprint of patients. We rediscovered numerous patterns that had been described in previous contributions although our system had never seen any pathological speech previously. Furthermore, we could show that intermediate activations from the neural network could serve as feature vectors encoding information related to the disease state of individuals. We were also able to directly correlate the expert-rated intelligibility of a speaker with the mean confidence of phonetic predictions. Our results support the assumption that pathological data is not necessarily required to train systems that are capable of analyzing PD speech.

机器翻译，仅供参考

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-12-23，如有侵权请联系 cloudcommunity@tencent.com 删除

linux