人工智能学术速递[6.25]

公众号-arXiv每日学术速递

发布于 2021-07-02 17:44:32

8390

发布于 2021-07-02 17:44:32

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.AI人工智能，共计45篇

【1】 Video Swin Transformer 标题：视频双Transformer

作者：Ze Liu,Jia Ning,Yue Cao,Yixuan Wei,Zheng Zhang,Stephen Lin,Han Hu 机构：Microsoft Research Asia, University of Science and Technology of China, Huazhong University of Science and Technology, Tsinghua University 链接：https://arxiv.org/abs/2106.13230 摘要：视觉界正在见证从CNNs到Transformers的建模转变，在这里，纯Transformer架构已经在主要视频识别基准上达到了最高的精确度。这些视频模型都是建立在变换层上的，变换层在空间和时间维度上全局连接补丁。在本文中，我们主张在视频转换器中引入局部性的感应偏差，这使得与以前的方法相比，它可以更好地在速度-精度上进行折衷，而以前的方法是全局计算自我注意，即使是使用空时分解。提出的视频结构的局部性是通过调整为图像域设计的swn变换器来实现的，同时继续利用预先训练的图像模型的能力。我们的方法在广泛的视频识别基准上实现了最先进的精度，包括动作识别（Kinetics-400上的最高精度为84.9，Kinetics-600上的最高精度为86.1，预训练数据减少了约20倍，模型尺寸减小了约3倍）和时间建模（Kinetics-v2上的最高精度为69.6）。代码和模型将在https://github.com/SwinTransformer/Video-Swin-Transformer. 摘要：The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks. These video models are all built on Transformer layers that globally connect patches across the spatial and temporal dimensions. In this paper, we instead advocate an inductive bias of locality in video Transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization. The locality of the proposed video architecture is realized by adapting the Swin Transformer designed for the image domain, while continuing to leverage the power of pre-trained image models. Our approach achieves state-of-the-art accuracy on a broad range of video recognition benchmarks, including on action recognition (84.9 top-1 accuracy on Kinetics-400 and 86.1 top-1 accuracy on Kinetics-600 with ~20x less pre-training data and ~3x smaller model size) and temporal modeling (69.6 top-1 accuracy on Something-Something v2). The code and models will be made publicly available at https://github.com/SwinTransformer/Video-Swin-Transformer.

【2】 Model-Based Reinforcement Learning via Latent-Space Collocation 标题：基于模型的潜在空间配置强化学习

作者：Oleh Rybkin,Chuning Zhu,Anusha Nagabandi,Kostas Daniilidis,Igor Mordatch,Sergey Levine 机构： 20 16;Equal contribution 1University of Pennsylvania 2Covariant 3Google AI 4UC Berkeley 备注：International Conference on Machine Learning (ICML), 2021. Videos and code at this https URL 链接：https://arxiv.org/abs/2106.13229 摘要：对未来进行规划的能力，同时只利用原始的高维观测，如图像，可以为自主代理提供广泛的能力。基于视觉模型的强化学习（RL）方法直接规划未来的行为，在只需要短时推理的任务上取得了令人印象深刻的效果，然而，这些方法在时间扩展的任务上却很困难。我们认为，通过规划状态序列而不仅仅是行动来解决长时间的任务更容易，因为行动的效果随着时间的推移而大大复合，并且更难优化。为了实现这一点，我们借鉴了最优控制文献中的搭配思想，并利用学习到的潜在状态空间模型将其应用到基于图像的环境中。由此产生的潜在搭配方法（LatCo）优化了潜在状态的轨迹，这比以前提出的基于视觉模型的RL射击方法在奖励稀疏和长期目标的任务上有了改进。视频和代码https://orybkin.github.io/latco/. 摘要：The ability to plan into the future while utilizing only raw high-dimensional observations, such as images, can provide autonomous agents with broad capabilities. Visual model-based reinforcement learning (RL) methods that plan future actions directly have shown impressive results on tasks that require only short-horizon reasoning, however, these methods struggle on temporally extended tasks. We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions, as the effects of actions greatly compound over time and are harder to optimize. To achieve this, we draw on the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, and adapt it to the image-based setting by utilizing learned latent state space models. The resulting latent collocation method (LatCo) optimizes trajectories of latent states, which improves over previously proposed shooting methods for visual model-based RL on tasks with sparse rewards and long-term goals. Videos and code at https://orybkin.github.io/latco/.

【3】 Towards Understanding and Mitigating Social Biases in Language Models 标题：语言模式中社会偏见的理解与缓解

作者：Paul Pu Liang,Chiyu Wu,Louis-Philippe Morency,Ruslan Salakhutdinov 机构： 20 19; 1Carnegie Mellon University 备注：ICML 2021, code available at this https URL 链接：https://arxiv.org/abs/2106.13219 摘要：随着机器学习方法在医疗保健、法律体系和社会科学等现实环境中的应用，认识到它们如何在这些敏感的决策过程中形成社会偏见和定型观念至关重要。在这样的现实世界部署中，大规模的预训练语言模型（LMs）在表现不受欢迎的代表性偏见方面具有潜在的危险性，这些偏见是由陈规定型观念造成的，这些陈规定型观念传播涉及性别、种族、宗教和其他社会结构的负面概括。作为提高LMs公平性的一个步骤，在提出新的基准和度量标准之前，我们仔细定义了代表性偏差的几个来源。利用这些工具，我们提出了减轻文本生成过程中社会偏见的步骤。我们的实验结果和人的评价表明，在保留高保真文本生成的关键上下文信息的同时，有效地减少了偏见，从而推动了性能公平帕累托前沿。摘要：As machine learning methods are deployed in real-world settings such as healthcare, legal systems, and social science, it is crucial to recognize how they shape social biases and stereotypes in these sensitive decision-making processes. Among such real-world deployments are large-scale pretrained language models (LMs) that can be potentially dangerous in manifesting undesirable representational biases - harmful biases resulting from stereotyping that propagate negative generalizations involving gender, race, religion, and other social constructs. As a step towards improving the fairness of LMs, we carefully define several sources of representational biases before proposing new benchmarks and metrics to measure them. With these tools, we propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information for high-fidelity text generation, thereby pushing forward the performance-fairness Pareto frontier.

【4】 Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data 标题：移动数据中的语言学习与多模态隐私保护情绪标记物

作者：Paul Pu Liang,Terrance Liu,Anna Cai,Michal Muszynski,Ryo Ishii,Nicholas Allen,Randy Auerbach,David Brent,Ruslan Salakhutdinov,Louis-Philippe Morency 机构：Carnegie Mellon University ,University of Oregon, Columbia University ,University of Pittsburgh 备注：ACL 2021. arXiv admin note: substantial text overlap with arXiv:2012.02359 链接：https://arxiv.org/abs/2106.13213 摘要：即使在普遍享有先进医疗服务的国家，精神健康状况仍然诊断不足。从易于收集的数据中准确有效地预测情绪的能力对于心理健康障碍的早期发现、干预和治疗有着重要的意义。一个有希望帮助监测人类行为的数据源是智能手机的日常使用。但是，必须注意总结行为，不要通过个人（例如，个人可识别信息）或受保护（例如，种族、性别）属性识别用户。在这篇论文中，我们研究了日常情绪的行为标记，使用了一个最新的数据集移动行为的青少年人群中的高风险自杀行为。使用计算模型，我们发现语言和移动类型文本的多模态表示（跨类型字符、单词、按键时间和应用程序使用）可以预测日常情绪。然而，我们发现训练用来预测情绪的模型通常也会在其中间表示中捕获私人用户身份。为了解决这个问题，我们评估了在保持预测性的同时混淆用户身份的方法。通过将多模态表示与隐私保护学习相结合，我们能够推进性能隐私前沿。摘要：Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications for the early detection, intervention, and treatment of mental health disorders. One promising data source to help monitor human behavior is daily smartphone usage. However, care must be taken to summarize behaviors without identifying the user through personal (e.g., personally identifiable information) or protected (e.g., race, gender) attributes. In this paper, we study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors. Using computational models, we find that language and multimodal representations of mobile typed text (spanning typed characters, words, keystroke timings, and app usage) are predictive of daily mood. However, we find that models trained to predict mood often also capture private user identities in their intermediate representations. To tackle this problem, we evaluate approaches that obfuscate user identity while remaining predictive. By combining multimodal representations with privacy-preserving learning, we are able to push forward the performance-privacy frontier.

【5】 CCC/Code 8.7: Applying AI in the Fight Against Modern Slavery 标题：CCC/代码8.7：将人工智能应用于与现代奴隶制作斗争

作者：Nadya Bliss,Mark Briers,Alice Eckstein,James Goulding,Daniel P. Lopresti,Anjali Mazumder,Gavin Smith 备注：A Computing Community Consortium (CCC) workshop report, 24 pages 链接：https://arxiv.org/abs/2106.13186 摘要：在任何一天，数以千万计的人都会发现自己陷入现代奴隶制的困境。“人口贩运”、“人口贩运”和“现代奴役”这三个词有时可以互换使用，以指性贩运和强迫劳动。人口贩运是指人口贩子通过使用武力、欺诈和/或胁迫强迫他人提供劳动或服务。人口贩运的利益攸关方范围广泛，这是一个重大挑战。直接利益相关者包括执法部门、非政府组织和国际非政府组织、企业、地方/规划政府当局和幸存者。从一个非常高的层次来看，所有利益相关者共享一个丰富的交互网络，该网络产生并消耗大量的信息。为了打击贩运而有效利用这些信息，同时又遵守社区隐私和道德标准，这是一个令人生畏的问题。在帮助我们的同时，加强对人口监测的技术也会损害基本人权。2020年3月初，计算社区联盟（CCC）与Code8.7倡议合作，召集了50多名计算研究社区成员以及反奴隶制从业者和幸存者，制定了研究路线图。其主要目标是探索如何将人工智能的长期研究应用于打击人口贩运。根据2019年2月在联合国总部举行的启动代码8.7会议，本次研讨会的重点是将美国人工智能研究20年社区路线图（AI路线图）中概述的宏伟目标与实现联合国可持续发展目标8.7（消除现代奴隶制）的关键挑战联系起来。摘要：On any given day, tens of millions of people find themselves trapped in instances of modern slavery. The terms "human trafficking," "trafficking in persons," and "modern slavery" are sometimes used interchangeably to refer to both sex trafficking and forced labor. Human trafficking occurs when a trafficker compels someone to provide labor or services through the use of force, fraud, and/or coercion. The wide range of stakeholders in human trafficking presents major challenges. Direct stakeholders are law enforcement, NGOs and INGOs, businesses, local/planning government authorities, and survivors. Viewed from a very high level, all stakeholders share in a rich network of interactions that produce and consume enormous amounts of information. The problems of making efficient use of such information for the purposes of fighting trafficking while at the same time adhering to community standards of privacy and ethics are formidable. At the same time they help us, technologies that increase surveillance of populations can also undermine basic human rights. In early March 2020, the Computing Community Consortium (CCC), in collaboration with the Code 8.7 Initiative, brought together over fifty members of the computing research community along with anti-slavery practitioners and survivors to lay out a research roadmap. The primary goal was to explore ways in which long-range research in artificial intelligence (AI) could be applied to the fight against human trafficking. Building on the kickoff Code 8.7 conference held at the headquarters of the United Nations in February 2019, the focus for this workshop was to link the ambitious goals outlined in the A 20-Year Community Roadmap for Artificial Intelligence Research in the US (AI Roadmap) to challenges vital in achieving the UN's Sustainable Development Goal Target 8.7, the elimination of modern slavery.

【6】 Real-time Spatio-temporal Event Detection on Geotagged Social Media 标题：地理标记社交媒体上的实时时空事件检测

作者：Yasmeen George,Shanika Karunasekera,Aaron Harwood,Kwan Hui Lim 机构：Correspondence:, School of Computing and, Information Systems, The, University of Melbourne, Australia, Full list of author information is, available at the end of the, article 备注：Accepted to Journal of Big Data 链接：https://arxiv.org/abs/2106.13121 摘要：挖掘社交媒体数据流的一个关键挑战是识别特定地区或全球范围内一群人积极讨论的事件。此类事件对于事故、抗议、选举或突发新闻的预警很有用。然而，事件列表以及事件时间和空间的分辨率都不是固定的，也不是预先知道的。在这项工作中，我们提出了一个在线时空事件检测系统使用社会媒体，能够检测事件在不同的时间和空间分辨率。首先，针对事件空间分辨率未知的问题，利用四叉树方法，根据社交媒体数据的密度将地理空间划分为多尺度区域。然后，采用泊松分布的统计无监督方法和平滑方法来突出显示具有意外社会职位密度的区域。此外，通过以连续的时间间隔合并在同一区域中发生的事件来精确地估计事件持续时间。引入了后处理阶段来过滤垃圾邮件、假事件或错误事件。最后，我们通过使用社交媒体实体来评估检测到的事件的完整性和准确性，从而合并了简单的语义。该方法使用不同的社交媒体数据集（Twitter和Flickr）对墨尔本、伦敦、巴黎和纽约的不同城市进行评估。为了验证该方法的有效性，我们将我们的结果与基于固定地理空间分割和聚类的两种基线算法进行了比较。对于性能评估，我们手动计算查全率和查准率。我们还提出了一种新的质量度量方法，称为强度指数（strength index），它可以自动度量所报告事件的准确性。摘要：A key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.

【7】 The Option Keyboard: Combining Skills in Reinforcement Learning 标题：选项键盘：强化学习中的组合技能

作者：André Barreto,Diana Borsa,Shaobo Hou,Gheorghe Comanici,Eser Aygün,Philippe Hamel,Daniel Toyama,Jonathan Hunt,Shibl Mourad,David Silver,Doina Precup 机构：DeepMind 备注：Published at NeurIPS 2019 链接：https://arxiv.org/abs/2106.13105 摘要：在解决长期存在的复杂强化学习问题时，结合已知技能创造新技能的能力可能至关重要。我们认为一种结合技能的稳健方法是在伪奖励（或“累积量”）空间中定义和操纵技能。基于这个前提，我们提出了一个框架，结合技能使用的形式主义的选择。我们证明了每个确定性选项都可以明确地表示为扩展域中定义的累积量。在此基础上，结合前人关于迁移学习的研究成果，我们将展示如何逼近累积量是已知期权累积量线性组合的期权。这意味着，一旦我们学习了与一组累积量相关的选项，我们就可以在不涉及任何学习的情况下，即时合成由它们的任何线性组合所诱导的选项。我们描述了这个框架如何为抽象操作与基本技能组合相对应的环境提供层次化接口。我们证明了我们的方法在资源管理问题和导航任务涉及四足模拟机器人的实际好处。摘要：The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or "cumulants"). Based on this premise, we propose a framework for combining skills using the formalism of options. We show that every deterministic option can be unambiguously represented as a cumulant defined in an extended domain. Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options. This means that, once we have learned options associated with a set of cumulants, we can instantaneously synthesise options induced by any linear combination of them, without any learning involved. We describe how this framework provides a hierarchical interface to the environment whose abstract actions correspond to combinations of basic skills. We demonstrate the practical benefits of our approach in a resource management problem and a navigation task involving a quadrupedal simulated robot.

【8】 Pre-training transformer-based framework on large-scale pediatric claims data for downstream population-specific tasks 标题：基于预训练Transformer的下游人群特定任务大规模儿科索赔数据框架

作者：Xianlong Zeng,Simon Lin,Chang Liu 机构：Ohio University, Electrical Engineering and Computer Science, Athens Ohio, USA, Nationwide Children’s Hospital, Research Information Solutions and Innovation, Columbus Ohio, USA 链接：https://arxiv.org/abs/2106.13095 摘要：在过去的十年中，电子健康档案（EHR）的采用已变得普遍，这提供了深入的基于数据的研究。通过对大量医疗数据的学习，人们建立了各种数据驱动模型来预测不同医疗任务的未来事件，如自动诊断和心脏病发作预测。虽然EHR是丰富的，但是满足特定学习标准的群体却很少，这使得训练数据饥渴的深度学习模型成为一个挑战。本研究提出了Claim-Pre-Training（Claim-PT）框架，这是一个通用的预训练模型，首先对整个儿科索赔数据集进行训练，然后对每个特定人群的任务进行区分性微调。医学事件的语义可以在预训练阶段获取，通过任务感知的微调阶段完成有效的知识转移。微调过程需要在不改变模型结构的情况下进行最小的参数修改，这缓解了数据不足的问题，并有助于在小患者队列中充分训练深度学习模型。我们在一个真实世界的索赔数据集上进行了实验，这个数据集有超过一百万的患者记录。在两个下游任务上的实验结果证明了该方法的有效性：我们的通用任务无关预训练框架优于定制的任务特定模型，与基线相比，模型性能提高了10%以上。此外，我们的框架显示了巨大的推广潜力，可以将学到的知识从一个机构转移到另一个机构，为未来跨机构的医疗模式预训练铺平了道路。摘要：The adoption of electronic health records (EHR) has become universal during the past decade, which has afforded in-depth data-based research. By learning from the large amount of healthcare data, various data-driven models have been built to predict future events for different medical tasks, such as auto diagnosis and heart-attack prediction. Although EHR is abundant, the population that satisfies specific criteria for learning population-specific tasks is scarce, making it challenging to train data-hungry deep learning models. This study presents the Claim Pre-Training (Claim-PT) framework, a generic pre-training model that first trains on the entire pediatric claims dataset, followed by a discriminative fine-tuning on each population-specific task. The semantic meaning of medical events can be captured in the pre-training stage, and the effective knowledge transfer is completed through the task-aware fine-tuning stage. The fine-tuning process requires minimal parameter modification without changing the model architecture, which mitigates the data scarcity issue and helps train the deep learning model adequately on small patient cohorts. We conducted experiments on a real-world claims dataset with more than one million patient records. Experimental results on two downstream tasks demonstrated the effectiveness of our method: our general task-agnostic pre-training framework outperformed tailored task-specific models, achieving more than 10\% higher in model performance as compared to baselines. In addition, our framework showed a great generalizability potential to transfer learned knowledge from one institution to another, paving the way for future healthcare model pre-training across institutions.

【9】 Human-in-the-loop model explanation via verbatim boundary identification in generated neighborhoods 标题：基于生成邻域的逐字边界识别的人在环模型解释

作者：Xianlong Zeng,Fanghao Song,Zhongen Li,Krerkkiat Chusap,Chang Liu 链接：https://arxiv.org/abs/2106.13093 摘要：机器学习模型的黑匣子特性限制了它们在案例关键型应用中的使用，引发了导致信任危机的忠诚和道德问题。缓解这个问题的一个可能方法是理解（预测失误）决策是如何从决策边界中划分出来的。本文提出了一种人在回路的方法来解释机器学习模型使用逐字邻域表现。与现有的大多数可解释人工智能（XAI）系统提供命中或未命中的近似解释不同，该方法生成了给定实例的局部决策边界，使人类智能能够推断模型行为。该方法分为三个阶段：1）邻域生成阶段，根据给定的样本生成实例；2）分类阶段，对生成的实例进行分类，划分出局部决策边界，描述模型行为；3）人在环阶段，即人对感兴趣的邻域进行提炼和探索。在生成阶段，使用生成模型来生成给定实例周围合理的合成邻居。在分类阶段之后，分类的邻居实例提供了对模型行为的多方面理解。在人在环阶段提供了三个干预点，使人能够利用自己的智能来解释模型行为。在两个数据集上进行了多个实验，实验结果表明我们提出的方法有助于提高人们对复杂机器学习模型的理解。摘要：The black-box nature of machine learning models limits their use in case-critical applications, raising faithful and ethical concerns that lead to trust crises. One possible way to mitigate this issue is to understand how a (mispredicted) decision is carved out from the decision boundary. This paper presents a human-in-the-loop approach to explain machine learning models using verbatim neighborhood manifestation. Contrary to most of the current eXplainable Artificial Intelligence (XAI) systems, which provide hit-or-miss approximate explanations, our approach generates the local decision boundary of the given instance and enables human intelligence to conclude the model behavior. Our method can be divided into three stages: 1) a neighborhood generation stage, which generates instances based on the given sample; 2) a classification stage, which yields classifications on the generated instances to carve out the local decision boundary and delineate the model behavior; and 3) a human-in-the-loop stage, which involves human to refine and explore the neighborhood of interest. In the generation stage, a generative model is used to generate the plausible synthetic neighbors around the given instance. After the classification stage, the classified neighbor instances provide a multifaceted understanding of the model behavior. Three intervention points are provided in the human-in-the-loop stage, enabling humans to leverage their own intelligence to interpret the model behavior. Several experiments on two datasets are conducted, and the experimental results demonstrate the potential of our proposed approach for boosting human understanding of the complex machine learning model.

【10】 Privacy Threats Analysis to Secure Federated Learning 标题：保护联合学习安全的隐私威胁分析

作者：Yuchen Li,Yifan Bao,Liyao Xiang,Junhan Liu,Cen Chen,Li Wang,Xinbing Wang 机构：Wang, Senior Member, IEEE 链接：https://arxiv.org/abs/2106.13076 摘要：联合学习是一种新兴的机器学习技术，它可以跨多个分散的参与方训练模型。它以保护隐私而闻名，因为数据永远不会离开计算设备，最近的方法通过隐藏以加密方式传输的消息来进一步增强其隐私性。然而，我们发现，尽管做出了努力，联邦学习仍然存在隐私威胁，因为它在不同方面具有互动性。分析了工业级安全计算联邦学习框架中的隐私威胁，揭示了线性回归、logistic回归和决策树等典型机器学习模型中广泛存在的隐私威胁。对于线性回归和logistic回归，我们通过理论分析表明，攻击者有可能在信息很少的情况下反转受害者的全部私人输入。对于决策树模型，我们发起一个攻击来推断受害者的私有输入的范围。所有攻击都在流行的联邦学习框架和真实数据集上进行评估。摘要：Federated learning is emerging as a machine learning technique that trains a model across multiple decentralized parties. It is renowned for preserving privacy as the data never leaves the computational devices, and recent approaches further enhance its privacy by hiding messages transferred in encryption. However, we found that despite the efforts, federated learning remains privacy-threatening, due to its interactive nature across different parties. In this paper, we analyze the privacy threats in industrial-level federated learning frameworks with secure computation, and reveal such threats widely exist in typical machine learning models such as linear regression, logistic regression and decision tree. For the linear and logistic regression, we show through theoretical analysis that it is possible for the attacker to invert the entire private input of the victim, given very few information. For the decision tree model, we launch an attack to infer the range of victim's private inputs. All attacks are evaluated on popular federated learning frameworks and real-world datasets.

【11】 Fea2Fea: Exploring Structural Feature Correlations via Graph Neural Networks 标题：Fea2Fea：基于图神经网络的结构特征相关性研究

作者：Jiaqing Xie,Rex Ying 机构： Exploring Structural FeatureCorrelations via Graph Neural NetworksJiaqing Xie 1 and Rex Ying 2 1 University of Edinburgh s 200 1696, uk 2 Stanford University rexying 备注：Submitted to ECML-PKDD 2021 Graph Embedding and Mining(GEM) workshop 链接：https://arxiv.org/abs/2106.13061 摘要：结构特征是图形数据集的重要特征。然而，虽然已有一些基于协方差的特征相关性分析方法，但对于基于图神经网络模型的图的结构特征相关性研究还没有相关的研究。本文在低维空间引入图特征对特征（Fea2Fea）预测管道，探讨了基于图神经网络的结构特征关联的一些初步结果。结果表明，某些构造特征之间存在着很高的相关性。将冗余特征与初始节点特征相结合，通过图神经网络过滤，提高了在某些图数据集中的分类精度。我们比较了两种连接方法在特征间嵌入连接上的差异，结果表明最简单的方法是最好的。对合成几何图进行了推广，证明了两个结构特征之间预测困难的结果。摘要：Structural features are important features in graph datasets. However, although there are some correlation analysis of features based on covariance, there is no relevant research on exploring structural feature correlation on graphs with graph neural network based models. In this paper, we introduce graph feature to feature (Fea2Fea) prediction pipelines in a low dimensional space to explore some preliminary results on structural feature correlation, which is based on graph neural network. The results show that there exists high correlation between some of the structural features. A redundant feature combination with initial node features, which is filtered by graph neural network has improved its classification accuracy in some graph datasets. We compare the difference between concatenation methods on connecting embeddings between features and show that the simplest is the best. We generalize on the synthetic geometric graphs and certify the results on prediction difficulty between two structural features.

【12】 Kemeny ranking is NP-hard for 2-dimensional Euclidean preferences 标题：二维欧氏偏好的Kemeny排序是NP难的

作者：Bruno Escoffier,Olivier Spanjaard,Magdalena Tydrichova 机构：Tydrichov´a, Sorbonne Universit´e, CNRS, LIP, F-, Paris, France, Institut Universitaire de France, Paris, France 链接：https://arxiv.org/abs/2106.13054 摘要：假设选民的偏好具有某种共同的结构，这是规避NP困难的一种标准方法，它会导致社会选择问题。虽然Kemeny排序问题在一般情况下是NP困难的，但是如果偏好是一维欧氏的，则变得容易。在本文中，我们证明了对于d>=2的d维欧氏偏好，Kemeny排序问题是NP困难的。我们注意到这个结果也适用于Slater排序问题。摘要：The assumption that voters' preferences share some common structure is a standard way to circumvent NP-hardness results in social choice problems. While the Kemeny ranking problem is NP-hard in the general case, it is known to become easy if the preferences are 1-dimensional Euclidean. In this note, we prove that the Kemeny ranking problem is NP-hard for d-dimensional Euclidean preferences with d>=2. We note that this result also holds for the Slater ranking problem.

【13】 Autonomous Driving Strategies at Intersections: Scenarios, State-of-the-Art, and Future Outlooks 标题：交叉口自动驾驶策略：情景、现状和未来展望

作者：Lianzhen Wei,Zirui Li,Jianwei Gong,Cheng Gong,Jiachen Li 机构：of Mechanical Engineering, Beijing Institute of Technology, Beijing, China., Civil Engineering and Geosciences, Delft University of Technology, Stevinweg , CN Delft, The Netherlands., perform self-learning to minimize the delay at intersections [,]. 链接：https://arxiv.org/abs/2106.13052 摘要：由于交叉口场景的复杂性和动态性，交叉口自动驾驶策略一直是智能交通系统研究的难点和热点。本文简要总结了目前最先进的交叉口自动驾驶策略。首先，我们列举和分析了常见的交叉口场景类型、相应的仿真平台以及相关的数据集。其次，在回顾前人研究的基础上，总结了现有自主驾驶策略的特点，并对其进行了分类。最后指出了现有自主驾驶策略存在的问题，并提出了几点有价值的研究展望。摘要：Due to the complex and dynamic character of intersection scenarios, the autonomous driving strategy at intersections has been a difficult problem and a hot point in the research of intelligent transportation systems in recent years. This paper gives a brief summary of state-of-the-art autonomous driving strategies at intersections. Firstly, we enumerate and analyze common types of intersection scenarios, corresponding simulation platforms, as well as related datasets. Secondly, by reviewing previous studies, we have summarized characteristics of existing autonomous driving strategies and classified them into several categories. Finally, we point out problems of the existing autonomous driving strategies and put forward several valuable research outlooks.

【14】 Mix and Mask Actor-Critic Methods 标题：混搭和面具演员-批评方法

作者：Dom Huh 机构：Department of Computer Science, University of California, Davis, CA, USA 链接：https://arxiv.org/abs/2106.13037 摘要：参与者-批评家方法的共享特征空间旨在捕获策略和值函数所使用的广义潜在表示，以期获得更稳定和更有效的样本优化。然而，这种范式在实践中提出了许多挑战，因为生成共享表示的参数必须学习两个不同的目标，从而导致相互竞争的更新和学习扰动。在本文中，我们提出了一个新的特征共享框架来解决这些困难，通过引入混合和掩码机制和分布式规模化技术。这些机制动态地耦合和解耦策略函数和价值函数之间可变的关联潜在特征，而分布尺度化则用概率的观点标准化了这两个目标。从我们的实验结果来看，与使用独立网络和具有共享主干的网络的替代方法相比，我们证明了显著的性能改进。摘要：Shared feature spaces for actor-critic methods aims to capture generalized latent representations to be used by the policy and value function with the hopes for a more stable and sample-efficient optimization. However, such a paradigm present a number of challenges in practice, as parameters generating a shared representation must learn off two distinct objectives, resulting in competing updates and learning perturbations. In this paper, we present a novel feature-sharing framework to address these difficulties by introducing the mix and mask mechanisms and the distributional scalarization technique. These mechanisms behaves dynamically to couple and decouple connected latent features variably between the policy and value function, while the distributional scalarization standardizes the two objectives using a probabilistic standpoint. From our experimental results, we demonstrate significant performance improvements compared to alternative methods using separate networks and networks with a shared backbone.

【15】 Symmetric Wasserstein Autoencoders 标题：对称Wasserstein自动编码器

作者：Sun Sun,Hongyu Guo 机构：National Research Council Canada, Ottawa, ON., K,A ,R, Canada 备注：Accepted by UAI2021 链接：https://arxiv.org/abs/2106.13024 摘要：利用优化传输的框架，我们引入了一个新的具有可学习先验的生成式自动编码器家族，称为对称Wasserstein自动编码器（SWAEs）。我们提出对称匹配的联合分布的观测数据和潜在的代表性所引起的编码器和解码器。由此产生的算法联合优化了数据和潜在空间中的建模损失，数据空间中的损失导致了去噪效果。该算法通过对数据的对称处理和隐式表示，隐式地保留了数据在隐式空间中的局部结构。为了进一步提高潜在表征的质量，我们在目标中加入了重建损失，这对生成和重建都有很大的好处。我们的经验表明，在分类、重构和生成方面，SWAEs优于最先进的生成式自动编码器。摘要：Leveraging the framework of Optimal Transport, we introduce a new family of generative autoencoders with a learnable prior, called Symmetric Wasserstein Autoencoders (SWAEs). We propose to symmetrically match the joint distributions of the observed data and the latent representation induced by the encoder and the decoder. The resulting algorithm jointly optimizes the modelling losses in both the data and the latent spaces with the loss in the data space leading to the denoising effect. With the symmetric treatment of the data and the latent representation, the algorithm implicitly preserves the local structure of the data in the latent space. To further improve the quality of the latent representation, we incorporate a reconstruction loss into the objective, which significantly benefits both the generation and reconstruction. We empirically show the superior performance of SWAEs over the state-of-the-art generative autoencoders in terms of classification, reconstruction, and generation.

【16】 Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting 标题：Autoformer：用于长期序列预测的具有自相关功能的分解Transformer

作者：Haixu Wu,Jiehui Xu,Jianmin Wang,Mingsheng Long 机构：School of Software, BNRist, Tsinghua University, China 链接：https://arxiv.org/abs/2106.13008 摘要：延长预测时间是极端天气预警和长期能源消耗规划等实际应用的关键需求。本文研究了时间序列的长期预测问题。先前基于Transformer的模型采用各种自我注意机制来发现长程依赖关系。然而，长期未来复杂的时间模式阻碍了模型找到可靠的依赖关系。同时，为了提高长串效率，Transformer必须采用点式自关注的稀疏形式，造成了信息利用的瓶颈。针对这些挑战，我们提出了一种具有自相关机制的新型分解结构Autoformer。我们超越了级数分解的预处理约定，将其更新为深度模型的基本内部块。这种设计使Autoformer具有复杂时间序列的渐进分解能力。进一步，受随机过程理论的启发，设计了基于序列周期性的自相关机制，在子序列层次上进行相关性发现和表示聚合。自相关在效率和准确性上都优于自我注意。在长期预测中，Autoformer的精度达到了最先进的水平，在六个基准上相对提高了38%，涵盖了五个实际应用：能源、交通、经济、天气和疾病。摘要：Extending the forecasting time is a critical demand for real applications, such as extreme weather early warning and long-term energy consumption planning. This paper studies the \textit{long-term forecasting} problem of time series. Prior Transformer-based models adopt various self-attention mechanisms to discover the long-range dependencies. However, intricate temporal patterns of the long-term future prohibit the model from finding reliable dependencies. Also, Transformers have to adopt the sparse versions of point-wise self-attentions for long series efficiency, resulting in the information utilization bottleneck. Towards these challenges, we propose Autoformer as a novel decomposition architecture with an Auto-Correlation mechanism. We go beyond the pre-processing convention of series decomposition and renovate it as a basic inner block of deep models. This design empowers Autoformer with progressive decomposition capacities for complex time series. Further, inspired by the stochastic process theory, we design the Auto-Correlation mechanism based on the series periodicity, which conducts the dependencies discovery and representation aggregation at the sub-series level. Auto-Correlation outperforms self-attention in both efficiency and accuracy. In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a 38% relative improvement on six benchmarks, covering five practical applications: energy, traffic, economics, weather and disease.

【17】 Bayesian Optimization with High-Dimensional Outputs 标题：高维输出的贝叶斯优化

作者：Wesley J. Maddox,Maximilian Balandat,Andrew Gordon Wilson,Eytan Bakshy 机构：New York University, Facebook 链接：https://arxiv.org/abs/2106.12997 摘要：贝叶斯优化是一种样本有效的黑盒优化过程，通常适用于具有少量独立目标的问题。然而，在实践中，我们经常希望优化在许多相关结果（或“任务”）上定义的目标。例如，科学家们可能希望优化密集网格中的基站网络覆盖范围。类似地，工程师们可能会寻求通过约束或鲁棒优化来平衡机器人在数十种不同环境中的性能。然而，高斯过程（GP）模型通常被用作多任务贝叶斯优化的概率替代模型，其结果数量的伸缩性较差，极大地限制了其适用性。我们设计了一种有效的精确多任务GP抽样技术，将协方差矩阵中的Kronecker结构与Matheron的恒等式相结合，使得我们能够使用具有成千上万个相关输出的精确多任务GP模型进行贝叶斯优化。通过这样做，与现有方法相比，我们在样本效率方面取得了实质性的改进，这些方法只对结果的聚合函数进行建模。我们演示了如何在科学和工程领域的一系列任务中打开贝叶斯优化的一类新应用，包括优化具有65000多个输出的光学干涉仪的干涉图。摘要：Bayesian Optimization is a sample-efficient black-box optimization procedure that is typically applied to problems with a small number of independent objectives. However, in practice we often wish to optimize objectives defined over many correlated outcomes (or ``tasks"). For example, scientists may want to optimize the coverage of a cell tower network across a dense grid of locations. Similarly, engineers may seek to balance the performance of a robot across dozens of different environments via constrained or robust optimization. However, the Gaussian Process (GP) models typically used as probabilistic surrogates for multi-task Bayesian Optimization scale poorly with the number of outcomes, greatly limiting applicability. We devise an efficient technique for exact multi-task GP sampling that combines exploiting Kronecker structure in the covariance matrices with Matheron's identity, allowing us to perform Bayesian Optimization using exact multi-task GP models with tens of thousands of correlated outputs. In doing so, we achieve substantial improvements in sample efficiency compared to existing approaches that only model aggregate functions of the outcomes. We demonstrate how this unlocks a new class of applications for Bayesian Optimization across a range of tasks in science and engineering, including optimizing interference patterns of an optical interferometer with more than 65,000 outputs.

【18】 RikoNet: A Novel Anime Recommendation Engine 标题：RikoNet：一种新颖的动漫推荐引擎

作者：Badal Soni,Debangan Thakuria,Nilutpal Nath,Navarun Das,Bhaskarananda Boro 机构：Received: DD Month YEAR Accepted: DD Month YEAR 备注：19 pages 链接：https://arxiv.org/abs/2106.12970 摘要：动漫在今天很受欢迎，尤其是在年轻一代中。随着各种类型节目的出现，越来越多的人越来越被娱乐业这个利基领域所吸引。由于动漫最近获得了主流的关注，我们对于用户的喜好和观看习惯还没有足够的信息。因此，为这种相对晦涩的娱乐媒体构建推荐引擎是一项艰巨的任务。在这个尝试中，我们建立了一个新的混合推荐系统，既可以作为推荐系统，也可以作为探索新的动漫类型和标题的手段。我们分析了该领域的总体趋势和用户的观看习惯，提出了有效的解决方案。我们的解决方案使用深度自动编码器来预测评级和生成嵌入。接下来，我们使用动画标题的嵌入来形成集群。这些簇形成了具有相似性的动画搜索空间，用于查找与用户喜欢或不喜欢的动画相似的动画。这种方法，结合预测评级，形成了新的混合滤波器。在本文中，我们演示了这一思想，并将我们实现的模型的性能与现有的最新技术进行了比较。摘要：Anime is quite well-received today, especially among the younger generations. With many genres of available shows, more and more people are increasingly getting attracted to this niche section of the entertainment industry. As anime has recently garnered mainstream attention, we have insufficient information regarding users' penchant and watching habits. Therefore, it is an uphill task to build a recommendation engine for this relatively obscure entertainment medium. In this attempt, we have built a novel hybrid recommendation system that could act both as a recommendation system and as a means of exploring new anime genres and titles. We have analyzed the general trends in this field and the users' watching habits for coming up with our efficacious solution. Our solution employs deep autoencoders for the tasks of predicting ratings and generating embeddings. Following this, we formed clusters using the embeddings of the anime titles. These clusters form the search space for anime with similarities and are used to find anime similar to the ones liked and disliked by the user. This method, combined with the predicted ratings, forms the novel hybrid filter. In this article, we have demonstrated this idea and compared the performance of our implemented model with the existing state-of-the-art techniques.

【19】 Modelling Art Interpretation and Meaning. A Data Model for Describing Iconology and Iconography 标题：造型艺术的诠释与意义。一种描述图像学和图像学的数据模型

作者：S. Baroncini,M. Daquino,F. Tomasi 机构： A. WARBURG, The renewal of pagan antiquity. Contributions to the cultural history of the European Renaissance, Getty Research Institute for the, History of Art and the Humanities, Los Angeles ,. 备注：16 pages, 7 figures 链接：https://arxiv.org/abs/2106.12967 摘要：图像学是艺术史的一个分支，研究艺术作品与其社会文化背景的关系。目前，一些跨学科的研究领域利用接近图像学的理论框架，运用数据科学方法和语义网技术，对艺术史进行定量研究。然而，尽管图像学研究最近在本体论中得到了解决，但仍然缺少对图像学研究相关方面的完整描述。在这篇文章中，我们对从文献中选取的11个案例研究进行了初步研究，并展望了扩展现有本体的新术语。我们根据一个通用的评估方法来验证新的术语，并根据这样一个扩展的本体论在数字艺术史社区中出现的机会来讨论我们的结果。摘要：Iconology is a branch of art history that investigates the meaning of artworks in relation to their social and cultural background. Nowadays, several interdisciplinary research fields leverage theoretical frameworks close to iconology to pursue quantitative Art History with data science methods and Semantic Web technologies. However, while Iconographic studies have been recently addressed in ontologies, a complete description of aspects relevant to iconological studies is still missing. In this article, we present a preliminary study on eleven case studies selected from the literature and we envision new terms for extending existing ontologies. We validate new terms according to a common evaluation method and we discuss our results in the light of the opportunities that such an extended ontology would arise in the community of Digital Art History.

【20】 AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry 标题：AIT-QA：航空业复杂表格上的问答数据集

作者：Yannis Katsis,Saneem Chemmengath,Vishwajeet Kumar,Samarth Bharadwaj,Mustafa Canim,Michael Glass,Alfio Gliozzo,Feifei Pan,Jaydeep Sen,Karthik Sankaranarayanan,Soumen Chakrabarti 机构：IBM Research ,Rensselaer Polytechnic Institute ,IIT Bombay 链接：https://arxiv.org/abs/2106.12944 摘要：Transformer的最新进展使表问答（tableqa）系统能够在WikiTableQuestions和WikiSQL等开放域数据集上实现高精度和SOTA结果。这类转换器经常在诸如Wikipedia之类的开放域内容上进行预先训练，在Wikipedia中，它们有效地对来自Wikipedia的问题和相应的表进行编码，如表QA数据集中所示。然而，Wikipedia中的web表在布局上明显是扁平的，第一行是唯一的列标题。这种布局有助于表的关系视图，其中每一行都是一个元组。然而，特定于领域的业务或科学文档中的表通常具有更复杂的布局，包括分层的行和列标题，此外还具有来自该领域的专门词汇表术语。为了解决这个问题，我们引入了特定于领域的表QA数据集AIT-QA（航空业表QA）。该数据集由515个问题组成，这些问题是由人工注释者在116个表格上编写的，这些表格摘自美国证券交易委员会的公开文件（公开网址：https://www.sec.gov/edgar.shtml)2017-2019财年的主要航空公司。我们还提供有关问题性质的注释，标记那些需要层次化标题、特定领域术语和释义形式的内容。我们对三种基于transformer的SOTA表QA方法TaPAS（端到端）、TaBERT（基于语义分析）和RCI（基于行-列编码）的零炮基线评估清楚地暴露了这些方法在这种实际设置中的局限性，其最佳准确率仅为51.8%（RCI）。我们还介绍了实用的表预处理步骤，用于将这些复杂的表透视并投影到适合SOTA表QA模型的布局中。摘要：Recent advances in transformers have enabled Table Question Answering (Table QA) systems to achieve high accuracy and SOTA results on open domain datasets like WikiTableQuestions and WikiSQL. Such transformers are frequently pre-trained on open-domain content such as Wikipedia, where they effectively encode questions and corresponding tables from Wikipedia as seen in Table QA dataset. However, web tables in Wikipedia are notably flat in their layout, with the first row as the sole column header. The layout lends to a relational view of tables where each row is a tuple. Whereas, tables in domain-specific business or scientific documents often have a much more complex layout, including hierarchical row and column headers, in addition to having specialized vocabulary terms from that domain. To address this problem, we introduce the domain-specific Table QA dataset AIT-QA (Airline Industry Table QA). The dataset consists of 515 questions authored by human annotators on 116 tables extracted from public U.S. SEC filings (publicly available at: https://www.sec.gov/edgar.shtml) of major airline companies for the fiscal years 2017-2019. We also provide annotations pertaining to the nature of questions, marking those that require hierarchical headers, domain-specific terminology, and paraphrased forms. Our zero-shot baseline evaluation of three transformer-based SOTA Table QA methods - TaPAS (end-to-end), TaBERT (semantic parsing-based), and RCI (row-column encoding-based) - clearly exposes the limitation of these methods in this practical setting, with the best accuracy at just 51.8\% (RCI). We also present pragmatic table preprocessing steps used to pivot and project these complex tables into a layout suitable for the SOTA Table QA models.

【21】 MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction 标题：MatchVIE：利用实体间的匹配相关性进行视觉信息提取

作者：Guozhi Tang,Lele Xie,Lianwen Jin,Jiapeng Wang,Jingdong Chen,Zhen Xu,Qianying Wang,Yaqiang Wu,Hui Li 机构：School of Electronic and Information Engineering, South China University of Technology, China, Guangdong Artificial Intelligence and Digital Economy Laboratory (Pazhou Lab), Guangzhou, China, Ant Group, China, Lenovo Research, China 备注：accepted by IJCAI 2021 链接：https://arxiv.org/abs/2106.12940 摘要：视觉信息提取（visualinformationextraction，VIE）任务旨在从各种文档图像（如发票和采购收据）中提取关键信息。以往的方法大多将VIE任务简单地看作一个序列标号问题或分类问题，需要模型通过引入字体、颜色、布局等多模态特征来仔细识别各种语义。但是，简单地引入多模态特征在面对数字语义类别或一些模糊文本时是不可能的。针对这一问题，本文提出了一种基于图神经网络的关键值匹配模型（MatchVIE）。该方法通过基于相关性评价的键值匹配，绕过了对各种语义的识别，只关注实体间的强相关性。此外，我们还引入了一种简单而有效的运算Num2Vec来解决编码值的不稳定性问题，使模型收敛更加平稳。综合实验表明，本文提出的MatchVIE算法的性能明显优于以往的方法。值得注意的是，据我们所知，MatchVIE可能是第一次尝试通过建模键和值之间的相关性来处理VIE任务，它是对现有方法的一个很好的补充。摘要：Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE task simply as a sequence labeling problem or classification problem, which requires models to carefully identify each kind of semantics by introducing multimodal features, such as font, color, layout. But simply introducing multimodal features couldn't work well when faced with numeric semantic categories or some ambiguous texts. To address this issue, in this paper we propose a novel key-value matching model based on a graph neural network for VIE (MatchVIE). Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics, and simply focuses on the strong relevancy between entities. Besides, we introduce a simple but effective operation, Num2Vec, to tackle the instability of encoded values, which helps model converge more smoothly. Comprehensive experiments demonstrate that the proposed MatchVIE can significantly outperform previous methods. Notably, to the best of our knowledge, MatchVIE may be the first attempt to tackle the VIE task by modeling the relevancy between keys and values and it is a good complement to the existing methods.

【22】 Optimizing piano practice with a utility-based scaffold 标题：基于效用的支架优化钢琴练习

作者：Alexandra Moringen,Sören Rüttgers,Luisa Zintgraf,Jason Friedman,Helge Ritter 机构：University Bielefeld, University of Oxford, Tel Aviv University 链接：https://arxiv.org/abs/2106.12937 摘要：学习弹钢琴的一个典型部分是通过一系列的练习单元进行的，这些练习单元集中在技巧的各个方面，例如手的协调、正确的姿势或正确的时机。理想情况下，对一种特定的练习方法的关注，应该使学习者在学习钢琴方面取得最大的进步。因为我们每个人的学习方式不同，而且可能的钢琴练习任务和方法有很多选择，练习任务的设置应该动态地适应人类的学习者。然而，有一个人类教师指导个人实践并不总是可行的，因为它是耗时的，昂贵的，并不总是可用的。相反，我们建议在实践方法的空间上进行优化，即所谓的实践模式。提出的优化过程考虑了个体学习者的技能和他们的学习历史。在这项工作中，我们提出了一个模型框架，通过选择具有最高预期效用（即钢琴演奏技能的提高）的练习模式来引导人类学习者通过学习过程。为此，我们提出了一个基于高斯过程的人类学习者效用模型，并以模拟人类学习者为例，举例说明了该模型的训练及其在实践支架中的应用。摘要：A typical part of learning to play the piano is the progression through a series of practice units that focus on individual dimensions of the skill, such as hand coordination, correct posture, or correct timing. Ideally, a focus on a particular practice method should be made in a way to maximize the learner's progress in learning to play the piano. Because we each learn differently, and because there are many choices for possible piano practice tasks and methods, the set of practice tasks should be dynamically adapted to the human learner. However, having a human teacher guide individual practice is not always feasible since it is time consuming, expensive, and not always available. Instead, we suggest to optimize in the space of practice methods, the so-called practice modes. The proposed optimization process takes into account the skills of the individual learner and their history of learning. In this work we present a modeling framework to guide the human learner through the learning process by choosing practice modes that have the highest expected utility (i.e., improvement in piano playing skill). To this end, we propose a human learner utility model based on a Gaussian process, and exemplify the model training and its application for practice scaffolding on an example of simulated human learners.

【23】 Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech 标题：低资源高表现性语音的显式持续时间非自回归TTS建模

作者：Raahil Shah,Kamil Pokora,Abdelhamid Ezzerg,Viacheslav Klimkov,Goeric Huybrechts,Bartosz Putrycz,Daniel Korzekwa,Thomas Merritt 机构：Amazon Text-to-Speech Research 备注：6 pages, 5 figures. Accepted to Speech Synthesis Workshop (SSW) 2021 链接：https://arxiv.org/abs/2106.12896 摘要：虽然最近的神经文本到语音（TTS）方法产生高质量的语音，但它们通常需要目标说话人的大量录音。在以前的工作中，提出了一种三步方法来生成高质量的TTS，同时大大减少了训练所需的数据量。然而，我们观察到，当使用这种方法时，高表达声音的自然度水平会出现天花板效应。在本文中，我们提出了一种方法来建立高表现力的TTS语音，只需15分钟的语音数据从目标发言者。与目前最先进的方法相比，我们提出的改进方案在语音自然度和说话人相似性方面与录音的差距分别缩小了23.3%和16.3%。此外，我们仅使用15分钟的目标说话人数据来匹配基于Tacotron2的完整数据模型（约10小时）的自然度和说话人相似性，而在30分钟或更长时间内，我们的性能明显优于它。提出了以下改进：1）由基于注意的自回归TTS模型改为非自回归模型，用外部持续时间模型代替注意；2）增加一个基于条件生成对抗网络（cGAN）的微调步骤。摘要：Whilst recent neural text-to-speech (TTS) approaches produce high-quality speech, they typically require a large amount of recordings from the target speaker. In previous work, a 3-step method was proposed to generate high-quality TTS while greatly reducing the amount of data required for training. However, we have observed a ceiling effect in the level of naturalness achievable for highly expressive voices when using this approach. In this paper, we present a method for building highly expressive TTS voices with as little as 15 minutes of speech data from the target speaker. Compared to the current state-of-the-art approach, our proposed improvements close the gap to recordings by 23.3% for naturalness of speech and by 16.3% for speaker similarity. Further, we match the naturalness and speaker similarity of a Tacotron2-based full-data (~10 hours) model using only 15 minutes of target speaker data, whereas with 30 minutes or more, we significantly outperform it. The following improvements are proposed: 1) changing from an autoregressive, attention-based TTS model to a non-autoregressive model replacing attention with an external duration model and 2) an additional Conditional Generative Adversarial Network (cGAN) based fine-tuning step.

【24】 rSoccer: A Framework for Studying Reinforcement Learning in Small and Very Small Size Robot Soccer 标题：rSocball：一种研究小型和超小型机器人足球强化学习的框架

作者：Felipe B. Martins,Mateus G. Machado,Hansenclever F. Bassani,Pedro H. M. Braga,Edna S. Barros 机构：Centro de Inform´atica - Universidade Federal de Pernambuco, Av. Jornalista Anibal, Fernandes, sn - CDU ,.,-, Recife, PE, Brazil. 链接：https://arxiv.org/abs/2106.12895 摘要：强化学习是一个活跃的研究领域，在机器人学中有着广泛的应用，而RoboCup竞赛是研究和评价强化学习方法的一个有趣的环境。将强化学习应用于机器人学的一个已知困难是需要大量的经验样本，即使用模拟环境训练代理，然后将学习转移到现实世界（sim-to-real）是一条可行的路径。本文介绍了一个用于IEEE超小型足球和小型联赛的开放源代码模拟器，该模拟器针对强化学习实验进行了优化。我们还提出了一个框架，用于创建OpenAI健身房环境和一组基准任务，用于评估单agent和多agent机器人足球技能。然后，我们展示了两种最先进的强化学习方法的学习能力，以及它们在该框架中引入的特定场景中的局限性。我们相信，这将使更多的团队更容易在这些类别的竞争中使用端到端强化学习方法，并进一步发展这一研究领域。摘要：Reinforcement learning is an active research area with a vast number of applications in robotics, and the RoboCup competition is an interesting environment for studying and evaluating reinforcement learning methods. A known difficulty in applying reinforcement learning to robotics is the high number of experience samples required, being the use of simulated environments for training the agents followed by transfer learning to real-world (sim-to-real) a viable path. This article introduces an open-source simulator for the IEEE Very Small Size Soccer and the Small Size League optimized for reinforcement learning experiments. We also propose a framework for creating OpenAI Gym environments with a set of benchmarks tasks for evaluating single-agent and multi-agent robot soccer skills. We then demonstrate the learning capabilities of two state-of-the-art reinforcement learning methods as well as their limitations in certain scenarios introduced in this framework. We believe this will make it easier for more teams to compete in these categories using end-to-end reinforcement learning approaches and further develop this research area.

【25】 InFlow: Robust outlier detection utilizing Normalizing Flows 标题：流入：利用归一化流进行稳健的离群值检测

作者：Nishant Kumar,Pia Hanfeld,Michael Hecht,Michael Bussmann,Stefan Gumhold,Nico Hoffmannn 机构：HZDR, Dresden, Germany, CASUS, Görlitz, Germany, CGV, TU Dresden, Nico Hoffmann 链接：https://arxiv.org/abs/2106.12894 摘要：规范化流是一种重要的深层生成模型，它提供了易于处理的概率分布和有效的密度估计。然而，众所周知，它们在检测非分布（OOD）输入时失败，因为它们直接在其潜在空间中编码输入表示的局部特征。在本文中，我们通过证明流如果通过注意机制扩展，可以可靠地检测异常值（包括对抗性攻击），来解决规范化流的过度自信问题。我们的方法不需要异常数据进行训练，我们通过在不同的实验环境中报告最先进的性能来展示我们的OOD检测方法的效率。代码位于https://github.com/ComputationalRadiationPhysics/InFlow . 摘要：Normalizing flows are prominent deep generative models that provide tractable probability distributions and efficient density estimation. However, they are well known to fail while detecting Out-of-Distribution (OOD) inputs as they directly encode the local features of the input representations in their latent space. In this paper, we solve this overconfidence issue of normalizing flows by demonstrating that flows, if extended by an attention mechanism, can reliably detect outliers including adversarial attacks. Our approach does not require outlier data for training and we showcase the efficiency of our method for OOD detection by reporting state-of-the-art performance in diverse experimental settings. Code available at https://github.com/ComputationalRadiationPhysics/InFlow .

【26】 Encoding Involutory Invariance in Neural Networks 标题：神经网络中对合不变性的编码

作者：Anwesh Bhattacharya,Marios Mattheakis,Pavlos Protopapas 机构：Department of Physics, Deparment of CS&IS, Birla Institute of Technology & Science Pilani, Pilani, Rajasthan, India - , John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts , United States 备注：19 pages, 12 figures 链接：https://arxiv.org/abs/2106.12891 摘要：在某些情况下，神经网络（NN）的训练数据遵循基本的物理对称性。然而，除非嵌入到网络结构中，否则不能保证NNs服从潜在的对称性。在这项工作中，我们探索了一种特殊的对称性，其中函数对于奇偶校验$p=\pm1$的对合线性/仿射变换是不变的。我们发展数学定理，并提出神经网络架构，以确保不变性和普遍逼近性质。数值实验表明，在考虑外加对称性的情况下，该模型的性能优于基线网络。对于具有固有水平/垂直反射对称性的数据集，我们还提出了一种适用于卷积神经网络分类任务的方法。摘要：In certain situations, Neural Networks (NN) are trained upon data that obey underlying physical symmetries. However, it is not guaranteed that NNs will obey the underlying symmetry unless embedded in the network structure. In this work, we explore a special kind of symmetry where functions are invariant with respect to involutory linear/affine transformations up to parity $p=\pm 1$. We develop mathematical theorems and propose NN architectures that ensure invariance and universal approximation properties. Numerical experiments indicate that the proposed models outperform baseline networks while respecting the imposed symmetry. An adaption of our technique to convolutional NN classification tasks for datasets with inherent horizontal/vertical reflection symmetry has also been proposed.

【27】 A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models 标题：一种用于训练机器学习模型去偏的近似最优算法

作者：Ibrahim Alabdulmohsin,Mario Lucic 机构：Google Research, Brain Team, Zürich, Switzerland 备注：21 pages, 5 figures 链接：https://arxiv.org/abs/2106.12887 摘要：我们提出了一个可扩展的后处理算法，用于去除训练模型，包括深度神经网络（DNNs），我们通过限制其过度Bayes风险证明了它是接近最优的。我们在经典算法和现代DNN体系结构的标准基准数据集上验证了它的优势，并证明了它在处理性能上优于以前的后处理方法。此外，我们还证明了所提出的算法对于大规模训练的模型是特别有效的，其中后处理是一种自然而实用的选择。摘要：We present a scalable post-processing algorithm for debiasing trained models, including deep neural networks (DNNs), which we prove to be near-optimal by bounding its excess Bayes risk. We empirically validate its advantages on standard benchmark datasets across both classical algorithms as well as modern DNN architectures and demonstrate that it outperforms previous post-processing methods while performing on par with in-processing. In addition, we show that the proposed algorithm is particularly effective for models trained at scale where post-processing is a natural and practical choice.

【28】 DCoM: A Deep Column Mapper for Semantic Data Type Detection 标题：DCOM：一种用于语义数据类型检测的深列映射器

作者：Subhadip Maji,Swapna Sourav Rout,Sudeep Choudhary 机构：Optum Global Solutions, Bangalore, India , Senior Data Scientist 备注：9 pages, 2 figures, 7 tables 链接：https://arxiv.org/abs/2106.12871 摘要：语义数据类型检测是数据科学中一项非常重要的任务，可以实现数据的自动清洗、模式匹配、数据发现、语义数据类型规范化和敏感数据识别。现有的方法包括基于正则表达式或基于字典查找的方法，这些方法对脏数据和看不见的数据不具有鲁棒性，并且只能预测非常少的语义数据类型。现有的机器学习方法从数据中提取大量的工程特征，建立logistic回归、随机森林或前馈神经网络。在本文中，我们引入了DCoM（一种基于多输入NLP的深度神经网络）来检测语义数据类型，它不是从数据中提取大量的特征，而是将列（或实例）的原始值作为文本输入到模型中。我们在从VizNet语料库中提取的686765个数据列上训练DCoM，这些数据列包含78种不同的语义数据类型。在同一数据集上，DCoM的性能优于其他当代结果，具有相当大的优势。摘要：Detection of semantic data types is a very crucial task in data science for automated data cleaning, schema matching, data discovery, semantic data type normalization and sensitive data identification. Existing methods include regular expression-based or dictionary lookup-based methods that are not robust to dirty as well unseen data and are limited to a very less number of semantic data types to predict. Existing Machine Learning methods extract large number of engineered features from data and build logistic regression, random forest or feedforward neural network for this purpose. In this paper, we introduce DCoM, a collection of multi-input NLP-based deep neural networks to detect semantic data types where instead of extracting large number of features from the data, we feed the raw values of columns (or instances) to the model as texts. We train DCoM on 686,765 data columns extracted from VizNet corpus with 78 different semantic data types. DCoM outperforms other contemporary results with a quite significant margin on the same dataset.

【29】 Awareness Logic: Kripke Lattices as a Middle Ground between Syntactic and Semantic Models 标题：意识逻辑：作为句法模型和语义模型中间地带的克里普克格子

作者：Gaia Belardinelli,Rasmus K. Rendsvig 机构：Center for Information and Bubble Studies, University of Copenhagen 备注：arXiv admin note: substantial text overlap with arXiv:2012.12982 链接：https://arxiv.org/abs/2106.12868 摘要：有关感知建模的文献包括无语法框架和基于语法的框架。Heifetz，Meier\&Schipper（HMS）提出了一个语法无关的意识格模型。虽然他们的格方法优雅而直观，但它排除了依赖形式语言来归纳格的简单选择，并且没有明确区分不确定性和无意识。与此相反，最突出的基于语法的解决方案，费金-哈珀（FH）模型，解释了这一区别，并提供了一个简单的表示意识，但缺乏直观的晶格结构。在这里，我们结合了这两种方法，提供了一个由原子子集包含引起的Kripke模型的格，其中不确定性和不确定性是分开的。通过定义HMS和FH模型之间的变换，我们证明了我们的模型等价于HMS和FH模型，保持了语言公式对显式知识的满足，并通过我们和HMS的结果获得了完整性。最后，我们证明了Kripke格模型与FH模型（当意识是命题确定的）在一般意识逻辑的语言上是等价的，FH模型最初被提出。摘要：The literature on awareness modeling includes both syntax-free and syntax-based frameworks. Heifetz, Meier \& Schipper (HMS) propose a lattice model of awareness that is syntax-free. While their lattice approach is elegant and intuitive, it precludes the simple option of relying on formal language to induce lattices, and does not explicitly distinguish uncertainty from unawareness. Contra this, the most prominent syntax-based solution, the Fagin-Halpern (FH) model, accounts for this distinction and offers a simple representation of awareness, but lacks the intuitiveness of the lattice structure. Here, we combine these two approaches by providing a lattice of Kripke models, induced by atom subset inclusion, in which uncertainty and unawareness are separate. We show our model equivalent to both HMS and FH models by defining transformations between them which preserve satisfaction of formulas of a language for explicit knowledge, and obtain completeness through our and HMS' results. Lastly, we prove that the Kripke lattice model can be shown equivalent to the FH model (when awareness is propositionally determined) also with respect to the language of the Logic of General Awareness, for which the FH model where originally proposed.

【30】 Extraction of common conceptual components from multiple ontologies 标题：从多个本体中提取公共概念组件

作者：Luigi Asprino,Valentina Anita Carriero,Valentina Presutti 机构：Department of Classical Philology and Italian Studies, University of Bologna, Via Zamboni , Bologna, Italy, Department of Computer Science and Engineering, Mura Anteo Zamboni , Bologna, Italy, Department of Modern Languages, Literatures, and Culture 链接：https://arxiv.org/abs/2106.12831 摘要：我们描述了一种从领域本体中识别和提取概念组件的新方法，用于理解和比较概念组件。该方法分别应用于文化遗产领域和会议领域的两个本体语料库。结果，显示出良好的质量，通过手动检查，并通过与数据集和工具性能的相关性，从本体对齐评估倡议进行评估。摘要：We describe a novel method for identifying and extracting conceptual components from domain ontologies, which are used to understand and compare them. The method is applied to two corpora of ontologies in the Cultural Heritage and Conference domain, respectively. The results, which show good quality, are evaluated by manual inspection and by correlation with datasets and tool performance from the ontology alignment evaluation initiative.

【31】 A comprehensive empirical analysis on cross-domain semantic enrichment for detection of depressive language 标题：跨领域语义丰富检测抑郁性语言的综合实证分析

作者：Nawshad Farruque,Randy Goebel,Osmar Zaiane 机构：Department of Computing Science, University of Alberta, Alberta, T,G ,E, Canada 备注：This is an extension over ECML-PKDD, 2019 paper "Augmenting Semantic Representation of Depressive Language: from Forums to Microblogs", with more embedding mapping/augmentation methods and data ablation tests. These experiments were done in the year 2019 链接：https://arxiv.org/abs/2106.12797 摘要：我们分析了在注释数据稀少的情况下，例如，在tweet的抑郁语言检测中，为学习任务设计的单词嵌入特征表示的创建过程。我们首先从一个大的通用数据集中预先训练的丰富的单词嵌入开始，然后通过一个简单的非线性映射机制从一个更小更具体的领域数据集中学习嵌入来增强它。我们还尝试了其他一些更复杂的映射方法，包括基于自动编码器和基于自定义损失函数的方法，这些方法通过逐渐学习接近语义相似的单词和远离语义不同的单词来学习嵌入表示。我们的强化表示更好地捕捉了抑郁症领域的语义，因为它结合了从特定领域学习到的语义和从一般语言中获得的单词覆盖率。我们还提出了一个简单的词袋模型，众所周知的情感和心理语言学词汇，以及一个一般的预先训练的词嵌入表示的比较性能分析。当使用多种不同的机器学习方法（包括抑郁Tweets识别任务中的深度学习模型）作为特征表示时，我们发现我们的增强词嵌入表示比其他方法获得了显著更好的F1分数，特别是当应用于高质量的数据集时。此外，我们还提供了一些数据消融试验，证实了我们的增强技术的有效性。摘要：We analyze the process of creating word embedding feature representations designed for a learning task when annotated data is scarce, for example, in depressive language detection from Tweets. We start with a rich word embedding pre-trained from a large general dataset, which is then augmented with embeddings learned from a much smaller and more specific domain dataset through a simple non-linear mapping mechanism. We also experimented with several other more sophisticated methods of such mapping including, several auto-encoder based and custom loss-function based methods that learn embedding representations through gradually learning to be close to the words of similar semantics and distant to dissimilar semantics. Our strengthened representations better capture the semantics of the depression domain, as it combines the semantics learned from the specific domain coupled with word coverage from the general language. We also present a comparative performance analyses of our word embedding representations with a simple bag-of-words model, well known sentiment and psycholinguistic lexicons, and a general pre-trained word embedding. When used as feature representations for several different machine learning methods, including deep learning models in a depressive Tweets identification task, we show that our augmented word embedding representations achieve a significantly better F1 score than the others, specially when applied to a high quality dataset. Also, we present several data ablation tests which confirm the efficacy of our augmentation techniques.

【32】 Online Verification of Deep Neural Networks under Domain or Weight Shift 标题：域或权值漂移条件下的深度神经网络在线验证

作者：Tianhao Wei,Changliu Liu 机构：Carnegie Mellon University 链接：https://arxiv.org/abs/2106.12732 摘要：虽然神经网络的应用非常广泛，但在实际应用中，正式验证神经网络的安全性和鲁棒性仍然是一个挑战。现有的方法都是在使用前对网络进行验证，仅限于相对简单的规范和固定的网络。这些方法还不能应用于具有复杂和/或动态变化的规范和网络的实际问题。为了有效地处理动态变化的规范和网络，当这些变化发生时，需要在线执行验证。然而，在线运行现有的验证算法仍然是一个挑战。我们的关键见解是，我们可以利用这些更改的时间依赖性来加速验证过程，例如，通过使用以前的验证结果热启动新的在线验证。本文提出了一种新的可扩展在线验证框架，解决了规范和/或网络动态变化的实际验证问题，即域转移和权值转移。我们提出三种技术（分支管理、扰动容限分析和增量计算）来加速深层神经网络的在线验证。实验结果表明，我们的在线验证算法比现有的验证算法快两个数量级，可以扩展到实际应用中。摘要：Although neural networks are widely used, it remains challenging to formally verify the safety and robustness of neural networks in real-world applications. Existing methods are designed to verify the network before use, which is limited to relatively simple specifications and fixed networks. These methods are not ready to be applied to real-world problems with complex and/or dynamically changing specifications and networks. To effectively handle dynamically changing specifications and networks, the verification needs to be performed online when these changes take place. However, it is still challenging to run existing verification algorithms online. Our key insight is that we can leverage the temporal dependencies of these changes to accelerate the verification process, e.g., by warm starting new online verification using previous verified results. This paper establishes a novel framework for scalable online verification to solve real-world verification problems with dynamically changing specifications and/or networks, known as domain shift and weight shift respectively. We propose three types of techniques (branch management, perturbation tolerance analysis, and incremental computation) to accelerate the online verification of deep neural networks. Experiment results show that our online verification algorithm is up to two orders of magnitude faster than existing verification algorithms, and thus can scale to real-world applications.

【33】 Sparse Flows: Pruning Continuous-depth Models 标题：稀疏流：修剪连续深度模型

作者：Lucas Liebenwein,Ramin Hasani,Alexander Amini,Daniela Rus 机构：MIT CSAIL 链接：https://arxiv.org/abs/2106.12718 摘要：连续深度学习体系结构可以学习灵活的概率模型，用于预测建模（如神经常微分方程）和生成建模（如连续规范化流）。在这项工作中，我们设计了一个框架，通过删减这些连续深度模型的网络结构来破译它们的内部动态。我们的实验结果表明，剪枝可以提高生成模型中神经ODEs的泛化能力。此外，剪枝发现最小和有效的神经ODE表示与原始网络相比，参数减少98\%，而不损失准确性。最后，我们证明了应用剪枝可以获得关于设计更好的神经ODEs的有见地的信息，我们希望我们的结果能为进一步研究现代连续深度模型的性能-大小权衡提供动力。摘要：Continuous deep learning architectures enable learning of flexible probabilistic models for predictive modeling as neural ordinary differential equations (ODEs), and for generative modeling as continuous normalizing flows. In this work, we design a framework to decipher the internal dynamics of these continuous depth models by pruning their network architectures. Our empirical results suggest that pruning improves generalization for neural ODEs in generative modeling. Moreover, pruning finds minimal and efficient neural ODE representations with up to 98\% less parameters compared to the original network, without loss of accuracy. Finally, we show that by applying pruning we can obtain insightful information about the design of better neural ODEs.We hope our results will invigorate further research into the performance-size trade-offs of modern continuous-depth models.

【34】 Fairness via Representation Neutralization 标题：通过表征中和实现公平

作者：Mengnan Du,Subhabrata Mukherjee,Guanchu Wang,Ruixiang Tang,Ahmed Hassan Awadallah,Xia Hu 机构：Texas A&M University,Microsoft Research 链接：https://arxiv.org/abs/2106.12674 摘要：现有的DNN模型的偏差缓解方法主要是学习基于Debias的编码器。这个过程不仅需要对敏感属性进行大量实例级注释，而且不能保证所有公平性敏感信息都已从编码器中删除。为了解决这些局限性，我们探讨了以下研究问题：即使以有偏表示作为输入，我们是否可以通过仅对分类头进行减分来降低DNN模型的区分度？为此，我们提出了一种新的缓解技术，即表示中和公平性（RNF），该技术仅通过去除DNN模型中特定于任务的分类头来实现公平性。为此，我们利用具有相同地面真值标签但敏感属性不同的样本，并利用它们的中和表示来训练DNN模型的分类头。RNF的关键思想是阻止分类头捕获具有特定类标签的编码器表示中公平敏感信息之间的虚假相关性。为了解决低资源设置而无法访问敏感属性注释的问题，我们利用偏差放大模型为敏感属性生成代理注释。在多个基准数据集上的实验结果表明，我们的RNF框架可以有效地降低DNN模型的区分度，并且任务特定性能的退化最小。摘要：Existing bias mitigation methods for DNN models primarily work on learning debiased encoders. This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder. To address these limitations, we explore the following research question: Can we reduce the discrimination of DNN models by only debiasing the classification head, even with biased representations as inputs? To this end, we propose a new mitigation technique, namely, Representation Neutralization for Fairness (RNF) that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. The key idea of RNF is to discourage the classification head from capturing spurious correlation between fairness sensitive information in encoder representations with specific class labels. To address low-resource settings with no access to sensitive attribute annotations, we leverage a bias-amplified model to generate proxy annotations for sensitive attributes. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models with minimal degradation in task-specific performance.

【35】 Charformer: Fast Character Transformers via Gradient-based Subword Tokenization 标题：Charformer：基于梯度子字标记化的快速字符转换器

作者：Yi Tay,Vinh Q. Tran,Sebastian Ruder,Jai Gupta,Hyung Won Chung,Dara Bahri,Zhen Qin,Simon Baumgartner,Cong Yu,Donald Metzler 机构：Google Research and DeepMind† 链接：https://arxiv.org/abs/2106.12672 摘要：自然语言处理中最先进的模型依赖于独立的刚性子词标记化算法，这限制了它们的泛化能力和对新环境的适应能力。在本文中，我们提出了一个新的模型归纳偏见，学习一个子字符号化作为模型的一部分。为此，我们引入了一个基于软梯度的子词标记化模块（GBST），它以数据驱动的方式从字符中自动学习潜在的子词表示。具体地说，GBST枚举候选子词块并学习使用块评分网络以位置方式对它们进行评分。此外，我们还介绍了Charformer，这是一种深度转换器模型，它集成了GBST并在字节级进行操作。通过对英语胶水、多语言和有噪声的文本数据集的大量实验，我们发现Charformer优于一系列有竞争力的字节级基线，同时通常表现在平价上，有时甚至优于基于子词的模型。此外，Charformer速度很快，将普通字节级和子字级转换器的速度提高了28%-100%，同时保持了有竞争力的质量。我们相信这项工作为完全端到端训练的高性能无令牌模型铺平了道路。摘要：State-of-the-art models in natural language processing rely on separate rigid subword tokenization algorithms, which limit their generalization ability and adaptation to new settings. In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model. To this end, we introduce a soft gradient-based subword tokenization module (GBST) that automatically learns latent subword representations from characters in a data-driven fashion. Concretely, GBST enumerates candidate subword blocks and learns to score them in a position-wise fashion using a block scoring network. We additionally introduce Charformer, a deep Transformer model that integrates GBST and operates on the byte level. Via extensive experiments on English GLUE, multilingual, and noisy text datasets, we show that Charformer outperforms a series of competitive byte-level baselines while generally performing on par and sometimes outperforming subword-based models. Additionally, Charformer is fast, improving the speed of both vanilla byte-level and subword-level Transformers by 28%-100% while maintaining competitive quality. We believe this work paves the way for highly performant token-free models that are trained completely end-to-end.

【36】 Human Activity Recognition using Continuous Wavelet Transform and Convolutional Neural Networks 标题：基于连续小波变换和卷积神经网络的人体活动识别

作者：Anna Nedorubova,Alena Kadyrova,Aleksey Khlyupin 机构：Center for Engineering and Technology of MIPT, Moscow Institute of Physics and Technology, Institutskiy Pereulok , Dolgoprudny, Moscow , Russia, Key words: Human activity recognition, convolutional neural network, residual neural networks 链接：https://arxiv.org/abs/2106.12666 摘要：世界上有相当多的人因为健康原因不得不长期处于监视之下；他们包括糖尿病患者或其他慢性病患者、老年人和残疾人。这些人群可能面临着更高的危险，可能会有危及生命的跌倒或晕厥。由于资源有限，相当一部分处于危险中的人无法得到必要的监测，因此面临着过度的危险。目前，这一问题通常是通过应用人类活动识别（Human Activity Recognition，HAR）方法来解决的。HAR是一个具有前瞻性和快速发展的数据科学领域，有着广泛的应用领域，如医疗保健、体育、安全等。然而，目前的识别技术在准确性方面明显不足，因此本文提出了一种高精度的人类活动分类方法。我们提出了一个新的工作流程来解决HAR问题，并在UniMiB-SHAR数据集上对其进行了评估。该模型基于连续小波变换（CWT）和卷积神经网络（CNNs）。小波变换在时域和频域中定位信号特征，然后由CNN提取这些特征并识别活动。另外值得注意的是，CWT将1D加速度计信号转换为2D图像，因此能够获得更好的结果，因为2D网络具有显著更高的预测能力。在工作过程中，我们建立了一个卷积神经网络，改变了空间轴数、层数、每层神经元数、图像大小、母小波类型、母小波零矩阶数等模型参数，我们还应用了具有剩余块的模型，这导致了显著更高的度量值。最后，我们成功地达到了99.26%的准确率，这对于解决这个问题是一个有价值的表现。摘要：Quite a few people in the world have to stay under permanent surveillance for health reasons; they include diabetic people or people with some other chronic conditions, the elderly and the disabled.These groups may face heightened risk of having life-threatening falls or of being struck by a syncope. Due to limited availability of resources a substantial part of people at risk can not receive necessary monitoring and thus are exposed to excessive danger. Nowadays, this problem is usually solved via applying Human Activity Recognition (HAR) methods. HAR is a perspective and fast-paced Data Science field, which has a wide range of application areas such as healthcare, sport, security etc. However, the currently techniques of recognition are markedly lacking in accuracy, hence, the present paper suggests a highly accurate method for human activity classification. Wepropose a new workflow to address the HAR problem and evaluate it on the UniMiB SHAR dataset, which consists of the accelerometer signals. The model we suggest is based on continuous wavelet transform (CWT) and convolutional neural networks (CNNs). Wavelet transform localizes signal features both in time and frequency domains and after that a CNN extracts these features and recognizes activity. It is also worth noting that CWT converts 1D accelerometer signal into 2D images and thus enables to obtain better results as 2D networks have a significantly higher predictive capacity. In the course of the work we build a convolutional neural network and vary such model parameters as number of spatial axes, number of layers, number of neurons in each layer, image size, type of mother wavelet, the order of zero moment of mother wavelet etc. Besides, we also apply models with residual blocks which resulted in significantly higher metric values. Finally, we succeed to reach 99.26 % accuracy and it is a worthy performance for this problem.

【37】 Reimagining GNN Explanations with ideas from Tabular Data 标题：从表格数据中重新构思GNN解释

作者：Anjali Singh,Shamanth R Nayak K,Balaji Ganesan 机构： welooked at very popular explainability techniques in tabular 1Manipal Institute of Technology 备注：4 pages, 8 figures, XAI Workshop at ICML 2021 链接：https://arxiv.org/abs/2106.12665 摘要：与基于表格数据训练的神经和决策树模型的解释相比，图形神经网络的解释技术还有很长的路要走。使用一个横跨图形和表格数据的任务，即实体匹配，我们对GNN模型解释中缺少的可解释性的关键方面进行了评论。摘要：Explainability techniques for Graph Neural Networks still have a long way to go compared to explanations available for both neural and decision decision tree-based models trained on tabular data. Using a task that straddles both graphs and tabular data, namely Entity Matching, we comment on key aspects of explainability that are missing in GNN model explanations.

【38】 Transformer-based unsupervised patient representation learning based on medical claims for risk stratification and analysis 标题：基于Transformer的基于医疗索赔的无监督患者表征学习风险分层分析

作者：Xianlong Zeng,Simon Lin,Chang Liu 机构：Electrical Engineering and, Computer Science, Ohio University, Athens Ohio USA, Research Information Solutions, and Innovation, Nationwide Children's Hospital, Columbus Ohio USA 链接：https://arxiv.org/abs/2106.12658 摘要：索赔数据包含医疗代码、服务信息和发生的支出，是估计个人健康状况和医疗风险水平的良好资源。在这项研究中，我们开发了基于Transformer的多模态自动编码器（TMAE），这是一个无监督的学习框架，可以通过编码索赔数据中有意义的信息来学习有效的患者表示。TMAE的动机是医疗保健的实际需要，将患者分为不同的风险水平，以改善医疗服务的提供和管理。与以前的方法相比，TMAE能够1）对住院患者、门诊患者和药物申请进行集体建模，2）处理医疗事件之间不规则的时间间隔，3）缓解罕见医疗代码的稀疏性问题，4）合并医疗支出信息。我们使用一个包含600000多名患者的真实儿科索赔数据集对TMAE进行了训练，并在两个聚类任务中比较了其与各种方法的性能。实验结果表明，TMAE具有优于所有基线的性能。多个下游应用程序也说明了我们的框架的有效性。有希望的结果证实，TMAE框架可扩展到大型索赔数据，并且能够生成有效的患者嵌入，用于风险分层和分析。摘要：The claims data, containing medical codes, services information, and incurred expenditure, can be a good resource for estimating an individual's health condition and medical risk level. In this study, we developed Transformer-based Multimodal AutoEncoder (TMAE), an unsupervised learning framework that can learn efficient patient representation by encoding meaningful information from the claims data. TMAE is motivated by the practical needs in healthcare to stratify patients into different risk levels for improving care delivery and management. Compared to previous approaches, TMAE is able to 1) model inpatient, outpatient, and medication claims collectively, 2) handle irregular time intervals between medical events, 3) alleviate the sparsity issue of the rare medical codes, and 4) incorporate medical expenditure information. We trained TMAE using a real-world pediatric claims dataset containing more than 600,000 patients and compared its performance with various approaches in two clustering tasks. Experimental results demonstrate that TMAE has superior performance compared to all baselines. Multiple downstream applications are also conducted to illustrate the effectiveness of our framework. The promising results confirm that the TMAE framework is scalable to large claims data and is able to generate efficient patient embeddings for risk stratification and analysis.

【39】 Handwritten Digit Recognition using Machine and Deep Learning Algorithms 标题：基于机器和深度学习算法的手写体数字识别

作者：Samay Pashine,Ritik Dixit,Rishika Kushwah 机构：Computer Science and Engineering, Acropolis Institute of Technology & Research, Indore, India 备注：None 链接：https://arxiv.org/abs/2106.12614 摘要：人类对机器的依赖程度从未如此之高，以至于从照片中的物体分类到无声电影中添加声音，一切都可以借助深度学习和机器学习算法来完成。同样，手写文本识别是一个重要的研究和开发领域，有许多可能实现。手写识别（HWR），也称为手写文本识别（HTR），是计算机接收和解释来自纸张文档、照片、触摸屏和其他设备的可理解手写输入的能力[1]。显然，在本文中，我们使用支持向量机（SVM）、多层感知器（MLP）和卷积神经网络（CNN）模型，利用MNIST数据集进行手写体数字识别。我们的主要目标是比较上述模型的准确性以及它们的执行时间，以获得最佳的数字识别模型。摘要：The reliance of humans over machines has never been so high such that from object classification in photographs to adding sound to silent movies everything can be performed with the help of deep learning and machine learning algorithms. Likewise, Handwritten text recognition is one of the significant areas of research and development with a streaming number of possibilities that could be attained. Handwriting recognition (HWR), also known as Handwritten Text Recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices [1]. Apparently, in this paper, we have performed handwritten digit recognition with the help of MNIST datasets using Support Vector Machines (SVM), Multi-Layer Perceptron (MLP) and Convolution Neural Network (CNN) models. Our main objective is to compare the accuracy of the models stated above along with their execution time to get the best possible model for digit recognition.

【40】 Adversarial Examples in Multi-Layer Random ReLU Networks 标题：多层随机REU网络中的对抗性实例

作者：Peter L. Bartlett,Sébastien Bubeck,Yeshwanth Cherapanamjeri 机构：Department of Electrical Engineering and Computer Science, Department of Statistics, UC Berkeley, Microsoft Research Redmond 链接：https://arxiv.org/abs/2106.12611 摘要：我们考虑了具有独立高斯参数的ReLU网络中的对抗性例子现象。对于深度恒定且宽度范围较大的网络（例如，如果每一层的宽度与任何其他层的宽度是多项式，就足够了），输入向量的小扰动会导致输出的大变化。这推广了Daniely和Schacham（2020）关于宽度迅速减小网络的结果，以及Bubeck等人（2021）关于双层网络的结果。证明表明，在这些网络中出现对抗性例子是因为它们计算的函数非常接近线性。网络中的瓶颈层起着关键作用：网络中某个点的最小宽度决定了计算到该点的映射的规模和灵敏度。主要结果是对于具有常数深度的网络，但是我们也证明了对于这类结果，深度上的一些约束是必要的，因为有适当的深度网络，以常数概率计算接近常数的函数。摘要：We consider the phenomenon of adversarial examples in ReLU networks with independent gaussian parameters. For networks of constant depth and with a large range of widths (for instance, it suffices if the width of each layer is polynomial in that of any other layer), small perturbations of input vectors lead to large changes of outputs. This generalizes results of Daniely and Schacham (2020) for networks of rapidly decreasing width and of Bubeck et al (2021) for two-layer networks. The proof shows that adversarial examples arise in these networks because the functions that they compute are very close to linear. Bottleneck layers in the network play a key role: the minimal width up to some point in the network determines scales and sensitivities of mappings computed up to that point. The main result is for networks with constant depth, but we also show that some constraint on depth is necessary for a result of this kind, because there are suitably deep networks that, with constant probability, compute a function that is close to constant.

【41】 DP-SGD vs PATE: Which Has Less Disparate Impact on Model Accuracy? 标题：DP-SGD与Pate：哪个对模型准确性影响较小？

作者：Archit Uniyal,Rakshit Naidu,Sasikanth Kotti,Sahib Singh,Patrik Joslin Kenfack,Fatemehsadat Mireshghallah,Andrew Trask 机构： 20 16)Equal contribution 1Panjab University 2OpenMined 3ManipalInstitute of Technology 4Carnegie Mellon University 5IIT Jodh-pur 6Ford Motor Company 7Innopolis University 8University ofCalifornia, San Diego 9University of Oxford 备注：4 pages, 3 images 链接：https://arxiv.org/abs/2106.12576 摘要：差异隐私深度学习的最新进展表明，差异隐私的应用，特别是DP-SGD算法，对群体中的不同子群体具有不同的影响，这导致代表性不足的子群体（少数群体）的模型效用显著下降，与代表性很强的人相比。在这项工作中，我们的目的是比较PATE，另一种使用差异隐私的深度学习模型训练机制，与DP-SGD在公平性方面的差异。我们表明，PATE确实也有不同的影响，然而，它比DP-SGD严重得多。我们从这一观察中得出了一些见解，即在实现更好的公平性和隐私权衡方面，哪些可能是有希望的方向。摘要：Recent advances in differentially private deep learning have demonstrated that application of differential privacy, specifically the DP-SGD algorithm, has a disparate impact on different sub-groups in the population, which leads to a significantly high drop-in model utility for sub-populations that are under-represented (minorities), compared to well-represented ones. In this work, we aim to compare PATE, another mechanism for training deep learning models using differential privacy, with DP-SGD in terms of fairness. We show that PATE does have a disparate impact too, however, it is much less severe than DP-SGD. We draw insights from this observation on what might be promising directions in achieving better fairness-privacy trade-offs.

【42】 Coherent, super resolved radar beamforming using self-supervised learning 标题：基于自监督学习的相干超分辨雷达波束形成

作者：Itai Orr,Moshik Cohen,Harel Damari,Meir Halachmi,Zeev Zalevsky 机构：Bar Ilan University, Ramat-Gan, Israel, Wisense Technologies Ltd., Tel Aviv, Israel 备注：28 pages 10 figures 链接：https://arxiv.org/abs/2106.13085 摘要：高分辨率汽车雷达传感器的要求，以满足高标准的自主汽车的需要和法规。然而，目前的雷达系统在角度分辨率上受到限制，造成了技术上的空白。通过增加物理通道的数量来提高角度分辨率的行业和学术趋势，也增加了系统复杂性，需要敏感的校准过程，降低了对硬件故障的鲁棒性，并导致更高的成本。本文提出了一种新的雷达信号重建方法，称为自监督雷达信号重建（R2-S2），该方法在不增加物理通道数目的情况下，显著提高了给定雷达阵列的角分辨力。R2-S2算法是一类以复杂距离多普勒雷达数据为输入的深度神经网络（DNN）算法，它采用在多个数据表示空间中运行的损失函数进行自监督训练。在晴雨天气条件下，利用在城市和公路环境中收集的真实数据集，证明了角度分辨率提高了4倍。摘要：High resolution automotive radar sensors are required in order to meet the high bar of autonomous vehicles needs and regulations. However, current radar systems are limited in their angular resolution causing a technological gap. An industry and academic trend to improve angular resolution by increasing the number of physical channels, also increases system complexity, requires sensitive calibration processes, lowers robustness to hardware malfunctions and drives higher costs. We offer an alternative approach, named Radar signal Reconstruction using Self Supervision (R2-S2), which significantly improves the angular resolution of a given radar array without increasing the number of physical channels. R2-S2 is a family of algorithms which use a Deep Neural Network (DNN) with complex range-Doppler radar data as input and trained in a self-supervised method using a loss function which operates in multiple data representation spaces. Improvement of 4x in angular resolution was demonstrated using a real-world dataset collected in urban and highway environments during clear and rainy weather conditions.

【43】 A Declarative Goal-oriented Framework for Smart Environments with LPaaS 标题：一种面向目标的声明性LPaaS智能环境框架

作者：Giuseppe Bisicchia,Stefano Forti,Antonio Brogi 机构：Department of Computer Science University of Pisa, Italy 链接：https://arxiv.org/abs/2106.13083 摘要：物联网智能环境旨在通过自动调整环境参数（如温度、室内光线）和通过自我管理的网络物理系统实现节能，从而改善我们的日常生活。然而，商业解决方案只允许在这些参数上设置简单的目标，不考虑在不同的用户和/或系统管理员之间调解冲突的目标，并且在不同的物联网垂直领域具有有限的兼容性。在本文中，我们提出了一个声明性框架来表示智能环境、用户设置目标和可定制的中介策略，以协调包含多个物联网系统的不同目标。该框架的一个开源Prolog原型展示在两个栩栩如生的激励示例上。摘要：Smart environments powered by the Internet of Things aim at improving our daily lives by automatically tuning ambient parameters (e.g. temperature, interior light) and by achieving energy savings through self-managing cyber-physical systems. Commercial solutions, however, only permit setting simple target goals on those parameters and do not consider mediating conflicting goals among different users and/or system administrators, and feature limited compatibility across different IoT verticals. In this article, we propose a declarative framework to represent smart environments, user-set goals and customisable mediation policies to reconcile contrasting goals encompassing multiple IoT systems. An open-source Prolog prototype of the framework is showcased over two lifelike motivating examples.

【44】 Fast, high-fidelity Lyman α forests with convolutional neural networks标题：快速、高保真的莱曼α卷积神经网络森林

作者：Peter Harrington,Mustafa Mustafa,Max Dornfest,Benjamin Horowitz,Zarija Lukić 机构：Lawrence Berkeley National Laboratory, Cyclotron Road, Berkeley, CA , USA, Department of Astronomy, Princeton University, Princeton, NJ, USA 备注：10 pages, 6 figures 链接：https://arxiv.org/abs/2106.12662 摘要：全物理宇宙学模拟是研究宇宙结构形成和演化的有力工具，但需要极大的计算资源。在这里，我们训练一个卷积神经网络来使用一个更便宜的N体模拟来重建与Lyman-$\alpha$（Ly$\alpha$）森林相关尺度上的重子流体动力学变量（密度、温度和速度），使用Nyx模拟的数据。我们表明，我们的方法能够以$\sim$20kpc的分辨率快速估计这些场，并且比现有的近似方法更准确地捕获Ly$\alpha$森林的统计信息。因为我们的模型是完全卷积的，所以我们可以在更小的模拟箱上训练，并在更大的模拟箱上部署，从而节省大量的计算量。此外，由于我们的方法产生了一个近似的水动力场而不是直接的Ly$\alpha$通量，它不局限于电离背景或平均传输通量的特定选择。摘要：Full-physics cosmological simulations are powerful tools for studying the formation and evolution of structure in the universe but require extreme computational resources. Here, we train a convolutional neural network to use a cheaper N-body-only simulation to reconstruct the baryon hydrodynamic variables (density, temperature, and velocity) on scales relevant to the Lyman-$\alpha$ (Ly$\alpha$) forest, using data from Nyx simulations. We show that our method enables rapid estimation of these fields at a resolution of $\sim$20kpc, and captures the statistics of the Ly$\alpha$ forest with much greater accuracy than existing approximations. Because our model is fully-convolutional, we can train on smaller simulation boxes and deploy on much larger ones, enabling substantial computational savings. Furthermore, as our method produces an approximation for the hydrodynamic fields instead of Ly$\alpha$ flux directly, it is not limited to a particular choice of ionizing background or mean transmitted flux.

【45】 Multiclass Disease Predictions Based on Integrated Clinical and Genomics Datasets 标题：基于综合临床和基因组数据集的多类疾病预测

作者：Moeez M. Subhani,Ashiq Anjum 机构：College of Engineering and Technology, University of Derby, Derby, England 备注：None 链接：https://arxiv.org/abs/2006.07879 摘要：通过计算方法使用临床数据进行临床预测在生物信息学中很常见。然而，利用基因组数据集的信息进行临床预测并不是研究中经常观察到的现象。精确医学研究需要来自所有可用数据集的信息来提供智能的临床解决方案。在这篇论文中，我们试图建立一个预测模型，它使用来自临床和基因组数据集的信息。我们已经用机器学习方法证明了基于临床和基因组数据集的多类疾病预测。我们使用临床（ClinVar）和基因组学（gene expression）数据集创建了一个集成的数据集，并使用基于实例的学习者对其进行训练，以预测临床疾病。我们使用了一种创新但简单的方法进行多类分类，其中输出类的数量高达75。我们使用主成分分析进行特征选择。该分类器在综合数据集上预测疾病的准确率为73%。与其他分类模型相比，结果是一致的和有效的。结果表明，基因组学信息可以可靠地纳入临床预测数据集，在临床诊断和精密医学中具有重要的应用价值。摘要：Clinical predictions using clinical data by computational methods are common in bioinformatics. However, clinical predictions using information from genomics datasets as well is not a frequently observed phenomenon in research. Precision medicine research requires information from all available datasets to provide intelligent clinical solutions. In this paper, we have attempted to create a prediction model which uses information from both clinical and genomics datasets. We have demonstrated multiclass disease predictions based on combined clinical and genomics datasets using machine learning methods. We have created an integrated dataset, using a clinical (ClinVar) and a genomics (gene expression) dataset, and trained it using instance-based learner to predict clinical diseases. We have used an innovative but simple way for multiclass classification, where the number of output classes is as high as 75. We have used Principal Component Analysis for feature selection. The classifier predicted diseases with 73\% accuracy on the integrated dataset. The results were consistent and competent when compared with other classification models. The results show that genomics information can be reliably included in datasets for clinical predictions and it can prove to be valuable in clinical diagnostics and precision medicine.

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-06-25，如有侵权请联系 cloudcommunity@tencent.com 删除

linux