前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >金融/语音/音频处理学术速递[6.30]

金融/语音/音频处理学术速递[6.30]

作者头像
公众号-arXiv每日学术速递
发布2021-07-02 17:26:34
5680
发布2021-07-02 17:26:34
举报

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

q-fin金融,共计10篇

cs.SD语音,共计7篇

eess.AS音频处理,共计8篇

1.q-fin金融:

【1】 The Ecological System of Innovation: A New Architectural Framework for a Functional Evidence-Based Platform for Science and Innovation Policy 标题:创新生态系统:基于证据的功能性科学与创新政策平台的新架构框架

作者:Robert M Yawson 机构: Humphrey Institute, University of Minnesota 备注:None 链接:https://arxiv.org/abs/2106.15479 摘要:在大多数情况下,有关创新的模型并不包括全面和端到端的观点。大多数创新政策的注意力似乎集中在创新能力和投入因素上,如研发投资、科学机构、人力资源和资本。这些投入常常作为创新的代表,并与专利数量等中间产出和人均GDP等产出相关联。虽然这种分析通常是创新行为的象征,但在区分因果关系以及推动成功战略或公共政策干预的因素方面却没有多大用处。这种情况导致了由全球国家科技政策中心领导的创新体系的新框架的发展。这些新的创新模式被称为国家创新生态系统。然而,有一个根本性的问题需要回答:创新政策应该包括哪些要素,这些政策应该如何实施?本文试图回答这个问题。 摘要:Models on innovation, for the most part, do not include a comprehensive and end-to-end view. Most innovation policy attention seems to be focused on the capacity to innovate and on input factors such as R&D investment, scientific institutions, human resources and capital. Such inputs frequently serve as proxies for innovativeness and are correlated with intermediate outputs such as patent counts and outcomes such as GDP per capita. While this kind of analysis is generally indicative of innovative behaviour, it is less useful in terms of discriminating causality and what drives successful strategy or public policy interventions. This situation has led to the developing of new frameworks for the innovation system led by National Science and Technology Policy Centres across the globe. These new models of innovation are variously referred to as the National Innovation Ecosystem. There is, however, a fundamental question that needs to be answered: what elements should an innovation policy include, and how should such policies be implemented? This paper attempts to answer this question.

【2】 ESGM: ESG scores and the Missing pillar 标题:ESGM:ESG得分和缺失的支柱

作者:Özge Sahin,Karoline Bax,Sandra Paterlini,Claudia Czado 机构:Department of Mathematics, Technical University of Munich, Boltzmanstraße , Garching, Germany, Department of Economics and Management, University of Trento, Via Inama , Trento, Italy 链接:https://arxiv.org/abs/2106.15466 摘要:环境、社会和治理(ESG)得分衡量公司在可持续性和社会影响方面的活动,并分为三大支柱:环境(E-)、社会(S-)和治理(G-)。人们提出了不同的方法来计算公司的ESG得分,这些方法通常依赖于许多不同信息来源的汇总。这些互补的非金融ESG得分应提供有关不同公司ESG绩效和风险的信息。然而,信息缺失的程度使ESG评分的可靠性受到质疑。为了解释潜在ESG支柱中的缺失信息,我们引入了一个新支柱,即所谓的缺失(M-)支柱,并提出了一种计算新ESG(ESGM)分数的优化方法,该方法还应与公司风险相关。因此,ESGM评分考虑到了缺失信息的程度,并就所考虑的公司的风险建立了一些有意义的关系。有趣的见解,目前的局限性,ESG评分方法进行了讨论。 摘要:Environmental, social, and governance (ESG) scores measure companies' activities concerning sustainability and societal impact and are organized on three pillars: Environmental (E-), Social (S-), and Governance (G-). Different approaches have been proposed to compute ESG scores for companies, which typically rely on the aggregation of many and different sources of information. These complementary non-financial ESG scores should provide information about the ESG performance and risks of different companies. However, the extent of missing information makes the reliability of ESG scores questionable. To account for the missing information in the underlying ESG pillars, we introduce a new pillar, the so-called Missing (M-) pillar, and propose an optimization approach to compute new ESG (ESGM) scores, which should also be related to the company riskiness. As a result, the ESGM scores allow for incorporating the extent of missing information and establishing some meaningful relationship with respect to the riskiness of the companies under consideration. Interesting insights into the current limitations of ESG scoring methodology are discussed.

【3】 The Variance Gamma++ Process and Applications to Energy Markets 标题:方差Gamma++过程及其在能源市场中的应用

作者:Gardini,M.,Sabino,P.,Sasso,E 备注:39 pages, 5 figures, 7 tables 链接:https://arxiv.org/abs/2106.15452 摘要:本文的目的是引入一个新的L′evy过程,称为方差Gamma++过程,来模拟非流动市场中资产的动态变化。这种过程具有方差Gamma过程的数学可处理性,并且是应用Gamma定律的自分解性得到的。与方差Gamma模型相比,它有一个额外的参数来表示交易活动的度量。给出了方差Gamma++过程的特征三重态、特征函数和跃迁密度的完整刻画。此外,我们提供了有效的路径模拟算法,包括时间上的向前和向后。我们还得到了一个有效的无积分欧式期权显式定价公式。这些结果有助于应用傅立叶期权定价和最大似然技术进行参数估计。最后,我们将我们的模型应用于非流动性市场,即欧洲电力期货市场数据的校准。因此,我们使用蒙特卡罗方法评估奇异导数,并将这些值与使用方差伽马过程获得的值进行比较,并对获得的结果给出经济解释。最后,我们说明了多元框架的一个扩展。 摘要:The purpose of this article is to introduce a new L\'evy process, termed Variance Gamma++ process, to model the dynamic of assets in illiquid markets. Such a process has the mathematical tractability of the Variance Gamma process and is obtained applying the self-decomposability of the gamma law. Compared to the Variance Gamma model, it has an additional parameter representing the measure of the trading activity. We give a full characterization of the Variance Gamma++ process in terms of its characteristic triplet, characteristic function and transition density. In addition, we provide efficient path simulation algorithms, both forward and backward in time. We also obtain an efficient "integral-free" explicit pricing formula for European options. These results are instrumental to apply Fourier-based option pricing and maximum likelihood techniques for the parameter estimation. Finally, we apply our model to illiquid markets, namely to the calibration of European power future market data. We accordingly evaluate exotic derivatives using the Monte Carlo method and compare these values to those obtained using the Variance Gamma process and give an economic interpretation of the obtained results. Finally, we illustrate an extension to the multivariate framework.

【4】 Effect of Labour Income on the Optimal Bankruptcy Problem 标题:劳动收入对最优破产问题的影响

作者:Guodong Ding,Daniele Marazzina 链接:https://arxiv.org/abs/2106.15426 摘要:本文研究了一个代理人的最优破产问题,该代理人可以最优地分配其消费率、风险资产的投资额和闲暇时间。在我们的框架中,代理人被赋予一个初始债务,她需要不断偿还她的债务。宣布破产后,以财富缩水为代价免除债务偿还。采用对偶方法对问题进行了解析求解,并对破产成本和收益参数进行了敏感性分析。在破产模型中引入弹性休闲/工作率,进而引入劳动收入,考察其对最优策略的影响。 摘要:In this paper we deal with the optimal bankruptcy problem for an agent who can optimally allocate her consumption rate, the amount of capital invested in the risky asset as well as her leisure time. In our framework, the agent is endowed by an initial debt, and she is required to repay her debt continuously. Declaring bankruptcy, the debt repayment is exempted at the cost of a wealth shrinkage. We implement the duality method to solve the problem analytically and conduct a sensitivity analysis to the cost and benefit parameters of bankruptcy. Introducing the flexible leisure/working rate, and therefore the labour income, into the bankruptcy model, we investigate its effect on the optimal strategies.

【5】 What Explains Gender Gap in Unpaid Household and Care Work in India? 标题:如何解释印度无偿家庭和护理工作中的性别差距?

作者:Athary Janiso,Prakash Kumar Shukla,Bheemeshwar Reddy A 机构: Department of Economics and Finance, Birla Institute of Technology and Science-Pilani 链接:https://arxiv.org/abs/2106.15376 摘要:在全世界,特别是南亚,妇女承担着不成比例的无报酬家务和护理工作的负担。然而,由于没有关于时间使用的具有全国代表性的数据,在印度还没有对无报酬家庭和护理工作中的性别差距进行系统分析。本文利用最近的时间使用调查(2019年)数据解决了两个问题。首先,研究了社会经济和人口统计因素与男女在无报酬家务和护理工作上花费的时间的差异有关。第二,本研究分析了在分配给无报酬工作的时间中,有多少性别差距可以通过采用瓦哈卡盲法来解释男女社会经济和人口因素的差异。调查结果显示,女性花在无报酬家务和护理工作上的时间比男性多得多。我们发现,婚姻状况、就业、家庭中有更多女性是妇女从事无报酬家务劳动时间的最重要预测因素。分解结果显示,男女社会经济和人口因素的差异并不能解释无报酬家务劳动中的大部分性别差异。研究结果表明,决定男女家务分工的不是理性的经济选择;相反,未被观察到的性别规范和做法支配着印度家庭中无偿工作的分配。 摘要:Women bear a disproportionate burden of unpaid household and care work across the world and especially in South Asia. However, due to the unavailability of nationally representative data on time use, a systematic analysis of the gender gap in unpaid household and care work has not been made in the context of India. The present paper, using the recent Time Use Survey (2019) data addresses two concerns. First, it examines the socioeconomic and demographic factors associated with variation in the time spent on unpaid household and care work among men and women. Second, it analyses how much of the gender gap in the time allocated to unpaid work can be explained by the difference in socioeconomic and demographic factors of men and women by employing the Oaxaca-Blinder technique. The findings show that women spend much higher time than men in unpaid household and care work. We find that marital status, employment, having additional females in the household are the most important predictors of time devoted to unpaid household work among women. The decomposition results reveal that differences in socioeconomic and demographic factors between men and women do not explain most of the gender gap in unpaid household work. Our results suggest that it is not the rational economic choices that determine the division of household chores between women and men; instead, unobserved gender norms and practices govern the allocation of unpaid work within Indian households.

【6】 Exploring the trilemma of cost-efficient, equitable and publicly acceptable onshore wind expansion planning 标题:探索具有成本效益、公平和公众接受的陆上风能扩展规划的三难境地

作者:Jann Michael Weinand,Russell McKenna,Heidi Heinrichs,Michael Roth,Detlef Stolten,Wolf Fichtner 机构:Chair of Energy Economics, Karlsruhe Institute for Technology, Germany, Chair of Energy Transition, School of Engineering, University of Aberdeen, King's College, United Kingdom 链接:https://arxiv.org/abs/2106.15198 摘要:陆上风电开发历来注重成本效益,这可能会导致风机分布不均,以及由于景观影响造成的公众阻力。利用多标准规划方法,我们展示了如何以成本效益、公平和公众可接受的方式在2050年实现陆上风电容量目标。对于德国的案例研究,我们以现有的涡轮机存量为基础,使用技术上可行的涡轮机位置和景观景观的开放数据来规划最佳扩建。分析表明,虽然成本效率和公众接受度之间的权衡相当薄弱,成本或前景高出约15%,但公平分配对这些标准有很大影响。虽然通过扩建,每个居民的陆上风电容量可以更公平地分配约220%,但到2050年,公平将严重限制规划灵活性。我们的分析有助于利益相关者解决陆上风电扩建的三重困境。 摘要:Onshore wind development has historically focused on cost-efficiency, which may lead to inequitable turbine distributions and public resistance due to landscape impacts. Using a multi-criteria planning approach, we show how onshore wind capacity targets can be achieved by 2050 in a cost-efficient, equitable and publicly acceptable way. For the case study of Germany, we build on the existing turbine stock and use open data on technically feasible turbine locations and scenicness of landscapes to plan the optimal expansion. The analysis shows that while the trade-off between cost-efficiency and public acceptance is rather weak with about 15% higher costs or scenicness, an equitable distribution has a large impact on these criteria. Although the onshore wind capacity per inhabitant could be distributed about 220% more equitably through the expansion, equity would severely limit planning flexibility by 2050. Our analysis assists stakeholders in resolving the onshore wind expansion trilemma.

【7】 Empirical Framework for Cournot Oligopoly with Private Information 标题:具有私人信息的古诺寡头垄断的经验框架

作者:Gaurab Aryal,Federico Zincenko 链接:https://arxiv.org/abs/2106.15035 摘要:我们提出了一个关于成本私人信息的古诺寡头垄断的实证框架。首先,考虑具有随机截距的线性需求,我们刻画了贝叶斯古诺-纳什均衡,并确定了其可测试的含义。然后建立了需求与技术冲击联合分布和企业特定成本分布的非参数辨识模型。最后,本文提出了一种基于似然估计的方法,并将其应用于全球原油市场。利用反事实,我们还量化了企业分享成本信息对消费者福利的影响。我们还将模型扩展到包括企业特定行为参数、非线性需求或选择性进入。 摘要:We propose an empirical framework for Cournot oligopoly with private information about costs. First, considering a linear demand with a random intercept, we characterize the Bayesian Cournot-Nash equilibrium and determine its testable implications. Then we establish nonparametric identification of the joint distribution of demand and technology shock and firm-specific cost distributions. Finally, we propose a likelihood-based estimation method and apply it to the global crude oil market. Using counterfactuals, we also quantify the effect of firms sharing information about their costs on consumer welfare. We also extend the model to include either firm-specific conduct parameters, nonlinear demand, or selective entry.

【8】 An expert survey to assess the current status and future challenges of energy system analysis 标题:评估能源系统分析现状和未来挑战的专家调查

作者:Fabian Scheller,Frauke Wiese,Jann Michael Weinand,Dominik Franjo Dominković,Russell McKenna 机构:Department of Technology, Management and Economics, Technical University of Denmark, Akademivej, Universität Flensburg, Flensburg, Production (IIP), Karlsruhe Institute of Technology, Karlsruhe, Germany 链接:https://arxiv.org/abs/2106.15518 摘要:计算机辅助能源系统分析(ESA)等决策支持系统被认为是制定可持续和可靠的能源转换战略的主要支柱之一。虽然今天的各种工具已经可以支持决策者在各种各样的研究问题,进一步的发展仍然是必要的。为了确定该领域的机遇和挑战,我们从广泛的文献回顾中对建模能力(32)、方法(15)、实施问题(15)和管理问题(7)进行了分类。基于对主要使用模拟和优化模型的能源系统建模人员(N=61)的定量专家调查,评估了这些建模主题的发展状况和实现的复杂性。虽然被评定的项目被认为比实际代表的项目更复杂,但没有明显的异常值是可以确定的,这表明对欧空局缺乏发展的特定方面没有达成共识。尽管如此,根据特别定义的建模战略矩阵对项目进行分类,可以将土地利用规划模式、公平和分配效应以及内生技术学习等能力确定为“低挂果实”,以供加强,以及大量已经很好实现的复杂主题。关于建模能力的其余“难题”包括非能源部门和社会行为互动效应。一般来说,优化模型和仿真模型在各自的优势上有所不同,证明了两者的存在。虽然方法通常被认为是相当发达的,组合优化方法,以及机器学习,被认为是重要的研究方法,有待进一步发展的欧空局。 摘要:Decision support systems like computer-aided energy system analysis (ESA) are considered one of the main pillars for developing sustainable and reliable energy transformation strategies. Although today's diverse tools can already support decision-makers in a variety of research questions, further developments are still necessary. Intending to identify opportunities and challenges in the field, we classify modelling capabilities (32), methodologies (15) implementation issues (15) and management issues (7) from an extensive literature review. Based on a quantitative expert survey of energy system modellers (N=61) mainly working with simulation and optimisation models, the status of development and the complexity of realisation of those modelling topics are assessed. While the rated items are considered to be more complex than actually represented, no significant outliers are determinable, showing that there is no consensus about particular aspects of ESA that are lacking development. Nevertheless, a classification of the items in terms of a specially defined modelling strategy matrix identifies capabilities like land-use planning patterns, equity and distributional effects and endogenous technological learning as "low hanging fruits" for enhancement, as well as a large number of complex topics that are already well implemented. The remaining "tough nuts" regarding modelling capabilities include non-energy sector and social behaviour interaction effects. In general, the optimisation and simulation models differ in their respective strengths, justifying the existence of both. While methods were generally rated as quite well developed, combinatorial optimisation approaches, as well as machine learning, are identified as important research methods to be developed further for ESA.

【9】 Numerical approximation of singular Forward-Backward SDEs 标题:奇异正反向随机微分方程的数值逼近

作者:Jean-François Chassagneux,Mohan Yang 链接:https://arxiv.org/abs/2106.15496 摘要:本文研究了一类奇异完全耦合正倒向随机微分方程的数值逼近问题。这些方程具有退化的前向分量和非光滑的终端条件。例如,它们被用于碳市场的建模[9],并与扩散扰动的标量守恒定律相联系。经典的FBSDEs方法不能获得相应的准线性偏微分方程的正确熵解。我们引入了一种分裂方法,通过不同地处理扩散部分和非线性输运部分的数值近似来避免这一困难。在保证奇异FBSDEs[8]适定性的结构条件下,证明了分裂法的收敛速度为1/2$。实现了基于深度神经网络的非线性回归与保守差分相结合的分裂格式。数值试验表明,在可能的高维框架下,可以得到很好的结果。 摘要:In this work, we study the numerical approximation of a class of singular fully coupled forward backward stochastic differential equations. These equations have a degenerate forward component and non-smooth terminal condition. They are used, for example, in the modeling of carbon market[9] and are linked to scalar conservation law perturbed by a diffusion. Classical FBSDEs methods fail to capture the correct entropy solution to the associated quasi-linear PDE. We introduce a splitting approach that circumvent this difficulty by treating differently the numerical approximation of the diffusion part and the non-linear transport part. Under the structural condition guaranteeing the well-posedness of the singular FBSDEs [8], we show that the splitting method is convergent with a rate $1/2$. We implement the splitting scheme combining non-linear regression based on deep neural networks and conservative finite difference schemes. The numerical tests show very good results in possibly high dimensional framework.

【10】 Scale analysis for on-demand ridepooling systems and comparison with public transport 标题:按需拼车系统规模分析及其与公共交通的比较

作者:Andres Fielbaum,Alejandro Tirachini,Javier Alonso-Mora 机构: Universidad de Chile 3 Instituto Sistemas Complejos de Ingeniería 备注:30 pabes, 12 figures, submitted to ITEA 2021 链接:https://arxiv.org/abs/2106.15270 摘要:如果按需行驶(ODRP)能吸引私家车用户而不是公共交通的用户,那么它就可以成为减少交通拥堵和排放的一个强有力的替代方案。因此,确定决定ODRP系统何时能够高效运行的战略现象,以及了解ODRP系统何时能够集成到公共交通网络中,是至关重要的。在本文中,我们分析了一个ODRP系统的性能运行在一个单一的公共交通线覆盖的区域。低容量车辆的车队内生性地适应了需求。考虑到用户和运营商的成本,我们确定了规模经济的两个来源:当需求增长时,由于莫林效应(公共交通中存在的)的等价物,平均成本降低,以及由于将用户分配到车辆时将其匹配到更兼容的组中,我们称之为更好的匹配效果。当车辆负载增加,使用者面临更长的弯道时,会观察到一种称为弯曲路径效应的反平衡力。我们发现了一个特定的需求范围,在这个范围内,最后一个效应占主导地位,当只考虑用户成本时,就会产生规模不经济。这种现象在基于固定路线的公共交通系统中是不存在的。然而,当同时考虑用户和运营商的成本时,规模经济占优势。我们比较了ODRP结果与公共交通,支线和环线的均匀需求。我们发现,当用户共享一个共同的目的地(支线)和需求较低时,ODRP更有竞争力,尽管规模效应表明ODRP在需求较高时也可以发挥作用。如果ODRP必须满足所有要求,则放宽门到门车辆要求以允许短距离步行,对于ODRP成为人力和自动化车辆的可行替代方案至关重要。 摘要:On-demand ridepooling (ODRP) can become a powerful alternative to reduce congestion and emissions, if it attracts private car users rather than from public transport. Therefore, it is crucial to identify the strategic phenomena that determine when ODRP systems can run efficiently, and understand when they could be integrated into a public transport network. In this paper, we analyze the performance of an ODRP system operated in a zone covered by a single public transport line. The fleet of low-capacity vehicles is endogenously adapted to the demand. Considering both users' and operators' costs, we identify two sources of scale economies: when demand grows, the average cost is reduced due to an equivalent of the Mohring Effect (that is present in public transport), and due to matching the users in more compatible groups when they are assigned to the vehicles, which we call Better-matching Effect. A counter-balance force, called Flex-route Effect, is observed when the vehicle loads increase and users face longer detours. We find a specific demand range in which this last effect dominates the others, imposing diseconomies of scale when only users' costs are considered. Such a phenomenon is not observed in public transport systems based on fixed routes. However, when considering both users' and operators' costs, scale economies prevail. We compare the ODRP results against public transport, for a feeder line and a circular line with homogeneous demand. We show that ODRP is more competitive when users share a common destination (the feeder line) and when the demand is low, although scale effects suggest that ODRP can also play a role when the demand is high. Relaxing door-to-door vehicle requirements to allow short walks, is shown to be crucial for ODRP to become a viable alternative for both human-driven and automated vehicles, if the ODRP must serve all requests.

2.cs.SD语音:

【1】 Sounds of COVID-19: exploring realistic performance of audio-based digital testing 标题:冠状病毒之声:探索基于音频的数字测试的真实性能

作者:Jing Han,Tong Xia,Dimitris Spathis,Erika Bondareva,Chloë Brown,Jagmohan Chauhan,Ting Dang,Andreas Grammenos,Apinan Hasthanasombat,Andres Floto,Pietro Cicuta,Cecilia Mascolo 机构:Chlo¨e Brown,†, Department of Computer Science and Technology, University of Cambridge, UK, Department of Medicine, University of Cambridge, UK, Department of Physics, University of Cambridge, UK, ECS, University of Southampton, UK 链接:https://arxiv.org/abs/2106.15523 摘要:研究人员一直在为如何有效、经济、大规模地鉴别冠状病毒病(COVID-19)病例而斗争。最近的工作表明,基于音频的方法(收集呼吸音频数据(咳嗽、呼吸和声音)如何用于测试,但是缺乏对偏见和方法决定如何影响这些工具在实践中的性能的探索。在本文中,我们探讨了基于音频的COVID-19数字测试的真实性能。为了调查这一点,我们通过移动应用程序收集了大量的众包呼吸音频数据集,以及最新的COVID-19测试结果和症状作为基本事实。在收集到的数据集中,我们从2478名参与者中选取了5240个样本,并将其分成不同的参与者独立集进行模型开发和验证。其中,我们控制了潜在的混杂因素(如人口统计学和语言)。无偏模型以从呼吸、咳嗽和语音信号中提取的特征作为预测因子,AUC-ROC为0.71(95\%CI:0.65$-$0.77)。我们进一步探讨不同的不平衡分布,以显示偏见和参与者分裂如何影响绩效。最后,我们讨论了如何将所提出的现实模型整合到临床实践中,在人群规模上实现连续、普遍、可持续和负担得起的检测。 摘要:Researchers have been battling with the question of how we can identify Coronavirus disease (COVID-19) cases efficiently, affordably and at scale. Recent work has shown how audio based approaches, which collect respiratory audio data (cough, breathing and voice) can be used for testing, however there is a lack of exploration of how biases and methodological decisions impact these tools' performance in practice. In this paper, we explore the realistic performance of audio-based digital testing of COVID-19. To investigate this, we collected a large crowdsourced respiratory audio dataset through a mobile app, alongside recent COVID-19 test result and symptoms intended as a ground truth. Within the collected dataset, we selected 5,240 samples from 2,478 participants and split them into different participant-independent sets for model development and validation. Among these, we controlled for potential confounding factors (such as demographics and language). The unbiased model takes features extracted from breathing, coughs, and voice signals as predictors and yields an AUC-ROC of 0.71 (95\% CI: 0.65$-$0.77). We further explore different unbalanced distributions to show how biases and participant splits affect performance. Finally, we discuss how the realistic model presented could be integrated in clinical practice to realize continuous, ubiquitous, sustainable and affordable testing at population scale.

【2】 Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding 标题:对可分解任务端到端评价的再思考--以口语理解为例

作者:Siddhant Arora,Alissa Ostapenko,Vijay Viswanathan,Siddharth Dalmia,Florian Metze,Shinji Watanabe,Alan W Black 机构:Language Technologies Institute, Carnegie Mellon University, USA 备注:INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15065 摘要:可分解任务是复杂的,由一系列子任务组成。例如,口语意图预测结合了自动语音识别和自然语言理解。然而,现有的基准通常只提供表面级子任务的示例。因此,在这些基准上具有相似性能的模型在其他子任务上可能存在未观察到的性能差异。为了在竞争性的端到端架构之间进行有见地的比较,我们提出了一个框架来构造健壮的测试集,该框架使用子任务特定效用函数上的坐标上升。给定一个可分解任务的数据集,我们的方法为每个子任务创建一个测试集,以单独评估端到端模型的子组件。以口语理解为例,我们为Fluent语音命令和Snips-SmartLights数据集生成新的split。每一组都有两个测试集:一个测试集测试被试的自然语言理解能力,另一个测试集测试被试的语言处理能力。我们的拆分确定了在原始测试集上彼此相差不超过1%的端到端系统之间高达10%的性能差距。这些性能差距允许在不同体系结构之间进行更现实和可操作的比较,从而推动未来的模型开发。我们为社区发布分裂和工具。 摘要:Decomposable tasks are complex and comprise of a hierarchy of sub-tasks. Spoken intent prediction, for example, combines automatic speech recognition and natural language understanding. Existing benchmarks, however, typically hold out examples for only the surface-level sub-task. As a result, models with similar performance on these benchmarks may have unobserved performance differences on the other sub-tasks. To allow insightful comparisons between competitive end-to-end architectures, we propose a framework to construct robust test sets using coordinate ascent over sub-task specific utility functions. Given a dataset for a decomposable task, our method optimally creates a test set for each sub-task to individually assess sub-components of the end-to-end model. Using spoken language understanding as a case study, we generate new splits for the Fluent Speech Commands and Snips SmartLights datasets. Each split has two test sets: one with held-out utterances assessing natural language understanding abilities, and one with held-out speakers to test speech processing skills. Our splits identify performance gaps up to 10% between end-to-end systems that were within 1% of each other on the original test sets. These performance gaps allow more realistic and actionable comparisons between different architectures, driving future model development. We release our splits and tools for the community.

【3】 A Survey on Neural Speech Synthesis 标题:神经语音合成技术综述

作者:Xu Tan,Tao Qin,Frank Soong,Tie-Yan Liu 机构:Microsoft Research Asia 备注:A comprehensive survey on TTS, 63 pages, 18 tables, 7 figures, 447 references 链接:https://arxiv.org/abs/2106.15561 摘要:文本到语音(Text-to-speech,简称TTS)是语音、语言和机器学习领域的一个研究热点,在工业领域有着广泛的应用。近年来,随着深度学习和人工智能的发展,基于神经网络的TTS技术显著提高了合成语音的质量。本文对神经TTS进行了全面的综述,旨在对神经TTS的研究现状和发展趋势有一个很好的认识。我们重点讨论了神经TTS的关键组成部分,包括文本分析、声学模型和声码器,以及一些高级主题,包括快速TTS、低资源TTS、鲁棒TTS、表达TTS和自适应TTS等。我们进一步总结了与TTS相关的资源(如数据集,并讨论未来的研究方向。这项调查可以服务于学术研究人员和行业从业人员的TTS工作。 摘要:Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry. As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years. In this paper, we conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends. We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc. We further summarize resources related to TTS (e.g., datasets, opensource implementations) and discuss future research directions. This survey can serve both academic researchers and industry practitioners working on TTS.

【4】 N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement 标题:N-SINGER:一种用于发音增强的非自回归韩语歌声合成系统

作者:Gyeong-Hoon Lee,Tae-Woo Kim,Hanbin Bae,Min-Ji Lee,Young-Ik Kim,Hoon-Young Cho 机构:Speech AI Lab., AI Center, NCSOFT, Republic of Korea 备注:Accepted to INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15205 摘要:最近,端到端韩国人的歌唱声音系统被设计来产生真实的歌唱声音。然而,这些系统在发音准确性方面仍然缺乏鲁棒性。在本文中,我们提出N-Singer,一个非自回归的韩国人唱歌的声音系统,来合成准确和发音的韩国人唱歌的声音平行。N-Singer由基于Transformer的mel发生器、基于卷积网络的postnet和语音识别鉴别器组成。它可以通过以下方式作出贡献。首先,为了获得准确的发音,N-Singer分别对语言和音高信息进行建模,而没有其他声学特征。其次,为了实现改进的mel谱图,N-Singer结合了基于Transformer的模块和基于卷积网络的模块。第三,在对抗性训练中,利用语音感知条件鉴别器来捕获浊音段的谐波特征和清音段的噪声成分。实验结果表明,N-Singer模型能同时合成出自然的人声,比基线模型发音更准确。 摘要:Recently, end-to-end Korean singing voice systems have been designed to generate realistic singing voices. However, these systems still suffer from a lack of robustness in terms of pronunciation accuracy. In this paper, we propose N-Singer, a non-autoregressive Korean singing voice system, to synthesize accurate and pronounced Korean singing voices in parallel. N-Singer consists of a Transformer-based mel-generator, a convolutional network-based postnet, and voicing-aware discriminators. It can contribute in the following ways. First, for accurate pronunciation, N-Singer separately models linguistic and pitch information without other acoustic features. Second, to achieve improved mel-spectrograms, N-Singer uses a combination of Transformer-based modules and convolutional network-based modules. Third, in adversarial training, voicing-aware conditional discriminators are used to capture the harmonic features of voiced segments and noise components of unvoiced segments. The experimental results prove that N-Singer can synthesize a natural singing voice in parallel with a more accurate pronunciation than the baseline model.

【5】 DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection 标题:DCASE 2021任务3:用于复调声音事件定位和检测的谱时间对齐特征

作者:Thi Ngoc Tho Nguyen,Karn Watcharasupat,Ngoc Khanh Nguyen,Douglas L. Jones,Woon Seng Gan 机构: School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore., Dept. of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, IL, USA. 备注:5 pages, Technical Report for DCASE 2021 Challenge Task 3 链接:https://arxiv.org/abs/2106.15190 摘要:声事件定位与检测由声事件检测和到达方向估计两个子任务组成。声事件检测主要依靠时频模式来区分不同的声音类别,而到达方向估计则利用麦克风之间的幅度或相位差来估计声源方向。因此,通常很难同时对这两个子任务进行联合训练。我们提出了一种新的特征空间线索增强对数谱图(SALSA),它具有信号功率和源到达方向之间的精确时频映射。该特征包括多信道对数谱图以及估计的直接混响比和谱图上每个时频bin的空间协方差矩阵的主特征向量的归一化版本。在DCASE 2021数据集上进行的具有方向干扰的声音事件定位和检测的实验结果表明,基于这种新特征训练的深度学习模型的性能大大优于DCASE挑战基线。为了进一步提高DCASE声音事件定位和检测挑战的系统性能,我们结合了几个结构稍有不同的模型,对这些模型进行了新特性的训练。 摘要:Sound event localization and detection consists of two subtasks which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses magnitude or phase differences between microphones to estimate source directions. Therefore, it is often difficult to jointly train these two subtasks simultaneously. We propose a novel feature called spatial cue-augmented log-spectrogram (SALSA) with exact time-frequency mapping between the signal power and the source direction-of-arrival. The feature includes multichannel log-spectrograms stacked along with the estimated direct-to-reverberant ratio and a normalized version of the principal eigenvector of the spatial covariance matrix at each time-frequency bin on the spectrograms. Experimental results on the DCASE 2021 dataset for sound event localization and detection with directional interference showed that the deep learning-based models trained on this new feature outperformed the DCASE challenge baseline by a large margin. We combined several models with slightly different architectures that were trained on the new feature to further improve the system performances for the DCASE sound event localization and detection challenge.

【6】 GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis 标题:GanSpeech:高保真多说话人语音合成的对抗性训练

作者:Jinhyeok Yang,Jae-Sung Bae,Taejun Bak,Youngik Kim,Hoon-Young Cho 机构:Speech AI Lab, NCSOFT, Republic of Korea 备注:Accepted to INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15153 摘要:神经网络多说话人文本到语音(TTS)模型的最新进展使得用单一模型生成相当好的语音质量成为可能,并且使得用有限的训练数据合成说话人的语音成为可能。利用多说话人模型对目标说话人数据进行微调可以获得更好的语音质量,但与实际语音样本相比仍存在差距,且模型依赖于说话人。在这项工作中,我们提出了GANSpeech,这是一个高保真的多说话人TTS模型,它采用了非自回归多说话人TTS模型的对抗性训练方法。此外,本文还提出了一种简单而有效的对抗性训练中特征匹配丢失的自动缩放方法。在主观听力测试中,GANSpeech显著优于基线多说话人FastSpeech和FastSpeech2模型,并且显示出比特定说话人微调FastSpeech2更好的MOS分数。 摘要:Recent advances in neural multi-speaker text-to-speech (TTS) models have enabled the generation of reasonably good speech quality with a single model and made it possible to synthesize the speech of a speaker with limited training data. Fine-tuning to the target speaker data with the multi-speaker model can achieve better quality, however, there still exists a gap compared to the real speech sample and the model depends on the speaker. In this work, we propose GANSpeech, which is a high-fidelity multi-speaker TTS model that adopts the adversarial training method to a non-autoregressive multi-speaker TTS model. In addition, we propose simple but efficient automatic scaling methods for feature matching loss used in adversarial training. In the subjective listening tests, GANSpeech significantly outperformed the baseline multi-speaker FastSpeech and FastSpeech2 models, and showed a better MOS score than the speaker-specific fine-tuned FastSpeech2.

【7】 FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis 标题:FastPitchFormant:基于信源-过滤的语音合成分解建模

作者:Taejun Bak,Jae-Sung Bae,Hanbin Bae,Young-Ik Kim,Hoon-Young Cho 机构:Speech AI Lab, NCSOFT, Republic of Korea 备注:Accepted to INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15123 摘要:针对神经文本语音转换(TTS)模型,提出了基于声学特征的韵律建模与控制方法。韵律语音可以通过调节声学特征来产生。然而,基音偏移量大的合成语音存在音质下降、说话人特征变形等问题。针对这一问题,本文提出了一种基于源滤波器理论设计的基于前馈Transformer的TTS模型。这个模型被称为FastPitch共振峰,它有一个独特的结构,可以并行处理文本和声学特征。通过对每个特征分别建模,可以缓解模型学习两个特征之间关系的趋势。 摘要:Methods for modeling and controlling prosody with acoustic features have been proposed for neural text-to-speech (TTS) models. Prosodic speech can be generated by conditioning acoustic features. However, synthesized speech with a large pitch-shift scale suffers from audio quality degradation, and speaker characteristics deformation. To address this problem, we propose a feed-forward Transformer based TTS model that is designed based on the source-filter theory. This model, called FastPitchFormant, has a unique structure that handles text and acoustic features in parallel. With modeling each feature separately, the tendency that the model learns the relationship between two features can be mitigated.

3.eess.AS音频处理:

【1】 A Survey on Neural Speech Synthesis 标题:神经语音合成技术综述

作者:Xu Tan,Tao Qin,Frank Soong,Tie-Yan Liu 机构:Microsoft Research Asia 备注:A comprehensive survey on TTS, 63 pages, 18 tables, 7 figures, 447 references 链接:https://arxiv.org/abs/2106.15561 摘要:文本到语音(Text-to-speech,简称TTS)是语音、语言和机器学习领域的一个研究热点,在工业领域有着广泛的应用。近年来,随着深度学习和人工智能的发展,基于神经网络的TTS技术显著提高了合成语音的质量。本文对神经TTS进行了全面的综述,旨在对神经TTS的研究现状和发展趋势有一个很好的认识。我们重点讨论了神经TTS的关键组成部分,包括文本分析、声学模型和声码器,以及一些高级主题,包括快速TTS、低资源TTS、鲁棒TTS、表达TTS和自适应TTS等。我们进一步总结了与TTS相关的资源(如数据集,并讨论未来的研究方向。这项调查可以服务于学术研究人员和行业从业人员的TTS工作。 摘要:Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry. As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years. In this paper, we conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends. We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc. We further summarize resources related to TTS (e.g., datasets, opensource implementations) and discuss future research directions. This survey can serve both academic researchers and industry practitioners working on TTS.

【2】 N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement 标题:N-SINGER:一种用于发音增强的非自回归韩语歌声合成系统

作者:Gyeong-Hoon Lee,Tae-Woo Kim,Hanbin Bae,Min-Ji Lee,Young-Ik Kim,Hoon-Young Cho 机构:Speech AI Lab., AI Center, NCSOFT, Republic of Korea 备注:Accepted to INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15205 摘要:最近,端到端韩国人的歌唱声音系统被设计来产生真实的歌唱声音。然而,这些系统在发音准确性方面仍然缺乏鲁棒性。在本文中,我们提出N-Singer,一个非自回归的韩国人唱歌的声音系统,来合成准确和发音的韩国人唱歌的声音平行。N-Singer由基于Transformer的mel发生器、基于卷积网络的postnet和语音识别鉴别器组成。它可以通过以下方式作出贡献。首先,为了获得准确的发音,N-Singer分别对语言和音高信息进行建模,而没有其他声学特征。其次,为了实现改进的mel谱图,N-Singer结合了基于Transformer的模块和基于卷积网络的模块。第三,在对抗性训练中,利用语音感知条件鉴别器来捕获浊音段的谐波特征和清音段的噪声成分。实验结果表明,N-Singer模型能同时合成出自然的人声,比基线模型发音更准确。 摘要:Recently, end-to-end Korean singing voice systems have been designed to generate realistic singing voices. However, these systems still suffer from a lack of robustness in terms of pronunciation accuracy. In this paper, we propose N-Singer, a non-autoregressive Korean singing voice system, to synthesize accurate and pronounced Korean singing voices in parallel. N-Singer consists of a Transformer-based mel-generator, a convolutional network-based postnet, and voicing-aware discriminators. It can contribute in the following ways. First, for accurate pronunciation, N-Singer separately models linguistic and pitch information without other acoustic features. Second, to achieve improved mel-spectrograms, N-Singer uses a combination of Transformer-based modules and convolutional network-based modules. Third, in adversarial training, voicing-aware conditional discriminators are used to capture the harmonic features of voiced segments and noise components of unvoiced segments. The experimental results prove that N-Singer can synthesize a natural singing voice in parallel with a more accurate pronunciation than the baseline model.

【3】 DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection 标题:DCASE 2021任务3:用于复调声音事件定位和检测的谱时间对齐特征

作者:Thi Ngoc Tho Nguyen,Karn Watcharasupat,Ngoc Khanh Nguyen,Douglas L. Jones,Woon Seng Gan 机构: School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore., Dept. of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, IL, USA. 备注:5 pages, Technical Report for DCASE 2021 Challenge Task 3 链接:https://arxiv.org/abs/2106.15190 摘要:声事件定位与检测由声事件检测和到达方向估计两个子任务组成。声事件检测主要依靠时频模式来区分不同的声音类别,而到达方向估计则利用麦克风之间的幅度或相位差来估计声源方向。因此,通常很难同时对这两个子任务进行联合训练。我们提出了一种新的特征空间线索增强对数谱图(SALSA),它具有信号功率和源到达方向之间的精确时频映射。该特征包括多信道对数谱图以及估计的直接混响比和谱图上每个时频bin的空间协方差矩阵的主特征向量的归一化版本。在DCASE 2021数据集上进行的具有方向干扰的声音事件定位和检测的实验结果表明,基于这种新特征训练的深度学习模型的性能大大优于DCASE挑战基线。为了进一步提高DCASE声音事件定位和检测挑战的系统性能,我们结合了几个结构稍有不同的模型,对这些模型进行了新特性的训练。 摘要:Sound event localization and detection consists of two subtasks which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses magnitude or phase differences between microphones to estimate source directions. Therefore, it is often difficult to jointly train these two subtasks simultaneously. We propose a novel feature called spatial cue-augmented log-spectrogram (SALSA) with exact time-frequency mapping between the signal power and the source direction-of-arrival. The feature includes multichannel log-spectrograms stacked along with the estimated direct-to-reverberant ratio and a normalized version of the principal eigenvector of the spatial covariance matrix at each time-frequency bin on the spectrograms. Experimental results on the DCASE 2021 dataset for sound event localization and detection with directional interference showed that the deep learning-based models trained on this new feature outperformed the DCASE challenge baseline by a large margin. We combined several models with slightly different architectures that were trained on the new feature to further improve the system performances for the DCASE sound event localization and detection challenge.

【4】 GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis 标题:GanSpeech:高保真多说话人语音合成的对抗性训练

作者:Jinhyeok Yang,Jae-Sung Bae,Taejun Bak,Youngik Kim,Hoon-Young Cho 机构:Speech AI Lab, NCSOFT, Republic of Korea 备注:Accepted to INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15153 摘要:神经网络多说话人文本到语音(TTS)模型的最新进展使得用单一模型生成相当好的语音质量成为可能,并且使得用有限的训练数据合成说话人的语音成为可能。利用多说话人模型对目标说话人数据进行微调可以获得更好的语音质量,但与实际语音样本相比仍存在差距,且模型依赖于说话人。在这项工作中,我们提出了GANSpeech,这是一个高保真的多说话人TTS模型,它采用了非自回归多说话人TTS模型的对抗性训练方法。此外,本文还提出了一种简单而有效的对抗性训练中特征匹配丢失的自动缩放方法。在主观听力测试中,GANSpeech显著优于基线多说话人FastSpeech和FastSpeech2模型,并且显示出比特定说话人微调FastSpeech2更好的MOS分数。 摘要:Recent advances in neural multi-speaker text-to-speech (TTS) models have enabled the generation of reasonably good speech quality with a single model and made it possible to synthesize the speech of a speaker with limited training data. Fine-tuning to the target speaker data with the multi-speaker model can achieve better quality, however, there still exists a gap compared to the real speech sample and the model depends on the speaker. In this work, we propose GANSpeech, which is a high-fidelity multi-speaker TTS model that adopts the adversarial training method to a non-autoregressive multi-speaker TTS model. In addition, we propose simple but efficient automatic scaling methods for feature matching loss used in adversarial training. In the subjective listening tests, GANSpeech significantly outperformed the baseline multi-speaker FastSpeech and FastSpeech2 models, and showed a better MOS score than the speaker-specific fine-tuned FastSpeech2.

【5】 Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech 标题:用于非自回归文本到语音转换的分层上下文感知转换器

作者:Jae-Sung Bae,Tae-Jun Bak,Young-Sun Joo,Hoon-Young Cho 机构:Speech AI Lab, NCSOFT, Seongnam, Republic of Korea 备注:Accepted to INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15144 摘要:本文提出了一种改进基于Transformer的非自回归文语转换(TNA-TTS)模型建模性能的方法。尽管文本编码器和音频解码器处理不同类型和长度的数据(即文本和音频),但TNA-TTS模型的设计并未考虑这些变化。因此,为了提高TNA-TTS模型的建模性能,我们提出了一种基于层次变换结构的文本编码器和音频解码器,以适应每个模块的特点。对于文本编码器,我们约束每个自我注意层,以便编码器关注从局部到全局范围的文本序列。相反地,音频解码器将其自身注意力层限制为反向聚焦,即从全局范围聚焦到局部范围。此外,我们通过提供句子和单词级的基音作为条件,进一步提高了音频解码器的基音建模精度。各种客观和主观评价证实,该方法优于基准TNA-TTS。 摘要:In this paper, we propose methods for improving the modeling performance of a Transformer-based non-autoregressive text-to-speech (TNA-TTS) model. Although the text encoder and audio decoder handle different types and lengths of data (i.e., text and audio), the TNA-TTS models are not designed considering these variations. Therefore, to improve the modeling performance of the TNA-TTS model we propose a hierarchical Transformer structure-based text encoder and audio decoder that are designed to accommodate the characteristics of each module. For the text encoder, we constrain each self-attention layer so the encoder focuses on a text sequence from the local to the global scope. Conversely, the audio decoder constrains its self-attention layers to focus in the reverse direction, i.e., from global to local scope. Additionally, we further improve the pitch modeling accuracy of the audio decoder by providing sentence and word-level pitch as conditions. Various objective and subjective evaluations verified that the proposed method outperformed the baseline TNA-TTS.

【6】 FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis 标题:FastPitchFormant:基于信源-过滤的语音合成分解建模

作者:Taejun Bak,Jae-Sung Bae,Hanbin Bae,Young-Ik Kim,Hoon-Young Cho 机构:Speech AI Lab, NCSOFT, Republic of Korea 备注:Accepted to INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15123 摘要:针对神经文本语音转换(TTS)模型,提出了基于声学特征的韵律建模与控制方法。韵律语音可以通过调节声学特征来产生。然而,基音偏移量大的合成语音存在音质下降、说话人特征变形等问题。针对这一问题,本文提出了一种基于源滤波器理论设计的基于前馈Transformer的TTS模型。这个模型被称为FastPitch共振峰,它有一个独特的结构,可以并行处理文本和声学特征。通过对每个特征分别建模,可以缓解模型学习两个特征之间关系的趋势。 摘要:Methods for modeling and controlling prosody with acoustic features have been proposed for neural text-to-speech (TTS) models. Prosodic speech can be generated by conditioning acoustic features. However, synthesized speech with a large pitch-shift scale suffers from audio quality degradation, and speaker characteristics deformation. To address this problem, we propose a feed-forward Transformer based TTS model that is designed based on the source-filter theory. This model, called FastPitchFormant, has a unique structure that handles text and acoustic features in parallel. With modeling each feature separately, the tendency that the model learns the relationship between two features can be mitigated.

【7】 Sounds of COVID-19: exploring realistic performance of audio-based digital testing 标题:冠状病毒之声:探索基于音频的数字测试的真实性能

作者:Jing Han,Tong Xia,Dimitris Spathis,Erika Bondareva,Chloë Brown,Jagmohan Chauhan,Ting Dang,Andreas Grammenos,Apinan Hasthanasombat,Andres Floto,Pietro Cicuta,Cecilia Mascolo 机构:Chlo¨e Brown,†, Department of Computer Science and Technology, University of Cambridge, UK, Department of Medicine, University of Cambridge, UK, Department of Physics, University of Cambridge, UK, ECS, University of Southampton, UK 链接:https://arxiv.org/abs/2106.15523 摘要:研究人员一直在为如何有效、经济、大规模地鉴别冠状病毒病(COVID-19)病例而斗争。最近的工作表明,基于音频的方法(收集呼吸音频数据(咳嗽、呼吸和声音)如何用于测试,但是缺乏对偏见和方法决定如何影响这些工具在实践中的性能的探索。在本文中,我们探讨了基于音频的COVID-19数字测试的真实性能。为了调查这一点,我们通过移动应用程序收集了大量的众包呼吸音频数据集,以及最新的COVID-19测试结果和症状作为基本事实。在收集到的数据集中,我们从2478名参与者中选取了5240个样本,并将其分成不同的参与者独立集进行模型开发和验证。其中,我们控制了潜在的混杂因素(如人口统计学和语言)。无偏模型以从呼吸、咳嗽和语音信号中提取的特征作为预测因子,AUC-ROC为0.71(95\%CI:0.65$-$0.77)。我们进一步探讨不同的不平衡分布,以显示偏见和参与者分裂如何影响绩效。最后,我们讨论了如何将所提出的现实模型整合到临床实践中,在人群规模上实现连续、普遍、可持续和负担得起的检测。 摘要:Researchers have been battling with the question of how we can identify Coronavirus disease (COVID-19) cases efficiently, affordably and at scale. Recent work has shown how audio based approaches, which collect respiratory audio data (cough, breathing and voice) can be used for testing, however there is a lack of exploration of how biases and methodological decisions impact these tools' performance in practice. In this paper, we explore the realistic performance of audio-based digital testing of COVID-19. To investigate this, we collected a large crowdsourced respiratory audio dataset through a mobile app, alongside recent COVID-19 test result and symptoms intended as a ground truth. Within the collected dataset, we selected 5,240 samples from 2,478 participants and split them into different participant-independent sets for model development and validation. Among these, we controlled for potential confounding factors (such as demographics and language). The unbiased model takes features extracted from breathing, coughs, and voice signals as predictors and yields an AUC-ROC of 0.71 (95\% CI: 0.65$-$0.77). We further explore different unbalanced distributions to show how biases and participant splits affect performance. Finally, we discuss how the realistic model presented could be integrated in clinical practice to realize continuous, ubiquitous, sustainable and affordable testing at population scale.

【8】 Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding 标题:对可分解任务端到端评价的再思考--以口语理解为例

作者:Siddhant Arora,Alissa Ostapenko,Vijay Viswanathan,Siddharth Dalmia,Florian Metze,Shinji Watanabe,Alan W Black 机构:Language Technologies Institute, Carnegie Mellon University, USA 备注:INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15065 摘要:可分解任务是复杂的,由一系列子任务组成。例如,口语意图预测结合了自动语音识别和自然语言理解。然而,现有的基准通常只提供表面级子任务的示例。因此,在这些基准上具有相似性能的模型在其他子任务上可能存在未观察到的性能差异。为了在竞争性的端到端架构之间进行有见地的比较,我们提出了一个框架来构造健壮的测试集,该框架使用子任务特定效用函数上的坐标上升。给定一个可分解任务的数据集,我们的方法为每个子任务创建一个测试集,以单独评估端到端模型的子组件。以口语理解为例,我们为Fluent语音命令和Snips-SmartLights数据集生成新的split。每一组都有两个测试集:一个测试集测试被试的自然语言理解能力,另一个测试集测试被试的语言处理能力。我们的拆分确定了在原始测试集上彼此相差不超过1%的端到端系统之间高达10%的性能差距。这些性能差距允许在不同体系结构之间进行更现实和可操作的比较,从而推动未来的模型开发。我们为社区发布分裂和工具。 摘要:Decomposable tasks are complex and comprise of a hierarchy of sub-tasks. Spoken intent prediction, for example, combines automatic speech recognition and natural language understanding. Existing benchmarks, however, typically hold out examples for only the surface-level sub-task. As a result, models with similar performance on these benchmarks may have unobserved performance differences on the other sub-tasks. To allow insightful comparisons between competitive end-to-end architectures, we propose a framework to construct robust test sets using coordinate ascent over sub-task specific utility functions. Given a dataset for a decomposable task, our method optimally creates a test set for each sub-task to individually assess sub-components of the end-to-end model. Using spoken language understanding as a case study, we generate new splits for the Fluent Speech Commands and Snips SmartLights datasets. Each split has two test sets: one with held-out utterances assessing natural language understanding abilities, and one with held-out speakers to test speech processing skills. Our splits identify performance gaps up to 10% between end-to-end systems that were within 1% of each other on the original test sets. These performance gaps allow more realistic and actionable comparisons between different architectures, driving future model development. We release our splits and tools for the community.

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2021-06-30,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
访问管理
访问管理(Cloud Access Management,CAM)可以帮助您安全、便捷地管理对腾讯云服务和资源的访问。您可以使用CAM创建子用户、用户组和角色,并通过策略控制其访问范围。CAM支持用户和角色SSO能力,您可以根据具体管理场景针对性设置企业内用户和腾讯云的互通能力。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档