前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >金融/语音/音频处理学术速递[7.9]

金融/语音/音频处理学术速递[7.9]

作者头像
公众号-arXiv每日学术速递
发布2021-07-27 10:42:59
4520
发布2021-07-27 10:42:59
举报
文章被收录于专栏:arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

q-fin金融,共计8篇

cs.SD语音,共计5篇

eess.AS音频处理,共计5篇

1.q-fin金融:

【1】 On the Selection of Loss Severity Distributions to Model Operational Risk 标题:论操作风险建模中损失严重性分布的选择

作者:Daniel Hadley,Harry Joe,Natalia Nolde 机构:University of British Columbia 备注:None 链接:https://arxiv.org/abs/2107.03979 摘要:准确的操作风险模型对于银行和整个金融业为潜在的灾难性损失做准备是非常重要的。模拟运营损失的一种方法是损失分布法,该方法要求银行将运营损失分为风险类别,并为每个类别选择损失频率和严重程度分布。这种方法估计年度运营损失分布,银行必须留出资本,称为监管资本,等于该估计分布的0.999分位数。在实践中,这种方法可能会产生不稳定的监管资本计算,从一年到一年的选择损失严重性分布家庭的变化。本文提出了损失严重度数据的截断概率估计和年度损失数据的一致分位数评分函数,作为有效的严重度分布选择标准,可以得到更稳定的监管资本。此外,Sinh-arcSinh分布是另一个灵活的候选族,用于建模损失严重程度,可以使用最大似然法轻松估计。最后,我们建议收集低于最小报告阈值的损失频率,以便将损失严重性数据视为删失数据。 摘要:Accurate modeling of operational risk is important for a bank and the finance industry as a whole to prepare for potentially catastrophic losses. One approach to modeling operational is the loss distribution approach, which requires a bank to group operational losses into risk categories and select a loss frequency and severity distribution for each category. This approach estimates the annual operational loss distribution, and a bank must set aside capital, called regulatory capital, equal to the 0.999 quantile of this estimated distribution. In practice, this approach may produce unstable regulatory capital calculations from year-to-year as selected loss severity distribution families change. This paper presents truncation probability estimates for loss severity data and a consistent quantile scoring function on annual loss data as useful severity distribution selection criteria that may lead to more stable regulatory capital. Additionally, the Sinh-arcSinh distribution is another flexible candidate family for modeling loss severities that can be easily estimated using the maximum likelihood approach. Finally, we recommend that loss frequencies below the minimum reporting threshold be collected so that loss severity data can be treated as censored data.

【2】 Public preferences for marine plastic litter reductions across Europe 标题:欧洲各地公众对减少海洋塑料垃圾的偏好

作者:Salma Khedr,Katrin Rehdanz,Roy Brouwer,Hanna Dijkstra,Sem Duijndam,Pieter van Beukering,Ikechukwu C. Okoli 机构: Kiel University, Department of Economics, Wilhelm-Seelig-Platz , Kiel, Germany, Department of Economics and the Water Institute, University of Waterloo, University Avenue, West, Waterloo, Ontario, N,L ,G, Canada 链接:https://arxiv.org/abs/2107.03957 摘要:塑料污染是当今影响海洋环境最具挑战性的问题之一。基于一个覆盖四个欧洲海域和八个欧洲国家的独特数据集,本文补充了与海洋垃圾管理的社会福利效应相关的有限经验证据基础。根据欧洲海洋战略框架指令的要求,我们使用离散选择实验来引出公众对宏观和微观塑料去除的支付意愿(WTP),以在整个欧洲海域实现良好的环境状况。使用一个共同的估价设计和遵循最佳实践准则,我们在国家、海洋和政策背景之间进行有意义的比较。欧洲公民强烈希望通过清除微观和宏观塑料垃圾来改善海洋环境的环境状况,这有利于泛欧的做法。然而,欧洲各国和各海域的水处理厂公共估算值存在显著差异。我们解释原因并讨论对决策的影响。 摘要:Plastic pollution is one of the most challenging problems affecting the marine environment of our time. Based on a unique dataset covering four European seas and eight European countries, this paper adds to the limited empirical evidence base related to the societal welfare effects of marine litter management. We use a discrete choice experiment to elicit public willingness-to-pay (WTP) for macro and micro plastic removal to achieve Good Environmental Status across European seas as required by the European Marine Strategy Framework Directive. Using a common valuation design and following best-practice guidelines, we draw meaningful comparisons between countries, seas and policy contexts. European citizens have strong preferences to improve the environmental status of the marine environment by removing both micro and macro plastic litter favouring a pan-European approach. However, public WTP estimates differ significantly across European countries and seas. We explain why and discuss implications for policymaking.

【3】 Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities 标题:衡量金融时间序列相似性以识别有利可图的股票市场机会

作者:Rian Dolphin,Barry Smyth,Yang Xu,Ruihai Dong 机构: School of Computer Science, University College Dublin, Dublin, Ireland, Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland, School of Economics and Management, Beihang University, Beijing, China 备注:15 pages. Accepted for presentation at the International Conference on Case-Based Reasoning 2021 (ICCBR) 链接:https://arxiv.org/abs/2107.03926 摘要:由于市场的高度随机性以及影响交易量和价格的大量因素和事件,预测股票收益率是一个具有挑战性的问题。然而,它已经被证明是机器学习研究的一个有吸引力的目标,因为即使是适度的预测精度也有可能带来显著的好处。在本文中,我们描述了一个基于案例的推理方法来预测股市回报率仅使用历史定价数据。我们认为,基于案例的股票预测的障碍之一是,在确定类似的定价历史作为未来预测的基础时,缺乏合适的相似性度量——传统的欧几里德和基于相关性的方法由于各种原因而无效——在这方面,这项工作的一个关键贡献是开发了一种新的用于比较历史定价数据的相似性度量。通过与各种常规基准的比较,我们展示了这种度量和基于实例的方法在实际应用中的优势。 摘要:Forecasting stock returns is a challenging problem due to the highly stochastic nature of the market and the vast array of factors and events that can influence trading volume and prices. Nevertheless it has proven to be an attractive target for machine learning research because of the potential for even modest levels of prediction accuracy to deliver significant benefits. In this paper, we describe a case-based reasoning approach to predicting stock market returns using only historical pricing data. We argue that one of the impediments for case-based stock prediction has been the lack of a suitable similarity metric when it comes to identifying similar pricing histories as the basis for a future prediction -- traditional Euclidean and correlation based approaches are not effective for a variety of reasons -- and in this regard, a key contribution of this work is the development of a novel similarity metric for comparing historical pricing data. We demonstrate the benefits of this metric and the case-based approach in a real-world application in comparison to a variety of conventional benchmarks.

【4】 Financial Markets and the Phase Transition between Water and Steam 标题:金融市场与水与蒸汽的相变

作者:Christof Schmidhuber 机构:Zurich University of Applied Sciences, arXiv:,.,v, [q-fin.ST] , Jul 备注:32 pages, 7 figures 链接:https://arxiv.org/abs/2107.03857 摘要:我们提出了一个金融市场的格子气模型来解释以前对趋势和反转相互作用的经验观察。资产的份额是由分布在隐藏的投资者社会网络中的气体分子来模拟的。由于羊群行为,网络中的邻居倾向于排列他们的位置。该模型等价于此网络上的Ising模型,具有磁化作用,使资产价格偏离其价值。对于N个独立资产,推广到O(N)向量模型。在有效市场中,系统被驱动到临界温度。在那里,它的特点是长程关联和普遍的临界指数,类似于水和蒸汽之间的二阶相变。利用重整化群,我们证明了这些临界指数意味着对金融市场收益率自相关性的预测。对于一个简单的网络拓扑,与观测的一致性意味着网络的分形维数为3.3,相关时间为10年。虽然简单模型与长时间尺度上的市场数据非常吻合,但它无法解释观察到的从一个月到一年的市场趋势。因此,在下一步中,该方法应扩展到其他关键动态模型和一般网络拓扑。它打开了一扇大门,从趋势和逆转的可观察相互作用中,间接衡量隐藏的投资者社会网络的普遍属性。 摘要:We present a lattice gas model of financial markets to explain previous empirical observations of the interplay of trends and reversion. The shares of an asset are modeled by gas molecules that are distributed across a hidden social network of investors. Neighbors in the network tend to align their positions due to herding behavior. The model is equivalent to the Ising model on this network, with the magnetization in the role of the deviation of the asset price from its value. For N independent assets, it generalizes to an O(N) vector model. In efficient markets, the system is driven to its critical temperature. There, it is characterized by long-range correlations and universal critical exponents, in analogy with the second-order phase transition between water and steam. Using the renormalization group, we show that these critical exponents imply predictions for the auto-correlations of financial market returns. For a simple network topology, consistency with observation implies a fractal dimension of the network of 3.3 and a correlation time of 10 years. While the simple model agrees well with market data on long time scales, it cannot explain the observed market trends over time horizons from one month to one year. In a next step, the approach should therefore be extended to other models of critical dynamics and to general network topologies. It opens the door for indirectly measuring universal properties of the hidden social network of investors from the observable interplay of trends and reversion.

【5】 Limited intelligence and performance-based compensation: An agent-based model of the hidden action problem 标题:有限智能与绩效补偿:基于Agent的隐蔽行为问题模型

作者:Patrick Reinwald,Stephan Leitner,Friederike Wall 机构:University of Klagenfurt, Universit¨atsstraße ,-, Klagenfurt, Austria 链接:https://arxiv.org/abs/2107.03764 摘要:经济决策者的模型通常包括理想化的假设,例如理性、完美的预见性以及获取所有相关信息的途径。这些假设通常保证了模型的内在有效性,但同时也限制了模型解释经验现象的能力。本文特别研究了隐藏行为问题模型,该模型针对委托人将任务分配给代理人,而执行任务所采取的行为不被委托人观察的情况,提出了一种基于性能的最优共享规则。我们遵循代理化的方法,引入了一个基于代理的隐藏行为问题,其中关于委托人和代理的一些理想化假设被放宽,使得他们只能获得有限的信息,被赋予了获取信息的能力,把它存储在他们(有限的)记忆中,然后从中检索出来。我们遵循一种进化的方法,分析了委托人和代理人的决策是如何随着时间的推移影响共享规则、任务性能及其效用的。结果表明,最优共享规则并没有出现。委托人的效用对智能的变化相对稳健,而代理人的效用对智能的局限性高度敏感。委托人的行为似乎是由机会主义驱动的,因为她从代理人那里扣压溢价,以确保自己的最佳效用。 摘要:Models of economic decision makers often include idealized assumptions, such as rationality, perfect foresight, and access to all relevant pieces of information. These assumptions often assure the models' internal validity, but, at the same time, might limit the models' power to explain empirical phenomena. This paper is particularly concerned with the model of the hidden action problem, which proposes an optimal performance-based sharing rule for situations in which a principal assigns a task to an agent, and the action taken to carry out this task is not observable by the principal. We follow the agentization approach and introduce an agent-based version of the hidden action problem, in which some of the idealized assumptions about the principal and the agent are relaxed so that they only have limited information access, are endowed with the ability to gain information, and store it in and retrieve it from their (limited) memory. We follow an evolutionary approach and analyze how the principal's and the agent's decisions affect the sharing rule, task performance, and their utility over time. The results indicate that the optimal sharing rule does not emerge. The principal's utility is relatively robust to variations in intelligence, while the agent's utility is highly sensitive to limitations in intelligence. The principal's behavior appears to be driven by opportunism, as she withholds a premium from the agent to assure the optimal utility for herself.

【6】 Numerical approximation of hybrid Poisson-jump Ait-Sahalia-type interest rate model with delay 标题:带时滞的混合Poisson-Jumping Ait-Sahalia利率模型的数值逼近

作者:Emmanuel Coffie 机构:Department of Mathematics and Statistics, University of Strathclyde, Glasgow G,XH, U.K. 备注:arXiv admin note: text overlap with arXiv:2103.07651 链接:https://arxiv.org/abs/2107.03712 摘要:虽然最初的Ait-Sahalia利率模型被认为是描述利率时间序列演变的一个相当有用的模型,但它可能不具备足够的规范来解释利率对经验现象的反应,如波动性“偏斜”和“微笑”、跳跃行为、市场监管失误,经济危机、金融冲突等。本文的目的是提出这个模型的一个修正版本,通过加入额外的特征来充分描述这些经验现象。此外,由于该模型缺乏一个封闭形式的解,我们采用了几种新的截断EM技术对该模型进行了数值研究,并在montecarlo框架下证明了该方案的有效性,以计算债券和路径依赖障碍期权等金融量。 摘要:While the original Ait-Sahalia interest rate model has been found considerable use as a model for describing time series evolution of interest rates, it may not possess adequate specifications to explain responses of interest rates to empirical phenomena such as volatility 'skews' and 'smiles', jump behaviour, market regulatory lapses, economic crisis, financial clashes among others collectively. The aim of this paper is to propose a modified version of this model by incorporating additional features to collectively describe these empirical phenomena adequately. Moreover, due to lack of a closed-form solution to the proposed model, we employ several new truncated EM techniques to numerically study this model and justify the scheme within Monte Carlo framework to compute some financial quantities such as a bond and a path-dependent barrier option.

【7】 Inference and forecasting for continuous-time integer-valued trawl processes and their use in financial economics 标题:连续时间整值拖网过程的推断和预测及其在金融经济中的应用

作者:Mikkel Bennedsen,Asger Lunde,Neil Shephard,Almut E. D. Veraart 链接:https://arxiv.org/abs/2107.03674 摘要:本文发展了基于似然的连续时间整值拖网过程的估计、推理、模型选择和预测方法。整数值拖曳过程的全部可能性通常是高度难处理的,激励使用复合似然方法,在这里我们考虑成对似然来代替完全似然。最大化数据的两两似然得到了模型参数向量的一个估计,并证明了该估计的相合性和渐近正态性。同样的方法允许我们发展概率预测方法,可以用来构造整数值时间序列的预测分布。在一个模拟研究中,我们记录了基于似然估计的良好的有限样本性能和相关的模型选择过程。最后,我们将这些方法应用于金融买卖价差数据的建模和预测中,我们发现仔细地建模这些数据的边际分布和自相关结构是有益的。我们认为整数值拖网过程特别适合这种情况。 摘要:This paper develops likelihood-based methods for estimation, inference, model selection, and forecasting of continuous-time integer-valued trawl processes. The full likelihood of integer-valued trawl processes is, in general, highly intractable, motivating the use of composite likelihood methods, where we consider the pairwise likelihood in lieu of the full likelihood. Maximizing the pairwise likelihood of the data yields an estimator of the parameter vector of the model, and we prove consistency and asymptotic normality of this estimator. The same methods allow us to develop probabilistic forecasting methods, which can be used to construct the predictive distribution of integer-valued time series. In a simulation study, we document good finite sample performance of the likelihood-based estimator and the associated model selection procedure. Lastly, the methods are illustrated in an application to modelling and forecasting financial bid-ask spread data, where we find that it is beneficial to carefully model both the marginal distribution and the autocorrelation structure of the data. We argue that integer-valued trawl processes are especially well-suited in such situations.

【8】 Adaptive Stress Testing for Adversarial Learning in a Financial Environment 标题:金融环境下对抗性学习的自适应压力测试

作者:Khalid El-Awady 链接:https://arxiv.org/abs/2107.03577 摘要:我们演示了如何使用自适应压力测试来检测和解决金融环境中的潜在漏洞。我们开发了一个简化的信用卡欺诈检测模型,该模型利用基于历史支付交易数据和业务规则的线性回归分类器。然后,我们应用被称为自适应压力测试(adaptivestress Testing)的强化学习模型来训练一个可以被认为是潜在欺诈者的代理,以找到最有可能导致系统失败的路径——成功地欺诈系统。我们展示了这种最可能的故障路径与分类器限制之间的联系,并讨论了如何进一步增强欺诈检测系统的业务规则以减轻这些故障模式。 摘要:We demonstrate the use of Adaptive Stress Testing to detect and address potential vulnerabilities in a financial environment. We develop a simplified model for credit card fraud detection that utilizes a linear regression classifier based on historical payment transaction data coupled with business rules. We then apply the reinforcement learning model known as Adaptive Stress Testing to train an agent, that can be thought of as a potential fraudster, to find the most likely path to system failure -- successfully defrauding the system. We show the connection between this most likely failure path and the limits of the classifier and discuss how the fraud detection system's business rules can be further augmented to mitigate these failure modes.

2.cs.SD语音:

【1】 Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil 标题:多语言演讲评价:以英语、马来语和泰米尔语为例

作者:Huayun Zhang,Ke Shi,Nancy F. Chen 机构:Institute for Infocomm Research, ASTAR, Singapore 备注:Accepted at INTERSPEECH 2021 链接:https://arxiv.org/abs/2107.03675 摘要:语音评价是计算机辅助语言学习的重要组成部分。虽然英语语音评价已经很流行,但低资源语言的自动语音评分仍然很有挑战性。这方面的工作主要集中在单语设计和源于英语等资源丰富的语言的手工特征上。这种方法往往难以推广到其他语言,特别是如果我们也想考虑超音格质量,如节奏。在这项工作中,我们研究了三种不同的语言,具有不同的节奏模式:英语(重音计时),马来语(音节计时),泰米尔语(莫拉计时)。我们利用音乐处理和向量表示学习的启发,开发了鲁棒的特征表示。实证检验表明,在预测语音、节奏和语调表现时,这三种语言的收益是一致的。 摘要:Speech evaluation is an essential component in computer-assisted language learning (CALL). While speech evaluation on English has been popular, automatic speech scoring on low resource languages remains challenging. Work in this area has focused on monolingual specific designs and handcrafted features stemming from resource-rich languages like English. Such approaches are often difficult to generalize to other languages, especially if we also want to consider suprasegmental qualities such as rhythm. In this work, we examine three different languages that possess distinct rhythm patterns: English (stress-timed), Malay (syllable-timed), and Tamil (mora-timed). We exploit robust feature representations inspired by music processing and vector representation learning. Empirical validations show consistent gains for all three languages when predicting pronunciation, rhythm and intonation performance.

【2】 BumbleBee: A Transformer for Music 标题:大黄蜂:音乐的Transformer

作者:Lucas Fenaux,Maria Juliana Quintero 机构:University of Toronto 备注:8 pages, 3 figures 链接:https://arxiv.org/abs/2107.03443 摘要:我们将介绍大黄蜂,Transformer模型,将产生MIDI音乐数据。我们将通过实现一个使用伸缩滑动窗口计算注意层的longformer生成模型来解决应用于长序列的Transformer问题。我们将把我们的结果与音乐Transformer和长-短期记忆(LSTM)的结果进行比较,以验证我们的结果。该分析将使用钢琴MIDI文件进行,特别是JSB合唱团数据集,该数据集已用于其他研究工作(Huang et al.,2018) 摘要:We will introduce BumbleBee, a transformer model that will generate MIDI music data . We will tackle the issue of transformers applied to long sequences by implementing a longformer generative model that uses dilating sliding windows to compute the attention layers. We will compare our results to that of the music transformer and Long-Short term memory (LSTM) to benchmark our results. This analysis will be performed using piano MIDI files, in particular , the JSB Chorales dataset that has already been used for other research works (Huang et al., 2018)

【3】 Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases 标题:比较监督模型和学习语音表示对选定短语上的混乱语音的可理解性进行分类

作者:Subhashini Venugopalan,Joel Shor,Manoj Plakal,Jimmy Tobin,Katrin Tomanek,Jordan R. Green,Michael P. Brenner 机构:Google Research,MGH Institute of Health Professions USA,Harvard University USA 备注:Accepted at INTERSPEECH 2021 链接:https://arxiv.org/abs/2107.03985 摘要:语音障碍的自动分类可以为识别语音障碍的存在和严重程度提供一个客观的工具。分类方法也有助于识别难以识别的语音样本,使ASR系统了解受损语音的各种表现形式。在这里,我们发展和比较不同的深度学习技术来分类所选短语的可懂度。我们从661名自述有29个单词或短语的患者中收集了样本,由言语语言病理学家使用五点Likert量表对这些患者的整体可理解性进行了评估。然后我们评估了使用3种方法开发的分类器:(1)为任务训练的卷积神经网络(CNN),(2)使用无监督目标的CNN非语义语音表示训练的分类器[1],(3)基于典型语音训练的ASR系统中嵌入的声学(编码器)分类器[2]。我们发现,ASR编码器的嵌入在检测和分类无序语音方面明显优于其他两种。进一步的分析表明,ASR嵌入的语音是按口语短语聚类的,而非语义嵌入的语音是按说话人聚类的。此外,较长的短语比单个单词更能说明可理解性缺陷。 摘要:Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity of speech impairment. Classification approaches can also help identify hard-to-recognize speech samples to teach ASR systems about the variable manifestations of impaired speech. Here, we develop and compare different deep learning techniques to classify the intelligibility of disordered speech on selected phrases. We collected samples from a diverse set of 661 speakers with a variety of self-reported disorders speaking 29 words or phrases, which were rated by speech-language pathologists for their overall intelligibility using a five-point Likert scale. We then evaluated classifiers developed using 3 approaches: (1) a convolutional neural network (CNN) trained for the task, (2) classifiers trained on non-semantic speech representations from CNNs that used an unsupervised objective [1], and (3) classifiers trained on the acoustic (encoder) embeddings from an ASR system trained on typical speech [2]. We found that the ASR encoder's embeddings considerably outperform the other two on detecting and classifying disordered speech. Further analysis shows that the ASR embeddings cluster speech by the spoken phrase, while the non-semantic embeddings cluster speech by speaker. Also, longer phrases are more indicative of intelligibility deficits than single words.

【4】 Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer 标题:表现性语音转换:说话人身份与情感风格转换的联合框架

作者:Zongyang Du,Berrak Sisman,Kun Zhou,Haizhou Li 机构:Singapore University of Technology and Design, Singapore, National University of Singapore, Singapore 备注:Submitted to ASRU 2021 链接:https://arxiv.org/abs/2107.03748 摘要:传统的语音转换主要是针对中性表达的语音进行说话人身份转换。我们注意到,情感表达在日常交际中起着至关重要的作用,而情感的言语风格也可能依赖于说话人。本文研究了说话人身份与说话人情感风格的联合转换技术,即表达性语音转换。我们提出了一个基于StarGAN的框架来学习不同说话人之间的多对多映射,该框架考虑了说话人依赖的情感风格,而不需要平行数据。为了实现这一点,我们条件发生器的情感风格编码来自预先训练的语音情感识别(SER)模型。实验验证了该框架在客观和主观评价中的有效性。据我们所知,这是第一次研究表达性语音转换。 摘要:Traditional voice conversion(VC) has been focused on speaker identity conversion for speech with a neutral expression. We note that emotional expression plays an essential role in daily communication, and the emotional style of speech can be speaker-dependent. In this paper, we study the technique to jointly convert the speaker identity and speaker-dependent emotional style, that is called expressive voice conversion. We propose a StarGAN-based framework to learn a many-to-many mapping across different speakers, that takes into account speaker-dependent emotional style without the need for parallel data. To achieve this, we condition the generator on emotional style encoding derived from a pre-trained speech emotion recognition(SER) model. The experiments validate the effectiveness of our proposed framework in both objective and subjective evaluations. To our best knowledge, this is the first study on expressive voice conversion.

【5】 Heavily Augmented Sound Event Detection utilizing Weak Predictions 标题:使用弱预测的重扩大声事件检测

作者:Hyeonuk Nam,Byeong-Yun Ko,Gyeong-Tae Lee,Seong-Hu Kim,Won-Ho Jung,Sang-Min Choi,Yong-Hwa Park 机构:Korea Advanced Institute of Science and Technology, Department of Mechanical Engineering, Daehak-ro, Yuseong-gu, Daejeon , South Korea 备注:Won 3rd place on IEEE DCASE 2021 Task 4 链接:https://arxiv.org/abs/2107.03649 摘要:声事件检测(SED)系统的性能受到难以生成大型强标记数据集的限制。在这项工作中,我们使用了两种主要的方法来克服强标记数据的缺乏。首先,我们对输入特征进行了大量的数据扩充。所使用的数据增强方法不仅包括用于语音/音频领域的传统方法,还包括我们提出的FilterAugment方法。其次,我们提出了两种利用弱预测来提高弱监督SED性能的方法。结果表明,在DESED真实验证数据集上,最佳PSDS1为0.4336,最佳PSDS2为0.8161。这项工作是提交给DCASE 2021任务4和排名第3位。 摘要:The performances of Sound Event Detection (SED) systems are greatly limited by the difficulty in generating large strongly labeled dataset. In this work, we used two main approaches to overcome the lack of strongly labeled data. First, we applied heavy data augmentation on input features. Data augmentation methods used include not only conventional methods used in speech/audio domains but also our proposed method named FilterAugment. Second, we propose two methods to utilize weak predictions to enhance weakly supervised SED performance. As a result, we obtained the best PSDS1 of 0.4336 and best PSDS2 of 0.8161 on the DESED real validation dataset. This work is submitted to DCASE 2021 Task4 and is ranked on the 3rd place.

3.eess.AS音频处理:

【1】 Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases 标题:比较监督模型和学习语音表示对选定短语上的混乱语音的可理解性进行分类

作者:Subhashini Venugopalan,Joel Shor,Manoj Plakal,Jimmy Tobin,Katrin Tomanek,Jordan R. Green,Michael P. Brenner 机构:Google Research,MGH Institute of Health Professions USA,Harvard University USA 备注:Accepted at INTERSPEECH 2021 链接:https://arxiv.org/abs/2107.03985 摘要:语音障碍的自动分类可以为识别语音障碍的存在和严重程度提供一个客观的工具。分类方法也有助于识别难以识别的语音样本,使ASR系统了解受损语音的各种表现形式。在这里,我们发展和比较不同的深度学习技术来分类所选短语的可懂度。我们从661名自述有29个单词或短语的患者中收集了样本,由言语语言病理学家使用五点Likert量表对这些患者的整体可理解性进行了评估。然后我们评估了使用3种方法开发的分类器:(1)为任务训练的卷积神经网络(CNN),(2)使用无监督目标的CNN非语义语音表示训练的分类器[1],(3)基于典型语音训练的ASR系统中嵌入的声学(编码器)分类器[2]。我们发现,ASR编码器的嵌入在检测和分类无序语音方面明显优于其他两种。进一步的分析表明,ASR嵌入的语音是按口语短语聚类的,而非语义嵌入的语音是按说话人聚类的。此外,较长的短语比单个单词更能说明可理解性缺陷。 摘要:Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity of speech impairment. Classification approaches can also help identify hard-to-recognize speech samples to teach ASR systems about the variable manifestations of impaired speech. Here, we develop and compare different deep learning techniques to classify the intelligibility of disordered speech on selected phrases. We collected samples from a diverse set of 661 speakers with a variety of self-reported disorders speaking 29 words or phrases, which were rated by speech-language pathologists for their overall intelligibility using a five-point Likert scale. We then evaluated classifiers developed using 3 approaches: (1) a convolutional neural network (CNN) trained for the task, (2) classifiers trained on non-semantic speech representations from CNNs that used an unsupervised objective [1], and (3) classifiers trained on the acoustic (encoder) embeddings from an ASR system trained on typical speech [2]. We found that the ASR encoder's embeddings considerably outperform the other two on detecting and classifying disordered speech. Further analysis shows that the ASR embeddings cluster speech by the spoken phrase, while the non-semantic embeddings cluster speech by speaker. Also, longer phrases are more indicative of intelligibility deficits than single words.

【2】 Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer 标题:表现性语音转换:说话人身份与情感风格转换的联合框架

作者:Zongyang Du,Berrak Sisman,Kun Zhou,Haizhou Li 机构:Singapore University of Technology and Design, Singapore, National University of Singapore, Singapore 备注:Submitted to ASRU 2021 链接:https://arxiv.org/abs/2107.03748 摘要:传统的语音转换主要是针对中性表达的语音进行说话人身份转换。我们注意到,情感表达在日常交际中起着至关重要的作用,而情感的言语风格也可能依赖于说话人。本文研究了说话人身份与说话人情感风格的联合转换技术,即表达性语音转换。我们提出了一个基于StarGAN的框架来学习不同说话人之间的多对多映射,该框架考虑了说话人依赖的情感风格,而不需要平行数据。为了实现这一点,我们条件发生器的情感风格编码来自预先训练的语音情感识别(SER)模型。实验验证了该框架在客观和主观评价中的有效性。据我们所知,这是第一次研究表达性语音转换。 摘要:Traditional voice conversion(VC) has been focused on speaker identity conversion for speech with a neutral expression. We note that emotional expression plays an essential role in daily communication, and the emotional style of speech can be speaker-dependent. In this paper, we study the technique to jointly convert the speaker identity and speaker-dependent emotional style, that is called expressive voice conversion. We propose a StarGAN-based framework to learn a many-to-many mapping across different speakers, that takes into account speaker-dependent emotional style without the need for parallel data. To achieve this, we condition the generator on emotional style encoding derived from a pre-trained speech emotion recognition(SER) model. The experiments validate the effectiveness of our proposed framework in both objective and subjective evaluations. To our best knowledge, this is the first study on expressive voice conversion.

【3】 Heavily Augmented Sound Event Detection utilizing Weak Predictions 标题:使用弱预测的重扩大声事件检测

作者:Hyeonuk Nam,Byeong-Yun Ko,Gyeong-Tae Lee,Seong-Hu Kim,Won-Ho Jung,Sang-Min Choi,Yong-Hwa Park 机构:Korea Advanced Institute of Science and Technology, Department of Mechanical Engineering, Daehak-ro, Yuseong-gu, Daejeon , South Korea 备注:Won 3rd place on IEEE DCASE 2021 Task 4 链接:https://arxiv.org/abs/2107.03649 摘要:声事件检测(SED)系统的性能受到难以生成大型强标记数据集的限制。在这项工作中,我们使用了两种主要的方法来克服强标记数据的缺乏。首先,我们对输入特征进行了大量的数据扩充。所使用的数据增强方法不仅包括用于语音/音频领域的传统方法,还包括我们提出的FilterAugment方法。其次,我们提出了两种利用弱预测来提高弱监督SED性能的方法。结果表明,在DESED真实验证数据集上,最佳PSDS1为0.4336,最佳PSDS2为0.8161。这项工作是提交给DCASE 2021任务4和排名第3位。 摘要:The performances of Sound Event Detection (SED) systems are greatly limited by the difficulty in generating large strongly labeled dataset. In this work, we used two main approaches to overcome the lack of strongly labeled data. First, we applied heavy data augmentation on input features. Data augmentation methods used include not only conventional methods used in speech/audio domains but also our proposed method named FilterAugment. Second, we propose two methods to utilize weak predictions to enhance weakly supervised SED performance. As a result, we obtained the best PSDS1 of 0.4336 and best PSDS2 of 0.8161 on the DESED real validation dataset. This work is submitted to DCASE 2021 Task4 and is ranked on the 3rd place.

【4】 Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil 标题:多语言演讲评价:以英语、马来语和泰米尔语为例

作者:Huayun Zhang,Ke Shi,Nancy F. Chen 机构:Institute for Infocomm Research, ASTAR, Singapore 备注:Accepted at INTERSPEECH 2021 链接:https://arxiv.org/abs/2107.03675 摘要:语音评价是计算机辅助语言学习的重要组成部分。虽然英语语音评价已经很流行,但低资源语言的自动语音评分仍然很有挑战性。这方面的工作主要集中在单语设计和源于英语等资源丰富的语言的手工特征上。这种方法往往难以推广到其他语言,特别是如果我们也想考虑超音格质量,如节奏。在这项工作中,我们研究了三种不同的语言,具有不同的节奏模式:英语(重音计时),马来语(音节计时),泰米尔语(莫拉计时)。我们利用音乐处理和向量表示学习的启发,开发了鲁棒的特征表示。实证检验表明,在预测语音、节奏和语调表现时,这三种语言的收益是一致的。 摘要:Speech evaluation is an essential component in computer-assisted language learning (CALL). While speech evaluation on English has been popular, automatic speech scoring on low resource languages remains challenging. Work in this area has focused on monolingual specific designs and handcrafted features stemming from resource-rich languages like English. Such approaches are often difficult to generalize to other languages, especially if we also want to consider suprasegmental qualities such as rhythm. In this work, we examine three different languages that possess distinct rhythm patterns: English (stress-timed), Malay (syllable-timed), and Tamil (mora-timed). We exploit robust feature representations inspired by music processing and vector representation learning. Empirical validations show consistent gains for all three languages when predicting pronunciation, rhythm and intonation performance.

【5】 BumbleBee: A Transformer for Music 标题:大黄蜂:音乐的Transformer

作者:Lucas Fenaux,Maria Juliana Quintero 机构:University of Toronto 备注:8 pages, 3 figures 链接:https://arxiv.org/abs/2107.03443 摘要:我们将介绍大黄蜂,Transformer模型,将产生MIDI音乐数据。我们将通过实现一个使用伸缩滑动窗口计算注意层的longformer生成模型来解决应用于长序列的Transformer问题。我们将把我们的结果与音乐Transformer和长-短期记忆(LSTM)的结果进行比较,以验证我们的结果。该分析将使用钢琴MIDI文件进行,特别是JSB合唱团数据集,该数据集已用于其他研究工作(Huang et al.,2018) 摘要:We will introduce BumbleBee, a transformer model that will generate MIDI music data . We will tackle the issue of transformers applied to long sequences by implementing a longformer generative model that uses dilating sliding windows to compute the attention layers. We will compare our results to that of the music transformer and Long-Short term memory (LSTM) to benchmark our results. This analysis will be performed using piano MIDI files, in particular , the JSB Chorales dataset that has already been used for other research works (Huang et al., 2018)

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-07-09,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档