金融/语音/音频处理学术速递[9.1]

公众号-arXiv每日学术速递

发布于 2021-09-16 14:54:26

4780

发布于 2021-09-16 14:54:26

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

q-fin金融，共计4篇

cs.SD语音，共计3篇

eess.AS音频处理，共计5篇

1.q-fin金融:

【1】 Is happiness u-shaped in age everywhere? A methodological reconsideration for Europe 标题：幸福在年龄上是不是到处都是U型的？对欧洲的方法论反思链接：https://arxiv.org/abs/2108.13671

作者：David Bartram 机构：University of Leicester, Leicester LE,RH, United Kingdom, ORCID: ,-,-,- 备注：17 pages, 4 tables, 2 figures; submitted to National Institute Economic Review 摘要：最近对年龄和幸福感研究的一项贡献（Blanchflower 2021）发现，年龄对幸福感的影响几乎在任何地方都是“u型”的：几乎所有国家的幸福感在中年时都会下降，然后上升。本文评估了欧洲国家的这一发现，考虑到它是否对其他方法学方法具有鲁棒性。此处的分析排除了受年龄影响的控制变量（注意这些变量本身不是年龄的前因），并使用了整个成人年龄范围的数据（而不是仅使用70岁以下受访者的数据）。我还通过不采用二次函数形式的模型探讨了这种关系。论文表明，这些替代方法并没有让我们“处处”看到u形：u形在一些国家很明显，但在另一些国家，其模式却大不相同。摘要：A recent contribution to research on age and well-being (Blanchflower 2021) found that the impact of age on happiness is "u-shaped" virtually everywhere: happiness declines towards middle age and subsequently rises, in almost all countries. This paper evaluates that finding for European countries, considering whether it is robust to alternative methodological approaches. The analysis here excludes control variables that are affected by age (noting that those variable are not themselves antecedents of age) and uses data from the entire adult age range (rather than using data only from respondents younger than 70). I also explore the relationship via models that do not impose a quadratic functional form. The paper shows that these alternate approaches do not lead us to perceive a u-shape "everywhere": u-shapes are evident for some countries, but for others the pattern is quite different.

【2】 On the link between monetary and star-shaped risk measures 标题：论货币风险度量与星形风险度量之间的联系链接：https://arxiv.org/abs/2108.13500

作者：Marlon Moresco,Marcelo Brutti Righi 摘要：最近，Castagnoli等人（2021年）将星形风险测度类作为凸风险测度和相干风险测度的推广引入，证明了凸风险测度构成的某些族的点态极小值存在表示。同时，Jia等人（2020年）证明了货币风险度量的类似表示结果，这比星形度量更为普遍。然后，有一个关于这两个类如何连接的问题。在这封信中，我们通过阐明0的可接受性的重要性来提供答案，这与规范化的属性有关。然后，我们证明了在温和的条件下，货币风险度量仅仅是与星形的一种转换。摘要：Recently, Castagnoli et al. (2021) introduce the class of star-shaped risk measures as a generalization of convex and coherent ones, proving that there is a representation as the pointwise minimum of some family composed by convex risk measures. Concomitantly, Jia et al. (2020) prove a similar representation result for monetary risk measures, which are more general than star-shaped ones. Then, there is a question on how both classes are connected. In this letter, we provide an answer by casting light on the importance of the acceptability of 0, which is linked to the property of normalization. We then show that under mild conditions, a monetary risk measure is only a translation away from star-shapedness.

【3】 A Tutorial on Time-Dependent Cohort State-Transition Models in R using a Cost-Effectiveness Analysis Example 标题：使用成本效益分析示例的R中依赖于时间的队列状态转换模型的教程链接：https://arxiv.org/abs/2108.13552

作者：Fernando Alarid-Escudero,Eline M. Krijkamp,Eva A. Enns,Alan Yang,M. G. Myriam Hunink,Petros Pechlivanoglou,Hawre Jalal 机构：Eline Krijkamp, MSc†, M.G. Myriam Hunink, PhD†¶, -,- 备注：41 pages, 12 figures. arXiv admin note: text overlap with arXiv:2001.07824 摘要：本教程展示了如何在R中实现时间相关的队列状态转换模型（CSTM）以进行成本效益分析（CEA），其中转换概率和回报随时间而变化。我们考虑了两种类型的时间依赖性：自模拟开始以来的时间（模拟时间依赖性）和在健康状态下花费的时间（状态驻留依赖性）。我们说明了如何使用先前发布的cSTM（包括概率敏感性分析）基于时间相关的cSTM对多种策略进行CEA。我们还演示了如何从cSTM生成的输出计算各种感兴趣的流行病学结果，例如生存概率和疾病流行率。我们提供了数学符号和R代码来执行计算。本教程以介绍性教程为基础，该教程使用R中的CEA示例介绍了与时间无关的CSTM。我们为更广泛的实现提供了最新的公共代码存储库。摘要：This tutorial shows how to implement time-dependent cohort state-transition models (cSTMs) to conduct cost-effectiveness analyses (CEA) in R, where transition probabilities and rewards vary by time. We account for two types of time dependency: time since the start of the simulation (simulation-time dependency) and time spent in a health state (state residence dependency). We illustrate how to conduct a CEA of multiple strategies based on a time-dependent cSTM using a previously published cSTM, including probabilistic sensitivity analyses. We also demonstrate how to compute various epidemiological outcomes of interest from the outputs generated from the cSTM, such as survival probability and disease prevalence. We present both the mathematical notation and the R code to execute the calculations. This tutorial builds upon an introductory tutorial that introduces time-independent cSTMs using a CEA example in R. We provide an up-to-date public code repository for broader implementation.

【4】 Submission Fees in Risk-Taking Contests 标题：深渊翻滚在冒险比赛中的手续费链接：https://arxiv.org/abs/2108.13506

作者：Mark Whitmeyer 摘要：本文研究了随机连续时间竞赛的一个转折点：设计者要求参赛者在提交参赛作品时支付一定的费用。当设计师希望最大限度地提高顶级执行者的（预期）性能时，严格的正向提交费用是最佳的。当设计师希望最大化总体（预期）性能时，最高提交费用或最低提交费用都是最佳的。摘要：This paper investigates stochastic continuous time contests with a twist: the designer requires that contest participants incur some cost to submit their entries. When the designer wishes to maximize the (expected) performance of the top performer, a strictly positive submission fee is optimal. When the designer wishes to maximize total (expected) performance, either the highest submission fee or the lowest submission fee is optimal.

2.cs.SD语音:

【1】 Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification 标题：基于自监督学习的域自适应鲁棒说话人确认链接：https://arxiv.org/abs/2108.13843

作者：Zhengyang Chen,Shuai Wang,Yanmin Qian 机构：MoE Key Lab of Artificial Intelligence, AI Institute, SpeechLab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 备注：Published in: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 摘要：当应用到一个新的域数据集时，通常会观察到说话人验证系统的性能大幅下降。在一个未标记的目标域数据集上，通常使用非监督域自适应（UDA）方法来弥补域不匹配造成的性能差距，该方法通常利用对抗性训练策略。然而，这种对抗性训练策略仅利用目标域数据的分布信息，不能保证目标域性能的提高。本文将自监督学习策略引入到无监督领域自适应系统中，提出了一种基于自监督学习的领域自适应方法（SSDA）。与传统的UDA方法相比，新的SSDA训练策略能够充分利用目标域的潜在标签信息，同时适应源域的说话人识别能力。我们在Vox Celeb（标记源域）和CnCeleb（未标记目标do main）数据集上评估了所提出的方法，最佳SSDA系统在CnCeleb数据集上获得10.2%的等错误率（EER），而不使用任何说话人标签，这也可以在该语料库上实现最先进的结果。摘要：Large performance degradation is often observed for speaker ver-ification systems when applied to a new domain dataset. Givenan unlabeled target-domain dataset, unsupervised domain adaptation(UDA) methods, which usually leverage adversarial training strate-gies, are commonly used to bridge the performance gap caused bythe domain mismatch. However, such adversarial training strategyonly uses the distribution information of target domain data and cannot ensure the performance improvement on the target domain. Inthis paper, we incorporate self-supervised learning strategy to the un-supervised domain adaptation system and proposed a self-supervisedlearning based domain adaptation approach (SSDA). Compared tothe traditional UDA method, the new SSDA training strategy canfully leverage the potential label information from target domainand adapt the speaker discrimination ability from source domainsimultaneously. We evaluated the proposed approach on the Vox-Celeb (labeled source domain) and CnCeleb (unlabeled target do-main) datasets, and the best SSDA system obtains 10.2% Equal ErrorRate (EER) on the CnCeleb dataset without using any speaker labelson CnCeleb, which also can achieve the state-of-the-art results onthis corpus.

【2】 Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise 标题：添加随机噪声对语音识别系统的敌例破坏与检测链接：https://arxiv.org/abs/2108.13562

作者：Mingyu Dong,Diqun Yan,Yongkang Gong,Rangding Wang 机构：Received: date Accepted: date 备注：20 pages, 5 figures 摘要：基于深度神经网络的自动语音识别（ASR）系统由于神经网络的脆弱性，容易受到敌方攻击，是近年来研究的热点。对抗性的例子会对ASR系统造成损害，特别是如果常见的依赖ASR出现错误，将导致严重后果。为了提高ASR系统的鲁棒性和安全性，必须提出对抗性示例的防御方法。基于这一思想，我们提出了一种攻击当前先进ASR系统的毁伤和敌方实例检测算法。我们选择高级文本相关和命令相关ASR系统作为我们的目标系统。通过选择依赖文本的ASR和基于命令依赖ASR的GA算法生成对抗性示例。该方法的主要思想是对抗性示例的输入转换。将不同的随机强度和种类的噪声添加到对抗性示例中，以破坏先前添加到正常示例中的扰动。从实验结果来看，该方法性能良好。在实例破坏方面，加噪前后原始语音相似度可达99.68%，对抗性实例相似度可达0%，对抗性实例检测率可达94%。摘要：The automatic speech recognition (ASR) system based on deep neural network is easy to be attacked by an adversarial example due to the vulnerability of neural network, which is a hot topic in recent years. The adversarial example does harm to the ASR system, especially if the common-dependent ASR goes wrong, it will lead to serious consequences. To improve the robustness and security of the ASR system, the defense method against adversarial examples must be proposed. Based on this idea, we propose an algorithm of devastation and detection on adversarial examples which can attack the current advanced ASR system. We choose advanced text-dependent and command-dependent ASR system as our target system. Generating adversarial examples by the OPT on text-dependent ASR and the GA-based algorithm on command-dependent ASR. The main idea of our method is input transformation of the adversarial examples. Different random intensities and kinds of noise are added to the adversarial examples to devastate the perturbation previously added to the normal examples. From the experimental results, the method performs well. For the devastation of examples, the original speech similarity before and after adding noise can reach 99.68%, the similarity of the adversarial examples can reach 0%, and the detection rate of the adversarial examples can reach 94%.

【3】 Music Demixing Challenge at ISMIR 2021 标题：ISMIR 2021音乐分身挑战赛链接：https://arxiv.org/abs/2108.13559

作者：Yuki Mitsufuji,Giorgio Fabbro,Stefan Uhlich,Fabian-Robert Stöter 机构： Sony Group Corporation, Japan, Sony Europe B.V., Germany, INRIA, France 摘要：在过去的十年中，音乐源分离得到了深入的研究，随着深度学习的出现，可以观察到巨大的进步。MIREX或SiSEC等评估活动将最先进的模型和相应的论文联系起来，这有助于研究人员将最佳实践整合到他们的模型中。然而，近年来，由于音乐分离社区不得不依赖有限的测试数据，并且偏向于特定的流派和混合风格，因此越来越难以衡量现实世界的表现。为了解决这些问题，我们在一个基于人群的机器学习竞赛平台上设计了音乐分离（MDX）挑战，该平台的任务是将立体声歌曲分为四个乐器干（人声、鼓、低音、其他）。与过去的挑战相比，主要区别在于：1）比赛旨在更容易地让其他学科的机器学习从业者参与；2）在专门针对挑战的音乐专业人士创建的隐藏测试集上进行评估，以确保挑战的透明度，即。，测试集不包括在训练集中。在本文中，我们提供了数据集、基线、评估指标、评估结果以及未来竞争的技术挑战的详细信息。摘要：Music source separation has been intensively studied in the last decade and tremendous progress with the advent of deep learning could be observed. Evaluation campaigns such as MIREX or SiSEC connected state-of-the-art models and corresponding papers, which can help researchers integrate the best practices into their models. In recent years, however, it has become increasingly difficult to measure real-world performance as the music separation community had to rely on a limited amount of test data and was biased towards specific genres and mixing styles. To address these issues, we designed the Music Demixing (MDX) Challenge on a crowd-based machine learning competition platform where the task is to separate stereo songs into four instrument stems (Vocals, Drums, Bass, Other). The main differences compared with the past challenges are 1) the competition is designed to more easily allow machine learning practitioners from other disciplines to participate and 2) evaluation is done on a hidden test set created by music professionals dedicated exclusively to the challenge to assure the transparency of the challenge, i.e., the test set is not included in the training set. In this paper, we provide the details of the datasets, baselines, evaluation metrics, evaluation results, and technical challenges for future competitions.

3.eess.AS音频处理:

【1】 Neural Sequence-to-Sequence Speech Synthesis Using a Hidden Semi-Markov Model Based Structured Attention Mechanism 标题：基于隐半马尔可夫模型的结构化注意机制的神经序列到序列语音合成链接：https://arxiv.org/abs/2108.13985

作者：Yoshihiko Nankaku,Kenta Sumiya,Takenori Yoshimura,Shinji Takaki,Kei Hashimoto,Keiichiro Oura,Keiichi Tokuda 机构：Nagoya Institute of Technology, Japan 备注：5 pages, 3 figures 摘要：本文提出了一种新的序列到序列（Seq2Seq）模型，该模型将隐半马尔可夫模型（HSMMs）的结构整合到其注意机制中。在语音合成中，研究表明，基于深度神经网络的Seq2Seq模型的方法可以在适当的条件下合成高质量的语音。然而，仍然存在一些基本问题，即，由于对齐自由度过大（两个序列之间的映射函数），需要大量训练数据，并且由于缺乏明确的持续时间建模，难以处理持续时间。该方法基于变分自动编码器（VAE）框架定义了一个生成模型，实现了路线和模型参数的同时优化，并基于HSMM结构提供了单调路线和显式工期建模。该方法可以看作是基于隐马尔可夫模型（HMM）的语音合成和使用Seq2Seq模型的基于深度学习的语音合成的集成，综合了两者的优点。主观评价实验表明，在相对较少的训练数据量下，该方法获得了比Tacotron 2更高的平均意见分数。摘要：This paper proposes a novel Sequence-to-Sequence (Seq2Seq) model integrating the structure of Hidden Semi-Markov Models (HSMMs) into its attention mechanism. In speech synthesis, it has been shown that methods based on Seq2Seq models using deep neural networks can synthesize high quality speech under the appropriate conditions. However, several essential problems still have remained, i.e., requiring large amounts of training data due to an excessive degree for freedom in alignment (mapping function between two sequences), and the difficulty in handling duration due to the lack of explicit duration modeling. The proposed method defines a generative models to realize the simultaneous optimization of alignments and model parameters based on the Variational Auto-Encoder (VAE) framework, and provides monotonic alignments and explicit duration modeling based on the structure of HSMM. The proposed method can be regarded as an integration of Hidden Markov Model (HMM) based speech synthesis and deep learning based speech synthesis using Seq2Seq models, incorporating both the benefits. Subjective evaluation experiments showed that the proposed method obtained higher mean opinion scores than Tacotron 2 on relatively small amount of training data.

【2】 Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech 标题：用于二语英语语音端到端发音错误检测和诊断的最高F1分训练链接：https://arxiv.org/abs/2108.13816

作者：Bi-Cheng Yan,Shao-Wei Fan Jiang,Fu-An Chao,Berlin Chen 机构：National Normal Taiwan University, Taiwan 备注：6 pages, 4 figures 摘要：端到端（E2E）神经模型作为一种有前途的发音错误检测与诊断（MDD）建模方法，正日益受到人们的关注。通常，通过优化交叉熵准则来训练这些模型，这对应于提高训练数据的对数似然性。然而，模型训练的目标与MDD评估之间存在差异，因为MDD模型的性能通常根据F1分数而不是单词错误率（WER）进行评估。鉴于此，我们在本文中探讨了使用判别目标函数来训练E2E MDD模型，其目的是直接最大化期望F1分数。为了进一步促进最大F1分数训练，我们在第二语言学习者的训练话语中随机扰动语音混淆对标签的分数，以生成用于数据扩充的人工发音错误模式。在L2-ARCTIC数据集上进行的一系列实验表明，与一些最先进的E2E MDD方法和传统的GOP方法相比，我们提出的方法可以产生相当大的性能改进。摘要：End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation, since the performance of an MDD model is commonly evaluated in terms of F1-score instead of word error rate (WER). In view of this, we in this paper explore the use of a discriminative objective function for training E2E MDD models, which aims to maximize the expected F1-score directly. To further facilitate maximum F1-score training, we randomly perturb fractions of the labels of phonetic confusing pairs in the training utterances of L2 (second language) learners to generate artificial pronunciation error patterns for data augmentation. A series of experiments conducted on the L2-ARCTIC dataset show that our proposed method can yield considerable performance improvements in relation to some state-of-the-art E2E MDD approaches and the conventional GOP method.

【3】 Music Demixing Challenge at ISMIR 2021 标题：ISMIR 2021音乐分身挑战赛链接：https://arxiv.org/abs/2108.13559

作者：Yuki Mitsufuji,Giorgio Fabbro,Stefan Uhlich,Fabian-Robert Stöter 机构： Sony Group Corporation, Japan, Sony Europe B.V., Germany, INRIA, France 摘要：在过去的十年中，音乐源分离得到了深入的研究，随着深度学习的出现，可以观察到巨大的进步。MIREX或SiSEC等评估活动将最先进的模型和相应的论文联系起来，这有助于研究人员将最佳实践整合到他们的模型中。然而，近年来，由于音乐分离社区不得不依赖有限的测试数据，并且偏向于特定的流派和混合风格，因此越来越难以衡量现实世界的表现。为了解决这些问题，我们在一个基于人群的机器学习竞赛平台上设计了音乐分离（MDX）挑战，该平台的任务是将立体声歌曲分为四个乐器干（人声、鼓、低音、其他）。与过去的挑战相比，主要区别在于：1）比赛旨在更容易地让其他学科的机器学习从业者参与；2）在专门针对挑战的音乐专业人士创建的隐藏测试集上进行评估，以确保挑战的透明度，即。，测试集不包括在训练集中。在本文中，我们提供了数据集、基线、评估指标、评估结果以及未来竞争的技术挑战的详细信息。摘要：Music source separation has been intensively studied in the last decade and tremendous progress with the advent of deep learning could be observed. Evaluation campaigns such as MIREX or SiSEC connected state-of-the-art models and corresponding papers, which can help researchers integrate the best practices into their models. In recent years, however, it has become increasingly difficult to measure real-world performance as the music separation community had to rely on a limited amount of test data and was biased towards specific genres and mixing styles. To address these issues, we designed the Music Demixing (MDX) Challenge on a crowd-based machine learning competition platform where the task is to separate stereo songs into four instrument stems (Vocals, Drums, Bass, Other). The main differences compared with the past challenges are 1) the competition is designed to more easily allow machine learning practitioners from other disciplines to participate and 2) evaluation is done on a hidden test set created by music professionals dedicated exclusively to the challenge to assure the transparency of the challenge, i.e., the test set is not included in the training set. In this paper, we provide the details of the datasets, baselines, evaluation metrics, evaluation results, and technical challenges for future competitions.

【4】 Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification 标题：基于自监督学习的域自适应鲁棒说话人确认链接：https://arxiv.org/abs/2108.13843

作者：Zhengyang Chen,Shuai Wang,Yanmin Qian 机构：MoE Key Lab of Artificial Intelligence, AI Institute, SpeechLab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 备注：Published in: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 摘要：当应用到一个新的域数据集时，通常会观察到说话人验证系统的性能大幅下降。在一个未标记的目标域数据集上，通常使用非监督域自适应（UDA）方法来弥补域不匹配造成的性能差距，该方法通常利用对抗性训练策略。然而，这种对抗性训练策略仅利用目标域数据的分布信息，不能保证目标域性能的提高。本文将自监督学习策略引入到无监督领域自适应系统中，提出了一种基于自监督学习的领域自适应方法（SSDA）。与传统的UDA方法相比，新的SSDA训练策略能够充分利用目标域的潜在标签信息，同时适应源域的说话人识别能力。我们在Vox Celeb（标记源域）和CnCeleb（未标记目标do main）数据集上评估了所提出的方法，最佳SSDA系统在CnCeleb数据集上获得10.2%的等错误率（EER），而不使用任何说话人标签，这也可以在该语料库上实现最先进的结果。摘要：Large performance degradation is often observed for speaker ver-ification systems when applied to a new domain dataset. Givenan unlabeled target-domain dataset, unsupervised domain adaptation(UDA) methods, which usually leverage adversarial training strate-gies, are commonly used to bridge the performance gap caused bythe domain mismatch. However, such adversarial training strategyonly uses the distribution information of target domain data and cannot ensure the performance improvement on the target domain. Inthis paper, we incorporate self-supervised learning strategy to the un-supervised domain adaptation system and proposed a self-supervisedlearning based domain adaptation approach (SSDA). Compared tothe traditional UDA method, the new SSDA training strategy canfully leverage the potential label information from target domainand adapt the speaker discrimination ability from source domainsimultaneously. We evaluated the proposed approach on the Vox-Celeb (labeled source domain) and CnCeleb (unlabeled target do-main) datasets, and the best SSDA system obtains 10.2% Equal ErrorRate (EER) on the CnCeleb dataset without using any speaker labelson CnCeleb, which also can achieve the state-of-the-art results onthis corpus.

【5】 Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise 标题：添加随机噪声对语音识别系统的敌例破坏与检测链接：https://arxiv.org/abs/2108.13562

作者：Mingyu Dong,Diqun Yan,Yongkang Gong,Rangding Wang 机构：Received: date Accepted: date 备注：20 pages, 5 figures 摘要：基于深度神经网络的自动语音识别（ASR）系统由于神经网络的脆弱性，容易受到敌方攻击，是近年来研究的热点。对抗性的例子会对ASR系统造成损害，特别是如果常见的依赖ASR出现错误，将导致严重后果。为了提高ASR系统的鲁棒性和安全性，必须提出对抗性示例的防御方法。基于这一思想，我们提出了一种攻击当前先进ASR系统的毁伤和敌方实例检测算法。我们选择高级文本相关和命令相关ASR系统作为我们的目标系统。通过选择依赖文本的ASR和基于命令依赖ASR的GA算法生成对抗性示例。该方法的主要思想是对抗性示例的输入转换。将不同的随机强度和种类的噪声添加到对抗性示例中，以破坏先前添加到正常示例中的扰动。从实验结果来看，该方法性能良好。在实例破坏方面，加噪前后原始语音相似度可达99.68%，对抗性实例相似度可达0%，对抗性实例检测率可达94%。摘要：The automatic speech recognition (ASR) system based on deep neural network is easy to be attacked by an adversarial example due to the vulnerability of neural network, which is a hot topic in recent years. The adversarial example does harm to the ASR system, especially if the common-dependent ASR goes wrong, it will lead to serious consequences. To improve the robustness and security of the ASR system, the defense method against adversarial examples must be proposed. Based on this idea, we propose an algorithm of devastation and detection on adversarial examples which can attack the current advanced ASR system. We choose advanced text-dependent and command-dependent ASR system as our target system. Generating adversarial examples by the OPT on text-dependent ASR and the GA-based algorithm on command-dependent ASR. The main idea of our method is input transformation of the adversarial examples. Different random intensities and kinds of noise are added to the adversarial examples to devastate the perturbation previously added to the normal examples. From the experimental results, the method performs well. For the devastation of examples, the original speech similarity before and after adding noise can reach 99.68%, the similarity of the adversarial examples can reach 0%, and the detection rate of the adversarial examples can reach 94%.

机器翻译，仅供参考

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-09-01，如有侵权请联系 cloudcommunity@tencent.com 删除

linux