金融/语音/音频处理学术速递[9.2]

公众号-arXiv每日学术速递

发布于 2021-09-16 14:56:47

3480

发布于 2021-09-16 14:56:47

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

q-fin金融，共计10篇

cs.SD语音，共计7篇

eess.AS音频处理，共计6篇

1.q-fin金融:

【1】 The Potential of Sufficiency Measures to Achieve a Fully Renewable Energy System -- A case study for Germany 标题：充分措施实现完全可再生能源系统的潜力--以德国为例链接：https://arxiv.org/abs/2109.00453

作者：Elmar Zozmann,Mirjam Helena Eerma,Dylan Manning,Gro Lill Økland,Citlali Rodriguez del Angel,Paul E. Seifert,Johanna Winkler,Alfredo Zamora Blaumann,Leonard Göke,Mario Kendziorski,Christian von Hirschhausen 摘要：该文件提供了能源系统范围内不同部门充足措施对能源供应和系统成本的影响估计。与能源效率不同，我们将充分性定义为在不显著降低效用的情况下减少有用能源的行为变化，例如通过调节恒温器。通过减少需求，充足性措施是支持向脱碳能源系统转型的潜在决定性因素，但考虑较少。因此，本文解决了以下问题：充分性措施的潜力是什么？它们对100%可再生能源系统的供应侧有何影响？为此，进行了广泛的文献回顾，以获得不同充足措施对德国最终能源需求影响的估计。然后，使用自下而上的可再生能源系统规划模型量化这些措施对供应侧和系统成本的影响。结果表明，最终能源可减少20.5%，因此成本可降低11.3%至25.6%。取暖部门确定了采取充分措施的最大潜力。摘要：The paper provides energy system-wide estimates of the effects sufficiency measures in different sectors can have on energy supply and system costs. In distinction to energy efficiency, we define sufficiency as behavioral changes to reduce useful energy without significantly reducing utility, for example by adjusting thermostats. By reducing demand, sufficiency measures are a potentially decisive but seldomly considered factor to support the transformation towards a decarbonized energy system. Therefore, this paper addresses the following question: What is the potential of sufficiency measures and what is their impacts on the supply side of a 100% renewable energy system? For this purpose, an extensive literature review is conducted to obtain estimates for the effects of different sufficiency measures on final energy demand in Germany. Afterwards, the impact of these measures on the supply side and system costs is quantified using a bottom-up planning model of a renewable energy system. Results indicate that final energy could be reduced by up to 20.5% and as a result cost reduction between 11.3% to 25.6% are conceivable. The greatest potential for sufficiency measures was identified in the heating sector.

【2】 Decentralized Payment Clearing using Blockchain and Optimal Bidding 标题：基于区块链和最优投标的分散支付清算链接：https://arxiv.org/abs/2109.00446

作者：Hamed Amini,Maxim Bichuch,Zachary Feinstein 备注：30 pages 摘要：在本文中，我们构建了一个分散的清算机制，该机制内生地自动提供索赔解决程序。该机制可用于通过区块链清除义务网络。特别是，我们调查通过区块链清算的智能合约网络中的违约传染。在这样做的过程中，我们提供了一个构建区块链的算法，以确保支付可以被验证，矿工可以赚取费用。此外，我们还考虑了块具有无限容量的特殊情况，以提供终端网络净值的简单均衡结算条件；证明了该系统的存在唯一性。最后，我们考虑网络中每个公司的最优竞价策略，使得所有公司相对于其终端财富都是效用最大化者。首先寻找混合纳什均衡竞价策略，然后再考虑帕累托最优竞价策略。考虑了这些策略以及更广泛的区块链对系统性风险的影响。摘要：In this paper, we construct a decentralized clearing mechanism which endogenously and automatically provides a claims resolution procedure. This mechanism can be used to clear a network of obligations through blockchain. In particular, we investigate default contagion in a network of smart contracts cleared through blockchain. In so doing, we provide an algorithm which constructs the blockchain so as to guarantee the payments can be verified and the miners earn a fee. We, additionally, consider the special case in which the blocks have unbounded capacity to provide a simple equilibrium clearing condition for the terminal net worths; existence and uniqueness are proven for this system. Finally, we consider the optimal bidding strategies for each firm in the network so that all firms are utility maximizers with respect to their terminal wealths. We first look for a mixed Nash equilibrium bidding strategies, and then also consider Pareto optimal bidding strategies. The implications of these strategies, and more broadly blockchain, on systemic risk are considered.

【3】 Closed-form portfolio optimization under GARCH models 标题：GARCH模型下的闭式投资组合优化链接：https://arxiv.org/abs/2109.00433

作者：Marcos Escobar-Anel,Maximilian Gollart,Rudi Zagst 机构：Department of Statistical and Actuarial Sciences, University of Western Ontario, London, ON, Canada, N,A,B, Department of Mathematics, Technical University of Munich, Munich, Germany 摘要：本文针对方差服从GARCH（1,1）过程的现货资产，给出了第一个封闭形式的最优投资组合配置公式。我们考虑一个投资者具有恒定相对风险厌恶（CRRA）效用谁想要最大化预期效用的终端财富下的赫斯顿和楠迪（2000）GARCH（HN-GARCH）模型。我们得到了最优投资策略、价值函数和最优终端财富的封闭公式。我们发现，最优策略独立于风险资产的发展，并且解收敛于连续时间Heston随机波动率模型的解，尽管是在附加条件下。对于日常交易场景，最优解对参数变化非常稳健，而数值财富等价损失（WEL）分析显示赫斯顿解的性能良好，而默顿解的性能较差。摘要：This paper develops the first closed-form optimal portfolio allocation formula for a spot asset whose variance follows a GARCH(1,1) process. We consider an investor with constant relative risk aversion (CRRA) utility who wants to maximize the expected utility from terminal wealth under a Heston and Nandi (2000) GARCH (HN-GARCH) model. We obtain closed formulas for the optimal investment strategy, the value function and the optimal terminal wealth. We find the optimal strategy is independent of the development of the risky asset, and the solution converges to that of a continuous-time Heston stochastic volatility model, albeit under additional conditions. For a daily trading scenario, the optimal solutions are quite robust to variations in the parameters, while the numerical wealth equivalent loss (WEL) analysis shows good performance of the Heston solution, with a quite inferior performance of the Merton solution.

【4】 Multiple-prior valuation of cash flows subject to capital requirements 标题：受资本要求约束的现金流的多重优先估值链接：https://arxiv.org/abs/2109.00306

作者：Hampus Engsner,Filip Lindskog,Julie Thoegersen 备注：1 figure 摘要：我们研究保险业现行监管框架激励下的负债现金流市场一致性估值。基于多重先验最优停止理论，我们提出了一个具有良好经济性质的估值函数，适用于任何负债现金流。可复制现金流被指定为可复制投资组合的市场价值，而不完全可复制的现金流被指定为可复制投资组合的市场价值和正利润的总和。保证金是考虑将负债现金流从保险公司转移至空置公司实体的直接结果，该空置公司实体的唯一目的是根据重复资本要求管理负债流失，并从所有者的角度考虑该实体的估值，同时考虑模型的不确定性。针对适用性，我们考虑一个详细的保险申请，并解释如何优化问题的概率测度集可以投简单的优化问题的参数集对应于参数化的密度过程中出现的应用程序。摘要：We study market-consistent valuation of liability cash flows motivated by current regulatory frameworks for the insurance industry. Building on the theory on multiple-prior optimal stopping we propose a valuation functional with sound economic properties that applies to any liability cash flow. Whereas a replicable cash flow is assigned the market value of the replicating portfolio, a cash flow that is not fully replicable is assigned a value which is the sum of the market value of a replicating portfolio and a positive margin. The margin is a direct consequence of considering a hypothetical transfer of the liability cash flow from an insurance company to an empty corporate entity set up with the sole purpose to manage the liability run-off, subject to repeated capital requirements, and considering the valuation of this entity from the owner's perspective taking model uncertainty into account. Aiming for applicability, we consider a detailed insurance application and explain how the optimisation problems over sets of probability measures can be cast as simpler optimisation problems over parameter sets corresponding to parameterised density processes appearing in applications.

【5】 Nota Sobre Algumas Interpretacoes da Teoria de Tributacao Otima 标题：Nota Sobre Algumas Interpreacos da Teoria de Trictacao Otima 链接：https://arxiv.org/abs/2109.00297

作者：Jose Ricardo Bezerra Nogueira 备注：in Portuguese 摘要：本说明讨论了巴西税收制度最新著作中提出的最优税收理论解释的一些方面。摘要：This note discusses some aspects of interpretations of the theory of optimal taxation presented in recent works on the Brazilian tax system.

【6】 Multi Anchor Point Shrinkage for the Sample Covariance Matrix (Extended Version) 标题：样本协方差矩阵的多锚点收缩(扩展版) 链接：https://arxiv.org/abs/2109.00148

作者：Hubeyb Gurdogan,Alec Kercheval 备注：60 pages, 6 figures 摘要：面对有限样本量的投资组合经理必须使用因子模型来估计高维回报向量的协方差矩阵。对于最简单的单因素市场模型，成功取决于估计的领先特征向量“β”的质量。当只观察到收益本身时，实践者可以得到与样本协方差矩阵的前导特征向量相等的“PCA”估计。该估计器在各种方面表现不佳。为了在高维、有限样本量渐近制度下解决这一问题，并在估计最小方差投资组合的背景下，Goldberg、Papanicolau和Shkolnik开发了一种收缩方法（“GPS估计器”），该方法通过将β的PCA估计器收缩到一个恒定的目标单位向量来改进β的PCA估计器。在本文中，我们继续他们的工作，以开发一个更通用的收缩目标框架，使从业者能够利用进一步的信息来改进估计量。例子包括股票beta的部门分离，以及来自先前估计的最新信息。我们证明了一些精确的陈述，并通过一些数值实验说明了与GPS估计器相比所得到的改进。摘要：Portfolio managers faced with limited sample sizes must use factor models to estimate the covariance matrix of a high-dimensional returns vector. For the simplest one-factor market model, success rests on the quality of the estimated leading eigenvector "beta". When only the returns themselves are observed, the practitioner has available the "PCA" estimate equal to the leading eigenvector of the sample covariance matrix. This estimator performs poorly in various ways. To address this problem in the high-dimension, limited sample size asymptotic regime and in the context of estimating the minimum variance portfolio, Goldberg, Papanicolau, and Shkolnik developed a shrinkage method (the "GPS estimator") that improves the PCA estimator of beta by shrinking it toward a constant target unit vector. In this paper we continue their work to develop a more general framework of shrinkage targets that allows the practitioner to make use of further information to improve the estimator. Examples include sector separation of stock betas, and recent information from prior estimates. We prove some precise statements and illustrate the resulting improvements over the GPS estimator with some numerical experiments.

【7】 Evaluation of the importance of criteria for the selection of cryptocurrencies 标题：评估选择加密货币标准的重要性链接：https://arxiv.org/abs/2109.00130

作者：Natalia A. Van Heerden,Juan B. Cabral,Nadia Luczywo 机构：Universidad Blas Pascal, C´ordoba, Argentina, Centro Internacional Franco Argentino de Ciencias de la Informaci´on y de, Sistemas, CONICET–UNR, Argentina, Instituto de Astronom´ıa Te´orica y Experimental, CONICET–UNC, Argentina 摘要：近年来，加密货币已经从一个不起眼的利基变成了一个突出的位置，对这些资产的投资越来越受欢迎。然而，加密货币由于其高波动性而具有高风险。在本文中，定义了基于历史加密货币数据的标准，以便在短时间窗口（7天和15天）内以不同方式描述收益和风险；然后，通过各种方法分析标准的重要性，并评估其影响。最后，预计未来计划将使用获得的知识，通过应用多标准方法选择投资组合。摘要：In recent years, cryptocurrencies have gone from an obscure niche to a prominent place, with investment in these assets becoming increasingly popular. However, cryptocurrencies carry a high risk due to their high volatility. In this paper, criteria based on historical cryptocurrency data are defined in order to characterize returns and risks in different ways, in short time windows (7 and 15 days); then, the importance of criteria is analyzed by various methods and their impact is evaluated. Finally, the future plan is projected to use the knowledge obtained for the selection of investment portfolios by applying multi-criteria methods.

【8】 Proceedings of KDD 2020 Workshop on Data-driven Humanitarian Mapping: Harnessing Human-Machine Intelligence for High-Stake Public Policy and Resilience Planning 标题：KDD 2020数据驱动人道主义绘图研讨会论文集：利用人机智能制定高风险的公共政策和复原力规划链接：https://arxiv.org/abs/2109.00435

作者：Snehalkumar,S. Gaikwad,Shankar Iyer,Dalton Lunga,Yu-Ru Lin 备注：The proceedings of the 1st Data-driven Humanitarian Mapping workshop at the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. August 24th, 2020 摘要：人道主义挑战，2019冠状病毒疾病、食物不安全、气候变化、种族和性别暴力、环境危机、COVID-19冠状病毒大流行、人权侵犯和被迫转移，不成比例地影响了世界范围内的弱势群体。根据联合国人道协调厅的数据，20211年将有2.35亿人需要人道主义援助。尽管这些危险越来越大，但数据科学研究仍显不足，无法科学地为改善高危人群生计的公平公共政策决策提供信息。为了应对这些挑战，存在着分散的数据科学工作，但它们仍然与实践隔离，并且容易受到算法方面的损害，涉及缺乏隐私、公平性、可解释性、问责制、透明度和道德。数据驱动方法中的偏见有可能放大影响数百万人生计的高风险决策中的不平等。因此，作为人道主义行动和全球发展核心的决策者、实践者和边缘化社区仍然无法获得数据驱动创新的公开好处。为了填补这一空白，我们提出了数据驱动的人道主义测绘研究计划，该计划的重点是开发新的数据科学方法，利用人机智能制定高风险的公共政策和恢复力规划。摘要：Humanitarian challenges, including natural disasters, food insecurity, climate change, racial and gender violence, environmental crises, the COVID-19 coronavirus pandemic, human rights violations, and forced displacements, disproportionately impact vulnerable communities worldwide. According to UN OCHA, 235 million people will require humanitarian assistance in 20211 . Despite these growing perils, there remains a notable paucity of data science research to scientifically inform equitable public policy decisions for improving the livelihood of at-risk populations. Scattered data science efforts exist to address these challenges, but they remain isolated from practice and prone to algorithmic harms concerning lack of privacy, fairness, interpretability, accountability, transparency, and ethics. Biases in data-driven methods carry the risk of amplifying inequalities in high-stakes policy decisions that impact the livelihood of millions of people. Consequently, proclaimed benefits of data-driven innovations remain inaccessible to policymakers, practitioners, and marginalized communities at the core of humanitarian actions and global development. To help fill this gap, we propose the Data-driven Humanitarian Mapping Research Program, which focuses on developing novel data science methodologies that harness human-machine intelligence for high-stakes public policy and resilience planning.

【9】 Proceedings of KDD 2021 Workshop on Data-driven Humanitarian Mapping: Harnessing Human-Machine Intelligence for High-Stake Public Policy and Resilience Planning 标题：KDD 2021数据驱动的人道主义绘图研讨会论文集：利用人机智能制定高风险的公共政策和复原力规划链接：https://arxiv.org/abs/2109.00100

作者：Snehalkumar,S. Gaikwad,Shankar Iyer,Dalton Lunga,Elizabeth Bondi 备注：The proceedings of the 2nd Data-driven Humanitarian Mapping workshop at the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. August 15th, 2021 摘要：人道主义挑战，2019冠状病毒疾病、食物不安全、气候变化、种族和性别暴力、环境危机、COVID-19冠状病毒大流行、人权侵犯和被迫转移，不成比例地影响了世界范围内的弱势群体。根据联合国人道协调厅的数据，20211年将有2.35亿人需要人道主义援助。尽管这些危险越来越大，但数据科学研究仍显不足，无法科学地为改善高危人群生计的公平公共政策决策提供信息。为了应对这些挑战，存在着分散的数据科学工作，但它们仍然与实践隔离，并且容易受到算法方面的损害，涉及缺乏隐私、公平性、可解释性、问责制、透明度和道德。数据驱动方法中的偏见有可能放大影响数百万人生计的高风险决策中的不平等。因此，作为人道主义行动和全球发展核心的决策者、实践者和边缘化社区仍然无法获得数据驱动创新的公开好处。为了填补这一空白，我们提出了数据驱动的人道主义测绘研究计划，该计划的重点是开发新的数据科学方法，利用人机智能制定高风险的公共政策和恢复力规划。摘要：Humanitarian challenges, including natural disasters, food insecurity, climate change, racial and gender violence, environmental crises, the COVID-19 coronavirus pandemic, human rights violations, and forced displacements, disproportionately impact vulnerable communities worldwide. According to UN OCHA, 235 million people will require humanitarian assistance in 20211 . Despite these growing perils, there remains a notable paucity of data science research to scientifically inform equitable public policy decisions for improving the livelihood of at-risk populations. Scattered data science efforts exist to address these challenges, but they remain isolated from practice and prone to algorithmic harms concerning lack of privacy, fairness, interpretability, accountability, transparency, and ethics. Biases in data-driven methods carry the risk of amplifying inequalities in high-stakes policy decisions that impact the livelihood of millions of people. Consequently, proclaimed benefits of data-driven innovations remain inaccessible to policymakers, practitioners, and marginalized communities at the core of humanitarian actions and global development. To help fill this gap, we propose the Data-driven Humanitarian Mapping Research Program, which focuses on developing novel data science methodologies that harness human-machine intelligence for high-stakes public policy and resilience planning.

【10】 Controlled Measure-Valued Martingales: a Viscosity Solution Approach 标题：受控测度值鞅：一种粘性解方法链接：https://arxiv.org/abs/2109.00064

作者：Alexander M. G. Cox,Sigrid Källblad,Martin Larsson,Sara Svaluto-Ferro 机构：†Department of Mathematics, KTH Royal Institute of Technology 备注：49 pages 摘要：我们考虑一类随机控制问题，其中状态过程是一个概率测度值过程，满足其动态上的附加鞅条件，称为测度值鞅（MVM）。我们为这些问题建立了随机控制的“经典”结果：具体地说，我们证明了问题的值函数可以描述为哈密顿-雅可比-贝尔曼方程在粘性解意义下的唯一解。为了证明这个结果，我们利用了MVM过程的结构特性。我们的结果还包括了一个适用于受控MVM的It“o”引理的适当版本。我们还展示了这类问题是如何在许多应用中出现的，包括独立于模型的衍生品定价、最优Skorokhod嵌入问题以及信息不对称的两人博弈。摘要：We consider a class of stochastic control problems where the state process is a probability measure-valued process satisfying an additional martingale condition on its dynamics, called measure-valued martingales (MVMs). We establish the `classical' results of stochastic control for these problems: specifically, we prove that the value function for the problem can be characterised as the unique solution to the Hamilton-Jacobi-Bellman equation in the sense of viscosity solutions. In order to prove this result, we exploit structural properties of the MVM processes. Our results also include an appropriate version of It\^o's lemma for controlled MVMs. We also show how problems of this type arise in a number of applications, including model-independent derivatives pricing, the optimal Skorokhod embedding problem, and two player games with asymmetric information.

2.cs.SD语音:

【1】 Mean absorption estimation from room impulse responses using virtually supervised learning 标题：基于虚拟监督学习的室内冲激响应平均吸收估计链接：https://arxiv.org/abs/2109.00393

作者：Cédric Foy,Antoine Deleforge,Diego Di Carlo 机构：UMRAE, Cerema, Univ. Gustave Eiffel, Ifsttar, Strasbourg, France, Universit´e de Lorraine, CNRS, Inria, LORIA, F-, Nancy, France, Univ Rennes, Inria, CNRS, IRISA, France 备注：None 摘要：在建筑声学和现有房间声学诊断的背景下，本文介绍并研究了一种仅从房间脉冲响应（RIR）估算平均吸收系数的新方法。该反问题通过虚拟监督学习解决，即RIR到吸收映射通过使用人工神经网络在模拟数据集上的回归隐式学习。我们关注基于易于理解的体系结构的简单模型。广泛讨论和研究用于训练模型的几何、声学和模拟参数的关键选择，同时牢记代表建筑声学领域的条件。将学习的神经模型的估计误差与需要了解房间几何结构和混响时间的经典公式的估计误差进行比较。在各种模拟测试集上进行的广泛比较突出了学习模型可以克服这些公式所依据的扩散声场假设的众所周知的局限性的不同条件。在声学可配置房间中测量的真实RIR上获得的结果表明，在1~kHz及以上时，当混响时间可以可靠估计时，所提出的方法与经典模型的性能相当，即使无法估计，也能继续工作。摘要：In the context of building acoustics and the acoustic diagnosis of an existing room, this paper introduces and investigates a new approach to estimate mean absorption coefficients solely from a room impulse response (RIR). This inverse problem is tackled via virtually-supervised learning, namely, the RIR-to-absorption mapping is implicitly learned by regression on a simulated dataset using artificial neural networks. We focus on simple models based on well-understood architectures. The critical choices of geometric, acoustic and simulation parameters used to train the models are extensively discussed and studied, while keeping in mind conditions that are representative of the field of building acoustics. Estimation errors from the learned neural models are compared to those obtained with classical formulas that require knowledge of the room's geometry and reverberation times. Extensive comparisons made on a variety of simulated test sets highlight different conditions under which the learned models can overcome the well-known limitations of the diffuse sound field hypothesis underlying these formulas. Results obtained on real RIRs measured in an acoustically configurable room show that at 1~kHz and above, the proposed approach performs comparably to classical models when reverberation times can be reliably estimated, and continues to work even when they cannot.

【2】 Benchmarking and challenges in security and privacy for voice biometrics 标题：语音生物识别在安全和隐私方面的基准测试和挑战链接：https://arxiv.org/abs/2109.00281

作者：Jean-Francois Bonastre,Hector Delgado,Nicholas Evans,Tomi Kinnunen,Kong Aik Lee,Xuechen Liu,Andreas Nautsch,Paul-Gauthier Noe,Jose Patino,Md Sahidullah,Brij Mohan Lal Srivastava,Massimiliano Todisco,Natalia Tomashenko,Emmanuel Vincent,Xin Wang,Junichi Yamagishi 机构：ASVspoof and VoicePrivacy organising committees 备注：Submitted to the symposium of the ISCA Security & Privacy in Speech Communications (SPSC) special interest group 摘要：几十年来，语音技术的研究一直致力于提高可靠性。现在，语音技术满足了用户对一系列不同应用的期望，因此语音技术在今天无所不在。因此，对安全和隐私的关注现在已经成为主流。在这方面，研究工作处于相对初级阶段，需要与安全、隐私、法律和道德专家等进行更大的多学科合作。这种合作正在进行中。为了促进这些努力，本文对一些相关研究进行了高层次的概述。它以非言语受众为目标，并描述了基准测试方法，该方法引领了传统研究的进展，目前推动了与语音生物识别相关的最新安全和隐私倡议。我们描述：ASVspoof挑战与欺骗对策的开发有关；VoicePrivacy initiative促进隐私保护匿名化研究。摘要：For many decades, research in speech technologies has focused upon improving reliability. With this now meeting user expectations for a range of diverse applications, speech technology is today omni-present. As result, a focus on security and privacy has now come to the fore. Here, the research effort is in its relative infancy and progress calls for greater, multidisciplinary collaboration with security, privacy, legal and ethical experts among others. Such collaboration is now underway. To help catalyse the efforts, this paper provides a high-level overview of some related research. It targets the non-speech audience and describes the benchmarking methodology that has spearheaded progress in traditional research and which now drives recent security and privacy initiatives related to voice biometrics. We describe: the ASVspoof challenge relating to the development of spoofing countermeasures; the VoicePrivacy initiative which promotes research in anonymisation for privacy preservation.

【3】 Embedding and Beamforming: All-neural Causal Beamformer for Multichannel Speech Enhancement 标题：嵌入和波束形成：用于多通道语音增强的全神经因果波束形成器链接：https://arxiv.org/abs/2109.00265

作者：Andong Li,Wenzhe Liu,Chengshi Zheng,Xiaodong Li 机构：⋆ Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy, † University of Chinese Academy of Sciences, Beijing, China 备注：Submitted to ICASSP 2022, first version 摘要：空间协方差矩阵被认为是波束形成器的重要参数。基于传统波束形成器和深度神经网络的交叉点，我们提出了一种因果神经波束形成器范式，称为嵌入和波束形成，并据此设计了两个核心模块，即EM和BM。对于EM，不是显式地估计空间协方差矩阵，而是使用网络学习三维嵌入张量，其中可以表示光谱和空间鉴别信息。对于BM，直接利用网络推导波束形成权重，从而实现滤波和求和运算。为了进一步提高语音质量，引入后处理模块进一步抑制残余噪声。基于DNS挑战数据集，我们进行了多通道语音增强实验，结果表明，该系统在多个评价指标上比以前的先进基线有很大的优势。摘要：The spatial covariance matrix has been considered to be significant for beamformers. Standing upon the intersection of traditional beamformers and deep neural networks, we propose a causal neural beamformer paradigm called Embedding and Beamforming, and two core modules are designed accordingly, namely EM and BM. For EM, instead of estimating spatial covariance matrix explicitly, the 3-D embedding tensor is learned with the network, where both spectral and spatial discriminative information can be represented. For BM, a network is directly leveraged to derive the beamforming weights so as to implement filter-and-sum operation. To further improve the speech quality, a post-processing module is introduced to further suppress the residual noise. Based on the DNS-Challenge dataset, we conduct the experiments for multichannel speech enhancement and the results show that the proposed system outperforms previous advanced baselines by a large margin in multiple evaluation metrics.

【4】 A Separable Temporal Convolution Neural Network with Attention for Small-Footprint Keyword Spotting 标题：一种带关注度的可分离时域卷积神经网络小足迹关键词检测链接：https://arxiv.org/abs/2109.00260

作者：Shenghua Hu,Jing Wang,Yujun Wang,Lidong Yang,Wenjing Yang 机构：School of Information and Electronics, Beijing Institute of Technology, Beijing, China, Xiaomi Inc., Beijing, China, School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 备注：arXiv admin note: text overlap with arXiv:2108.12146 摘要：移动设备上的关键字定位（KWS）通常需要较小的内存占用。然而，大多数当前模型仍然保留大量参数以确保良好的性能。为了解决这一问题，本文提出了一种具有注意力的可分离时间卷积神经网络，它具有较少的参数。通过时间卷积结合注意机制，在保持高性能的同时实现了少量参数模型（32.2K）。该模型在Google Speech Commands数据集上的准确率达到95.7%，接近KWS目前最先进的模型Res15（239K）的性能。摘要：Keyword spotting (KWS) on mobile devices generally requires a small memory footprint. However, most current models still maintain a large number of parameters in order to ensure good performance. To solve this problem, this paper proposes a separable temporal convolution neural network with attention, it has a small number of parameters. Through the time convolution combined with attention mechanism, a small number of parameters model (32.2K) is implemented while maintaining high performance. The proposed model achieves 95.7% accuracy on the Google Speech Commands dataset, which is close to the performance of Res15(239K), the state-of-the-art model in KWS at present.

【5】 Prior Distribution Design for Music Bleeding-Sound Reduction Based on Nonnegative Matrix Factorization 标题：基于非负矩阵分解的音乐放声降噪先验分布设计链接：https://arxiv.org/abs/2109.00237

作者：Yusaku Mizobuchi,Daichi Kitamura,Tomohiko Nakamura,Hiroshi Saruwatari,Yu Takahashi,Kazunobu Kondo 机构：∗ National Institute of Technology, Kagawa College, Kagawa, Japan, † The University of Tokyo, Tokyo, Japan, ‡ Yamaha Corporation, Shizuoka, Japan 备注：Accepted and will be presented at APSIPA2021 摘要：在录音过程中，当我们将麦克风放在靠近其他声源的声源附近时，获得的音频信号包括来自其他声源的不需要的声音，这通常称为串音或放音。对于许多音频应用，包括现场演出后的舞台扩声和声音编辑，减少每个录制信号中的放音非常重要。然而，由于在此情况下麦克风在空间上彼此分离，因此无法使用典型的相位感知盲源分离（BSS）方法。我们提出了一种相位不敏感的盲放音降噪方法。该方法基于时间通道非负矩阵分解，这是一种仅使用幅度谱图的BSS方法。在所提出的方法中，我们引入了基于伽马分布的出血声泄漏水平先验估计。它的优化可以解释为最大后验估计。实验结果表明，与其他盲源分离方法相比，该方法对音乐信号的放音效果更好。摘要：When we place microphones close to a sound source near other sources in audio recording, the obtained audio signal includes undesired sound from the other sources, which is often called cross-talk or bleeding sound. For many audio applications including onstage sound reinforcement and sound editing after a live performance, it is important to reduce the bleeding sound in each recorded signal. However, since microphones are spatially apart from each other in this situation, typical phase-aware blind source separation (BSS) methods cannot be used. We propose a phase-insensitive method for blind bleeding-sound reduction. This method is based on time-channel nonnegative matrix factorization, which is a BSS method using only amplitude spectrograms. With the proposed method, we introduce the gamma-distribution-based prior for leakage levels of bleeding sounds. Its optimization can be interpreted as maximum a posteriori estimation. The experimental results of music bleeding-sound reduction indicate that the proposed method is more effective for bleeding-sound reduction of music signals compared with other BSS methods.

【6】 CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations 标题：CTAL：用于音频和语言表示的预训练跨模式转换器链接：https://arxiv.org/abs/2109.00181

作者：Hang Li,Yu Kang,Tianqiao Liu,Wenbiao Ding,Zitao Liu 机构：TAL Education Group, Beijing, China 备注：The 2021 Conference on Empirical Methods in Natural Language Processing 摘要：现有的音频语言任务特定预测方法侧重于构建复杂的后期融合机制。然而，这些模型面临着过度拟合的挑战，标签有限，模型泛化能力低。在本文中，我们提出了一种用于音频和语言的跨模态转换器，即CTAL，其目的是通过对大量音频和语言对的两个代理任务：掩蔽语言建模和掩蔽跨模态声学建模来学习音频和语言之间的模态内和模态间连接。在对多个下游音频和语言任务的预训练模型进行微调后，我们观察到各种任务的显著改进，如情绪分类、情绪分析和说话人验证。在此基础上，我们进一步提出了一种专门设计的融合机制，可用于微调阶段，使我们预先训练的模型获得更好的性能。最后，我们展示了详细的消融研究，以证明我们新的跨模态融合组件和音频语言预训练方法对有希望的结果有显著贡献。摘要：Existing audio-language task-specific predictive approaches focus on building complicated late-fusion mechanisms. However, these models are facing challenges of overfitting with limited labels and low model generalization abilities. In this paper, we present a Cross-modal Transformer for Audio-and-Language, i.e., CTAL, which aims to learn the intra-modality and inter-modality connections between audio and language through two proxy tasks on a large amount of audio-and-language pairs: masked language modeling and masked cross-modal acoustic modeling. After fine-tuning our pre-trained model on multiple downstream audio-and-language tasks, we observe significant improvements across various tasks, such as, emotion classification, sentiment analysis, and speaker verification. On this basis, we further propose a specially-designed fusion mechanism that can be used in fine-tuning phase, which allows our pre-trained model to achieve better performance. Lastly, we demonstrate detailed ablation studies to prove that both our novel cross-modality fusion component and audio-language pre-training methods significantly contribute to the promising results.

【7】 Automatic non-invasive Cough Detection based on Accelerometer and Audio Signals 标题：基于加速度计和音频信号的无创咳嗽自动检测链接：https://arxiv.org/abs/2109.00103

作者：Madhurananda Pahar,Igor Miranda,Andreas Diacon,Thomas Niesler 机构：Department of Electrical and Electronic Engineering, Stellenbosch University, South Africa 备注：arXiv admin note: text overlap with arXiv:2102.04997 摘要：我们提出了一种基于加速度计和音频信号的自动无创检测咳嗽事件的方法。加速度信号由牢牢固定在患者床上的智能手机通过其集成的加速计捕获。同一部智能手机使用外部麦克风同时捕获音频信号。我们已经编译了一个手动注释的数据集，其中包含同时捕获的加速度和音频信号，用于结核病诊所14名成年男性患者的大约6000例咳嗽和68000例非咳嗽事件。LR、SVM和MLP作为基线分类器进行评估，并使用留一交叉验证方案与深层结构（如CNN、LSTM和Resnet50）进行比较。我们发现，所研究的分类器可以使用加速度或音频信号来区分咳嗽和其他活动，包括打喷嚏、清嗓子和在床上移动，具有很高的准确性。然而，在所有情况下，深度神经网络的性能明显优于浅层分类器，而Resnet50的性能最好，加速度和音频信号的AUC分别超过0.98和0.99。虽然基于音频的分类始终比基于加速的分类提供更好的性能，但我们观察到，对于最好的系统来说，差异非常小。由于加速信号需要更少的处理能力，由于录音的需要被回避，因此隐私被固有地保护，并且由于录音设备被连接到床上，并且没有佩戴，基于加速度计的高精度无创咳嗽检测器可能是长期咳嗽监测中更方便、更容易接受的方法。摘要：We present an automatic non-invasive way of detecting cough events based on both accelerometer and audio signals. The acceleration signals are captured by a smartphone firmly attached to the patient's bed, using its integrated accelerometer. The audio signals are captured simultaneously by the same smartphone using an external microphone. We have compiled a manually-annotated dataset containing such simultaneously-captured acceleration and audio signals for approximately 6000 cough and 68000 non-cough events from 14 adult male patients in a tuberculosis clinic. LR, SVM and MLP are evaluated as baseline classifiers and compared with deep architectures such as CNN, LSTM, and Resnet50 using a leave-one-out cross-validation scheme. We find that the studied classifiers can use either acceleration or audio signals to distinguish between coughing and other activities including sneezing, throat-clearing, and movement on the bed with high accuracy. However, in all cases, the deep neural networks outperform the shallow classifiers by a clear margin and the Resnet50 offers the best performance by achieving an AUC exceeding 0.98 and 0.99 for acceleration and audio signals respectively. While audio-based classification consistently offers a better performance than acceleration-based classification, we observe that the difference is very small for the best systems. Since the acceleration signal requires less processing power, and since the need to record audio is sidestepped and thus privacy is inherently secured, and since the recording device is attached to the bed and not worn, an accelerometer-based highly accurate non-invasive cough detector may represent a more convenient and readily accepted method in long-term cough monitoring.

3.eess.AS音频处理: