首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >金融/语音/音频处理学术速递[6.29]

金融/语音/音频处理学术速递[6.29]

作者头像
公众号-arXiv每日学术速递
发布2021-07-02 17:29:20
发布2021-07-02 17:29:20
5780
举报

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

q-fin金融,共计9篇

cs.SD语音,共计5篇

eess.AS音频处理,共计6篇

1.q-fin金融:

【1】 On Stochastic PDEs for the pricing of derivatives in a multi-dimensional diffusion framework 标题:多维扩散框架下衍生品定价的随机偏微分方程

作者:Kaustav Das,Ivan Guo,Grégoire Loeper 机构:†School of Mathematics, Monash University 链接:https://arxiv.org/abs/2106.14870 摘要:在多维扩散框架下,金融衍生产品的价格可以表示为一个迭代的条件期望,其中内部条件期望对进入现货动态的辅助过程的未来进行约束。受非线性滤波理论结果的启发,我们证明了这种内在条件期望解出了一个后向的SPDE(即所谓的“条件Feynman-Kac公式”),从而建立了SPDE与衍生品定价理论之间的联系。这种代表的好处是潜在的重大和理论和实际利益。特别是,这种表示导致了另一类所谓的混合蒙特卡罗/偏微分方程数值方法。 摘要:In a multi-dimensional diffusion framework, the price of a financial derivative can be expressed as an iterated conditional expectation, where the inner conditional expectation conditions on the future of an auxiliary process that enters into the dynamics for the spot. Inspired by results from non-linear filtering theory, we show that this inner conditional expectation solves a backward SPDE (a so-called `conditional Feynman-Kac formula'), thereby establishing a connection between SPDE and derivative pricing theory. The benefits of this representation are potentially significant and of both theoretical and practical interest. In particular, this representation leads to an alternative class of so-called mixed Monte-Carlo / PDE numerical methods.

【2】 Risk contributions of lambda quantiles 标题:λ分位数的风险贡献

作者:Akif Ince,Ilaria Peri,Silvana Pesenti 机构:Department of Economics, University of London 链接:https://arxiv.org/abs/2106.14824 摘要:投资组合的风险贡献构成了风险调整绩效衡量中不可或缺的一部分。投资组合的风险贡献,例如在Euler或Aumann-Shapley框架中,由应用于资产权重方向的投资组合回报的风险度量的偏导数给出。然而,对于不是1级正同质的风险度量,已知的资本配置原则不适用。我们研究了一类lambda分位数风险测度,它包括已知的风险值作为特例,但没有已知的分配规则。我们证明了lambda分位数的可微性,并导出了lambda分位数的导数相对于其投资组合构成的显式公式,即风险贡献。为此,我们定义了投资组合组合空间上的lambda分位数,并考虑了一般的(也是非线性的)投资组合算子。进一步推导了一般投资组合lambda分位数的Euler分解,证明了lambda分位数在投资组合空间中是齐次的,其齐次程度取决于投资组合和lambda函数。这一结果与定义在随机变量空间上的风险度量的正齐性形成了鲜明的对比。我们介绍了欧拉贡献和欧拉分配规则的一个推广版本,它兼容于任何同质度和非线性投资组合的风险度量。我们进一步提供了lambda分位数同质程度的金融解释,并引入了投资组合算子的事件特定同质性的概念。 摘要:Risk contributions of portfolios form an indispensable part of risk adjusted performance measurement. The risk contribution of a portfolio, e.g., in the Euler or Aumann-Shapley framework, is given by the partial derivatives of a risk measure applied to the portfolio return in direction of the asset weights. For risk measures that are not positively homogeneous of degree 1, however, known capital allocation principles do not apply. We study the class of lambda quantile risk measures, that includes the well-known Value-at-Risk as a special case, but for which no known allocation rule is applicable. We prove differentiability and derive explicit formulae of the derivatives of lambda quantiles with respect to their portfolio composition, that is their risk contribution. For this purpose, we define lambda quantiles on the space of portfolio compositions and consider generic (also non-linear) portfolio operators. We further derive the Euler decomposition of lambda quantiles for generic portfolios and show that lambda quantiles are homogeneous in the space of portfolio compositions, with a homogeneity degree that depends on the portfolio composition and the lambda function. This result is in stark contrast to the positive homogeneity properties of risk measures defined on the space of random variables which admit a constant homogeneity degree. We introduce a generalised version of Euler contributions and Euler allocation rule, which are compatible with risk measures of any homogeneity degree and non-linear portfolios. We further provide financial interpretations of the homogeneity degree of lambda quantiles and introduce the notion of event-specific homogeneity of portfolio operators.

【3】 UNISWAP: Impermanent Loss and Risk Profile of a Liquidity Provider 标题:UNISWAP:流动性提供者的非永久性损失和风险概况

作者:Andreas A. Aigner,Gurvinder Dhaliwal 机构: TradeFlags, Vienna, Austria, +Fuel Ventures, London, United Kingdom, on the Ethereum mainnet [,] and is part of, an Ecosystem of products in Decentralized Finance (DeFi). It replaces a traditional order book type of trading common on centralized 备注:16 pages, 8 Figures, 1 Table 链接:https://arxiv.org/abs/2106.14404 摘要:Uniswap是一个去中心化交易所(DEX),于2018年11月2日首次在以太坊主网上推出[1],是去中心化金融(DeFi)产品生态系统的一部分。它用一种确定性模型取代了集中交易所(CEX)上常见的传统订单簿交易类型,该模型根据流动性提供者提供的货币数量确定的固定价格函数交换货币(或代币/资产)。流动性提供者可被视为分散交易所的投资者,并在每笔交易中赚取固定佣金。它们将资金锁定在不同货币对的流动性池中,允许市场参与者使用固定价格功能进行互换。流动性提供者作为流动性提供者承担市场风险,以换取每笔交易的佣金。在这里,我们分析了流动性提供者的风险状况,特别是所谓的非永久(未实现)损失。我们提供了半无限域上Uniswap v2的一个改进的非永久损失函数。讨论了Uniswap v2和v3的区别。 摘要:Uniswap is a decentralized exchange (DEX) and was first launched on November 2, 2018 on the Ethereum mainnet [1] and is part of an Ecosystem of products in Decentralized Finance (DeFi). It replaces a traditional order book type of trading common on centralized exchanges (CEX) with a deterministic model that swaps currencies (or tokens/assets) along a fixed price function determined by the amount of currencies supplied by the liquidity providers. Liquidity providers can be regarded as investors in the decentralized exchange and earn fixed commissions per trade. They lock up funds in liquidity pools for distinct pairs of currencies allowing market participants to swap them using the fixed price function. Liquidity providers take on market risk as a liquidity provider in exchange for earning commissions on each trade. Here we analyze the risk profile of a liquidity provider and the so called impermanent (unrealized) loss in particular. We provide an improved version of the commonly denoted impermanent loss function for Uniswap v2 on the semi-infinite domain. The differences between Uniswap v2 and v3 are also discussed.

【4】 Bitcoin, Currencies, and Bubbles 标题:比特币、货币和泡沫

作者:Nassim Nicholas Taleb 机构:†Universa Investments, ‡Tandon School of Engineering, New York University, Forthcoming, Quantitative Finance 备注:Accepted in Quantitative Finance 链接:https://arxiv.org/abs/2106.14204 摘要:我们将定量金融方法和经济论据应用于加密货币,尤其是比特币——由于加密货币约为10000美元,我们将重点(除非另有说明)放在那些声称遵守原始协议(Nakamoto,2009)的加密货币中讨论最多的加密货币和到目前为止,最大市值。在目前的版本中,尽管有人大肆宣传,但比特币未能满足“没有政府的货币”的概念(事实证明,比特币甚至根本不是一种货币),既不能作为短期或长期的价值储存(其预期价值不高于0美元),也不能作为可靠的通胀对冲工具,最糟糕的是,比特币不构成,哪怕是一点一滴,一个人投资的避风港,政府暴政的盾牌,灾难性事件的尾巴保护工具。此外,支付机制(作为一种分散化的交换方式)的成功(到目前为止已经失败)与具有巨大负外部性的零和资产价格的投机变化之间似乎存在着潜在的冲突。纵观货币历史,我们还展示了一个真正的基准货币必须是任意一篮子商品和服务的最小方差之一,黄金和白银如何在1970年代后期亨特兄弟(Hunt brothers)的挤压过程中失去其通胀对冲地位,以及真正的通胀对冲价值储备需要什么。 摘要:We apply quantitative finance methods and economic arguments to cryptocurrencies in general and bitcoin in particular -- as there are about $10,000$ cryptocurrencies, we focus (unless otherwise specified) on the most discussed crypto of those that claim to hew to the original protocol (Nakamoto, 2009) and the one with, by far, the largest market capitalization. In its current version, in spite of the hype, bitcoin failed to satisfy the notion of "currency without government" (it proved to not even be a currency at all), can be neither a short nor long term store of value (its expected value is no higher than $0$), cannot operate as a reliable inflation hedge, and, worst of all, does not constitute, not even remotely, a safe haven for one's investments, a shield against government tyranny, nor a tail protection vehicle for catastrophic episodes. Furthermore, there appears to be an underlying conflation between the success of a payment mechanism (as a decentralized mode of exchange), which so far has failed, and the speculative variations in the price of a zero-sum asset with massive negative externalities. Going through monetary history, we also show how a true numeraire must be one of minimum variance with respect to an arbitrary basket of goods and services, how gold and silver lost their inflation hedge status during the Hunt brothers squeeze in the late 1970s and what would be required from a true inflation hedged store of value.

【5】 Hierarchical contagions in the interdependent financial network 标题:相互依赖的金融网络中的分层传染

作者:William A. Barnett,Xue Wang,Hai-Chuan Xu,Wei-Xing Zhou 机构:Department of Economics, University of Kansas, Lawrence, USA, Center for Financial Stability, New York, USA, Institute of Chinese Financial Studies, Southwestern University of Finance and Economics, Chengdu, China 备注:13 pages, 3 figures 链接:https://arxiv.org/abs/2106.14168 摘要:我们通过一个相互依赖的网络建立了银行间故障的层级级联模型。银行间的相互作用不仅包括直接交叉持有,还包括通过在银行体系外持有共同资产而产生的间接依赖。利用从欧洲银行管理局提取的数据,我们提出了由48家银行和21个资产类别组成的相互依赖网络。由于银行间风险敞口不是公开的,我们首先使用合计债权重构资产/负债交叉持有网络。对于鲁棒性,我们采用了三种重建方法,分别是$\textit{Anan}$、$\textit{Ha\l{a}$和$\textit{Maxe}$。然后,我们结合各银行的外部投资组合持有量来计算相互依赖矩阵。相互依赖网络比直接交叉持股网络密度大得多,显示出银行间复杂的潜在相互作用。最后,以EBA压力测试中的不利情景作为初始冲击,对欧洲银行体系进行宏观审慎压力测试。对于不同的重构网络,我们说明了层次级联,并表明除了少数银行外,故障层次大致相同,反映了重叠的投资组合持有占大多数违约。了解相互依存的网络和层级结构有助于改善政策干预和实施救援战略。 摘要:We model hierarchical cascades of failures among banks linked through an interdependent network. The interaction among banks include not only direct cross-holding, but also indirect dependency by holding mutual assets outside the banking system. Using data extracted from the European Banking Authority, we present the interdependency network composed of 48 banks and 21 asset classes. Since interbank exposures are not public, we first reconstruct the asset/liability cross-holding network using the aggregated claims. For the robustness, we employ three reconstruction methods, called $\textit{Anan}$, $\textit{Ha\l{}a}$ and $\textit{Maxe}$. Then we combine the external portfolio holdings of each bank to compute the interdependency matrix. The interdependency network is much denser than the direct cross-holding network, showing the complex latent interaction among banks. Finally, we perform macroprudential stress tests for the European banking system, using the adverse scenario in EBA stress test as the initial shock. For different reconstructed networks, we illustrate the hierarchical cascades and show that the failure hierarchies are roughly the same except for a few banks, reflecting the overlapping portfolio holding accounts for the majority of defaults. Understanding the interdependency network and the hierarchy of the cascades should help to improve policy intervention and implement rescue strategy.

【6】 Optimal investment and proportional reinsurance in a regime-switching market model under forward preferences 标题:具有远期偏好的制度转换市场模型中的最优投资和比例再保险

作者:Katia Colaneri,Alessandra Cretarola,Benedetta Salterini 机构: Department of Mathematics and Computer Science, University of Pe-rugia 备注:32 pages, 6 figures, 1 table 链接:https://arxiv.org/abs/2106.13888 摘要:本文研究了一个保险公司的最优投资和再保险问题,该保险公司的投资偏好是在一个制度转换市场模型中用前向动态指数效用来描述的。财务和精算框架是相互依赖的,因为股票价格和保险索赔根据一个连续时间有限状态马尔可夫链给出的公共因子而变化。构造了价值函数,证明了它是一个前向动态效用函数。然后,我们描述了投资策略和再保险的最优比例水平。我们还进行了数值实验,并对一些模型参数进行了敏感性分析。 摘要:In this paper we study the optimal investment and reinsurance problem of an insurance company whose investment preferences are described via a forward dynamic exponential utility in a regime-switching market model. Financial and actuarial frameworks are dependent since stock prices and insurance claims vary according to a common factor given by a continuous time finite state Markov chain. We construct the value function and we prove that it is a forward dynamic utility. Then, we characterize the investment strategy and the optimal proportional level of reinsurance. We also perform numerical experiments and provide sensitivity analyses with respect to some model parameters.

【7】 Rational Pricing of Leveraged ETF Expense Ratios 标题:杠杆ETF费用率的合理定价

作者:Alex Garivaltis 备注:40 pages, 13 figures 链接:https://arxiv.org/abs/2106.14820 摘要:本文研究杠杆ETF的杠杆比率与其相应的费用比率,即为提供杠杆金融服务而收取的投资管理费之间的一般关系。投资者不可能将两个或多个LETF组合在一起,使其(持续重新平衡的)LETF投资组合能够与给定的、专业管理的产品的杠杆比率相匹配,同时享有比现有LETF更低的加权平均费用。给出了市场上存在的一组有限的自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售自售。在线性规划对偶定理的一个很好的应用中,我证明了一种LETF的双基金定理:给定投资者的目标资产负债率,实现它的最便宜的方法是(唯一地)组合(唯一地)两个最近的未确定的LETF产品,将其支撑在杠杆轴上。这也恰好是年营业额最低的实现。为了使作者高兴,我们在凸几何中的Carath'eodory定理的基础上,给出了LETFs主定理的第二个证明。因此,比方说,一个三杠杆(“UltraPro”)交易所交易产品不应该与现金混合,如果投资者能够在基础指数交易。就金融创新而言,我们的双基金定理表明,引入新的、未确定的2.5倍产品将增加所有优先杠杆比率在2倍(“Ultra”)和3倍(“UltraPro”)之间的投资者的福利。对于1.5倍的产品也是如此。 摘要:This paper studies the general relationship between the gearing ratio of a Leveraged ETF and its corresponding expense ratio, viz., the investment management fees that are charged for the provision of this levered financial service. It must not be possible for an investor to combine two or more LETFs in such a way that his (continuously-rebalanced) LETF portfolio can match the gearing ratio of a given, professionally managed product and, at the same time, enjoy lower weighted-average expenses than the existing LETF. Given a finite set of LETFs that exist in the marketplace, I give necessary and sufficient conditions for these products to be undominated in the price-gearing plane. In a beautiful application of the duality theorem of linear programming, I prove a kind of two-fund theorem for LETFs: given a target gearing ratio for the investor, the cheapest way to achieve it is to combine (uniquely) the two nearest undominated LETF products that bracket it on the leverage axis. This also happens to be the implementation that has the lowest annual turnover. For the writer's enjoyment, we supply a second proof of the Main Theorem on LETFs that is based on Carath\'eodory's theorem in convex geometry. Thus, say, a triple-leveraged ("UltraPro") exchange-traded product should never be mixed with cash, if the investor is able to trade in the underlying index. In terms of financial innovation, our two-fund theorem for LETFs implies that the introduction of new, undominated 2.5x products would increase the welfare of all investors whose preferred gearing ratios lie between 2x ("Ultra") and 3x ("UltraPro"). Similarly for a 1.5x product.

【8】 Inheritances, social classes, and wealth distribution 标题:遗产、社会阶层和财富分配

作者:Pedro Patrício,Nuno A. M. Araújo 机构:Universidade de Lisboa 备注:7 pages, 7 figures 链接:https://arxiv.org/abs/2106.14758 摘要:我们考虑一个简单的理论模型来研究继承对财富分配的影响。财富被描述为一种有限的资源,在不同的世代中保持不变,并在后代中平均分配。所有其他的财富来源都被忽视了。我们认为不同的社会具有不同的后代概率分布特征。我们发现,如果人口保持不变,社会就会达到稳定的财富分配。我们证明,每当每个家庭的孩子数量不总是相同时,不平等就会出现。对于来自发达国家的实际后代分布,该模型预测基尼系数为$G\约0.3$。如果我们将社会划分为财富阶级,并将结婚的概率设定为阶级之间的距离,那么随着财富阶级数量和阶级区分程度的增加,平稳的财富分布将从指数型过渡到幂律型。 摘要:We consider a simple theoretical model to investigate the impact of inheritances on the wealth distribution. Wealth is described as a finite resource, which remains constant over different generations and is divided equally among offspring. All other sources of wealth are neglected. We consider different societies characterized by a different offspring probability distribution. We find that, if the population remains constant, the society reaches a stationary wealth distribution. We show that inequality emerges every time the number of children per family is not always the same. For realistic offspring distributions from developed countries, the model predicts a Gini coefficient of $G\approx 0.3$. If we divide the society into wealth classes and set the probability of getting married to depend on the distance between classes, the stationary wealth distribution crosses over from an exponential to a power-law regime as the number of wealth classes and the level of class distinction increase.

【9】 On the Design of an Insurance Mechanism for Reliability Differentiation in Electricity Markets 标题:论电力市场可靠性差异化保险机制的设计

作者:Farhad Billimoria,Filiberto Fele,Iacopo Savelli,Thomas Morstyn,Malcolm McCulloch 机构:Department of Engineering Science, University of Oxford, smithschool 备注:11 pages, 8 figures 链接:https://arxiv.org/abs/2106.14351 摘要:确保充足的可调度资源的供应对于在发电量变化较大的情况下保持电力系统的可靠性至关重要。传统的资源充足机制不适合利用分布式资源和控制技术的进步所带来的负载的日益增长的灵活性和异构性。为了应对这些挑战,本文利用保险风险管理框架,为电力行业开发了一种资源充足机制,该机制适应未来具有可变发电量和灵活需求的情况。拟议的设计引入了一个中央保险计划,该计划具有审慎的要求,将不同的消费者可靠性偏好与最终保险公司的财务目标结合起来。我们在以下几个方面说明了该方案的好处:(i)根据使用情况区分负荷,以便在极度匮乏时更好地管理系统;(ii)鼓励对发电基础设施进行增量投资,以符合用户的可靠性偏好;(iii)提高用户的整体可靠性结果。 摘要:Securing an adequate supply of dispatchable resources is critical for keeping a power system reliable under high penetrations of variable generation. Traditional resource adequacy mechanisms are poorly suited to exploiting the growing flexibility and heterogeneity of load enabled by advancements in distributed resource and control technology. To address these challenges this paper develops a resource adequacy mechanism for the electricity sector utilising insurance risk management frameworks that is adapted to a future with variable generation and flexible demand. The proposed design introduces a central insurance scheme with prudential requirements that align diverse consumer reliability preferences with the financial objectives of an insurer-of-last-resort. We illustrate the benefits of the scheme in (i) differentiating load by usage to enable better management of the system during times of extreme scarcity, (ii) incentivising incremental investment in generation infrastructure that is aligned with consumer reliability preferences and (iii) improving overall reliability outcomes for consumers.

2.cs.SD语音:

【1】 Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits 标题:时域稀疏重叠语音训练:目标语音分离和个人VAD收益的联合学习

作者:Qingjian Lin,Lin Yang,Xuyang Wang,Luyuan Xie,Chen Jia,Junjie Wang 机构:AI Lab, Lenovo Research, Beijing, China 备注:Rejected by Interspeech 2021. Plan to commit to ICASSP 2022 链接:https://arxiv.org/abs/2106.14371 摘要:目标语音分离是根据提供的附加说话人身份信息,从混合语音中过滤出特定说话人的语音的过程。最近的工作通过直接在时域处理信号已经取得了相当大的进步。其中大部分采用完全重叠的混合语音进行训练。然而,由于现实生活中的大多数会话都是随机发生的,并且很少重叠,因此我们认为,使用不同重叠率的数据进行训练是有益的。要做到这一点,一个不可避免的问题是,普遍使用的SI-SNR损失没有定义沉默的来源。本文提出了加权信噪比损失,并结合目标语音分离和个人VAD的联合学习。加权的SI-SNR损失施加了一个与目标说话人的持续时间成比例的权重因子,当目标说话人不在时,该权重因子返回零。同时,个人VAD生成面具并将非目标语音设置为静默。实验表明,该方法在完全重叠语音上的SDR比基线提高了1.73db,在干净噪声条件下的稀疏重叠语音上的SDR比基线提高了4.17db和0.9db。此外,在性能略有下降的情况下,我们的模型可以减少推理的时间开销。 摘要:Target speech separation is the process of filtering a certain speaker's voice out of speech mixtures according to the additional speaker identity information provided. Recent works have made considerable improvement by processing signals in the time domain directly. The majority of them take fully overlapped speech mixtures for training. However, since most real-life conversations occur randomly and are sparsely overlapped, we argue that training with different overlap ratio data benefits. To do so, an unavoidable problem is that the popularly used SI-SNR loss has no definition for silent sources. This paper proposes the weighted SI-SNR loss, together with the joint learning of target speech separation and personal VAD. The weighted SI-SNR loss imposes a weight factor that is proportional to the target speaker's duration and returns zero when the target speaker is absent. Meanwhile, the personal VAD generates masks and sets non-target speech to silence. Experiments show that our proposed method outperforms the baseline by 1.73 dB in terms of SDR on fully overlapped speech, as well as by 4.17 dB and 0.9 dB on sparsely overlapped speech of clean and noisy conditions. Besides, with slight degradation in performance, our model could reduce the time costs in inference.

【2】 Query-graph with Cross-gating Attention Model for Text-to-Audio Grounding 标题:基于交叉门控注意力模型的查询图在文本-音频转换中的应用

作者:Haoyu Tang,Jihua Zhu,Qinghai Zheng,Zhiyong Cheng 机构:cn)Qinghai Zheng is with the School of Software Engineering, Xi’an JiaotongUniversity 备注:10 pages 链接:https://arxiv.org/abs/2106.14136 摘要:本文讨论了文本到音频的接地问题,即在未修剪的音频中接地自然语言查询所描述的声音事件片段。这是一项新提出的但富有挑战性的音频语言任务,因为它不仅需要精确地定位音频中所需片段的所有开关集,而且需要进行全面的声学和语言学理解,并解释音频和查询之间的多模态交互。为了解决这些问题,现有的方法通过一个全局的查询表示将查询整体地作为一个单一的单元来处理,它不能突出包含丰富语义的关键字。此外,这种方法没有充分利用查询和音频之间的交互。此外,由于音频和查询的长度是任意的和可变的,在这种方法中许多无意义的部分没有被过滤掉,这妨碍了所需片段的接地。为此,我们提出了一种新的交叉门注意查询图(QGCA)模型,该模型通过一个新的查询图来建立查询词之间的综合关系。此外,为了捕获音频和查询之间的细粒度交互,引入了一个跨模态注意模块,为关键字赋予更高的权重,以生成特定于片段的查询表示。最后,我们还设计了一个交叉选通模块,在音频和查询中突出关键部分,弱化无关部分。我们在公共音频接地数据集上对所提出的QGCA模型进行了广泛的评估,并对几种最新的方法进行了显著的改进。此外,进一步的烧蚀研究表明,所提出的QGCA模型中不同模块的有效性是一致的。 摘要:In this paper, we address the text-to-audio grounding issue, namely, grounding the segments of the sound event described by a natural language query in the untrimmed audio. This is a newly proposed but challenging audio-language task, since it requires to not only precisely localize all the on- and off-sets of the desired segments in the audio, but to perform comprehensive acoustic and linguistic understandings and reason the multimodal interactions between the audio and query. To tackle those problems, the existing method treats the query holistically as a single unit by a global query representation, which fails to highlight the keywords that contain rich semantics. Besides, this method has not fully exploited interactions between the query and audio. Moreover, since the audio and queries are arbitrary and variable in length, many meaningless parts of them are not filtered out in this method, which hinders the grounding of the desired segments. To this end, we propose a novel Query Graph with Cross-gating Attention (QGCA) model, which models the comprehensive relations between the words in query through a novel query graph. Besides, to capture the fine-grained interactions between audio and query, a cross-modal attention module that assigns higher weights to the keywords is introduced to generate the snippet-specific query representations. Finally, we also design a cross-gating module to emphasize the crucial parts as well as weaken the irrelevant ones in the audio and query. We extensively evaluate the proposed QGCA model on the public Audiogrounding dataset with significant improvements over several state-of-the-art methods. Moreover, further ablation study shows the consistent effectiveness of different modules in the proposed QGCA model.

【3】 Transflower: probabilistic autoregressive dance generation with multimodal attention 标题:TransFlow:具有多模态注意的概率自回归舞蹈生成

作者:Guillermo Valle-Pérez,Gustav Eje Henter,Jonas Beskow,André Holzapfel,Pierre-Yves Oudeyer,Simon Alexanderson 机构: KTH Royal Institute of Technology 链接:https://arxiv.org/abs/2106.13871 摘要:舞蹈需要复杂动作的巧妙组合,这些动作遵循音乐的节奏、音调和音色特征。从形式上讲,生成以音乐为条件的舞蹈可以表示为以音频信号为条件的高维连续运动信号的建模问题。在这项工作中,我们为解决这个问题作出了两项贡献。首先,我们提出了一种新的概率自回归结构,该结构使用多模态变换器编码器,通过基于先前姿势和音乐背景的规范化流对未来姿势的分布进行建模。其次,我们介绍目前最大的三维舞蹈动作数据集,通过各种动作捕捉技术获得,包括专业和休闲舞者。利用这个数据集,我们通过客观指标和用户研究,将我们的新模型与两个基线进行比较,结果表明,建立概率分布模型的能力,以及能够在大运动和音乐环境中参与,都是产生与音乐相匹配的有趣、多样和真实的舞蹈所必需的。 摘要:Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder. Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers. Using this dataset, we compare our new model against two baselines, via objective metrics and a user study, and show that both the ability to model a probability distribution, as well as being able to attend over a large motion and music context are necessary to produce interesting, diverse, and realistic dance that matches the music.

【4】 Use of Variational Inference in Music Emotion Recognition 标题:变分推理在音乐情感识别中的应用

作者:Nathalie Deziderio,Hugo Tremonte de Carvalho 机构:Brasil, Rio de Janeiro, de mar¸co de , arXiv:,.,v, [stat.ML] , Jun 链接:https://arxiv.org/abs/2106.14323 摘要:这项工作旨在将统计技术应用于音乐情感识别领域,这是信号处理界公认的一个领域,但从统计角度进行的探索却很少。在这里,我们打开了该领域内的几种可能性,应用现代贝叶斯统计技术和开发有效的算法,重点是获得的结果的适用性。虽然这个项目的动机是开发一个基于情感的音乐推荐系统,但它的主要贡献是一个适应性很强的多元模型,可以用来解释任何有兴趣以有效的方式应用正则化的数据库。广义地说,我们将探讨一个健全的理论统计分析在一个能够理解一个著名数据库的算法建模中能起到什么作用,以及用这种方法能得到什么。 摘要:This work was developed aiming to employ Statistical techniques to the field of Music Emotion Recognition, a well-recognized area within the Signal Processing world, but hardly explored from the statistical point of view. Here, we opened several possibilities within the field, applying modern Bayesian Statistics techniques and developing efficient algorithms, focusing on the applicability of the results obtained. Although the motivation for this project was the development of a emotion-based music recommendation system, its main contribution is a highly adaptable multivariate model that can be useful interpreting any database where there is an interest in applying regularization in an efficient manner. Broadly speaking, we will explore what role a sound theoretical statistical analysis can play in the modeling of an algorithm that is able to understand a well-known database and what can be gained with this kind of approach.

【5】 An Audio Envelope Generator Derived from Industrial Process Control 标题:一种源于工业过程控制的音频包络发生器

作者:Ashwin Pillay 机构:University of Mumbai, Mumbai, India 链接:https://arxiv.org/abs/2106.13966 摘要:音频封套在确保合成器产生音色的多功能性方面起着至关重要的作用。为此,攻击、衰变、释放和维持(ADSR)包络发生器及其衍生物已被确立为现代音乐的支柱。然而,探索替代技术来制作信封,不仅可以类似于ADSR,还可以用来创造新颖的音色,这可能是有价值的。因此,本研究尝试藉由重新定义用于回馈式过程控制之比例积分微分(PID)演算法,来建立一个新的包络产生器架构。此外,文中还详细分析了信封的运作模式和信封的性质,以确定信封是音乐独特风格的潜在先兆。 摘要:Audio envelopes serve a crucial role in ensuring the versatility of synthesizers in producing timbres. To this end, the Attack, Decay, Release and Sustain (ADSR) envelope generator and its derivatives have been established as a mainstay in modern music. However, there may be merit in exploring alternate techniques to produce envelopes that could not only resemble ADSR but also be used to create novel timbres. Consequently, an attempt is made in this research to formulate the framework of a new envelope generator by redefining the Proportional-Integral-Derivative (PID) algorithm used in feedback-based process control. Additionally, a detailed analysis is made on the modes of operation and the nature of envelopes thus generated to establish it as a potential harbinger of distinctive styles of music.

3.eess.AS音频处理:

【1】 Mobile Microphone Array Speech Detection and Localization in Diverse Everyday Environments 标题:不同日常环境下的移动麦克风阵列语音检测与定位

作者:Pasi Pertilä,Emre Cakir,Aapo Hakala,Eemi Fagerlund,Tuomas Virtanen,Archontis Politis,Antti Eronen 机构:Tampere University, Finland, Nokia Technologies Oy 备注:to be published in the proceedings of the 29th European Signal Processing Conference, EUSIPCO 2021 链接:https://arxiv.org/abs/2106.14787 摘要:联合声音事件定位和检测(SELD)是将上下文感知发展为移动机器人、智能手机和家庭助理的通信接口的一个组成部分。例如,在移动电话上用于视频捕获的自动音频聚焦要求对设备周围的相关声学事件及其方向进行健壮的检测。现有的SELD方法已经用在受控的室内环境中产生的材料进行了评估,或者通过将孤立的声音混合到不同的空间位置来模拟音频。本文研究了在不同的日常环境中语音的SELD,其中音频对应于手持移动设备的典型使用场景。为了对定位与检测的相对重要性进行加权,我们将提出一个两阶段的分层系统,第一阶段是检测目标事件,第二阶段是定位目标事件。该方法利用卷积递归神经网络(CRNN),并在不同声学条件下的人工注释麦克风阵列记录数据库上进行了评估。该阵列是嵌入在当代手机的形式因素。实验结果表明,与非层次平面分类模型相比,该方法具有良好的语音检测和定位精度。 摘要:Joint sound event localization and detection (SELD) is an integral part of developing context awareness into communication interfaces of mobile robots, smartphones, and home assistants. For example, an automatic audio focus for video capture on a mobile phone requires robust detection of relevant acoustic events around the device and their direction. Existing SELD approaches have been evaluated using material produced in controlled indoor environments, or the audio is simulated by mixing isolated sounds to different spatial locations. This paper studies SELD of speech in diverse everyday environments, where the audio corresponds to typical usage scenarios of handheld mobile devices. In order to allow weighting the relative importance of localization vs. detection, we will propose a two-stage hierarchical system, where the first stage is to detect the target events, and the second stage is to localize them. The proposed method utilizes convolutional recurrent neural network (CRNN) and is evaluated on a database of manually annotated microphone array recordings from various acoustic conditions. The array is embedded in a contemporary mobile phone form factor. The obtained results show good speech detection and localization accuracy of the proposed method in contrast to a non-hierarchical flat classification model.

【2】 An Audio Envelope Generator Derived from Industrial Process Control 标题:一种源于工业过程控制的音频包络发生器

作者:Ashwin Pillay 机构:University of Mumbai, Mumbai, India 链接:https://arxiv.org/abs/2106.13966 摘要:音频封套在确保合成器产生音色的多功能性方面起着至关重要的作用。为此,攻击、衰变、释放和维持(ADSR)包络发生器及其衍生物已被确立为现代音乐的支柱。然而,探索替代技术来制作信封,不仅可以类似于ADSR,还可以用来创造新颖的音色,这可能是有价值的。因此,本研究尝试藉由重新定义用于回馈式过程控制之比例积分微分(PID)演算法,来建立一个新的包络产生器架构。此外,文中还详细分析了信封的运作模式和信封的性质,以确定信封是音乐独特风格的潜在先兆。 摘要:Audio envelopes serve a crucial role in ensuring the versatility of synthesizers in producing timbres. To this end, the Attack, Decay, Release and Sustain (ADSR) envelope generator and its derivatives have been established as a mainstay in modern music. However, there may be merit in exploring alternate techniques to produce envelopes that could not only resemble ADSR but also be used to create novel timbres. Consequently, an attempt is made in this research to formulate the framework of a new envelope generator by redefining the Proportional-Integral-Derivative (PID) algorithm used in feedback-based process control. Additionally, a detailed analysis is made on the modes of operation and the nature of envelopes thus generated to establish it as a potential harbinger of distinctive styles of music.

【3】 Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits 标题:时域稀疏重叠语音训练:目标语音分离和个人VAD收益的联合学习

作者:Qingjian Lin,Lin Yang,Xuyang Wang,Luyuan Xie,Chen Jia,Junjie Wang 机构:AI Lab, Lenovo Research, Beijing, China 备注:Rejected by Interspeech 2021. Plan to commit to ICASSP 2022 链接:https://arxiv.org/abs/2106.14371 摘要:目标语音分离是根据提供的附加说话人身份信息,从混合语音中过滤出特定说话人的语音的过程。最近的工作通过直接在时域处理信号已经取得了相当大的进步。其中大部分采用完全重叠的混合语音进行训练。然而,由于现实生活中的大多数会话都是随机发生的,并且很少重叠,因此我们认为,使用不同重叠率的数据进行训练是有益的。要做到这一点,一个不可避免的问题是,普遍使用的SI-SNR损失没有定义沉默的来源。本文提出了加权信噪比损失,并结合目标语音分离和个人VAD的联合学习。加权的SI-SNR损失施加了一个与目标说话人的持续时间成比例的权重因子,当目标说话人不在时,该权重因子返回零。同时,个人VAD生成面具并将非目标语音设置为静默。实验表明,该方法在完全重叠语音上的SDR比基线提高了1.73db,在干净噪声条件下的稀疏重叠语音上的SDR比基线提高了4.17db和0.9db。此外,在性能略有下降的情况下,我们的模型可以减少推理的时间开销。 摘要:Target speech separation is the process of filtering a certain speaker's voice out of speech mixtures according to the additional speaker identity information provided. Recent works have made considerable improvement by processing signals in the time domain directly. The majority of them take fully overlapped speech mixtures for training. However, since most real-life conversations occur randomly and are sparsely overlapped, we argue that training with different overlap ratio data benefits. To do so, an unavoidable problem is that the popularly used SI-SNR loss has no definition for silent sources. This paper proposes the weighted SI-SNR loss, together with the joint learning of target speech separation and personal VAD. The weighted SI-SNR loss imposes a weight factor that is proportional to the target speaker's duration and returns zero when the target speaker is absent. Meanwhile, the personal VAD generates masks and sets non-target speech to silence. Experiments show that our proposed method outperforms the baseline by 1.73 dB in terms of SDR on fully overlapped speech, as well as by 4.17 dB and 0.9 dB on sparsely overlapped speech of clean and noisy conditions. Besides, with slight degradation in performance, our model could reduce the time costs in inference.

【4】 Use of Variational Inference in Music Emotion Recognition 标题:变分推理在音乐情感识别中的应用

作者:Nathalie Deziderio,Hugo Tremonte de Carvalho 机构:Brasil, Rio de Janeiro, de mar¸co de , arXiv:,.,v, [stat.ML] , Jun 链接:https://arxiv.org/abs/2106.14323 摘要:这项工作旨在将统计技术应用于音乐情感识别领域,这是信号处理界公认的一个领域,但从统计角度进行的探索却很少。在这里,我们打开了该领域内的几种可能性,应用现代贝叶斯统计技术和开发有效的算法,重点是获得的结果的适用性。虽然这个项目的动机是开发一个基于情感的音乐推荐系统,但它的主要贡献是一个适应性很强的多元模型,可以用来解释任何有兴趣以有效的方式应用正则化的数据库。广义地说,我们将探讨一个健全的理论统计分析在一个能够理解一个著名数据库的算法建模中能起到什么作用,以及用这种方法能得到什么。 摘要:This work was developed aiming to employ Statistical techniques to the field of Music Emotion Recognition, a well-recognized area within the Signal Processing world, but hardly explored from the statistical point of view. Here, we opened several possibilities within the field, applying modern Bayesian Statistics techniques and developing efficient algorithms, focusing on the applicability of the results obtained. Although the motivation for this project was the development of a emotion-based music recommendation system, its main contribution is a highly adaptable multivariate model that can be useful interpreting any database where there is an interest in applying regularization in an efficient manner. Broadly speaking, we will explore what role a sound theoretical statistical analysis can play in the modeling of an algorithm that is able to understand a well-known database and what can be gained with this kind of approach.

【5】 Query-graph with Cross-gating Attention Model for Text-to-Audio Grounding 标题:基于交叉门控注意力模型的查询图在文本-音频转换中的应用

作者:Haoyu Tang,Jihua Zhu,Qinghai Zheng,Zhiyong Cheng 机构:cn)Qinghai Zheng is with the School of Software Engineering, Xi’an JiaotongUniversity 备注:10 pages 链接:https://arxiv.org/abs/2106.14136 摘要:本文讨论了文本到音频的接地问题,即在未修剪的音频中接地自然语言查询所描述的声音事件片段。这是一项新提出的但富有挑战性的音频语言任务,因为它不仅需要精确地定位音频中所需片段的所有开关集,而且需要进行全面的声学和语言学理解,并解释音频和查询之间的多模态交互。为了解决这些问题,现有的方法通过一个全局的查询表示将查询整体地作为一个单一的单元来处理,它不能突出包含丰富语义的关键字。此外,这种方法没有充分利用查询和音频之间的交互。此外,由于音频和查询的长度是任意的和可变的,在这种方法中许多无意义的部分没有被过滤掉,这妨碍了所需片段的接地。为此,我们提出了一种新的交叉门注意查询图(QGCA)模型,该模型通过一个新的查询图来建立查询词之间的综合关系。此外,为了捕获音频和查询之间的细粒度交互,引入了一个跨模态注意模块,为关键字赋予更高的权重,以生成特定于片段的查询表示。最后,我们还设计了一个交叉选通模块,在音频和查询中突出关键部分,弱化无关部分。我们在公共音频接地数据集上对所提出的QGCA模型进行了广泛的评估,并对几种最新的方法进行了显著的改进。此外,进一步的烧蚀研究表明,所提出的QGCA模型中不同模块的有效性是一致的。 摘要:In this paper, we address the text-to-audio grounding issue, namely, grounding the segments of the sound event described by a natural language query in the untrimmed audio. This is a newly proposed but challenging audio-language task, since it requires to not only precisely localize all the on- and off-sets of the desired segments in the audio, but to perform comprehensive acoustic and linguistic understandings and reason the multimodal interactions between the audio and query. To tackle those problems, the existing method treats the query holistically as a single unit by a global query representation, which fails to highlight the keywords that contain rich semantics. Besides, this method has not fully exploited interactions between the query and audio. Moreover, since the audio and queries are arbitrary and variable in length, many meaningless parts of them are not filtered out in this method, which hinders the grounding of the desired segments. To this end, we propose a novel Query Graph with Cross-gating Attention (QGCA) model, which models the comprehensive relations between the words in query through a novel query graph. Besides, to capture the fine-grained interactions between audio and query, a cross-modal attention module that assigns higher weights to the keywords is introduced to generate the snippet-specific query representations. Finally, we also design a cross-gating module to emphasize the crucial parts as well as weaken the irrelevant ones in the audio and query. We extensively evaluate the proposed QGCA model on the public Audiogrounding dataset with significant improvements over several state-of-the-art methods. Moreover, further ablation study shows the consistent effectiveness of different modules in the proposed QGCA model.

【6】 Transflower: probabilistic autoregressive dance generation with multimodal attention 标题:TransFlow:具有多模态注意的概率自回归舞蹈生成

作者:Guillermo Valle-Pérez,Gustav Eje Henter,Jonas Beskow,André Holzapfel,Pierre-Yves Oudeyer,Simon Alexanderson 机构: KTH Royal Institute of Technology 链接:https://arxiv.org/abs/2106.13871 摘要:舞蹈需要复杂动作的巧妙组合,这些动作遵循音乐的节奏、音调和音色特征。从形式上讲,生成以音乐为条件的舞蹈可以表示为以音频信号为条件的高维连续运动信号的建模问题。在这项工作中,我们为解决这个问题作出了两项贡献。首先,我们提出了一种新的概率自回归结构,该结构使用多模态变换器编码器,通过基于先前姿势和音乐背景的规范化流对未来姿势的分布进行建模。其次,我们介绍目前最大的三维舞蹈动作数据集,通过各种动作捕捉技术获得,包括专业和休闲舞者。利用这个数据集,我们通过客观指标和用户研究,将我们的新模型与两个基线进行比较,结果表明,建立概率分布模型的能力,以及能够在大运动和音乐环境中参与,都是产生与音乐相匹配的有趣、多样和真实的舞蹈所必需的。 摘要:Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder. Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers. Using this dataset, we compare our new model against two baselines, via objective metrics and a user study, and show that both the ability to model a probability distribution, as well as being able to attend over a large motion and music context are necessary to produce interesting, diverse, and realistic dance that matches the music.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-06-29,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档