统计学学术速递[7.7]

公众号-arXiv每日学术速递

发布于 2021-07-27 10:26:35

5750

发布于 2021-07-27 10:26:35

stat统计学，共计38篇

【1】 A provable two-stage algorithm for penalized hazards regression 标题：惩罚风险回归的一个可证明的两阶段算法

作者：Jianqing Fan,Wenyan Gong,Qiang Sun 机构：†Department of Operations Research and Financial Engineering, Princeton University 备注：42 pages 链接：https://arxiv.org/abs/2107.02730 摘要：从优化器的角度来看，用经典的最坏情况分析法来实现一般非凸问题的全局最优解通常是NP困难的。在Cox比例风险模型的情况下，通过考虑其统计模型结构，我们确定了全局最优解附近的局部强凸性，并由此提出了用两个凸规划来优化折凹惩罚Cox比例风险回归。理论上，我们研究了该算法在统计和计算上的折衷，并建立了估计量的强预言性。数值研究和实际数据分析进一步支持了我们的算法和理论。摘要：From an optimizer's perspective, achieving the global optimum for a general nonconvex problem is often provably NP-hard using the classical worst-case analysis. In the case of Cox's proportional hazards model, by taking its statistical model structures into account, we identify local strong convexity near the global optimum, motivated by which we propose to use two convex programs to optimize the folded-concave penalized Cox's proportional hazards regression. Theoretically, we investigate the statistical and computational tradeoffs of the proposed algorithm and establish the strong oracle property of the resulting estimators. Numerical studies and real data analysis lend further support to our algorithm and theory.

【2】 Distributed Adaptive Huber Regression 标题：分布式自适应Huber回归

作者：Jiyu Luo,Qiang Sun,Wenxin Zhou 机构：†Department of Statistical Sciences, University of Toronto 备注：29 pages 链接：https://arxiv.org/abs/2107.02726 摘要：分布式数据自然出现在涉及多个观测源的场景中，每个观测源存储在不同的位置。由于带宽和存储空间有限，或者由于隐私协议，通常禁止将所有数据直接汇集在一起。本文介绍了一种新的鲁棒分布式算法，用于在数据具有重尾和/或非对称误差的情况下，用有限的二阶矩拟合线性回归。该算法在每次迭代时只传递梯度信息，因此通信效率高。统计上，所得的估计器达到了集中的非交感误差界，就好像所有的数据汇集在一起，并且来自一个具有次高斯尾的分布。在有限$（2+\delta）$阶矩条件下，给出了分布估计量的Berry-Esseen界，并在此基础上构造了鲁棒置信区间。数值研究进一步证实，与现有的分布式方法相比，本文提出的方法具有较低的变异性和较好的覆盖率，且置信宽度较窄。摘要：Distributed data naturally arise in scenarios involving multiple sources of observations, each stored at a different location. Directly pooling all the data together is often prohibited due to limited bandwidth and storage, or due to privacy protocols. This paper introduces a new robust distributed algorithm for fitting linear regressions when data are subject to heavy-tailed and/or asymmetric errors with finite second moments. The algorithm only communicates gradient information at each iteration and therefore is communication-efficient. Statistically, the resulting estimator achieves the centralized nonasymptotic error bound as if all the data were pooled together and came from a distribution with sub-Gaussian tails. Under a finite $(2+\delta)$-th moment condition, we derive a Berry-Esseen bound for the distributed estimator, based on which we construct robust confidence intervals. Numerical studies further confirm that compared with extant distributed methods, the proposed methods achieve near-optimal accuracy with low variability and better coverage with tighter confidence width.

【3】 Fast, universal estimation of latent variable models using extended variational approximations 标题：基于扩展变分近似的潜变量模型的快速普适估计

作者：Pekka Korhonen,Francis K. C. Hui,Jenni Niku,Sara Taskinen 机构：Department of Mathematics and Statistics, University of Jyväskylä, Finland, Francis K.C. Hui, Research School of Finance, Actuarial Studies & Statistics, Australian National University, Australia 链接：https://arxiv.org/abs/2107.02627 摘要：广义线性潜变量模型（GLLVMs）是一类分析多响应数据的方法，近年来在生态学多变量丰度数据分析中得到了广泛的应用。GLLVMs的主要特点之一是能够处理各种类型的响应，例如（过度分散）计数、二项式响应、（半）连续和比例数据。另一方面，潜在变量的引入带来了一些重大的计算挑战，因为由此产生的边际似然函数涉及非正态分布响应的一个棘手的积分。这促使人们研究近似方法来克服这种积分，最近的一种方法是变分近似法（VA）。然而，由于只得到了某些响应分布和连接函数对的闭式近似，使得对GLLVMs及其相关模型的VA应用的研究受到了阻碍。在这篇文章中，我们提出了一个扩展的变分近似（EVA）方法，它大大拓宽了适用于VA的gllvm的范围。EVA从Laplace近似的基本思想中得到启发：通过将完全数据似然函数替换为关于变分分布平均值的二阶Taylor近似，我们可以得到对任何响应类型和连接函数的GLLVM的边际似然的闭式近似。通过模拟研究和在生态学中测试变形虫数据集的应用，我们展示了EVA如何产生一种通用的方法来拟合GLLVMs，相对于标准VA和拉普拉斯近似方法，它在估计和推断性能方面仍然具有竞争力，同时在计算上比两者都更具可扩展性。摘要：Generalized linear latent variable models (GLLVMs) are a class of methods for analyzing multi-response data which has garnered considerable popularity in recent years, for example, in the analysis of multivariate abundance data in ecology. One of the main features of GLLVMs is their capacity to handle a variety of responses types, such as (overdispersed) counts, binomial responses, (semi-)continuous, and proportions data. On the other hand, the introduction of underlying latent variables presents some major computational challenges, as the resulting marginal likelihood function involves an intractable integral for non-normally distributed responses. This has spurred research into approximation methods to overcome this integral, with a recent and particularly computationally scalable one being that of variational approximations (VA). However, research into the use of VA of GLLVMs and related models has been hampered by the fact that closed-form approximations have only been obtained for certain pairs of response distributions and link functions. In this article, we propose an extended variational approximations (EVA) approach which widens the set of VA-applicable GLLVMs drastically. EVA draws inspiration from the underlying idea of Laplace approximations: by replacing the complete-data likelihood function with its second order Taylor approximation about the mean of the variational distribution, we can obtain a closed-form approximation to the marginal likelihood of the GLLVM for any response type and link function. Through simulation studies and an application to testate amoebae data set in ecology, we demonstrate how EVA results in a universal approach to fitting GLLVMs, which remains competitive in terms of estimation and inferential performance relative to both standard VA and a Laplace approximation approach, while being computationally more scalable than both in practice.

【4】 Inference for Low-Rank Models 标题：关于低秩模型的推理

作者：Victor Chernozhukov,Christian Hansen,Yuan Liao,Yinchu Zhu 链接：https://arxiv.org/abs/2107.02602 摘要：研究了以高维矩阵为参数的线性模型的推理问题。我们关注高维矩阵参数被一个``尖峰低秩矩阵'很好地逼近的情况，该矩阵的秩相对于其维数增长缓慢，并且其非零奇异值发散到无穷大。我们证明了该框架涵盖了一类广泛的潜在变量模型，这些模型可以容纳矩阵完备问题、因子模型、变系数模型、缺失数据的主成分分析和异质处理效应。为了进行推论，我们提出了一种新的“旋转减损”方法，用核范数惩罚法对产品参数进行初步估计。我们提出了一般的高层次的结果，在此基础上我们的程序提供渐近正态估计。然后，我们提出了低水平的条件，在此条件下，我们在一个治疗效果的例子中验证了高水平的条件。摘要：This paper studies inference in linear models whose parameter of interest is a high-dimensional matrix. We focus on the case where the high-dimensional matrix parameter is well-approximated by a ``spiked low-rank matrix'' whose rank grows slowly compared to its dimensions and whose nonzero singular values diverge to infinity. We show that this framework covers a broad class of models of latent-variables which can accommodate matrix completion problems, factor models, varying coefficient models, principal components analysis with missing data, and heterogeneous treatment effects. For inference, we propose a new ``rotation-debiasing" method for product parameters initially estimated using nuclear norm penalization. We present general high-level results under which our procedure provides asymptotically normal estimators. We then present low-level conditions under which we verify the high-level conditions in a treatment effects example.

【5】 A nonBayesian view of Hempel's paradox of the ravens 标题：亨佩尔“乌鸦悖论”的非贝叶斯观点

作者：Yudi Pawitan 机构：Department of Medical Epidemiology and Biostatistics, Karolinska Institutet 备注：10 pages, 0 figures 链接：https://arxiv.org/abs/2107.02522 摘要：在亨佩尔的《乌鸦悖论》中，看到一支红铅笔被认为是所有乌鸦都是黑色的佐证。也被称为确认悖论，悖论及其许多解决办法表明，我们不能低估在评估支持假设的证据时所需的逻辑和统计要素。以往对这一悖论的分析大多是在贝叶斯框架内进行的。这些分析和亨佩尔本人普遍接受了这个矛盾的结论；这让人觉得自相矛盾，因为证据的数量非常少。在这里，我描述了各种统计模型的非贝叶斯分析和基于可能性的推理。分析表明，这一悖论似乎是自相矛盾的，因为在一些自然模型中，观察红铅笔与乌鸦的颜色没有关联。一般来说，证据的价值在很大程度上取决于抽样方案和有关模型基本参数的假设。摘要：In Hempel's paradox of the ravens, seeing a red pencil is considered as supporting evidence that all ravens are black. Also known as the Paradox of Confirmation, the paradox and its many resolutions indicate that we cannot underestimate the logical and statistical elements needed in the assessment of evidence in support of a hypothesis. Most of the previous analyses of the paradox are within the Bayesian framework. These analyses and Hempel himself generally accept the paradoxical conclusion; it feels paradoxical supposedly because the amount of evidence is extremely small. Here I describe a nonBayesian analysis of various statistical models with an accompanying likelihood-based reasoning. The analysis shows that the paradox seems paradoxical because there are natural models where observing a red pencil has no relevance to the color of ravens. In general the value of the evidence depends crucially on the sampling scheme and on the assumption about the underlying parameters of the relevant model.

【6】 T-LoHo: A Bayesian Regularization Model for Structured Sparsity and Smoothness on Graphs 标题：T-LOHO：图的结构稀疏性和光滑性的贝叶斯正则化模型

作者：Changwoo J. Lee,Zhao Tang Luo,Huiyan Sang 机构：Department of Statistics, Texas A&M University 链接：https://arxiv.org/abs/2107.02510 摘要：许多现代复杂数据可以用图形表示。在处理图结构数据的模型中，多元参数不仅具有稀疏性，而且具有结构稀疏性和光滑性，即零参数和非零参数都趋向于聚集在一起。我们提出了一种新的基于树的低秩马蹄形（T-LoHo）模型，该模型将流行的单变量贝叶斯马蹄形收缩推广到多变量设置之前，以同时检测结构稀疏性和光滑性。先验知识可以嵌入到许多层次化的高维模型中。为了说明它的实用性，我们应用它来正则化一个贝叶斯高维回归问题，其中回归系数链接在一个图上。所得到的簇具有灵活的形状，并且满足关于图的簇邻接约束。我们设计了一个有效的马尔可夫链蒙特卡罗算法，该算法提供了完整的贝叶斯推理，模型参数包括聚类数的不确定性度量。我们提供了聚类效应和后验浓度结果的理论研究。最后，通过仿真研究和实际数据应用，如道路网络中的异常检测，说明了该模型的性能。结果表明，与稀疏融合套索等其他竞争方法相比，该方法有较大的改进。摘要：Many modern complex data can be represented as a graph. In models dealing with graph-structured data, multivariate parameters are not just sparse but have structured sparsity and smoothness in the sense that both zero and non-zero parameters tend to cluster together. We propose a new prior for high dimensional parameters with graphical relations, referred to as a Tree-based Low-rank Horseshoe(T-LoHo) model, that generalizes the popular univariate Bayesian horseshoe shrinkage prior to the multivariate setting to detect structured sparsity and smoothness simultaneously. The prior can be embedded in many hierarchical high dimensional models. To illustrate its utility, we apply it to regularize a Bayesian high-dimensional regression problem where the regression coefficients are linked on a graph. The resulting clusters have flexible shapes and satisfy the cluster contiguity constraint with respect to the graph. We design an efficient Markov chain Monte Carlo algorithm that delivers full Bayesian inference with uncertainty measures for model parameters including the number of clusters. We offer theoretical investigations of the clustering effects and posterior concentration results. Finally, we illustrate the performance of the model with simulation studies and real data applications such as anomaly detection in road networks. The results indicate substantial improvements over other competing methods such as sparse fused lasso.

【7】 InfoNCE is a variational autoencoder 标题：InfoNCE是一个变分自动编码器

作者：Laurence Aitchison 机构：Department of Computer Science, University of Bristol, Bristol, UK 链接：https://arxiv.org/abs/2107.02495 摘要：我们证明了一种流行的自监督学习方法InfoNCE是一类新的无监督学习方法SSVAE的特例。ssvae通过使用一个精心选择的隐式解码器来绕过通常的VAE需求来重构数据。InfoNCE的目标是作为一个简化的参数互信息估计。在一个先验选择下，SSVAE目标（即ELBO）完全等于互信息（直到常数）。在另一种先验选择下，SSVAE目标与InfoNCE中使用的简化参数互信息估计完全相等（直到常数）。重要的是，使用简化的参数互信息估计器被认为是获得良好的高层表示的关键，SSVAE框架自然为使用先验信息来选择这些估计器提供了一个原则性的理由。摘要：We show that a popular self-supervised learning method, InfoNCE, is a special case of a new family of unsupervised learning methods, the self-supervised variational autoencoder (SSVAE). SSVAEs circumvent the usual VAE requirement to reconstruct the data by using a carefully chosen implicit decoder. The InfoNCE objective was motivated as a simplified parametric mutual information estimator. Under one choice of prior, the SSVAE objective (i.e. the ELBO) is exactly equal to the mutual information (up to constants). Under an alternative choice of prior, the SSVAE objective is exactly equal to the simplified parametric mutual information estimator used in InfoNCE (up to constants). Importantly, the use of simplified parametric mutual information estimators is believed to be critical to obtain good high-level representations, and the SSVAE framework naturally provides a principled justification for using prior information to choose these estimators.

【8】 Midwifery Learning and Forecasting: Predicting Content Demand with User-Generated Logs 标题：助产学习和预测：使用用户生成的日志预测内容需求

作者：Anna Guitart,Ana Fernández del Río,África Periáñez 机构：benshi.ai, Barcelona, Spain, Lauren Bellhouse, Maternity Foundation, Copenhagen, Denmark 链接：https://arxiv.org/abs/2107.02480 摘要：每天有800名妇女和6700名新生儿死于与怀孕或分娩有关的并发症。训练有素的助产士可以防止大多数产妇和新生儿死亡。数据科学模型以及助产士在线学习应用程序用户生成的日志有助于提高他们的学习能力。我们的目标是利用这些丰富的行为数据推动数字化学习向个性化内容发展，并提供一个适应性的学习之旅。在这项工作中，我们评估了各种预测方法，以确定未来用户对应用程序中可用的不同类型内容的兴趣，并按专业和地区进行了细分。摘要：Every day, 800 women and 6,700 newborns die from complications related to pregnancy or childbirth. A well-trained midwife can prevent most of these maternal and newborn deaths. Data science models together with logs generated by users of online learning applications for midwives can help to improve their learning competencies. The goal is to use these rich behavioral data to push digital learning towards personalized content and to provide an adaptive learning journey. In this work, we evaluate various forecasting methods to determine the interest of future users on the different kind of contents available in the app, broken down by profession and region.

【9】 Implicit Variational Conditional Sampling with Normalizing Flows 标题：带归一化流的隐式变分条件抽样

作者：Vincent Moens,Aivar Sootla,Haitham Bou Ammar,Jun Wang 机构：Huawei R&D UK, University College London 链接：https://arxiv.org/abs/2107.02474 摘要：我们提出了一种方法，当只有一部分观测值可用时，用归一化流进行条件采样。我们依赖于这样一个事实：如果流的域可以以这样一种方式划分，即流对子域的限制保持双射性，则可以导出条件变量log概率的下界。从变分条件流的模拟然后修正为求解等式约束。我们的贡献有三个方面：a）我们提供了关于变分分布选择的详细见解；b）提出了如何划分流的输入空间以保持双射性；c）我们提出了一套在特定情况下优化变分分布的方法。通过大量的实验，我们证明我们的抽样方法可以成功地应用于可逆残差网络的推理和分类。摘要：We present a method for conditional sampling with normalizing flows when only part of an observation is available. We rely on the following fact: if the flow's domain can be partitioned in such a way that the flow restrictions to subdomains keep the bijectivity property, a lower bound to the conditioning variable log-probability can be derived. Simulation from the variational conditional flow then amends to solving an equality constraint. Our contribution is three-fold: a) we provide detailed insights on the choice of variational distributions; b) we propose how to partition the input space of the flow to preserve bijectivity property; c) we propose a set of methods to optimise the variational distribution in specific cases. Through extensive experiments, we show that our sampling method can be applied with success to invertible residual networks for inference and classification.

【10】 Goodness-of-fit testing for Hölder continuous densities under local differential privacy 标题：局部差分隐私下Hölder连续密度的拟合优度检验

作者：Amandine Dubois,Thomas Berrett,Cristina Butucea 机构：CREST, ENSAI, Campus de Ker-Lann - Rue Blaise Pascal - BP , - , Department of Statistics, University of Warwick - Coventry - CV,AL - United, CREST, ENSAE, Institut Polytechnique de Paris, avenue Henry Le Chatelier 链接：https://arxiv.org/abs/2107.02439 摘要：本文研究了局部微分隐私约束下H′older连续密度的拟合优度检验问题。我们研究了当只允许使用非交互隐私机制，以及当非交互和顺序交互都可以用于私有化时的minimax分离率。我们提出了隐私机制和相关的测试程序，其分析使我们能够获得极小极大速率的上界。这些结果得到了下界的补充。通过比较这些界限，我们表明，所提出的隐私机制和测试是最佳的，最多对数因子的几个选择$f_0$，包括密度从均匀，正态，β，柯西，帕累托，指数分布。特别是，我们观察到结果恶化在私人设置相比，非私人的。此外，我们还表明，当只考虑非交互隐私机制时，顺序交互机制会改进所得到的结果。摘要：We address the problem of goodness-of-fit testing for H\"older continuous densities under local differential privacy constraints. We study minimax separation rates when only non-interactive privacy mechanisms are allowed to be used and when both non-interactive and sequentially interactive can be used for privatisation. We propose privacy mechanisms and associated testing procedures whose analysis enables us to obtain upper bounds on the minimax rates. These results are complemented with lower bounds. By comparing these bounds, we show that the proposed privacy mechanisms and tests are optimal up to at most a logarithmic factor for several choices of $f_0$ including densities from uniform, normal, Beta, Cauchy, Pareto, exponential distributions. In particular, we observe that the results are deteriorated in the private setting compared to the non-private one. Moreover, we show that sequentially interactive mechanisms improve upon the results obtained when considering only non-interactive privacy mechanisms.

【11】 Testing for the Presence of Structural Change and Spatial Heterogeneity 标题：检验结构变化和空间异质性的存在

作者：Ruby Anne E. Lemence,Erniel B. Barrios 机构：Bangko Sentral ng Pilipinas, Professor, School of Statistics, University of the Philippines Diliman 链接：https://arxiv.org/abs/2107.02417 摘要：在时空模型中，结构变化和/或空间异质性很容易影响参数估计。根据文献[1]中的时空模型，我们开发了一个非参数方法，用bootstrap技术和前向搜索算法来检验结构变化和空间异质性的存在。时间序列bootstrap可以在时间参数置信区间的构造中过滤临时结构变化的影响。前向搜索也有助于构造空间参数的鲁棒置信区间。然后利用这些置信区间确定不存在结构变化/空间异质性的零假设。模拟研究表明，在一定条件下，该测试方法能够检测出结构变化和空间异质性的存在。摘要：In a spatial-temporal model, structural change and/or spatial heterogeneity can easily affect estimation of parameters. Following the spatial-temporal model in [1], we develop a nonparametric procedure for test-ing the presence of structural change and spatial heterogeneity using bootstrap techniques and the forward search algorithm. The time series bootstrap can filter the effect of temporary structural change in the con-struction of a confidence interval for the temporal parameter. The forward search will also facilitate the construction of a robust confidence interval for the spatial parameter. These confidence intervals are then used in deciding on the null hypothesis that there is no structural change/spatial heterogeneity. Simulation studies illustrate the ability of the proposed test procedure in detecting presence of structural change and spatial heterogeneity under certain conditions.

【12】 Face masks, vaccination rates and low crowding drive the demand for the London Underground during the COVID-19 pandemic 标题：在冠状病毒大流行期间，口罩、疫苗接种率和低拥挤推动了对伦敦地铁的需求

作者：Prateek Bansal,Roselinde Kessels,Rico Krueger,Daniel J Graham 机构：Transport Strategy Centre, Imperial College London, UK, Department of Data Analytics and Digitalization, Maastricht University, the Netherlands, Transport and Mobility Laboratory, École Polytechnique Fédérale de Lausanne, Switzerland 链接：https://arxiv.org/abs/2107.02394 摘要：COVID-19大流行严重影响了人们的旅行行为和外出活动的参与。虽然随着疫苗接种率的提高，应对措施正在放松，但对公共交通的需求仍然不确定。为了调查流感大流行期间伦敦地铁的使用者偏好，我们在流感大流行前的使用者（N=961）中进行了一项声明选择实验。我们使用多项式和混合logit模型分析收集的数据。我们的分析揭示了伦敦地铁需求对出行属性（拥挤密度和出行时间）、疫情（确诊的新COVID-19病例）和干预措施（疫苗接种率和强制面罩）的敏感性。强制口罩和较高的疫苗接种率是COVID-19期间伦敦地铁出行需求的两大驱动力。疫苗接种率对地铁需求的积极影响随着拥挤密度的增加而增加，而强制口罩的积极影响随着出行时间的延长而减少。混合logit揭示了大量的偏好异质性。例如，虽然强制性口罩的平均效果是积极的，但大约20%的流感大流行前使用者乘地铁出行的偏好受到负面影响。估计的需求敏感性与运输系统的供需管理和先进流行病学模型的校准有关。摘要：The COVID-19 pandemic has drastically impacted people's travel behaviour and out-of-home activity participation. While countermeasures are being eased with increasing vaccination rates, the demand for public transport remains uncertain. To investigate user preferences to travel by London Underground during the pandemic, we conducted a stated choice experiment among its pre-pandemic users (N=961). We analysed the collected data using multinomial and mixed logit models. Our analysis provides insights into the sensitivity of the demand for the London Underground with respect to travel attributes (crowding density and travel time), the epidemic situation (confirmed new COVID-19 cases), and interventions (vaccination rates and mandatory face masks). Mandatory face masks and higher vaccination rates are the top two drivers of travel demand for the London Underground during COVID-19. The positive impact of vaccination rates on the Underground demand increases with crowding density, and the positive effect of mandatory face masks decreases with travel time. Mixed logit reveals substantial preference heterogeneity. For instance, while the average effect of mandatory face masks is positive, preferences of around 20% of the pre-pandemic users to travel by the Underground are negatively affected. The estimated demand sensitivities are relevant for supply-demand management in transit systems and the calibration of advanced epidemiological models.

【13】 Asymptotics of Network Embeddings Learned via Subsampling 标题：通过次抽样学习的网络嵌入的渐近性

作者：Andrew Davison,Morgane Austern 机构：Department of Statistics, Columbia University, New York, NY ,-, USA, Harvard University, Cambridge, MA ,-, USA 备注：98 pages, 3 figures, 1 table 链接：https://arxiv.org/abs/2107.02363 摘要：网络数据在现代机器学习中无处不在，其任务包括节点分类、节点聚类和链路预测。一种常用的方法是先学习网络的欧几里德嵌入，然后应用为向量值数据开发的算法。对于大型网络，嵌入是使用随机梯度方法学习的，其中子采样方案可以自由选择。尽管这些方法具有很强的实证性能，但它们在理论上并没有得到很好的理解。我们的工作将使用子抽样方法（如node2vec）的表示方法封装到一个统一的框架中。在图是可交换的假设下，我们证明了学习的嵌入向量的分布是渐近解耦的。此外，我们还刻画了渐近分布和收敛速度，根据潜在的参数，其中包括损失函数和嵌入维数的选择。这为理解嵌入向量代表什么以及这些方法在下游任务上的执行提供了理论基础。值得注意的是，我们观察到通常使用的损失函数可能会导致缺点，例如缺乏Fisher一致性。摘要：Network data are ubiquitous in modern machine learning, with tasks of interest including node classification, node clustering and link prediction. A frequent approach begins by learning an Euclidean embedding of the network, to which algorithms developed for vector-valued data are applied. For large networks, embeddings are learned using stochastic gradient methods where the sub-sampling scheme can be freely chosen. Despite the strong empirical performance of such methods, they are not well understood theoretically. Our work encapsulates representation methods using a subsampling approach, such as node2vec, into a single unifying framework. We prove, under the assumption that the graph is exchangeable, that the distribution of the learned embedding vectors asymptotically decouples. Moreover, we characterize the asymptotic distribution and provided rates of convergence, in terms of the latent parameters, which includes the choice of loss function and the embedding dimension. This provides a theoretical foundation to understand what the embedding vectors represent and how well these methods perform on downstream tasks. Notably, we observe that typically used loss functions may lead to shortcomings, such as a lack of Fisher consistency.

【14】 Hierarchical clustered multiclass discriminant analysis via cross-validation 标题：基于交叉验证的分层聚类多类判别分析

作者：Kei Hirose,Kanta Miura,Atori Koie 机构： Institute of Mathematics for Industry, Kyushu University, Motooka, Nishi-ku, Fukuoka ,-, Japan, RIKEN Center for Advanced Intelligence Project,-,-, Nihonbashi, Chuo-ku, Tokyo ,-, Japan, Nissan Motor Co., Ltd.,-, Morinosatoaoyama, Atsugi, Kanagawa ,-, Japan 备注：26 pages, 8 figures 链接：https://arxiv.org/abs/2107.02324 摘要：线性判别分析（LDA）是一种著名的多类分类和降维方法。然而，一般来说，当某些类别的观测值难以分类时，普通的LDA并不能达到很高的预测精度。本文提出了一种新的基于聚类的LDA方法，显著提高了预测精度。采用层次聚类法，通过交叉验证（CV）值定义两个聚类的相异度测度。因此，构造聚类使得误分类错误率最小化。由于层次聚类算法的每一步都必须计算CV值，因此我们的方法需要大量的计算量。为了解决这个问题，我们开发了一个LDA的回归公式，并构造了一个计算CV近似值的有效算法。通过将该方法应用于人工和真实数据集，研究了该方法的性能。从数值和理论上看，该方法具有计算速度快、预测精度高的特点。摘要：Linear discriminant analysis (LDA) is a well-known method for multiclass classification and dimensionality reduction. However, in general, ordinary LDA does not achieve high prediction accuracy when observations in some classes are difficult to be classified. This study proposes a novel cluster-based LDA method that significantly improves the prediction accuracy. We adopt hierarchical clustering, and the dissimilarity measure of two clusters is defined by the cross-validation (CV) value. Therefore, clusters are constructed such that the misclassification error rate is minimized. Our approach involves a heavy computational load because the CV value must be computed at each step of the hierarchical clustering algorithm. To address this issue, we develop a regression formulation for LDA and construct an efficient algorithm that computes an approximate value of the CV. The performance of the proposed method is investigated by applying it to both artificial and real datasets. Our proposed method provides high prediction accuracy with fast computation from both numerical and theoretical viewpoints.

【15】 Optimal Estimation of Brownian Penalized Regression Coefficients 标题：布朗惩罚回归系数的最优估计

作者：Paramahansa Pramanik,Alan M. Polansky 机构：A]University of South Alabama, B]Northern Illinois University, Department of Mathematics and Statistics, Mobile, AL , USA., Department of Statistics and Actuarial Science, DeKalb, IL , USA. 备注：27 pages, 0 figures 链接：https://arxiv.org/abs/2107.02291 摘要：本文介绍了一种确定惩罚函数回归最优系数的新方法。我们假设因变量、自变量和回归系数是时间的函数，误差动态遵循随机微分方程。首先将目标函数构造为与时间相关的残差平方和，然后根据不同的误差动态（如套索、群套索、融合套索和三次光滑样条）对回归系数进行最小化。然后利用Feynman型路径积分方法确定了一个具有系统全部信息的Schr-odinger型方程。利用这些系数的一阶条件，给出了它们的闭式解。摘要：In this paper we introduce a new methodology to determine an optimal coefficient of penalized functional regression. We assume the dependent, independent variables and the regression coefficients are functions of time and error dynamics follow a stochastic differential equation. First we construct our objective function as a time dependent residual sum of square and then minimize it with respect to regression coefficients subject to different error dynamics such as LASSO, group LASSO, fused LASSO and cubic smoothing spline. Then we use Feynman-type path integral approach to determine a Schr\"odinger-type equation which have the entire information of the system. Using first order conditions with respect to these coefficients give us a closed form solution of them.

【16】 Near-optimal inference in adaptive linear regression 标题：自适应线性回归中的近优推断

作者：Koulik Khamaru,Yash Deshpande,Lester Mackey,Martin J. Wainwright 机构：Department of Statistics:, UC Berkeley, Department of Electrical Engineering and Computer Sciences‹, UC Berkeley, Voleon Group˚ and Microsoft Research; 备注：45 pages, 7 figures 链接：https://arxiv.org/abs/2107.02266 摘要：当以自适应方式收集数据时，即使是像普通最小二乘法这样的简单方法也可能表现出非正态渐近行为。作为一种不良后果，基于渐近正态性的假设检验和置信区间可能导致错误的结果。我们提出了一个在线借记估计器来修正最小二乘估计中的分布异常。我们提出的方法利用了数据集中存在的协方差结构，并在产生更多信息的方向上提供了更精确的估计。在数据采集过程中，我们在温和的条件下，建立了我们所提出的在线借记估计的渐近正态性，并提供了渐近精确的置信区间。我们还证明了自适应线性回归问题的极小极大下界，从而提供了比较估计量的基线。我们提出的估计器在各种条件下达到对数因子的极大极小下界。通过对多武装土匪、自回归时间序列估计和主动探索学习的应用，证明了该理论的有效性。摘要：When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose an online debiasing estimator to correct these distributional anomalies in least squares estimation. Our proposed method takes advantage of the covariance structure present in the dataset and provides sharper estimates in directions for which more information has accrued. We establish an asymptotic normality property for our proposed online debiasing estimator under mild conditions on the data collection process, and provide asymptotically exact confidence intervals. We additionally prove a minimax lower bound for the adaptive linear regression problem, thereby providing a baseline by which to compare estimators. There are various conditions under which our proposed estimator achieves the minimax lower bound up to logarithmic factors. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.

【17】 Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy 标题：损坏数据的因果推断：测量误差、缺失值、离散化和差分隐私

作者：Anish Agarwal,Rahul Singh 机构：MIT 备注：99 pages 链接：https://arxiv.org/abs/2107.02780 摘要：即使是最精心策划的经济数据集也有一些变量，它们是嘈杂的、缺失的、离散化的或私有化的。实证研究的标准工作流程包括数据清理，然后进行数据分析，通常忽略数据清理的偏差和方差后果。我们建立了一个半参数的因果推理模型，包含了数据清洗和数据分析。我们提出了一种新的端到端的数据清理、估计和推断过程，其中数据清理调整了置信区间。我们证明了根n一致性，高斯逼近，半参数效率估计的因果参数有限样本参数。我们的关键假设是真正的协变量是近似低秩的。在我们的分析中，我们提供了矩阵完备、统计学习和半参数统计的非交感理论贡献。我们在模拟中验证了数据清理调整置信区间的覆盖率。摘要：Even the most carefully curated economic data sets have variables that are noisy, missing, discretized, or privatized. The standard workflow for empirical research involves data cleaning followed by data analysis that typically ignores the bias and variance consequences of data cleaning. We formulate a semiparametric model for causal inference with corrupted data to encompass both data cleaning and data analysis. We propose a new end-to-end procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove root-n consistency, Gaussian approximation, and semiparametric efficiency for our estimator of the causal parameter by finite sample arguments. Our key assumption is that the true covariates are approximately low rank. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. We verify the coverage of the data cleaning-adjusted confidence intervals in simulations.

【18】 Counterfactual Explanations in Sequential Decision Making Under Uncertainty 标题：不确定条件下序贯决策中的反事实解释

作者：Stratis Tsirtsis,Abir De,Manuel Gomez-Rodriguez 机构： and Manuel Gomez Rodriguez§§Max Planck Institute for Software Systems 备注：To appear at the ICML 2021 workshop on Interpretable Machine Learning in Healthcare 链接：https://arxiv.org/abs/2107.02776 摘要：寻找反事实解释的方法主要集中在一步决策过程。在这项工作中，我们开始发展的方法，以找到反事实的解释决策过程中，多个，相关的行动是采取顺序随着时间的推移。我们首先使用有限时间马尔可夫决策过程和Gumbel-Max结构因果模型对一系列行为和状态进行形式化描述。在这个特征的基础上，我们正式陈述了为连续决策过程寻找反事实解释的问题。在我们的问题表述中，反事实解释指定了一个替代的行动序列，该序列最多有k个行动不同于观察到的序列，可以导致观察到的过程实现更好的结果。然后，我们引入一个基于动态规划的多项式时间算法来建立一个反事实策略，该策略保证对反事实环境动力学的每一个可能实现提供最优的反事实解释。我们使用来自认知行为治疗的合成和真实数据验证了我们的算法，并表明我们的算法发现的反事实解释可以提供有价值的见解，以增强不确定性下的顺序决策。摘要：Methods to find counterfactual explanations have predominantly focused on one step decision making processes. In this work, we initiate the development of methods to find counterfactual explanations for decision making processes in which multiple, dependent actions are taken sequentially over time. We start by formally characterizing a sequence of actions and states using finite horizon Markov decision processes and the Gumbel-Max structural causal model. Building upon this characterization, we formally state the problem of finding counterfactual explanations for sequential decision making processes. In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions from the observed sequence that could have led the observed process realization to a better outcome. Then, we introduce a polynomial time algorithm based on dynamic programming to build a counterfactual policy that is guaranteed to always provide the optimal counterfactual explanation on every possible realization of the counterfactual environment dynamics. We validate our algorithm using both synthetic and real data from cognitive behavioral therapy and show that the counterfactual explanations our algorithm finds can provide valuable insights to enhance sequential decision making under uncertainty.

【19】 Dueling Bandits with Team Comparisons 标题：用团队比较法决斗土匪

作者：Lee Cohen,Ulrike Schmidt-Kraepelin,Yishay Mansour 机构： 1 Blavatnik School of Computer Science, Tel Aviv University, Technische Universität Berlin 链接：https://arxiv.org/abs/2107.02738 摘要：我们引入了决斗队问题，这是一个新的在线学习环境，在这个环境中，学习者观察到一个由$n$个玩家组成的$k$大小的不相交队对的嘈杂比较。学习者的目标是尽可能减少决斗的次数，以高概率确定一个决斗获胜的队伍，即一个战胜任何其他不相交队伍的队伍（概率至少为1/2$）。嘈杂的比较与团队的总秩序有关。我们通过建立决斗土匪设置（Yue et al.2012）形式化了我们的模型，并提供了几种算法，既适用于随机设置，也适用于确定性设置。对于随机设置，我们提供了对经典决斗强盗设置的简化，产生了一个在$\mathcal{O}（（n+k\log（k））\frac{\max（\log\log n，\log k）}{\Delta^2}）$决斗中识别Condorcet获胜团队的算法，其中$\Delta$是间隙参数。对于确定性反馈，我们还提出了一个与间隙无关的算法，该算法在$\mathcal{O}（nk\log（k）+k^5）$决斗中识别一个Condorcet获胜团队。摘要：We introduce the dueling teams problem, a new online-learning setting in which the learner observes noisy comparisons of disjoint pairs of $k$-sized teams from a universe of $n$ players. The goal of the learner is to minimize the number of duels required to identify, with high probability, a Condorcet winning team, i.e., a team which wins against any other disjoint team (with probability at least $1/2$). Noisy comparisons are linked to a total order on the teams. We formalize our model by building upon the dueling bandits setting (Yue et al.2012) and provide several algorithms, both for stochastic and deterministic settings. For the stochastic setting, we provide a reduction to the classical dueling bandits setting, yielding an algorithm that identifies a Condorcet winning team within $\mathcal{O}((n + k \log (k)) \frac{\max(\log\log n, \log k)}{\Delta^2})$ duels, where $\Delta$ is a gap parameter. For deterministic feedback, we additionally present a gap-independent algorithm that identifies a Condorcet winning team within $\mathcal{O}(nk\log(k)+k^5)$ duels.

【20】 Provable Lipschitz Certification for Generative Models 标题：生成模型的可证明Lipschitz证明

作者：Matt Jordan,Alexandros G. Dimakis 机构：We present a general algorithm for mapping zonotopes 1University of Texas at Austin 备注：Accepted into ICML 2021 链接：https://arxiv.org/abs/2107.02732 摘要：提出了一种生成模型Lipschitz常数上界的可扩展方法。我们将这个量与给定生成模型的向量雅可比积集上的最大范数联系起来。我们使用zonotopes通过分层凸近似来逼近这个集合。我们的方法推广和改进了以前使用zonotopeTransformer的工作，并扩展到了大输出维神经网络的Lipschitz估计。这在小型网络上提供了有效且严格的限制，并且可以扩展到VAE和DCGAN架构上的生成模型。摘要：We present a scalable technique for upper bounding the Lipschitz constant of generative models. We relate this quantity to the maximal norm over the set of attainable vector-Jacobian products of a given generative model. We approximate this set by layerwise convex approximations using zonotopes. Our approach generalizes and improves upon prior work using zonotope transformers and we extend to Lipschitz estimation of neural networks with large output dimension. This provides efficient and tight bounds on small networks and can scale to generative models on VAE and DCGAN architectures.

【21】 AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning 标题：AdaRL：在迁移强化学习中适应什么、在哪里和如何适应

作者：Biwei Huang,Fan Feng,Chaochao Lu,Sara Magliacane,Kun Zhang 机构：Carnegie Mellon University, City University of Hong Kong, University of Cambridge, University of Amsterdam, MIT-IBM Watson AI Lab 链接：https://arxiv.org/abs/2107.02729 摘要：强化学习（RL）中的大多数方法都是数据饥渴的，并且特定于固定的环境。在本文中，我们提出了一个原则性的自适应RL框架AdaRL，它能够可靠地适应跨域的变化。具体地说，我们为系统中变量之间的结构关系构建了一个生成环境模型，并以一种紧凑的方式嵌入了变化，这为定位变化是什么、在哪里以及如何适应变化提供了一个清晰的、可解释的图像。基于环境模型，我们描述了一个最小的表示集，包括领域特定的因素和领域共享状态表示，足以实现可靠和低成本的传输。此外，我们还表明，通过显式地利用紧凑的表示来编码更改，我们可以只使用少量样本来调整策略，而无需在目标域中进一步优化策略。我们通过一系列实验来说明AdaRL的有效性，这些实验允许Cartpole和Atari游戏的不同组件发生变化。摘要：Most approaches in reinforcement learning (RL) are data-hungry and specific to fixed environments. In this paper, we propose a principled framework for adaptive RL, called AdaRL, that adapts reliably to changes across domains. Specifically, we construct a generative environment model for the structural relationships among variables in the system and embed the changes in a compact way, which provides a clear and interpretable picture for locating what and where the changes are and how to adapt. Based on the environment model, we characterize a minimal set of representations, including both domain-specific factors and domain-shared state representations, that suffice for reliable and low-cost transfer. Moreover, we show that by explicitly leveraging a compact representation to encode changes, we can adapt the policy with only a few samples without further policy optimization in the target domain. We illustrate the efficacy of AdaRL through a series of experiments that allow for changes in different components of Cartpole and Atari games.

【22】 A Unified Off-Policy Evaluation Approach for General Value Function 标题：一般价值函数的统一非政策性评价方法

作者：Tengyu Xu,Zhuoran Yang,Zhaoran Wang,Yingbin Liang 机构：The Ohio State University, Princeton University, Northwestern University 备注：submitted for publication 链接：https://arxiv.org/abs/2107.02711 摘要：一般值函数（GVF）是强化学习（RL）中一种既能表示{\em预测性}又能表示{\em回顾性}知识的有力工具。在实践中，往往需要对多个相互关联的全球价值函数与预先收集的政策外样本进行联合评估。在文献中，梯度时间差（GTD）学习方法被用来评估非策略环境下的GVFs，但是这种方法即使函数近似类具有足够的表达性，也可能会产生较大的估计误差。此外，在函数逼近的情况下，以往的工作都没有正式建立对地真值GVF的收敛保证。在本文中，我们通过一类带因果滤波的GVF来解决这两个问题，它涵盖了RL的广泛应用，如报酬方差、价值梯度、异常检测中的成本、平稳分布梯度、，我们提出了一种新的非策略GVFs求值算法GenTD，并证明GenTD学习多个相关的多维GVFs的效率与学习单个规范标量值函数的效率相同。我们进一步证明了与GTD不同的是，只要函数逼近能力足够大，GenTD所学习的GVFs就可以保证收敛到地面真值GVFs。据我们所知，GenTD是第一个具有全局最优保证的策略GVF评估算法。摘要：General Value Function (GVF) is a powerful tool to represent both the {\em predictive} and {\em retrospective} knowledge in reinforcement learning (RL). In practice, often multiple interrelated GVFs need to be evaluated jointly with pre-collected off-policy samples. In the literature, the gradient temporal difference (GTD) learning method has been adopted to evaluate GVFs in the off-policy setting, but such an approach may suffer from a large estimation error even if the function approximation class is sufficiently expressive. Moreover, none of the previous work have formally established the convergence guarantee to the ground truth GVFs under the function approximation settings. In this paper, we address both issues through the lens of a class of GVFs with causal filtering, which cover a wide range of RL applications such as reward variance, value gradient, cost in anomaly detection, stationary distribution gradient, etc. We propose a new algorithm called GenTD for off-policy GVFs evaluation and show that GenTD learns multiple interrelated multi-dimensional GVFs as efficiently as a single canonical scalar value function. We further show that unlike GTD, the learned GVFs by GenTD are guaranteed to converge to the ground truth GVFs as long as the function approximation power is sufficiently large. To our best knowledge, GenTD is the first off-policy GVF evaluation algorithm that has global optimality guarantee.

【23】 Using Localized Twitter Activity for Red Tide Impact Assessment 标题：利用本地化Twitter活动进行赤潮影响评估

作者：A. Skripnikov,N. Wagner,J. Shafer,M. Beck,E. Sherwood,M. Burke 机构：a New College of Florida, Division of Natural Sciences, College Dr, Sarasota, FL , b Science and Environment Council of Southwest Florida, Dolphin Street, Suite , Sarasota, FL , USA., c Tampa Bay Estuary Program,th Ave S, St. Petersburg, FL , USA. 备注：40 pages,11 figures (27 image files though), 5 tables, submitted to "Harmful Algae" 链接：https://arxiv.org/abs/2107.02677 摘要：短沟鞭藻的赤潮爆发。（brevis）产生有毒的海岸条件，可能影响海洋生物和人类健康，同时也影响当地经济。在2017-2019年佛罗里达州极端赤潮事件期间，居民和游客转向社交媒体平台，既接收与灾害相关的信息，又交流自己的情感和经历。这是自社交媒体广泛使用以来的第一次重大赤潮事件，因此提供了独特的群众来源的赤潮影响报告。我们评估了Twitter上赤潮话题活动的时空准确性，考虑了tweet情绪和用户类型（如媒体、公民），并将tweet活动与报道的赤潮情况（如K。短细胞计数、死鱼数量和当地海滩上的呼吸道刺激。分析是在多个层面上进行的，涉及地区（例如，整个墨西哥湾沿岸、县级、市级、邮政编码制表区）和时间频率（例如，每天、每三天、每周），导致当地人均推特活动与在该地区观察到的实际赤潮情况之间存在很强的相关性。此外，还观察到与受影响沿海地区的距离和相关推特的人均计数之间存在关联。研究结果表明，Twitter是赤潮在当地影响和发展的可靠代表，有可能成为更有效评估和更协调地实时应对灾害的工具之一。摘要：Red tide blooms of the dinoflagellate Karenia brevis (K. brevis) produce toxic coastal conditions that can impact marine organisms and human health, while also affecting local economies. During the extreme Florida red tide event of 2017-2019, residents and visitors turned to social media platforms to both receive disaster-related information and communicate their own sentiments and experiences. This was the first major red tide event since the ubiquitous use of social media, thus providing unique crowd-sourced reporting of red tide impacts. We evaluated the spatial and temporal accuracy of red tide topic activity on Twitter, taking tweet sentiments and user types (e.g. media, citizens) into consideration, and compared tweet activity with reported red tide conditions, such as K. brevis cell counts, levels of dead fish and respiratory irritation on local beaches. The analysis was done on multiple levels with respect to both locality (e.g., entire Gulf coast, county-level, city-level, zip code tabulation areas) and temporal frequencies (e.g. daily, every three days, weekly), resulting in strong correlations between local per-capita Twitter activity and the actual red tide conditions observed in the area. Moreover, an association was observed between proximity to the affected coastal areas and per-capita counts for relevant tweets. Results show that Twitter is a reliable proxy of the red tide's local impacts and development over time, which can potentially be used as one of the tools for more efficient assessment and a more coordinated response to the disaster in real time.

【24】 Galerkin--Chebyshev approximation of Gaussian random fields on compact Riemannian manifolds 标题：紧致黎曼流形上高斯随机场的Galerkin-Chebyshev逼近

作者：Annika Lang,Mike Pereira 机构： Theresulting Karhunen–Loeve expansion is used to derive simulation methods and to characterize(Annika Lang)Department of Mathematical SciencesChalmers University of Technology & University of GothenburgS– 4 1 2 96 G¨oteborg 备注：34 pages, 5 figures 链接：https://arxiv.org/abs/2107.02667 摘要：介绍了紧黎曼流形上一类高斯随机场的一种新的数值逼近方法。这类随机场的特征是流形上的Laplace—Beltrami算子。利用切比雪夫级数将伽辽金近似与多项式近似相结合。这种所谓的伽辽金-切比雪夫近似格式给出了流形上高斯随机场的有效和通用的采样算法。给出了Galerkin逼近的强收敛阶、弱收敛阶和Galerkin—Chebyshev逼近的强收敛阶，并通过数值实验加以验证。摘要：A new numerical approximation method for a class of Gaussian random fields on compact Riemannian manifolds is introduced. This class of random fields is characterized by the Laplace--Beltrami operator on the manifold. A Galerkin approximation is combined with a polynomial approximation using Chebyshev series. This so-called Galerkin--Chebyshev approximation scheme yields efficient and generic sampling algorithms for Gaussian random fields on manifolds. Strong and weak orders of convergence for the Galerkin approximation and strong convergence orders for the Galerkin--Chebyshev approximation are shown and confirmed through numerical experiments.

【25】 On Generalization of Graph Autoencoders with Adversarial Training 标题：关于对抗性训练的图形自动编码器的泛化

作者：Tianjin huang,Yulong Pei,Vlado Menkovski,Mykola Pechenizkiy 机构：Department of Mathematics and Computer Science, Eindhoven University of, Technology, MB Eindhoven, the Netherlands 备注：ECML 2021 Accepted 链接：https://arxiv.org/abs/2107.02658 摘要：对抗性训练是一种提高模型抗干扰能力的方法。这样的方法已经被证明能产生具有更一般化的特征表示的模型。然而，在图形数据模型的对抗性训练方面的工作还很有限。在本文中，我们提出这样一个问题{对抗训练是否能提高图表示的泛化能力。我们用两种强大的节点嵌入方法：图自动编码器（GAE）和变分图自动编码器（VGAE）来描述L2和L1版本的对抗性训练。我们对GAE和VGAE的三个主要应用，即链路预测、节点聚类、图异常检测进行了大量的实验，证明了L2和L1对抗训练都能提高GAE和VGAE的泛化能力。摘要：Adversarial training is an approach for increasing model's resilience against adversarial perturbations. Such approaches have been demonstrated to result in models with feature representations that generalize better. However, limited works have been done on adversarial training of models on graph data. In this paper, we raise such a question { does adversarial training improve the generalization of graph representations. We formulate L2 and L1 versions of adversarial training in two powerful node embedding methods: graph autoencoder (GAE) and variational graph autoencoder (VGAE). We conduct extensive experiments on three main applications, i.e. link prediction, node clustering, graph anomaly detection of GAE and VGAE, and demonstrate that both L2 and L1 adversarial training boost the generalization of GAE and VGAE.

【26】 Automatic size and pose homogenization with spatial transformer network to improve and accelerate pediatric segmentation 标题：利用空间变换网络进行自动大小和姿势同质化以改进和加速儿科分割

作者：Giammarco La Barbera,Pietro Gori,Haithem Boussaid,Bruno Belucci,Alessandro Delmonte,Jeanne Goulin,Sabine Sarnacki,Laurence Rouet,Isabelle Bloch 机构：- LTCI, Telecom Paris, Institut Polytechnique de Paris, France, - Philips Research Paris, Suresnes, France, - IMAG, Imagine Institute, Universite de Paris, France 备注：None 链接：https://arxiv.org/abs/2107.02655 摘要：由于体位和大小的高度异质性以及可用数据的有限性，儿科图像的分割对于深度学习方法来说是一个挑战。在这项工作中，我们提出了一种新的CNN架构，由于使用了空间变换网络（STN），该架构具有姿态和尺度不变性。我们的结构由三个连续的模块组成，这些模块在训练期间一起被估计：（i）一个回归模块来估计相似矩阵，以将输入图像归一化为参考图像(ii）一个可微模块，用于寻找要分割的感兴趣区域(iii）基于流行的UNet架构的分割模块，用于描绘对象。不同于原始的UNet，它努力学习一个复杂的映射，包括姿势和比例的变化，从一个有限的训练数据集，我们的分割模块学习一个简单的映射集中在图像的规格化姿势和大小。此外，通过STN使用自动边界框检测可以节省时间，特别是内存，同时保持相似的性能。我们在儿科腹部CT扫描仪上测试了该方法在肾脏和肾脏肿瘤分割中的应用。结果表明，与标准数据增强（33h）相比，估计的STN大小和姿势均匀化加速了分割（25h），同时获得了相似的肾脏质量（Dice评分的88.01%），并改善了肾脏肿瘤的轮廓（从85.52%提高到87.12%）。摘要：Due to a high heterogeneity in pose and size and to a limited number of available data, segmentation of pediatric images is challenging for deep learning methods. In this work, we propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN). Our architecture is composed of three sequential modules that are estimated together during training: (i) a regression module to estimate a similarity matrix to normalize the input image to a reference one; (ii) a differentiable module to find the region of interest to segment; (iii) a segmentation module, based on the popular UNet architecture, to delineate the object. Unlike the original UNet, which strives to learn a complex mapping, including pose and scale variations, from a finite training dataset, our segmentation module learns a simpler mapping focusing on images with normalized pose and size. Furthermore, the use of an automatic bounding box detection through STN allows saving time and especially memory, while keeping similar performance. We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners. Results indicate that the estimated STN homogenization of size and pose accelerates the segmentation (25h), compared to standard data-augmentation (33h), while obtaining a similar quality for the kidney (88.01\% of Dice score) and improving the renal tumor delineation (from 85.52\% to 87.12\%).

【27】 The Hyperspherical Geometry of Community Detection: Modularity as a Distance 标题：社区检测的超球面几何：作为距离的模块性

作者：Martijn Gösgens,Remco van der Hofstad,Nelly Litvak 机构：Eindhoven University of Technology, Eindhoven, Netherlands, University of Twente, Enschede, Netherlands 链接：https://arxiv.org/abs/2107.02645 摘要：Louvain算法是目前最流行的社区检测方法之一。该算法通过最大化称为模块化的数量来发现社区。在这项工作中，我们描述了簇的度量空间，其中簇由顶点对索引的二元向量来描述。我们将此几何推广到一个超球面上，证明了最大化模块化等价于最小化聚类向量集合上某个模块化向量的角距离。这种等价性允许我们将Louvain算法看作是一种最近邻搜索，它近似地最小化了到这个模块化向量的距离。通过用一个不同的向量替换这个模块化向量，可以得到许多可选的社区检测方法。我们探索这个更广泛的类，并将其与现有的基于模块化的方法进行比较。我们的实验表明，这些方法可能优于基于模块化的方法。例如，与顶点邻域相比，当社区较大时，基于公共邻域数的向量优于现有的社区检测方法。虽然目前的工作重点是在网络中的社区检测，所提出的方法可以适用于任何聚类问题的配对相似性数据是可用的。摘要：The Louvain algorithm is currently one of the most popular community detection methods. This algorithm finds communities by maximizing a quantity called modularity. In this work, we describe a metric space of clusterings, where clusterings are described by a binary vector indexed by the vertex-pairs. We extend this geometry to a hypersphere and prove that maximizing modularity is equivalent to minimizing the angular distance to some modularity vector over the set of clustering vectors. This equivalence allows us to view the Louvain algorithm as a nearest-neighbor search that approximately minimizes the distance to this modularity vector. By replacing this modularity vector by a different vector, many alternative community detection methods can be obtained. We explore this wider class and compare it to existing modularity-based methods. Our experiments show that these alternatives may outperform modularity-based methods. For example, when communities are large compared to vertex neighborhoods, a vector based on numbers of common neighbors outperforms existing community detection methods. While the focus of the present work is community detection in networks, the proposed methodology can be applied to any clustering problem where pair-wise similarity data is available.

【28】 The global migration network of sex-workers 标题：性工作者的全球移民网络

作者：Luis E C Rocha,Petter Holme,Claudio D G Linhares 机构：Ghent University, Dept of Economics, Ghent, Belgium, Ghent University, Dept of Physics and Astronomy, Ghent, Belgium, Tokyo Institute of Technology, Tokyo, Japan, University of S˜ao Paulo, Institute of Mathematics and Computer Sciences, S˜ao Carlos, Brazil 备注：Comments and feedback welcomed. Two tables and 6 figures including SI 链接：https://arxiv.org/abs/2107.02633 摘要：各国社会和经济环境的差异鼓励人们移民，以寻求更好的生活条件，包括工作机会、更高的工资、安全和福利。然而，由于不良的记录、隐私问题和居住状况，量化全球移民具有挑战性。这对于参与污名化、无管制或非法活动的某些类别的移民来说尤其重要。护送服务或高端卖淫是吸引全世界工人的高薪活动。本文运用网络方法研究性工作者的国际迁移模式。利用广泛的国际在线护送服务广告目录和个人护送信息，我们重建了一个移民流动网络，其中的节点代表来源国或目的地国。这些线路代表了两国之间的直达路线。性工作者的移徙网络显示出与一般人口移徙不同的结构模式。该网络包含一个强大的核心，在这个核心中，经常观察到一组高收入欧洲国家之间的相互移徙，然而欧洲被划分为不同的网络社区，与非欧洲国家有着特定的联系。我们发现国家之间存在非互惠关系，其中一些国家主要提供服务，而另一些国家则吸引工人。人均国内生产总值是一个很好的指标，反映了一个国家对外来工人的吸引力和服务率，但与移民的可能性无关。与在本国工作相比，移民的平均经济收益为15.9%。只有来自77%国家的性工作者在移民方面有经济收益，而平均收益随着原籍国的GDPc而减少。我们的研究结果显示，高端性工作者的迁移受到经济、地理和文化方面的制约。摘要：Differences in the social and economic environment across countries encourage humans to migrate in search of better living conditions, including job opportunities, higher salaries, security and welfare. Quantifying global migration is, however, challenging because of poor recording, privacy issues and residence status. This is particularly critical for some classes of migrants involved in stigmatised, unregulated or illegal activities. Escorting services or high-end prostitution are well-paid activities that attract workers all around the world. In this paper, we study international migration patterns of sex-workers by using network methods. Using an extensive international online advertisement directory of escorting services and information about individual escorts, we reconstruct a migrant flow network where nodes represent either origin or destination countries. The links represent the direct routes between two countries. The migration network of sex-workers shows different structural patterns than the migration of the general population. The network contains a strong core where mutual migration is often observed between a group of high-income European countries, yet Europe is split into different network communities with specific ties to non-European countries. We find non-reciprocal relations between countries, with some of them mostly offering while others attract workers. The GDP per capita is a good indicator of country attractiveness for incoming workers and service rates but is unrelated to the probability of emigration. The median financial gain of migrating, in comparison to working at the home country, is 15.9%. Only sex-workers coming from 77% of the countries have financial gains with migration and average gains decrease with the GDPc of the country of origin. Our results shows that high-end sex-worker migration is regulated by economic, geographic and cultural aspects.

【29】 Tactile Sensing with a Tendon-Driven Soft Robotic Finger 标题：肌腱驱动软机器人手指的触觉传感

作者：Chang Cheng,Yadong Yan,Mingjun Guan,Jianan Zhang,Yu Wang 机构：School of Biological Sci. and Medical Engr., Beihang University, Beijing, China, Dept. of Math. and Computer Sci., Colorado College, Colorado, USA 备注：6 pages, 10 figures, submitted to ICCMA 2021 链接：https://arxiv.org/abs/2107.02546 摘要：提出了一种新型的机器人手指触觉传感机构。受哺乳动物本体感觉机制的启发，该方法从附着在手指肌腱上的应变传感器推断触觉信息。我们进行了实验来测试所提出的结构的触觉感知能力，并且我们的结果表明这种方法能够在外展和屈曲接触中触诊纹理和刚度。在系统交叉验证下，该系统的纹理和刚度识别准确率分别达到100%和99.7%，验证了该方法的可行性。此外，我们使用统计工具来确定提取的各种特征的重要性，以便进行分类。摘要：In this paper, a novel tactile sensing mechanism for soft robotic fingers is proposed. Inspired by the proprioception mechanism found in mammals, the proposed approach infers tactile information from a strain sensor attached on the finger's tendon. We perform experiments to test the tactile sensing capabilities of the proposed structures, and our results indicate this method is capable of palpating texture and stiffness in both abduction and flexion contact. Under systematic cross validation, the proposed system achieved 100% and 99.7% accuracy in texture and stiffness discrimination respectively, which validate the viability of this approach. Furthermore, we use statistics tools to determine the significance of various features extracted for classification.

【30】 Approximations to ultimate ruin probabilities with a Wienner process perturbation 标题：具有Wienner过程扰动的最终破产概率的逼近

作者：Yacine Koucha,Alfredo D. Egidio dos Reis 机构：Brunel University London, UB,PH Uxbridge, United Kingdom, Universidade de Lisboa, ISEG, -, Lisboa, Portugal 备注：Master dissertation work, 18 pages, 4 figures, 8 numerical tables 链接：https://arxiv.org/abs/2107.02537 摘要：在本文中，我们通过在复合泊松过程中加入一个维纳过程，将经典的Cram′er-Lundberg集体风险理论模型改为一个扰动模型，该过程可用于考虑保费收入的不确定性、利率波动和投保人数量的变化。我们的研究是一篇硕士论文的一部分，我们的目的是对扰动风险模型的无限时间破产概率作一个简要的综述，并提出一些新的近似方法。本文提出了四种不同的摄动风险模型的逼近方法。第一种方法基于对最大总损失分布的上下迭代逼近。第二种方法依赖于四矩指数devylder近似。第三种方法是基于Renyi和De Vylder近似的一阶Pad′e近似。最后一种方法是二阶Pad&e-Ramsay近似。它们是通过拟合索赔额分布的一、二、三或四个矩而产生的，这大大推广了近似方法。我们使用轻尾和重尾分布的组合对个人索赔额的近似精度进行了测试。我们评估了最终破产概率，并给出了指数、gamma和混合指数索赔分布的数值结果，证明了这四种方法的高精度。分析和数值方法被用来强调我们的发现的实际意义。摘要：In this paper, we adapt the classic Cram\'er-Lundberg collective risk theory model to a perturbed model by adding a Wiener process to the compound Poisson process, which can be used to incorporate premium income uncertainty, interest rate fluctuations and changes in the number of policyholders. Our study is part of a Master dissertation, our aim is to make a short overview and present additionally some new approximation methods for the infinite time ruin probabilities for the perturbed risk model. We present four different approximation methods for the perturbed risk model. The first method is based on iterative upper and lower approximations to the maximal aggregate loss distribution. The second method relies on a four-moment exponential De Vylder approximation. The third method is based on the first-order Pad\'e approximation of the Renyi and De Vylder approximations. The last method is the second order Pad\'e-Ramsay approximation. These are generated by fitting one, two, three or four moments of the claim amount distribution, which greatly generalizes the approximations. We test the precision of approximations using a combination of light and heavy tailed distributions for the individual claim amount. We assess the ultimate ruin probability and present numerical results for the exponential, gamma, and mixed exponential claim distributions, demonstrating the high accuracy of these four methods. Analytical and numerical methods are used to highlight the practical implications of our findings.

【31】 Intrinsic uncertainties and where to find them 标题：内在不确定性及其在哪里找到

作者：Francesco Farina,Lawrence Phillips,Nicola J Richmond 备注：Presented at the ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning 链接：https://arxiv.org/abs/2107.02526 摘要：我们介绍了一个不确定性估计的框架，它描述并扩展了许多现有的方法。我们将经典训练中涉及的典型超参数看作随机变量，并将其边缘化，以捕获参数空间中的各种不确定性来源。我们从标准基准数据集的实际角度出发，研究边缘化的哪些形式和组合最有用。此外，我们还讨论了一些边缘化如何产生可靠的不确定性估计，而无需进行广泛的超参数调整和/或大规模集成。摘要：We introduce a framework for uncertainty estimation that both describes and extends many existing methods. We consider typical hyperparameters involved in classical training as random variables and marginalise them out to capture various sources of uncertainty in the parameter space. We investigate which forms and combinations of marginalisation are most useful from a practical point of view on standard benchmarking data sets. Moreover, we discuss how some marginalisations may produce reliable estimates of uncertainty without the need for extensive hyperparameter tuning and/or large-scale ensembling.

【32】 EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data 标题：EVARS-GPR：季节性数据高斯过程回归的事件触发增广修正

作者：Florian Haselbeck,Dominik G. Grimm 机构： Technical University of Munich, TUM Campus Straubing for Biotechnology and, Sustainability, Bioinformatics, Schulgasse , Straubing, Germany, Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Petersgasse , Straubing, Germany 链接：https://arxiv.org/abs/2107.02463 摘要：时间序列预测是一个应用日益广泛的领域。然而，随着时间的推移，由于内部或外部的影响，系统行为的变化是具有挑战性的。因此，以前学习的预测模型的预测可能不再有用了。在本文中，我们提出了事件触发的季节数据高斯过程回归（EVARS-GPR）的增广修正，这是一种新的在线算法，能够处理季节数据目标变量尺度的突然变化。为此，EVARS-GPR将在线变化点检测与使用变化点之前样本的数据增强重新调整预测模型相结合。模拟数据实验表明，EVARS-GPR适用于大范围的输出尺度变化。与具有相似计算资源消耗的方法相比，EVARS-GPR在不同的实际数据集上的RMSE平均降低了20.8%。此外，我们还证明了我们的算法相对于所有具有周期性调整策略的比较伙伴，平均运行时间减少了6倍。综上所述，我们提出了一个计算效率高的季节性时间序列的在线预测算法，并在模拟和真实数据上演示了它的功能。所有代码都可在GitHub上公开获取：https://github.com/grimmlab/evars-gpr. 摘要：Time series forecasting is a growing domain with diverse applications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned fore-casting model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS-GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR com-bines online change point detection with a refitting of the prediction model using data augmentation for samples prior to a change point. Our experiments on sim-ulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8 % lower RMSE on different real-world datasets compared to methods with a similar computational resource con-sumption. Furthermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online fore-casting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: https://github.com/grimmlab/evars-gpr.

【33】 Deep Network Approximation With Accuracy Independent of Number of Neurons 标题：精度与神经元数目无关的深度网络逼近

作者：Zuowei Shen,Haizhao Yang,Shijun Zhang 机构：‡Department of Mathematics, Purdue University (haizhao, §Department of Mathematics, National University of Singapore (zhangshijun 链接：https://arxiv.org/abs/2107.02397 摘要：本文提出了一种简单的前向神经网络，它对所有具有固定有限个神经元的连续函数具有普遍逼近性。这些神经网络很简单，因为它们设计了一个简单的可计算的连续激活函数$\sigma$，利用三角波函数和软符号函数。我们证明了宽度为36d（2d+1）$、深度为11$的$\sigma$激活网络可以在任意小的误差范围内逼近$d$维超立方体上的任意连续函数。因此，对于有监督学习及其相关的回归问题，由这些大小不小于$36d（2d+1）\乘以11$的网络生成的假设空间在连续函数空间中是稠密的。此外，当存在$\mathbb{R}^d$的成对不相交闭有界子集使得同一类的样本位于同一子集中时，图像和信号分类产生的分类函数位于由$\sigma$激活的网络生成的假设空间中，网络的宽度为$36d（2d+1）$，深度为$12$。摘要：This paper develops simple feed-forward neural networks that achieve the universal approximation property for all continuous functions with a fixed finite number of neurons. These neural networks are simple because they are designed with a simple and computable continuous activation function $\sigma$ leveraging a triangular-wave function and a softsign function. We prove that $\sigma$-activated networks with width $36d(2d+1)$ and depth $11$ can approximate any continuous function on a $d$-dimensioanl hypercube within an arbitrarily small error. Hence, for supervised learning and its related regression problems, the hypothesis space generated by these networks with a size not smaller than $36d(2d+1)\times 11$ is dense in the space of continuous functions. Furthermore, classification functions arising from image and signal classification are in the hypothesis space generated by $\sigma$-activated networks with width $36d(2d+1)$ and depth $12$, when there exist pairwise disjoint closed bounded subsets of $\mathbb{R}^d$ such that the samples of the same class are located in the same subset.

【34】 A Short Note on the Relationship of Information Gain and Eluder Dimension 标题：关于信息增益与Eluder维数关系的一点注记

作者：Kaixuan Huang,Sham M. Kakade,Jason D. Lee,Qi Lei 机构：Princeton University, University of Washington, Microsoft Research 链接：https://arxiv.org/abs/2107.02377 摘要：逃逸维和信息增益是bandit和强化学习中常用的两种复杂度度量方法。Eluder维数最初是作为函数类的一般复杂性度量而提出的，但是已知它很小的常见例子是函数空间（向量空间）。在这些情况下，主要的工具上限的逃避维数是椭圆势引理。有趣的是，椭圆势引理在分析线性bandits/强化学习及其非参数推广，即信息增益方面也有显著的特点。我们证明了这不是巧合——对于再生核希尔BERT空间，逃逸维数和信息增益在精确意义上是等价的。摘要：Eluder dimension and information gain are two widely used methods of complexity measures in bandit and reinforcement learning. Eluder dimension was originally proposed as a general complexity measure of function classes, but the common examples of where it is known to be small are function spaces (vector spaces). In these cases, the primary tool to upper bound the eluder dimension is the elliptic potential lemma. Interestingly, the elliptic potential lemma also features prominently in the analysis of linear bandits/reinforcement learning and their nonparametric generalization, the information gain. We show that this is not a coincidence -- eluder dimension and information gain are equivalent in a precise sense for reproducing kernel Hilbert spaces.

【35】 Clustering Structure of Microstructure Measures 标题：微观结构测度的聚类结构

作者：Liao Zhu,Ningning Sun,Martin T. Wells 机构： Cornell University, ‡Department of Computer Science 链接：https://arxiv.org/abs/2107.02283 摘要：本文建立了市场微观结构特征测度在股票收益预测中的聚类模型。在10秒的时间频率内，我们研究了不同测度的聚类结构，以找出预测的最佳测度。通过这种方法，我们可以用有限的预测器进行更精确的预测，从而消除了噪声，使模型更易于解释。摘要：This paper builds the clustering model of measures of market microstructure features which are popular in predicting the stock returns. In a 10-second time frequency, we study the clustering structure of different measures to find out the best ones for predicting. In this way, we can predict more accurately with a limited number of predictors, which removes the noise and makes the model more interpretable.

【36】 Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination 标题：有效的一阶上下文环：预测、分配和三角判别

作者：Dylan J. Foster,Akshay Krishnamurthy 机构：Microsoft Research, New England, Microsoft Research, NYC 链接：https://arxiv.org/abs/2107.02237 摘要：在统计学习、在线学习和其他领域，一个反复出现的主题是，对于低噪声问题，更快的收敛速度是可能的，通常通过最佳假设的性能来量化；这种结果被称为一阶或小损失担保。虽然一阶保证在统计和在线学习中得到了相对较好的理解，但适应上下文盗贼（更广泛地说，决策）中的低噪声带来了主要的算法挑战。在COLT 2017的一个公开问题中，Agarwal、Krishnamurthy、Langford、Luo和Schapire提出了一个问题，即一阶保证是否对有背景的强盗都是可能的，如果可能的话，是否可以通过有效的算法来实现。我们给出了一个解决这个问题的方法，提供了一个从上下文盗贼到具有对数（或交叉熵）损失的在线回归的最优和有效的约简。我们的算法简单实用，易于容纳丰富的函数类，并且不需要超出可实现性的分布假设。在一个大规模的实证评估中，我们发现我们的方法通常优于可比的非一阶方法。在技术方面，我们证明了对数损失和一个称为三角判别的信息论量在获得一阶保证方面起着基础性作用，并且我们将这一观察结果与Foster和Rakhlin的回归oracle约化框架的新改进结合起来。三角判别法的使用甚至对经典的统计学习模型也产生了新的结果，我们预期它将得到更广泛的应用。摘要：A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise, often quantified by the performance of the best hypothesis; such results are known as first-order or small-loss guarantees. While first-order guarantees are relatively well understood in statistical and online learning, adapting to low noise in contextual bandits (and more broadly, decision making) presents major algorithmic challenges. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire asked whether first-order guarantees are even possible for contextual bandits and -- if so -- whether they can be attained by efficient algorithms. We give a resolution to this question by providing an optimal and efficient reduction from contextual bandits to online regression with the logarithmic (or, cross-entropy) loss. Our algorithm is simple and practical, readily accommodates rich function classes, and requires no distributional assumptions beyond realizability. In a large-scale empirical evaluation, we find that our approach typically outperforms comparable non-first-order methods. On the technical side, we show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees, and we combine this observation with new refinements to the regression oracle reduction framework of Foster and Rakhlin. The use of triangular discrimination yields novel results even for the classical statistical learning model, and we anticipate that it will find broader use.

【37】 End-to-End Weak Supervision 标题：端到端监管不力

作者：Salva Rühling Cachay,Benedikt Boecking,Artur Dubrawski 机构：Technical University of Darmstadt, Carnegie Mellon University 链接：https://arxiv.org/abs/2107.02233 摘要：聚合多个弱监督源（WS）可以通过替换繁琐的人工收集基本事实标签来缓解许多机器学习应用程序中普遍存在的数据标签瓶颈。然而，目前不使用任何标记训练数据的最新方法需要两个独立的建模步骤：基于WS源学习概率潜变量模型——做出实践中很少成立的假设——然后进行下游模型训练。重要的是，建模的第一步不考虑下游模型的性能。为了解决这些问题，我们提出了一种端到端的直接学习下游模型的方法，通过最大化其与通过使用神经网络重新参数化以前的概率后验概率而生成的概率标签的一致性。我们的结果显示，在下游测试集的终端模型性能方面，以及在弱监督源之间的依赖性方面，性能比以前的工作有了改进。摘要：Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sources -- making assumptions that rarely hold in practice -- followed by downstream model training. Importantly, the first step of modeling does not consider the performance of the downstream model. To address these caveats we propose an end-to-end approach for directly learning the downstream model by maximizing its agreement with probabilistic labels generated by reparameterizing previous probabilistic posteriors with a neural network. Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources.

【38】 Featurized Density Ratio Estimation 标题：特征密度比估计

作者：Kristy Choi,Madeline Liao,Stefano Ermon 机构：Computer Science Department, Stanford University 备注：First two authors contributed equally 链接：https://arxiv.org/abs/2107.02212 摘要：密度比估计是无监督机器学习工具箱中的一项重要技术。然而，对于复杂的高维数据，这种比率很难估计，特别是当感兴趣的密度相差很大时。在我们的工作中，我们建议利用一个可逆的生成模型将两个分布映射到一个共同的特征空间，然后再进行估计。这种特征化使得潜在空间中的密度更加接近，避免了病理场景中输入空间中的学习密度比可能任意不准确的情况。同时，特征映射的可逆性保证了在特征空间中计算的比率与在输入空间中计算的比率相等。在经验上，我们证明了我们的方法在各种下游任务中的有效性，这些任务需要获得精确的密度比，例如互信息估计、深层生成模型中的有针对性的抽样以及数据扩充的分类。摘要：Density ratio estimation serves as an important technique in the unsupervised machine learning toolbox. However, such ratios are difficult to estimate for complex, high-dimensional data, particularly when the densities of interest are sufficiently different. In our work, we propose to leverage an invertible generative model to map the two distributions into a common feature space prior to estimation. This featurization brings the densities closer together in latent space, sidestepping pathological scenarios where the learned density ratios in input space can be arbitrarily inaccurate. At the same time, the invertibility of our feature map guarantees that the ratios computed in feature space are equivalent to those in input space. Empirically, we demonstrate the efficacy of our approach in a variety of downstream tasks that require access to accurate density ratios such as mutual information estimation, targeted sampling in deep generative models, and classification with data augmentation.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-07，如有侵权请联系 cloudcommunity@tencent.com 删除

linux