前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >统计学学术速递[7.20]

统计学学术速递[7.20]

作者头像
公众号-arXiv每日学术速递
发布2021-07-27 11:07:00
8740
发布2021-07-27 11:07:00
举报
文章被收录于专栏:arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

stat统计学,共计55篇

【1】 Heavy-tailed phase-type distributions: A unified approach 标题:重尾相型分布:一种统一的方法

作者:Martin Bladt,Jorge Yslas 链接:https://arxiv.org/abs/2107.09023 摘要:重尾相位型随机变量具有数学上可处理的分布,并且由于其在隐马尔可夫结构方面的解释,在概念上对物理现象建模具有吸引力。最近三个扩展的正规相型分布引起了模型,允许重尾:离散或连续缩放;分数时间半马尔可夫扩展;以及潜在马尔可夫过程的非均匀时间变化。本文提出了重尾相型分布的统一理论,这三种方法都是特例。我们的主要目标是为重尾相位类型分布提供有用的模型,但我们的规范也捕获了任何其他尾部行为。我们提供了相关的新示例,并展示了现有方法是如何自然嵌入的。随后,受一元结构的启发,提出了两个多元扩展,一元结构可视为脆弱性模型的矩阵形式。我们为所有模型提供了完全显式的EM算法,并用合成数据和实际数据进行了说明。 摘要:Heavy-tailed phase-type random variables have mathematically tractable distributions and are conceptually attractive to model physical phenomena due to their interpretation in terms of a hidden Markov structure. Three recent extensions of regular phase-type distributions give rise to models which allow for heavy tails: discrete- or continuous-scaling; fractional-time semi-Markov extensions; and inhomogeneous time-change of the underlying Markov process. In this paper, we present a unifying theory for heavy-tailed phase-type distributions for which all three approaches are particular cases. Our main objective is to provide useful models for heavy-tailed phase-type distributions, but any other tail behavior is also captured by our specification. We provide relevant new examples and also show how existing approaches are naturally embedded. Subsequently, two multivariate extensions are presented, inspired by the univariate construction which can be considered as a matrix version of a frailty model. We provide fully explicit EM-algorithms for all models and illustrate them using synthetic and real-life data.

【2】 How balance and sample size impact bias in the estimation of causal treatment effects: A simulation study 标题:平衡和样本量如何影响因果处理效果估计中的偏差:一项模拟研究

作者:Andreas Markoulidakis,Peter Holmans,Philip Pallmann,Monica Busse,Beth Ann Griffin

机构:  ???ℎ ??? ???????& 1School of Medicine,  Cardiff University,  Cardiff  University,  School of Medicine 链接:https://arxiv.org/abs/2107.09009 摘要:观察性研究通常用于了解暴露和结果之间的关系。然而,除非使用统计技术来解释不同暴露组之间混杂因素的不平衡,否则它们不允许得出关于因果关系的结论。倾向评分和平衡加权(PSBW)是一种有用的技术,旨在通过对观察到的混杂因素进行加权,从而减少暴露组之间的不平衡。尽管有太多可用的方法来估计PSBW,但对于什么是充分的平衡几乎没有指导,除非有几个条件成立,否则不能保证对因果治疗效果的无偏和稳健的估计。准确的推断要求1.已知治疗分配机制,2.已知基线协变量与结果之间的关系,3.加权后达到基线协变量的充分平衡,4.已知一组适当的协变量来控制混杂偏差,有足够大的样本量。在本文中,我们使用不同大小的模拟数据来研究这五个因素对统计推断的影响。我们的研究结果提供了证据表明,最大Kolmogorov -斯米尔诺夫统计量是评估基线协变量平衡的适当的统计度量,而与在许多应用中使用的平均标准化平均差相比,0.1是一个合适的阈值,作为可接受的平衡。最后,我们建议每个治疗组每个混杂因素需要60-80次观察,以获得对因果治疗效果的可靠和无偏估计。 摘要:Observational studies are often used to understand relationships between exposures and outcomes. They do not, however, allow conclusions about causal relationships to be drawn unless statistical techniques are used to account for the imbalance of confounders across exposure groups. Propensity score and balance weighting (PSBW) are useful techniques that aim to reduce the imbalances between exposure groups by weighting the groups to look alike on the observed confounders. Despite the plethora of available methods to estimate PSBW, there is little guidance on what one defines as adequate balance, and unbiased and robust estimation of the causal treatment effect is not guaranteed unless several conditions hold. Accurate inference requires that 1. the treatment allocation mechanism is known, 2. the relationship between the baseline covariates and the outcome is known, 3. adequate balance of baseline covariates is achieved post-weighting, 4. a proper set of covariates to control for confounding bias is known, and 5. a large enough sample size is available. In this article, we use simulated data of various sizes to investigate the influence of these five factors on statistical inference. Our findings provide evidence that the maximum Kolmogorov- Smirnov statistic is the proper statistical measure to assess balance on the baseline covariates, in contrast to the mean standardised mean difference used in many applications, and 0.1 is a suitable threshold to consider as acceptable balance. Finally, we recommend that 60-80 observations, per confounder per treatment group, are required to obtain a reliable and unbiased estimation of the causal treatment effect.

【3】 Mind the Income Gap: Behavior of Inequality Estimators from Complex Survey Small Samples 标题:关注收入差距:来自复杂调查小样本的不平等估计者的行为

作者:Silvia De Nicolò,Maria Rosaria Ferrante,Silvia Pacei 机构: Department of Statistical Sciences, University of Padova, Department of Statistical Sciences ”P. Fortunati”, University of Bologna 备注:29 pages, 5 figures, 5 tables 链接:https://arxiv.org/abs/2107.08950 摘要:收入不平等的衡量标准在小样本中存在偏差,这通常导致低估。在研究了偏差的性质之后,我们通过考虑复杂的调查设计,为一大类由基尼指数、广义熵和阿特金森族组成的不平等测度提出了一个偏差校正框架。该方法基于泰勒展开式和广义线性化方法,不需要对收入分配进行任何参数假设,具有很强的灵活性。已使用EU-SILC调查的数据对建议的修正进行了基于设计的性能评估。结果表明,所有措施的偏差明显减少。一个bootstrap方差估计建议和分布分析,以提供一个小样本不等式估计行为的全面概述。估计量分布的结果表明,随着样本量的减小,正偏态性和瘦肉精症增加,证实了经典渐近结果在小样本中的不适用性,并建议发展替代性的推断方法。 摘要:Income inequality measures are biased in small samples leading generally to an underestimation. After investigating the nature of the bias, we propose a bias-correction framework for a large class of inequality measures comprising Gini Index, Generalized Entropy and Atkinson families by accounting for complex survey designs. The proposed methodology is based on Taylor's expansions and Generalized Linearization Method, and does not require any parametric assumption on income distribution, being very flexible. Design-based performance evaluation of the suggested correction has been carried out using data taken from EU-SILC survey. Results show a noticeable bias reduction for all measures. A bootstrap variance estimation proposal and a distributional analysis follow in order to provide a comprehensive overview of the behavior of inequality estimators in small samples. Results about estimators distributions show increasing positive skewness and leptokurtosis at decreasing sample sizes, confirming the non-applicability of classical asymptotic results in small samples and suggesting the development of alternative methods of inference.

【4】 Parametric estimation for functional autoregressive processes on the sphere 标题:球面上泛函自回归过程的参数估计

作者:Alessia Caponera,Claudio Durastanti 备注:17 pages 链接:https://arxiv.org/abs/2107.08900 摘要:本文的目的是定义一个非线性最小二乘估计的谱参数的一阶球面自回归过程的参数设置。进一步研究了它的渐近性质,如弱相合性和渐近正态性。 摘要:The aim of this paper is to define a nonlinear least squares estimator for the spectral parameters of a spherical autoregressive process of order 1 in a parametric setting. Furthermore, we investigate on its asymptotic properties, such as weak consistency and asymptotic normality.

【5】 A Proposed Hybrid Effect Size Plus p-Value Criterion: A Replication of Goodman et al. (2019)标题:建议的混合效应大小加p-值标准:Goodman等人的复制。(2019年)

作者:Robin Tim Dreher,Leona Hoffmann,Arne Kramer-Sunderbrink,Peter Pütz,Robin Werner 机构:Peter P¨utz∗, Department of Economics, Bielefeld University 链接:https://arxiv.org/abs/2107.08860 摘要:在最近的一项模拟研究中,Goodman等人(2019年)比较了几种方法在粗零假设情况下的I型和II型错误率性能,其中包括实际上等同于点零假设的所有值。他们提出了一个混合决策标准,只有在同时获得较小的$p$值和足够大的影响大小时,才声明结果“显著”。我们在R中使用我们自己的软件代码成功地复制了结果,并讨论了一种维持预定义假阳性率的附加决策方法。我们证实了混合决策准则在人们可以检查的设置中具有相对较低的错误率,但指出研究人员不容易控制错误发现率。我们的分析很容易访问和定制的网站https://github.com/drehero/goodman-replication. 摘要:In a recent simulation study, Goodman et al. (2019) compare several methods with regard to their performance of type I and type II error rates in case of a thick null hypothesis that includes all values that are practically equivalent to the point null hypothesis. They propose a hybrid decision criterion only declaring a result "significant" if both a small $p$-value and a sufficiently large effect size are obtained. We successfully replicate the results using our own software code in R and discuss an additional decision method that maintains a pre-defined false positive rate. We confirm that the hybrid decision criterion has comparably low error rates in settings one can check for but point out that the false discovery rate cannot be easily controlled by the researcher. Our analyses are readily accessible and customizable on https://github.com/drehero/goodman-replication.

【6】 The Future will be Different than Today: Model Evaluation Considerations when Developing Translational Clinical Biomarker 标题:未来将不同于今天:开发转译临床生物标记物时的模型评估考虑

作者:Yichen Lu,Jane Fridlyand,Tiffany Tang,Ting Qi,Noah Simon,Ning Leng 机构:Ti￿any Tang∗, Genentech Inc., South San Francisco, CA, USA, Department of Biostatistics, University of Washington, Seattle, WA, USA 备注:Paper has 4 pages, 2 figures. Appendix are supplementary at the end 链接:https://arxiv.org/abs/2107.08787 摘要:翻译生物标志物的发现是未来个性化医疗保健的核心。我们观察到在确定可靠的生物标志物方面存在显著的挑战,因为一些在一种情况下表现出色的生物标志物在新的试验中往往表现不佳(例如,不同的人群、适应症)。随着临床试验领域的快速发展(如分析、疾病定义),新的试验很可能在许多方面与传统的试验不同,在生物标志物的开发中,这种异质性应予以考虑。作为回应,我们建议在评估生物标志物时考虑异质性的构建。在这篇论文中,我们提出了一种评估策略,用leave one study out(LOSO)代替传统的交叉验证(cv)方法来解释用于构建和测试生物标志物的试验的潜在异质性。为了证明K-fold与LOSO-cv在估计生物标志物效应大小方面的性能,我们利用了临床试验和模拟研究的数据。在我们的评估中,LOSO-cv对未来的表现提供了更客观的估计。这一结论在不同的评价指标和不同的统计方法中仍然适用。 摘要:Finding translational biomarkers stands center stage of the future of personalized medicine in healthcare. We observed notable challenges in identifying robust biomarkers as some with great performance in one scenario often fail to perform well in new trials (e.g. different population, indications). With rapid development in the clinical trial world (e.g. assay, disease definition), new trials very likely differ from legacy ones in many perspectives and in development of biomarkers this heterogeneity should be considered. In response, we recommend considering building in the heterogeneity when evaluating biomarkers. In this paper, we present one evaluation strategy by using leave-one-study-out (LOSO) in place of conventional cross-validation (cv) methods to account for the potential heterogeneity across trials used for building and testing the biomarkers. To demonstrate the performance of K-fold vs LOSO cv in estimating the effect size of biomarkers, we leveraged data from clinical trials and simulation studies. In our assessment, LOSO cv provided a more objective estimate of the future performance. This conclusion remained true across different evaluation metrics and different statistical methods.

【7】 Assessing competitive balance in the English Premier League for over forty seasons using a stochastic block model 标题:用随机好胜模型评估英超联赛40多个赛季的挡路均衡

作者:Francesca Basini,Vasiliki Tsouli,Ioannis Ntzoufras,Nial Friel 机构:Department of Mathematics, University of Warwick, UK, Department of Statistics, Athens University of Economics and Business, Greece, School of Mathematics and Statistics, University College Dublin, Dublin , Ireland 备注:31 pages. Submitted for publication 链接:https://arxiv.org/abs/2107.08732 摘要:在任何职业体育联赛中,竞争平衡都是一个理想的特征,它概括了这样一个概念,即比赛的结果是不可预测的,而不是一些比赛的结果比其他比赛的结果更可预测的不平衡联赛,例如,当一个明显的强队与弱队比赛时。在这篇论文中,我们发展了一个基于模型的聚类方法来评估联盟中球队之间的平衡。我们提出了一个新的贝叶斯模型,将足球赛季的结果表示为一个密集的网络,其中节点由球队识别,分类边表示每场比赛的结果。由此产生的随机块模型有助于团队的概率聚类,以评估是否存在竞争不平衡的联盟。然后一个关键问题是评估集群或区块数量的不确定性,从而估计团队到区块的划分或分配。为了做到这一点,我们开发了一个MCMC算法,它允许联合估计块的数量和将团队分配给块。我们将我们的模式应用于英格兰超级联赛的每个赛季,从1978/79美元到2019/20美元。这项分析的一个关键发现是,有证据表明,从一个合理平衡的联赛到一个发生在2000年初左右的两级联赛的结构变化。 摘要:Competitive balance is a desirable feature in any professional sports league and encapsulates the notion that there is unpredictability in the outcome of games as opposed to an imbalanced league in which the outcome of some games are more predictable than others, for example, when an apparent strong team plays against a weak team. In this paper, we develop a model-based clustering approach to provide an assessment of the balance between teams in a league. We propose a novel Bayesian model to represent the results of a football season as a dense network with nodes identified by teams and categorical edges representing the outcome of each game. The resulting stochastic block model facilitates the probabilistic clustering of teams to assess whether there are competitive imbalances in a league. A key question then is to assess the uncertainty around the number of clusters or blocks and consequently estimation of the partition or allocation of teams to blocks. To do this, we develop an MCMC algorithm that allows the joint estimation of the number of blocks and the allocation of teams to blocks. We apply our model to each season in the English premier league from $1978/79$ to $2019/20$. A key finding of this analysis is evidence which suggests a structural change from a reasonably balanced league to a two-tier league which occurred around the early 2000's.

【8】 Estimation of high-dimensional change-points under a group sparsity structure 标题:群稀疏结构下高维变点的估计

作者:Hanqing Cai,Tengyao Wang 机构:University College London 备注:25 pages, 6 figures 链接:https://arxiv.org/abs/2107.08724 摘要:变化点是以高维数据流形式观察到的“大数据”的常规特征。在许多这样的数据流中,组件序列具有组结构,并且很自然地假设变化只发生在所有组中的一小部分中。我们提出了一种新的变点方法,称为groupInspect,它利用组稀疏性结构来估计投影方向,从而在整个分量序列中聚集信息,从而成功地估计序列平均结构中的变点。我们证明了当所有群的大小都是可比的时,在对数因子下,估计的投影方向是minimax最优的。此外,我们的理论为变点位置估计的收敛速度提供了有力的保证。数值研究证明了groupInspect在各种环境下的竞争性能,并通过一个实际数据验证了该方法的实用性。 摘要:Change-points are a routine feature of 'big data' observed in the form of high-dimensional data streams. In many such data streams, the component series possess group structures and it is natural to assume that changes only occur in a small number of all groups. We propose a new change point procedure, called 'groupInspect', that exploits the group sparsity structure to estimate a projection direction so as to aggregate information across the component series to successfully estimate the change-point in the mean structure of the series. We prove that the estimated projection direction is minimax optimal, up to logarithmic factors, when all group sizes are of comparable order. Moreover, our theory provide strong guarantees on the rate of convergence of the change-point location estimator. Numerical studies demonstrates the competitive performance of groupInspect in a wide range of settings and a real data example confirms the practical usefulness of our procedure.

【9】 Van Trees inequality, group equivariance, and estimation of principal subspaces 标题:Van Trees不等式、群等差与主子空间的估计

作者:Martin Wahl 机构: providesa non-asymptotic lower bound for the spiked covariance model with two groups ofMartin WahlHumboldt-Universität zu Berlin 备注:20 pages 链接:https://arxiv.org/abs/2107.08723 摘要:建立了主子空间估计的非渐近下界。作为应用,我们得到了主成分分析的超额风险和矩阵去噪问题的新结果。 摘要:We establish non-asymptotic lower bounds for the estimation of principal subspaces. As applications, we obtain new results for the excess risk of principal component analysis and the matrix denoising problem.

【10】 Equivariant Manifold Flows 标题:等变流形流

作者:Isay Katsman,Aaron Lou,Derek Lim,Qingxuan Jiang,Ser-Nam Lim,Christopher De Sa 机构:Cornell University, Facebook AI 备注:Preprint 链接:https://arxiv.org/abs/2107.08596 摘要:对流形上的分布进行易于处理的建模一直是自然科学中的一个重要目标。最近的工作集中在开发通用机器学习模型来学习这种分布。然而,对于许多应用,这些分布必须尊重流形对称性——这是大多数以前的模型所忽略的特性。本文为利用等变流形流学习任意流形上的对称不变分布奠定了理论基础。在量子场论的背景下,我们用它来学习$SU(n)$上的规范不变密度,证明了我们的方法的实用性。 摘要:Tractably modelling distributions over manifolds has long been an important goal in the natural sciences. Recent work has focused on developing general machine learning models to learn such distributions. However, for many applications these distributions must respect manifold symmetries -- a trait which most previous models disregard. In this paper, we lay the theoretical foundations for learning symmetry-invariant distributions on arbitrary manifolds via equivariant manifold flows. We demonstrate the utility of our approach by using it to learn gauge invariant densities over $SU(n)$ in the context of quantum field theory.

【11】 High-Dimensional Simulation Optimization via Brownian Fields and Sparse Grids 标题:基于布朗场和稀疏网格的高维仿真优化

作者:Liang Ding,Rui Tuo,Xiaowei Zhang 机构:Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX , U.S. 备注:Main body: 36 pages, 7 figures, 2 tables. Supplemental material: 32 pages, 1 figure 链接:https://arxiv.org/abs/2107.08595 摘要:高维模拟优化是众所周知的挑战。提出了一种新的采样算法,该算法能收敛到全局最优解,且受到维数灾难的影响最小。该算法分为两个阶段。首先,采用稀疏网格实验设计,通过核岭回归和布朗场核逼近响应面。第二,我们遵循预期的改进策略——通过关键的修改来提高算法的采样效率——从稀疏网格的下一层进行迭代采样。在响应面光滑性和模拟噪声较弱的条件下,分别建立了无噪声和有噪声模拟样本的收敛速度上界。这些上界率在可行集的维数上只会有轻微的退化,如果已知目标函数具有高阶光滑性,则上界率可以得到改善。大量的数值实验表明,该算法在实际应用中明显优于典型的算法。 摘要:High-dimensional simulation optimization is notoriously challenging. We propose a new sampling algorithm that converges to a global optimal solution and suffers minimally from the curse of dimensionality. The algorithm consists of two stages. First, we take samples following a sparse grid experimental design and approximate the response surface via kernel ridge regression with a Brownian field kernel. Second, we follow the expected improvement strategy -- with critical modifications that boost the algorithm's sample efficiency -- to iteratively sample from the next level of the sparse grid. Under mild conditions on the smoothness of the response surface and the simulation noise, we establish upper bounds on the convergence rate for both noise-free and noisy simulation samples. These upper rates deteriorate only slightly in the dimension of the feasible set, and they can be improved if the objective function is known be of a higher-order smoothness. Extensive numerical experiments demonstrate that the proposed algorithm dramatically outperforms typical alternatives in practice.

【12】 Nonparametric Finite Mixture Models with Possible Shape Constraints: A Cubic Newton Approach 标题:具有可能形状约束的非参数有限混合模型:三次牛顿方法

作者:Haoyue Wang,Shibal Ibrahim,Rahul Mazumder 机构:†MIT Department of Electrical Engineering and Computer Science (email, ‡MIT Sloan School of Management 备注:31 pages, 6 figures 链接:https://arxiv.org/abs/2107.08535 摘要:我们探讨了非参数有限混合模型混合比例的极大似然估计的计算方面——这是一个统计中具有旧根的凸优化问题,也是现代数据分析工具箱的关键成员。受形状约束推理中的问题的启发,我们考虑具有附加凸多面体约束的这个问题的结构化变型。我们提出了一种新的三次正则化牛顿法来解决这个问题,并为我们的算法提供了新的最坏情况和局部计算保证。我们将Nesterov和Polyak的早期工作扩展到具有多面体约束的自洽目标的情况,如本文所考虑的情况。提出了一种求解三次正则化牛顿子问题的Frank-Wolfe方法;并为线性优化预言导出有效解,这些预言可能是独立的。在无形状约束的高斯混合情形下,我们得到了有限混合问题逼近无限维Kiefer-Wolfowitz极大似然估计的界。在合成数据集和真实数据集上的实验表明,我们提出的算法比现有的基准测试具有更好的运行时和可伸缩性。 摘要:We explore computational aspects of maximum likelihood estimation of the mixture proportions of a nonparametric finite mixture model -- a convex optimization problem with old roots in statistics and a key member of the modern data analysis toolkit. Motivated by problems in shape constrained inference, we consider structured variants of this problem with additional convex polyhedral constraints. We propose a new cubic regularized Newton method for this problem and present novel worst-case and local computational guarantees for our algorithm. We extend earlier work by Nesterov and Polyak to the case of a self-concordant objective with polyhedral constraints, such as the ones considered herein. We propose a Frank-Wolfe method to solve the cubic regularized Newton subproblem; and derive efficient solutions for the linear optimization oracles that may be of independent interest. In the particular case of Gaussian mixtures without shape constraints, we derive bounds on how well the finite mixture problem approximates the infinite-dimensional Kiefer-Wolfowitz maximum likelihood estimator. Experiments on synthetic and real datasets suggest that our proposed algorithms exhibit improved runtimes and scalability features over existing benchmarks.

【13】 Sparse group variable selection for gene-environment interactions in the longitudinal study 标题:纵向研究中基因-环境交互作用的稀疏群体变量选择

作者:Fei Zhou,Xi Lu,Jie Ren,Kun Fan,Shuangge Ma,Cen Wu 机构:Wu,∗, Department of Statistics, Kansas State University, Manhattan, KS, Department of Biostatistics and Health Data Sciences, Indiana University School of, Medicine, Indianapolis, IN, School of Public Health, Yale University, New Haven, CT 链接:https://arxiv.org/abs/2107.08533 摘要:高维纵向数据的惩罚变量选择作为解释重复测量之间的相关性以及为改进识别和预测性能提供额外和必要的信息而受到广泛关注。尽管取得了成功,但在纵向研究中,惩罚方法在调节结构性稀疏性方面的潜力还远未被充分理解。在这篇文章中,我们发展了一种稀疏群惩罚方法来进行重复测量表型下的双水平基因-环境(G$\times$E)互作研究。在二次推理函数(QIF)框架下,该方法可以同时识别群体和个体层面的主效应和交互效应。仿真研究表明,该方法优于主要竞争对手。在儿童哮喘管理项目(CAMP)哮喘数据的个案研究中,我们以高维SNP数据为遗传因素,以纵向特征,1秒用力呼气量(FEV1)为表型进行G$\times$E研究。我们的方法改进了主效应和交互效应的预测和识别,具有重要的意义。 摘要:Penalized variable selection for high dimensional longitudinal data has received much attention as accounting for the correlation among repeated measurements and providing additional and essential information for improved identification and prediction performance. Despite the success, in longitudinal studies the potential of penalization methods is far from fully understood for accommodating structured sparsity. In this article, we develop a sparse group penalization method to conduct the bi-level gene-environment (G$\times$E) interaction study under the repeatedly measured phenotype. Within the quadratic inference function (QIF) framework, the proposed method can achieve simultaneous identification of main and interaction effects on both the group and individual level. Simulation studies have shown that the proposed method outperforms major competitors. In the case study of asthma data from the Childhood Asthma Management Program (CAMP), we conduct G$\times$E study by using high dimensional SNP data as the Genetic factor and the longitudinal trait, forced expiratory volume in one second (FEV1), as phenotype. Our method leads to improved prediction and identification of main and interaction effects with important implications.

【14】 Regression model selection via log-likelihood ratio and constrained minimum criterion 标题:基于对数似然比和约束最小准则的回归模型选择

作者:Min Tsao 机构:Department of Mathematics and Statistics, University of Victoria, British Columbia, Canada V,W ,Y 备注:23 pages 链接:https://arxiv.org/abs/2107.08529 摘要:虽然对数似然法在模型选择中有着广泛的应用,但对数似然比在这方面的应用却很少。我们发展了一种基于对数似然比的回归模型选择方法,通过关注似然比检验认为合理的模型集。结果表明,当样本量较大且检验的显著性水平较小时,集合中最小模型为真模型的概率较大;因此,我们选择这个最小的模型。显著性级别用作此方法的参数。在仿真研究中,我们考虑了该参数的三个层次,并将该方法与AKAIKE信息准则和贝叶斯信息准则进行比较,以证明其对不同样本大小的优良精度和适应性。我们也应用这种方法来选择一个南非心脏病数据集的logistic回归模型。 摘要:Although the log-likelihood is widely used in model selection, the log-likelihood ratio has had few applications in this area. We develop a log-likelihood ratio based method for selecting regression models by focusing on the set of models deemed plausible by the likelihood ratio test. We show that when the sample size is large and the significance level of the test is small, there is a high probability that the smallest model in the set is the true model; thus, we select this smallest model. The significance level serves as a parameter for this method. We consider three levels of this parameter in a simulation study and compare this method with the Akaike Information Criterion and Bayesian Information Criterion to demonstrate its excellent accuracy and adaptability to different sample sizes. We also apply this method to select a logistic regression model for a South African heart disease dataset.

【15】 Compressed Monte Carlo with application in particle filtering 标题:压缩蒙特卡罗及其在粒子滤波中的应用

作者:Luca Martino,Víctor Elvira 机构:⊤ Dep. of Signal Processing, Universidad Rey Juan Carlos (URJC) and Universidad Carlos III de Madrid (UC,M), ∗ IMT Lille Douai, Cit´e Scientifique, Rue Guglielmo Marconi, Villeneuve dAscq , (France) 备注:None 链接:https://arxiv.org/abs/2107.08459 摘要:近年来,贝叶斯模型在信号处理、统计和机器学习等领域得到了广泛的应用。贝叶斯推理需要对包含后验分布的复杂积分进行逼近。为此,蒙特卡罗(MC)方法,如马尔可夫链蒙特卡罗和重要抽样算法,经常被采用。在这项工作中,我们介绍了一种压缩MC(C-MC)方案的理论和实践来压缩随机样本中包含的统计信息。在其基本版本中,C-MC与分层技术(一种用于方差缩减目的的著名方法)密切相关。本文还提出了确定性C-MC方案,该方案具有很好的性能。压缩问题与应用于不同滤波技术的矩匹配方法密切相关,通常称为高斯求积规则或sigma点方法。当需要与中央处理器进行廉价快速的通信时,C-MC可以应用于分布式贝叶斯推理框架。此外,C-MC在粒子滤波和自适应is算法中也很有用,本文介绍了三种新的算法。六个数值结果证实了所引入格式的优点,优于相应的基准方法。还提供了相关代码。 摘要:Bayesian models have become very popular over the last years in several fields such as signal processing, statistics, and machine learning. Bayesian inference requires the approximation of complicated integrals involving posterior distributions. For this purpose, Monte Carlo (MC) methods, such as Markov Chain Monte Carlo and importance sampling algorithms, are often employed. In this work, we introduce the theory and practice of a Compressed MC (C-MC) scheme to compress the statistical information contained in a set of random samples. In its basic version, C-MC is strictly related to the stratification technique, a well-known method used for variance reduction purposes. Deterministic C-MC schemes are also presented, which provide very good performance. The compression problem is strictly related to the moment matching approach applied in different filtering techniques, usually called as Gaussian quadrature rules or sigma-point methods. C-MC can be employed in a distributed Bayesian inference framework when cheap and fast communications with a central processor are required. Furthermore, C-MC is useful within particle filtering and adaptive IS algorithms, as shown by three novel schemes introduced in this work. Six numerical results confirm the benefits of the introduced schemes, outperforming the corresponding benchmark methods. A related code is also provided.

【16】 Gibbs sampling for mixtures in order of appearance: the ordered allocation sampler 标题:按出现顺序对混合物进行Gibbs抽样:有序分配采样器

作者:Pierpaolo De Blasi,María F. Gil-Leyva 机构:University of Torino and Collegio Carlo Alberto, Italy, Mar´ıa F. Gil–Leyva, Bocconi University, Milan, Italy 备注:36 pages, 14 figures, 2 tables 链接:https://arxiv.org/abs/2107.08380 摘要:混合模型的Gibbs抽样方法是基于数据扩充方案,该方案考虑了数据中不可观测的分区。条件采样器依赖于分配变量,这些变量用一个混合成分来标识每个观测值。众所周知,它们在无限混合物中混合缓慢,需要某种形式的截断,无论是确定性的还是随机的。在具有随机组分数的混合物中,探索不同维度的参数空间也具有挑战性。我们通过在混合分布的指导下,以可交换序列的随机出现顺序来表达混合组分来解决这些问题。我们导出了一个简单易行的采样器,用于混合具有可处理的大小偏差有序权重的分布。在无限混合物中,不需要任何形式的截断。对于具有随机维数的有限混合体,通过一个分块变元得到了组分个数的简单更新,从而减轻了通过Metropolis-Hasting步骤进行跨维移动的挑战。此外,该模型的潜在聚类结构采用有序划分的方式进行加密,并以最小元素顺序对块进行标记,从而缓解了标签切换问题。通过仿真研究,说明了该采样器具有良好的混合性能。 摘要:Gibbs sampling methods for mixture models are based on data augmentation schemes that account for the unobserved partition in the data. Conditional samplers rely on allocation variables that identify each observation with a mixture component. They are known to suffer from slow mixing in infinite mixtures, where some form of truncation, either deterministic or random, is required. In mixtures with random number of components, the exploration of parameter spaces of different dimensions can also be challenging. We tackle these issues by expressing the mixture components in the random order of appearance in an exchangeable sequence directed by the mixing distribution. We derive a sampler that is straightforward to implement for mixing distributions with tractable size-biased ordered weights. In infinite mixtures, no form of truncation is necessary. As for finite mixtures with random dimension, a simple updating of the number of components is obtained by a blocking argument, thus, easing challenges found in trans-dimensional moves via Metropolis-Hasting steps. Additionally, the latent clustering structure of the model is encrypted by means of an ordered partition with blocks labelled in the least element order, which mitigates the label-switching problem. We illustrate through a simulation study the good mixing performance of the new sampler.

【17】 Best Subset Selection: Statistical Computing Meets Quantum Computing 标题:最佳子集选择:统计计算遇见量子计算

作者:Wenxuan Zhong,Yuan Ke,Ye Wang,Yongkai Chen,Jinyang Chen,Ping Ma 机构:Department of Statistics, University of Georgia 链接:https://arxiv.org/abs/2107.08359 摘要:随着量子计算机的迅速发展,量子算法得到了广泛的研究。然而,量子算法处理统计问题仍然缺乏。本文提出了一种新的非预言量子自适应搜索(QAS)方法来求解最优子集选择问题。QAS的性能几乎与原始最佳子集选择方法相同,但将其计算复杂度从$O(D)$降低到$O(\sqrt{D}\log\u2d)$,其中$D=2^p$是超过$p$协变量的子集总数。与现有的量子搜索算法不同,QAS不需要真实解状态的预言信息,因此适用于各种随机观测的统计学习问题。理论上,我们证明了QAS在$O(\log\u2d)$迭代次数内获得任意的成功概率$q\in(0.5,1)$。当基本回归模型是线性的时,我们提出了一种量子线性预测方法,它比经典的预测方法更快。我们进一步引入了一种混合量子经典策略来避免现有量子计算系统的容量瓶颈,并通过多数表决来提高QAS的成功概率。理论分析和在量子计算机和经典计算机上的大量实验证明了该策略的有效性。 摘要:With the rapid development of quantum computers, quantum algorithms have been studied extensively. However, quantum algorithms tackling statistical problems are still lacking. In this paper, we propose a novel non-oracular quantum adaptive search (QAS) method for the best subset selection problems. QAS performs almost identically to the naive best subset selection method but reduces its computational complexity from $O(D)$ to $O(\sqrt{D}\log_2D)$, where $D=2^p$ is the total number of subsets over $p$ covariates. Unlike existing quantum search algorithms, QAS does not require the oracle information of the true solution state and hence is applicable to various statistical learning problems with random observations. Theoretically, we prove QAS attains any arbitrary success probability $q \in (0.5, 1)$ within $O(\log_2D)$ iterations. When the underlying regression model is linear, we propose a quantum linear prediction method that is faster than its classical counterpart. We further introduce a hybrid quantum-classical strategy to avoid the capacity bottleneck of existing quantum computing systems and boost the success probability of QAS by majority voting. The effectiveness of this strategy is justified by both theoretical analysis and extensive empirical experiments on quantum and classical computers.

【18】 Assessing Mediational Processes in Parallel Bilinear Spline Growth Curve Models in the Framework of Individual Measurement Occasions 标题:在个体测量场合框架下评估平行双线性样条增长曲线模型的中介过程

作者:Jin Liu,Robert A. Perera 机构:Department of Biostatistics, Virginia Commonwealth University 备注:Draft version 1.1, 07/17/2021. This paper has not been peer reviewed. Please do not copy or cite without author's permission 链接:https://arxiv.org/abs/2107.08338 摘要:多个现有的研究已经发展了具有非线性函数形式的多元增长模型,以探索两个纵向记录随时间关联的联合发展。然而,多重重复的结果并不一定是同步的。因此,研究两个重复变量在不同场合之间的关联是有意义的,例如,一个变量的短期变化如何影响另一个变量的长期变化。这种分析的一个统计工具是纵向中介模型。在这项研究中,我们扩展了具有线性轨迹的潜在增长中介模型(Cheong et al.,2003),并开发了两个模型来评估中介过程,其中双线性样条(即线性分段)增长模型用于捕捉变化模式。我们将中介过程定义为基线协变量或影响中介变量变化的协变量变化,进而影响结果的变化。我们通过仿真研究提出了所提出的模型。我们的模拟研究表明,所提出的中介模型可以提供无偏和准确的点估计与目标覆盖概率的95%置信区间。为了说明建模过程,我们分析了从K年级到5年级的多个学科的纵向经验记录,包括阅读、数学和科学考试成绩。实证分析表明,该模型可以估计协变量对结果变化的直接和间接影响。通过对现实数据的分析,我们也为实证研究者提供了一套可行的建议。我们还为所提出的模型提供了相应的代码。 摘要:Multiple existing studies have developed multivariate growth models with nonlinear functional forms to explore joint development where two longitudinal records are associated over time. However, multiple repeated outcomes are not necessarily synchronous. Accordingly, it is of interest to investigate an association between two repeated variables on different occasions, for example, how a short-term change of one variable affects a long-term change of the other(s). One statistical tool for such analyses is longitudinal mediation models. In this study, we extend latent growth mediation models with linear trajectories (Cheong et al., 2003) and develop two models to evaluate mediational processes where the bilinear spline (i.e., the linear-linear piecewise) growth model is utilized to capture the change patterns. We define the mediational process as either the baseline covariate or the change of covariate influencing the change of the mediator, which, in turn, affects the change of the outcome. We present the proposed models by simulation studies. Our simulation studies demonstrate that the proposed mediational models can provide unbiased and accurate point estimates with target coverage probabilities with a 95% confidence interval. To illustrate modeling procedures, we analyze empirical longitudinal records of multiple disciplinary subjects, including reading, mathematics, and science test scores, from Grade K to Grade 5. The empirical analyses demonstrate that the proposed model can estimate covariates' direct and indirect effects on the change of the outcome. Through the real-world data analyses, we also provide a set of feasible recommendations for empirical researchers. We also provide the corresponding code for the proposed models.

【19】 Calibrating the scan statistic with size-dependent critical values: heuristics, methodology and computation 标题:用依赖于大小的临界值校准扫描统计:启发式、方法论和计算

作者:Guenther Walther 机构:Department of Statistics, Serra Mall, Stanford University, Stanford, CA 链接:https://arxiv.org/abs/2107.08296 摘要:众所周知,可变窗口大小的扫描统计量有利于空间范围小的信号的检测,而空间范围大的信号则有相应的功率损失。最近的研究结果表明,这种损失并非不可避免:使用依赖于窗口大小的临界值可以同时对所有信号大小进行最佳检测,因此不知道正确的窗口大小和使用可变窗口大小进行扫描不会产生巨大的代价。本文综述了这种与尺寸有关的临界值的启发式算法和方法,它们在包括多元情况在内的各种情况下的应用,以及计算扫描统计的快速算法的最新结果。 摘要:It is known that the scan statistic with variable window size favors the detection of signals with small spatial extent and there is a corresponding loss of power for signals with large spatial extent. Recent results have shown that this loss is not inevitable: Using critical values that depend on the size of the window allows optimal detection for all signal sizes simultaneously, so there is no substantial price to pay for not knowing the correct window size and for scanning with a variable window size. This paper gives a review of the heuristics and methodology for such size-dependent critical values, their applications to various settings including the multivariate case, and recent results about fast algorithms for computing scan statistics.

【20】 A Reproducing Kernel Hilbert Space Approach to Functional Calibration of Computer Models 标题:计算机模型功能校准的再生核-希尔BERT空间方法

作者:Rui Tuo,Shiyuan He,Arash Pourhabib,Yu Ding,Jianhua Z. Huang 链接:https://arxiv.org/abs/2107.08288 摘要:本文提出了一个函数定标问题的频数解法,它允许计算机模型中定标参数的值随物理系统中控制变量的值而变化。功能校准的需求是由工程应用引起的,在工程应用中,使用恒定的校准参数会导致计算机模型和物理实验输出之间的严重不匹配。利用再生核Hilbert空间(RKHS)对最优定标函数进行建模,定义为定标参数与控制变量之间的函数关系。这个最佳校准函数是通过带有RKHS范数惩罚的惩罚最小二乘法和使用物理数据来估计的。不确定性量化程序也被开发用于此类估计。从预测一致性和估计最优定标函数的一致性两个方面为该方法提供了理论保证。与现有的参数函数校正方法和最新的贝叶斯方法相比,该方法在预测和不确定度量化方面表现出更强的鲁棒性。 摘要:This paper develops a frequentist solution to the functional calibration problem, where the value of a calibration parameter in a computer model is allowed to vary with the value of control variables in the physical system. The need of functional calibration is motivated by engineering applications where using a constant calibration parameter results in a significant mismatch between outputs from the computer model and the physical experiment. Reproducing kernel Hilbert spaces (RKHS) are used to model the optimal calibration function, defined as the functional relationship between the calibration parameter and control variables that gives the best prediction. This optimal calibration function is estimated through penalized least squares with an RKHS-norm penalty and using physical data. An uncertainty quantification procedure is also developed for such estimates. Theoretical guarantees of the proposed method are provided in terms of prediction consistency and consistency of estimating the optimal calibration function. The proposed method is tested using both real and synthetic data and exhibits more robust performance in prediction and uncertainty quantification than the existing parametric functional calibration method and a state-of-art Bayesian method.

【21】 Subset-of-Data Variational Inference for Deep Gaussian-Processes Regression 标题:深高斯过程回归的数据子集变分推断

作者:Ayush Jain,P. K. Srijith,Mohammad Emtiyaz Khan 机构:Department of Computer Science and Engineering , Indian Institute of Technology Hyderabad, India, RIKEN Center for AI Project , Tokyo, Japan 备注:Accepted in the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021) 链接:https://arxiv.org/abs/2107.08265 摘要:深高斯过程(Deep Gaussian process,DGPs)是高斯过程的多层、灵活的扩展,但其训练仍然具有挑战性。稀疏近似简化了训练,但通常需要对大量诱导输入及其跨层位置进行优化。在本文中,我们通过将位置设置为固定的数据子集并从变分分布中抽取诱导输入来简化训练。这减少了可训练的参数和计算成本,而没有显著的性能下降,正如我们在回归问题上的经验结果所证明的那样。我们的修改简化和稳定DGP训练,同时使之适用于设置诱导输入的采样方案。 摘要:Deep Gaussian Processes (DGPs) are multi-layer, flexible extensions of Gaussian processes but their training remains challenging. Sparse approximations simplify the training but often require optimization over a large number of inducing inputs and their locations across layers. In this paper, we simplify the training by setting the locations to a fixed subset of data and sampling the inducing inputs from a variational distribution. This reduces the trainable parameters and computation cost without significant performance degradations, as demonstrated by our empirical results on regression problems. Our modifications simplify and stabilize DGP training while making it amenable to sampling schemes for setting the inducing inputs.

【22】 Minimising quantifier variance under prior probability shift 标题:先验概率漂移下的量词方差最小化

作者:Dirk Tasche 备注:7 pages 链接:https://arxiv.org/abs/2107.08209 摘要:对于先验概率移位下的二元患病率量化问题,我们确定了极大似然估计量的渐近方差。我们发现,在测试数据集分布下,类标签对特征的回归是Brier分数的函数。这一观察结果表明,在训练数据集上优化基础分类器的精度有助于减少测试数据集上相关量词的方差。因此,我们也指出了基础分类器的训练准则,这意味着训练和测试数据集上的Brier分数都要优化。 摘要:For the binary prevalence quantification problem under prior probability shift, we determine the asymptotic variance of the maximum likelihood estimator. We find that it is a function of the Brier score for the regression of the class label against the features under the test data set distribution. This observation suggests that optimising the accuracy of a base classifier on the training data set helps to reduce the variance of the related quantifier on the test data set. Therefore, we also point out training criteria for the base classifier that imply optimisation of both of the Brier scores on the training and the test data sets.

【23】 Model Uncertainty and Correctability for Directed Graphical Models 标题:有向图模型的模型不确定性和可修正性

作者:Panagiota Birmpa,Jinchao Feng,Markos A. Katsoulakis,Luc Rey-Bellet 机构:REY-BELLET§ 链接:https://arxiv.org/abs/2107.08179 摘要:概率图形模型是概率建模、机器学习和人工智能的基本工具。它们允许我们以一种自然的方式集成专家知识、物理建模、异构和相关数据以及感兴趣的数量。正是由于这个原因,图形模型的模块化结构中固有着多种模型不确定性来源。在本文中,我们发展了信息论,稳健的不确定性量化方法和非参数有向图模型应力测试,以评估多源模型不确定性对感兴趣量的影响和通过图的传播。这些方法允许我们对不确定性的不同来源进行排序,并通过针对感兴趣的数量确定其最有影响的部分来修正图形模型。因此,从机器学习的角度来看,我们提供了一种数学上严格的可纠正性方法,该方法可以保证在控制模型其他部分的过程中产生的潜在新错误的同时,系统地选择改进图形模型的组件。我们在两个物理化学的例子中展示了我们的方法,即量子尺度的化学动力学和材料筛选,以提高燃料电池的效率。 摘要:Probabilistic graphical models are a fundamental tool in probabilistic modeling, machine learning and artificial intelligence. They allow us to integrate in a natural way expert knowledge, physical modeling, heterogeneous and correlated data and quantities of interest. For exactly this reason, multiple sources of model uncertainty are inherent within the modular structure of the graphical model. In this paper we develop information-theoretic, robust uncertainty quantification methods and non-parametric stress tests for directed graphical models to assess the effect and the propagation through the graph of multi-sourced model uncertainties to quantities of interest. These methods allow us to rank the different sources of uncertainty and correct the graphical model by targeting its most impactful components with respect to the quantities of interest. Thus, from a machine learning perspective, we provide a mathematically rigorous approach to correctability that guarantees a systematic selection for improvement of components of a graphical model while controlling potential new errors created in the process in other parts of the model. We demonstrate our methods in two physico-chemical examples, namely quantum scale-informed chemical kinetics and materials screening to improve the efficiency of fuel cells.

【24】 Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences 标题:中介解耦学习:没有直接输入-输出对应的学习函数

作者:Ikko Yamane,Junya Honda,Florian Yger,Masashi Sugiyama 机构: Université Paris-Dauphine, PSL Re-search University, Japan 3Kyoto University, Japan 4The University ofTokyo 备注:ICML 2021 version with correction to Figure 1 链接:https://arxiv.org/abs/2107.08135 摘要:当我们有输入$X$和输出$Y$的成对训练数据时,普通的监督学习是有用的。然而,这种成对的数据在实践中很难收集。在本文中,我们考虑的任务是从$ X $ $ $ $时,我们没有配对数据,但我们有两个独立的数据集的$ X $和Y $每个观察到一些中介变量$ U $,也就是说,我们有两个数据集$ Syx= {(Xi I,UAI I)} $和$ Syy={(u’jJ,y’jJ)\$ $。一种简单的方法是用$S\ux$从$X$预测$U$,然后用$S\uy$从$U$预测$Y$,但我们发现这在统计上并不一致。此外,在实践中预测$U$可能比预测$Y$更困难,例如,当$U$具有更高的维度时。为了克服这一困难,我们提出了一种新的方法,避免预测$U$,而直接学习$Y=f(X)$,方法是用$S{X}$训练$f(X)$,预测用$S{Y}$训练的$h(U)$,使其近似于$Y$。我们证明了该方法的统计一致性和误差界,并通过实验验证了该方法的实用性。 摘要:Ordinary supervised learning is useful when we have paired training data of input $X$ and output $Y$. However, such paired data can be difficult to collect in practice. In this paper, we consider the task of predicting $Y$ from $X$ when we have no paired data of them, but we have two separate, independent datasets of $X$ and $Y$ each observed with some mediating variable $U$, that is, we have two datasets $S_X = \{(X_i, U_i)\}$ and $S_Y = \{(U'_j, Y'_j)\}$. A naive approach is to predict $U$ from $X$ using $S_X$ and then $Y$ from $U$ using $S_Y$, but we show that this is not statistically consistent. Moreover, predicting $U$ can be more difficult than predicting $Y$ in practice, e.g., when $U$ has higher dimensionality. To circumvent the difficulty, we propose a new method that avoids predicting $U$ but directly learns $Y = f(X)$ by training $f(X)$ with $S_{X}$ to predict $h(U)$ which is trained with $S_{Y}$ to approximate $Y$. We prove statistical consistency and error bounds of our method and experimentally confirm its practical usefulness.

【25】 Non-Parametric Manifold Learning 标题:非参数流形学习

作者:Dena Asta 机构:Department of Statistics, Ohio State University, Columbus, OH, USA 链接:https://arxiv.org/abs/2107.08089 摘要:基于Laplace-Beltrami算子的图Laplacian估计,我们引入了流形距离的估计。在欧氏空间的一个未知紧致黎曼子流形上,我们证明了基于从一个远离零的光滑密度中抽取的点的等分布样本,对于适当的图拉普拉斯算子的选择,估计是一致的。估计量类似于,事实上它的收敛性来自于,一个特殊的情况下的Kontorovic对偶重新制定的Wasserstein距离称为康内斯的距离公式。 摘要:We introduce an estimator for manifold distances based on graph Laplacian estimates of the Laplace-Beltrami operator. We show that the estimator is consistent for suitable choices of graph Laplacians in the literature, based on an equidistributed sample of points drawn from a smooth density bounded away from zero on an unknown compact Riemannian submanifold of Euclidean space. The estimator resembles, and in fact its convergence properties are derived from, a special case of the Kontorovic dual reformulation of Wasserstein distance known as Connes' Distance Formula.

【26】 Kpop: A kernel balancing approach for reducing specification assumptions in survey weighting 标题:KPop:一种减少调查权重中规范假设的核平衡方法

作者:Erin Hartman,Chad Hazlett,Ciara Sterbenz 链接:https://arxiv.org/abs/2107.08075 摘要:随着应答率的急剧下降,研究人员和民调人员只能得到高度不具代表性的样本,依靠构建的权重使这些样本能够代表期望的目标人群。尽管实践者使用有价值的专家知识来选择变量,$X$必须进行调整,但他们很少为这些变量与响应过程或结果相关的特定函数形式辩护。不幸的是,通常使用的校准权重——使样本中的加权平均值$X$等于总体的加权平均值——只有在$X$的线性函数未解释的结果部分和响应过程是独立的时,才能确保正确的调整。为了减轻这种函数形式的依赖,我们描述了核平衡的人口加权(kpop)。这种方法将设计矩阵$\mathbf{X}$替换为内核矩阵$\mathbf{K}$,对$\mathbf{X}$的高阶信息进行编码。然后找到权重,使抽样单元中$\mathbf{K}$的加权平均行近似等于目标总体的加权平均行。这在$X$的各种平滑函数上产生了良好的校准,而不依赖于用户显式地指定这些函数。我们描述了该方法,并以2016年美国总统大选的投票数据为例进行了说明。 摘要:With the precipitous decline in response rates, researchers and pollsters have been left with highly non-representative samples, relying on constructed weights to make these samples representative of the desired target population. Though practitioners employ valuable expert knowledge to choose what variables, $X$ must be adjusted for, they rarely defend particular functional forms relating these variables to the response process or the outcome. Unfortunately, commonly-used calibration weights -- which make the weighted mean $X$ in the sample equal that of the population -- only ensure correct adjustment when the portion of the outcome and the response process left unexplained by linear functions of $X$ are independent. To alleviate this functional form dependency, we describe kernel balancing for population weighting (kpop). This approach replaces the design matrix $\mathbf{X}$ with a kernel matrix, $\mathbf{K}$ encoding high-order information about $\mathbf{X}$. Weights are then found to make the weighted average row of $\mathbf{K}$ among sampled units approximately equal that of the target population. This produces good calibration on a wide range of smooth functions of $X$, without relying on the user to explicitly specify those functions. We describe the method and illustrate it by application to polling data from the 2016 U.S. presidential election.

【27】 Accounting for spatial confounding in epidemiological studies with individual-level exposures: An exposure-penalized spline approach 标题:个人水平暴露的流行病学研究中空间混杂的解释:暴露惩罚样条法

作者:Jennifer F. Bobb,Maricela F. Cruz,Stephen J. Mooney,Adam Drewnowski,David Arterburn,Andrea J. Cook 机构:Author Affiliations:, . Kaiser Permanente Washington Health Research Institute, Seattle, WA, . University of Washington Department of Biostatistics, Seattle, WA, . University of Washington Department of Epidemiology, Seattle, WA, Summary: 备注:30 pages, 5 figures, supplemental material 链接:https://arxiv.org/abs/2107.08072 摘要:在存在不可测量的空间混杂的情况下,空间模型实际上可能会增加(而不是减少)偏差,从而导致如何在实践中应用这些模型的不确定性。通过仿真和在大数据电子病历研究中的应用,我们评估了空间建模方法。然而,纯空间暴露(如建筑环境)的偏倚风险很高,我们发现空间聚集的个体水平暴露(如吸烟状态)增加偏倚的可能性非常有限。我们还提出了一种新的曝光惩罚样条方法,选择空间平滑度来解释曝光中的空间变化。这种方法似乎有希望有效地减少空间混淆偏见。 摘要:In the presence of unmeasured spatial confounding, spatial models may actually increase (rather than decrease) bias, leading to uncertainty as to how they should be applied in practice. We evaluated spatial modeling approaches through simulation and application to a big data electronic health record study. Whereas the risk of bias was high for purely spatial exposures (e.g., built environment), we found very limited potential for increased bias for individual-level exposures that cluster spatially (e.g., smoking status). We also proposed a novel exposure-penalized spline approach that selects the degree of spatial smoothing to explain spatial variability in the exposure. This approach appeared promising for efficiently reducing spatial confounding bias.

【28】 Moving towards practical user-friendly synthesis: Scalable synthetic data methods for large confidential administrative databases using saturated count models 标题:走向实用的用户友好合成:使用饱和计数模型的大型机密管理数据库的可伸缩合成数据方法

作者:James Jackson,Robin Mitra,Brian Francis,Iain Dove 机构:†Lancaster University, Lancaster, UK, ⋆Cardiff University, Cardiff, UK, ‡Office for National Statistics, Titchfield, UK 备注:37 pages, 6 figures 链接:https://arxiv.org/abs/2107.08062 摘要:近三十年来,统计信息披露控制的综合数据方法不断发展;这些方法适应了不同的数据类型,但主要是在调查数据集的范围内。行政数据库的某些特征——有时仅仅是它们所包含的记录的数量——从综合的角度提出了挑战,因此需要特别注意。本文通过对饱和模型的拟合,提出了一种方法,使管理数据库不仅可以快速合成,而且可以以一种在其他技术中固有的不可行的方式将风险和效用形式化。本文探讨了如何利用负二项式和泊松逆高斯两种参数统计模型所提供的灵活性来保护被调查者尤其是uniques在合成数据中的隐私。最后通过综合一个数据库进行了实证分析,为英语学校普查提供了一个很好的代表。 摘要:Over the past three decades, synthetic data methods for statistical disclosure control have continually developed; methods have adapted to account for different data types, but mainly within the domain of survey data sets. Certain characteristics of administrative databases - sometimes just the sheer volume of records of which they are comprised - present challenges from a synthesis perspective and thus require special attention. This paper, through the fitting of saturated models, presents a way in which administrative databases can not only be synthesized quickly, but also allows risk and utility to be formalised in a manner inherently unfeasible in other techniques. The paper explores how the flexibility afforded by two-parameter count models (the negative binomial and Poisson-inverse Gaussian) can be utilised to protect respondents' - especially uniques' - privacy in synthetic data. Finally an empirical example is carried out through the synthesis of a database which can be viewed as a good representative to the English School Census.

【29】 Just Train Twice: Improving Group Robustness without Training Group Information 标题:只需训练两次:无需训练组信息即可提高组健壮性

作者:Evan Zheran Liu,Behzad Haghgoo,Annie S. Chen,Aditi Raghunathan,Pang Wei Koh,Shiori Sagawa,Percy Liang,Chelsea Finn 机构: These performance disparities across groups can beespecially pronounced in the presence of spurious correla-Equal contribution 1Stanford University 备注:International Conference on Machine Learning (ICML), 2021 链接:https://arxiv.org/abs/2107.09044 摘要:通过经验风险最小化(ERM)的标准训练可以产生平均精度高但在某些群体中精度低的模型,特别是在输入和标签之间存在虚假相关性的情况下。以往实现高最差群精度的方法,如群分布鲁棒优化(group-distributionally robust optimization,group-DRO)需要对每个训练点进行昂贵的群注释,而不使用此类群注释的方法通常实现不令人满意的最差群精度。在本文中,我们提出了一个简单的两阶段方法,JTT,首先训练一个标准的ERM模型,然后训练第二个模型,该模型对第一个模型错误分类的训练实例进行加权。直观地说,这会增加标准ERM模型表现较差的组的示例,从而提高最差组的性能。在四个具有虚假相关性的图像分类和自然语言处理任务中平均,JTT缩小了标准ERM和组DRO之间最差组精度的75%差距,同时只需要在一个小的验证集上进行组注释以调整超参数。 摘要:Standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on certain groups, especially in the presence of spurious correlations between the input and label. Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training point, whereas approaches that do not use such group annotations typically achieve unsatisfactory worst-group accuracy. In this paper, we propose a simple two-stage approach, JTT, that first trains a standard ERM model for several epochs, and then trains a second model that upweights the training examples that the first model misclassified. Intuitively, this upweights examples from groups on which standard ERM models perform poorly, leading to improved worst-group performance. Averaged over four image classification and natural language processing tasks with spurious correlations, JTT closes 75% of the gap in worst-group accuracy between standard ERM and group DRO, while only requiring group annotations on a small validation set in order to tune hyperparameters.

【30】 Structured Stochastic Gradient MCMC 标题:结构化随机梯度MCMC

作者:Antonios Alexos,Alex Boyd,Stephan Mandt 机构:Department of Computer Science, University of California, Irvine, Department of Statistics 链接:https://arxiv.org/abs/2107.09028 摘要:随机梯度马尔可夫链蒙特卡罗(SGMCMC)被认为是贝叶斯神经网络等大规模模型中贝叶斯推理的金标准。由于实践者在这些模型中面临着速度与精度的权衡,变分推理(VI)通常是更好的选择。不幸的是,VI对后验概率的因式分解和函数形式做出了强有力的假设。在这项工作中,我们提出了一个新的非参数变分近似,它不需要假设近似后验函数的形式,并允许实践者指定算法应该尊重或打破的确切依赖关系。该方法依赖于一种新的Langevin型算法,该算法基于一个修正的能量函数,其中部分潜在变量在马尔可夫链早期迭代的样本上平均。通过这种方式,可以以受控的方式打破统计依赖关系,从而使链更快地混合。此方案可以以“退出”方式进一步修改,从而实现更大的可伸缩性。通过在ResNet-20体系结构上实现该方案,我们得到了比完整的SGMCMC更好的预测概率和更大的有效样本量。 摘要:Stochastic gradient Markov chain Monte Carlo (SGMCMC) is considered the gold standard for Bayesian inference in large-scale models, such as Bayesian neural networks. Since practitioners face speed versus accuracy tradeoffs in these models, variational inference (VI) is often the preferable option. Unfortunately, VI makes strong assumptions on both the factorization and functional form of the posterior. In this work, we propose a new non-parametric variational approximation that makes no assumptions about the approximate posterior's functional form and allows practitioners to specify the exact dependencies the algorithm should respect or break. The approach relies on a new Langevin-type algorithm that operates on a modified energy function, where parts of the latent variables are averaged over samples from earlier iterations of the Markov chain. This way, statistical dependencies can be broken in a controlled way, allowing the chain to mix faster. This scheme can be further modified in a ''dropout'' manner, leading to even more scalability. By implementing the scheme on a ResNet-20 architecture, we obtain better predictive likelihoods and larger effective sample sizes than full SGMCMC.

【31】 Causal Inference Struggles with Agency on Online Platforms 标题:在线平台上的因果推论与代理的斗争

作者:Smitha Milli,Luca Belli,Moritz Hardt 机构:UC Berkeley, Twitter 链接:https://arxiv.org/abs/2107.08995 摘要:在线平台定期进行随机实验,以了解平台的变化如何影响各种感兴趣的结果。然而,在网络平台上进行的实验,除其他问题外,还因缺乏有意义的监督和用户同意而受到批评。由于平台给予用户更大的代理权,因此有可能进行观察性研究,其中用户可以自主选择接受感兴趣的治疗,以替代平台控制用户是否接受治疗的实验。在这篇论文中,我们在Twitter上进行了四次大规模的研究内比较,旨在评估来自在线平台上用户自我选择的观察研究的有效性。在研究内比较中,观察研究的治疗效果是基于他们如何有效地复制同一目标人群的随机实验结果来评估的。我们在控制可能的混杂变量的同时,检验组均值估计量、精确匹配、回归调整和治疗加权的逆概率的朴素差异。在所有情况下,所有的观测估计在从类似的随机实验中恢复地面真值估计方面表现不佳。在除一例外的所有情况下,观察估计与随机估计具有相反的符号。我们的结果表明,来自用户自我选择的观察性研究是在线平台上随机实验的一个很差的选择。在讨论我们的结果时,我们假设“第22条军规”,这表明因果推理在这些环境中的成功可能与最初为用户提供更大代理的动机不一致。 摘要:Online platforms regularly conduct randomized experiments to understand how changes to the platform causally affect various outcomes of interest. However, experimentation on online platforms has been criticized for having, among other issues, a lack of meaningful oversight and user consent. As platforms give users greater agency, it becomes possible to conduct observational studies in which users self-select into the treatment of interest as an alternative to experiments in which the platform controls whether the user receives treatment or not. In this paper, we conduct four large-scale within-study comparisons on Twitter aimed at assessing the effectiveness of observational studies derived from user self-selection on online platforms. In a within-study comparison, treatment effects from an observational study are assessed based on how effectively they replicate results from a randomized experiment with the same target population. We test the naive difference in group means estimator, exact matching, regression adjustment, and inverse probability of treatment weighting while controlling for plausible confounding variables. In all cases, all observational estimates perform poorly at recovering the ground-truth estimate from the analogous randomized experiments. In all cases except one, the observational estimates have the opposite sign of the randomized estimate. Our results suggest that observational studies derived from user self-selection are a poor alternative to randomized experimentation on online platforms. In discussing our results, we postulate "Catch-22"s that suggest that the success of causal inference in these settings may be at odds with the original motivations for providing users with greater agency.

【32】 Over-Parameterization and Generalization in Audio Classification 标题:音频分类中的过参数化和泛化

作者:Khaled Koutini,Hamid Eghbal-zadeh,Florian Henkel,Jan Schlüter,Gerhard Widmer 机构: 1Institute of Computational Perception 备注:Presented at the ICML 2021 Workshop on Overparameterization: Pitfalls & Opportunities 链接:https://arxiv.org/abs/2107.08933 摘要:卷积神经网络(CNNs)在机器视觉、机器听觉、自然语言处理等领域中一直占据主导地位。在机器听觉中,cnn通常表现出很好的泛化能力,但它对所使用的特定录音设备非常敏感,这已被公认为声学场景分类(DCASE)领域的一个重要问题。在这项研究中,我们探讨了过度参数化的声学场景分类模型之间的关系,以及由此产生的泛化能力。具体来说,我们测试了不同条件下CNNs在宽度和深度上的伸缩性。我们的结果表明,增加宽度可以提高对不可见设备的泛化,即使不增加参数的数量。 摘要:Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene classification (DCASE) community. In this study, we investigate the relationship between over-parameterization of acoustic scene classification models, and their resulting generalization abilities. Specifically, we test scaling CNNs in width and depth, under different conditions. Our results indicate that increasing width improves generalization to unseen devices, even without an increase in the number of parameters.

【33】 Introducing a Family of Synthetic Datasets for Research on Bias in Machine Learning 标题:引入一族合成数据集研究机器学习中的偏差

作者:William Blanzeisky,Pádraig Cunningham,Kenneth Kennedy 机构:School of Computer Science, University College Dublin, Credit Scoring Consultant 链接:https://arxiv.org/abs/2107.08928 摘要:机器学习中的偏倚研究进展的一个重要障碍是相关数据集的可用性。考虑到这些数据的敏感性,这种情况不太可能改变太多。因此,综合数据在本研究中具有一定的作用。在这篇短文中,我们提出了这样一个家庭的合成数据集。我们提供了一个数据概述,描述了偏差水平是如何变化的,并给出了一个简单的数据实验例子。 摘要:A significant impediment to progress in research on bias in machine learning (ML) is the availability of relevant datasets. This situation is unlikely to change much given the sensitivity of such data. For this reason, there is a role for synthetic data in this research. In this short paper, we present one such family of synthetic data sets. We provide an overview of the data, describe how the level of bias can be varied, and present a simple example of an experiment on the data.

【34】 Epistemic Neural Networks 标题:认知神经网络

作者:Ian Osband,Zheng Wen,Mohammad Asghari,Morteza Ibrahimi,Xiyuan Lu,Benjamin Van Roy 机构:DeepMind 链接:https://arxiv.org/abs/2107.08924 摘要:我们引入认知神经网络作为深度学习中不确定性建模的接口。所有现有的不确定性建模方法都可以表示为新网络,任何新网络都可以用贝叶斯神经网络进行识别。然而,这一新的视角为未来的研究提供了几个有希望的方向。以前的工作为神经网络开发了概率推理工具;我们反问,“哪些神经网络适合作为概率推理的工具?”。我们提出了一个明确而简单的衡量新奥法进展的指标:目标分布的吉隆坡离差。我们开发了一个基于神经网络高斯过程推理的计算测试平台,并在\url上发布了我们的代码作为基准{https://github.com/deepmind/enn}. 我们评估了深度学习中几种典型的不确定性建模方法,发现它们的性能差异很大。我们提供了洞察这些结果的敏感性,并表明我们的指标是高度相关的性能在顺序决策问题。最后,我们指出新的ENN架构可以在统计品质和计算成本两方面改善效能。 摘要:We introduce the \textit{epistemic neural network} (ENN) as an interface for uncertainty modeling in deep learning. All existing approaches to uncertainty modeling can be expressed as ENNs, and any ENN can be identified with a Bayesian neural network. However, this new perspective provides several promising directions for future research. Where prior work has developed probabilistic inference tools for neural networks; we ask instead, `which neural networks are suitable as tools for probabilistic inference?'. We propose a clear and simple metric for progress in ENNs: the KL-divergence with respect to a target distribution. We develop a computational testbed based on inference in a neural network Gaussian process and release our code as a benchmark at \url{https://github.com/deepmind/enn}. We evaluate several canonical approaches to uncertainty modeling in deep learning, and find they vary greatly in their performance. We provide insight to the sensitivity of these results and show that our metric is highly correlated with performance in sequential decision problems. Finally, we provide indications that new ENN architectures can improve performance in both the statistical quality and computational cost.

【35】 Test-optional Policies: Overcoming Strategic Behavior and Informational Gaps 标题:测试-可选策略:克服战略行为和信息差距

作者:Zhi Liu,Nikhil Garg 机构:Operations Research and Information Engineering, Cornell University, Cornell Tech and Technion 链接:https://arxiv.org/abs/2107.08922 摘要:由于Covid-19病毒的流行,超过500所美国高校在招生时选择了“考试自选”,并承诺不会因为申请人不提交考试成绩而处罚他们,这是重新思考考试在大学招生中作用的长期趋势的一部分。然而,目前尚不清楚一所大学如何(以及是否)同时使用提交分数的学生的考试成绩,而不惩罚那些不使用分数的学生——这一承诺甚至意味着什么。我们将这些问题形式化,并研究大学如何通过可选测试克服两个挑战:$\textit{strategic applicators}$(当那些考试分数低的人可以假装没有参加考试时)和$\textit{information gaps}$(提交考试分数的人比没有提交的人有更多的信息)。我们发现,大学确实可以这样做,如果而且只有当他们能够使用关于谁有权参加考试的信息,并愿意随机录取。 摘要:Due to the Covid-19 pandemic, more than 500 US-based colleges and universities went "test-optional" for admissions and promised that they would not penalize applicants for not submitting test scores, part of a longer trend to rethink the role of testing in college admissions. However, it remains unclear how (and whether) a college can simultaneously use test scores for those who submit them, while not penalizing those who do not--and what that promise even means. We formalize these questions, and study how a college can overcome two challenges with optional testing: $\textit{strategic applicants}$ (when those with low test scores can pretend to not have taken the test), and $\textit{informational gaps}$ (it has more information on those who submit a test score than those who do not). We find that colleges can indeed do so, if and only if they are able to use information on who has test access and are willing to randomize admissions.

【36】 Integrated shape-sensitive functional metrics 标题:集成的形状敏感功能度量

作者:Sami Helander,Petra Laketa,Pauliina Ilmonen,Stanislav Nagy,Germain Van Bever,Lauri Viitasaari 机构:Aalto University School of Science, Finland, Charles University, Czech Republic, University of Namur, Belgium, Aalto University School of Business, Finland 链接:https://arxiv.org/abs/2107.08917 摘要:本文提出了一种新的积分球(伪)度量,它在一般函数空间中提供了一个选择的起始(伪)度量d和L\u p距离之间的中介。选择d作为Hausdorff或Fr′echet距离,我们引入这些基于上确界的度量的集成形状敏感版本。新的度量允许在功能设置中进行更精细的分析,而不是直接应用非集成版本。此外,收敛的离散近似使计算在实践中是可行的。 摘要:This paper develops a new integrated ball (pseudo)metric which provides an intermediary between a chosen starting (pseudo)metric d and the L_p distance in general function spaces. Selecting d as the Hausdorff or Fr\'echet distances, we introduce integrated shape-sensitive versions of these supremum-based metrics. The new metrics allow for finer analyses in functional settings, not attainable applying the non-integrated versions directly. Moreover, convergent discrete approximations make computations feasible in practice.

【37】 Reasoning-Modulated Representations 标题:推理调制的表示法

作者:Petar Veličković,Matko Bošnjak,Thomas Kipf,Alexander Lerchner,Raia Hadsell,Razvan Pascanu,Charles Blundell 备注:ICML 2021 Workshop on Self-Supervised Learning for Reasoning and Perception (Spotlight Talk). 7 pages, 3 figures 链接:https://arxiv.org/abs/2107.08881 摘要:神经网络利用强大的内部表示,以推广。学习它们是很困难的,而且通常需要一个覆盖数据分布的大型训练集。我们研究一个共同的环境,在那里我们的任务不是完全不透明的。事实上,我们经常可以获得有关底层系统的信息(例如,观测必须遵守某些物理定律),任何“表-拉”神经网络都需要从头开始重新学习,从而降低数据效率。我们将这些资讯整合到预先训练的推理模组中,并探讨其在不同的像素自我监督学习环境中形成所发现的表征的作用。我们的方法为一类新的数据高效表示学习铺平了道路。 摘要:Neural networks leverage robust internal representations in order to generalise. Learning them is difficult, and often requires a large training set that covers the data distribution densely. We study a common setting where our task is not purely opaque. Indeed, very often we may have access to information about the underlying system (e.g. that observations must obey certain laws of physics) that any "tabula rasa" neural network would need to re-learn from scratch, penalising data efficiency. We incorporate this information into a pre-trained reasoning module, and investigate its role in shaping the discovered representations in diverse self-supervised learning settings from pixels. Our approach paves the way for a new class of data-efficient representation learning.

【38】 Boost-R: Gradient Boosted Trees for Recurrence Data 标题:Boost-R:递归数据的梯度增强树

作者:Xiao Liu,Rong Pan 机构:Department of Industrial Engineering, University of Arkansas, School of Computing, Informatics, Decision Systems Engineering 链接:https://arxiv.org/abs/2107.08784 摘要:复发数据来源于可靠性、网络安全、医疗保健、网上零售等多个学科领域。本文研究了一种基于加性树的方法,即Boost-R(Boosting-for Recurrence data),用于处理具有静态和动态特征的复发事件数据。Boost-R构造了一个梯度增强加法树集合来估计循环事件过程的累积强度函数,通过最小化观测到的累积强度和预测的累积强度之间的正则化L2距离,向集合中添加一棵新的树。与传统的回归树不同,Boost-R在每个树的叶子上构造一个时间相关函数。这些函数的总和,从多棵树,产生的集成估计的累积强度。基于树的方法的分而治之的本质是吸引当隐藏的子种群存在于一个异质的种群中。回归树的非参数特性有助于避免对事件过程和特征之间复杂交互的参数假设。通过综合的数值算例研究了Boost-R的重要意义和优点。Boost-R的数据集和计算机代码在GitHub上提供。据我们所知,Boost-R是第一个基于梯度增强加性树的方法,用于建模具有静态和动态特征信息的大规模重复事件数据。 摘要:Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data), for recurrent event data with both static and dynamic features. Boost-R constructs an ensemble of gradient boosted additive trees to estimate the cumulative intensity function of the recurrent event process, where a new tree is added to the ensemble by minimizing the regularized L2 distance between the observed and predicted cumulative intensity. Unlike conventional regression trees, a time-dependent function is constructed by Boost-R on each tree leaf. The sum of these functions, from multiple trees, yields the ensemble estimator of the cumulative intensity. The divide-and-conquer nature of tree-based methods is appealing when hidden sub-populations exist within a heterogeneous population. The non-parametric nature of regression trees helps to avoid parametric assumptions on the complex interactions between event processes and features. Critical insights and advantages of Boost-R are investigated through comprehensive numerical examples. Datasets and computer code of Boost-R are made available on GitHub. To our best knowledge, Boost-R is the first gradient boosted additive-tree-based approach for modeling large-scale recurrent event data with both static and dynamic feature information.

【39】 Experimental Investigation and Evaluation of Model-based Hyperparameter Optimization 标题:基于模型的超参数优化实验研究与评价

作者:Eva Bartz,Martin Zaefferer,Olaf Mersmann,Thomas Bartz-Beielstein 机构:Bartz & Bartz GmbH, Goebenstr. , Gummersbach, IDE+A, TH K¨oln, Steinm¨ullerallee 链接:https://arxiv.org/abs/2107.08761 摘要:机器学习算法,如随机森林或xgboost,正变得越来越重要,并越来越多地纳入生产过程,以实现全面的数字化,如果可能的话,自动化的过程。这些算法的超参数需要适当的设置,这可以称为超参数整定或优化。基于可调性的概念,本文综述了目前流行的机器学习算法的理论和实践结果。本文对六种相关机器学习算法的30个超参数进行了实验分析。特别是,它提供了(i)一个重要的超参数调查,(ii)两个参数调整研究,和(iii)一个广泛的全局参数调整研究,以及(iv)一个新的方法,基于共识排序,分析多个算法的结果。将R包mlr作为机器学习模型的统一接口。R包SPOT用于执行实际的调优(优化)。所有附加代码都与本文一起提供。 摘要:Machine learning algorithms such as random forests or xgboost are gaining more importance and are increasingly incorporated into production processes in order to enable comprehensive digitization and, if possible, automation of processes. Hyperparameters of these algorithms used have to be set appropriately, which can be referred to as hyperparameter tuning or optimization. Based on the concept of tunability, this article presents an overview of theoretical and practical results for popular machine learning algorithms. This overview is accompanied by an experimental analysis of 30 hyperparameters from six relevant machine learning algorithms. In particular, it provides (i) a survey of important hyperparameters, (ii) two parameter tuning studies, and (iii) one extensive global parameter tuning study, as well as (iv) a new way, based on consensus ranking, to analyze results from multiple algorithms. The R package mlr is used as a uniform interface to the machine learning models. The R package SPOT is used to perform the actual tuning (optimization). All additional code is provided together with this paper.

【40】 Path Integrals for the Attribution of Model Uncertainties 标题:模型不确定性属性的路径积分

作者:Iker Perez,Piotr Skalski,Alec Barns-Graham,Jason Wong,David Sutton 机构:Featurespace Research, Cambridge, United Kingdom 链接:https://arxiv.org/abs/2107.08756 摘要:在贝叶斯机器学习应用中,模型不确定性的解释是非常重要的。通常,这需要有意义地将预测不确定性归因于图像、文本或分类数组中的源特征。然而,流行的归因方法是专门为分类和回归分数设计的。为了解释不确定性,最新的替代方案通常获取反事实的特征向量,并进行直接比较。本文利用路径积分对贝叶斯可微模型中的不确定性进行属性化处理。我们提出了一种新的算法,该算法依赖于将特征向量连接到反事实对应物的分布曲线,并且保持了可解释性方法的理想特性。我们在具有不同分辨率的基准图像数据集上验证了我们的方法,并表明它大大简化了现有替代方法的可解释性。 摘要:Enabling interpretations of model uncertainties is of key importance in Bayesian machine learning applications. Often, this requires to meaningfully attribute predictive uncertainties to source features in an image, text or categorical array. However, popular attribution methods are particularly designed for classification and regression scores. In order to explain uncertainties, state of the art alternatives commonly procure counterfactual feature vectors, and proceed by making direct comparisons. In this paper, we leverage path integrals to attribute uncertainties in Bayesian differentiable models. We present a novel algorithm that relies on in-distribution curves connecting a feature vector to some counterfactual counterpart, and we retain desirable properties of interpretability methods. We validate our approach on benchmark image data sets with varying resolution, and show that it significantly simplifies interpretability over the existing alternatives.

【41】 Improved Learning Rates for Stochastic Optimization: Two Theoretical Viewpoints 标题:随机优化的改进学习率:两种理论观点

作者:Shaojie Li,Yong Liu 机构:Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, Beijing , China, Editor: 链接:https://arxiv.org/abs/2107.08686 摘要:随机优化的泛化性能在机器学习中占有重要地位。在本文中,我们研究了两种流行的随机优化方法:经验风险最小化(ERM)和随机梯度下降(SGD)的超额风险性能和改进的学习率。尽管有大量的ERM和SGD在有监督学习中的推广分析,但目前对ERM和SGD的理论认识要么在凸学习中有较强的假设,如强凸性条件,要么在非凸学习中表现出较慢的速度,研究较少。基于这些问题,我们的目标是在更温和的假设下,在凸学习中提供更高的学习率,并在非凸学习中获得更快的学习率。值得注意的是,我们的分析跨越了两个流行的理论观点:稳定性和一致收敛性。具体地说,在稳定域中,我们给出了$\mathcal{O}(1/n)$w.r.t.阶的高概率率。在凸学习中,ERM和SGD的样本量为$n$,而在非凸学习中,具有类似的$\mathcal{O}(1/n)$阶的高概率率,而不是在期望中。此外,在一致收敛的情况下,这种学习速率提高到更快的阶$\mathcal{O}(1/n^2)$。据我们所知,对于ERM和SGD,本文给出的学习率都是最先进的。 摘要:Generalization performance of stochastic optimization stands a central place in machine learning. In this paper, we investigate the excess risk performance and towards improved learning rates for two popular approaches of stochastic optimization: empirical risk minimization (ERM) and stochastic gradient descent (SGD). Although there exists plentiful generalization analysis of ERM and SGD for supervised learning, current theoretical understandings of ERM and SGD are either have stronger assumptions in convex learning, e.g., strong convexity condition, or show slow rates and less studied in nonconvex learning. Motivated by these problems, we aim to provide improved rates under milder assumptions in convex learning and derive faster rates in nonconvex learning. It is notable that our analysis span two popular theoretical viewpoints: stability and uniform convergence. To be specific, in stability regime, we present high probability rates of order $\mathcal{O} (1/n)$ w.r.t. the sample size $n$ for ERM and SGD with milder assumptions in convex learning and similar high probability rates of order $\mathcal{O} (1/n)$ in nonconvex learning, rather than in expectation. Furthermore, this type of learning rate is improved to faster order $\mathcal{O} (1/n^2)$ in uniform convergence regime. To the best of our knowledge, for ERM and SGD, the learning rates presented in this paper are all state-of-the-art.

【42】 Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function 标题:非凸学习Tusla算法的非渐近估计及其在RELU激活函数神经网络中的应用

作者:Dong-Young Lim,Ariel Neufeld,Sotirios Sabanis,Ying Zhang 机构:Financial supports by The Alan Turing Institute 链接:https://arxiv.org/abs/2107.08649 摘要:我们考虑目标函数具有超线性增长和不连续随机梯度的非凸随机优化问题。在这样的背景下,我们对Lovas等人(2021)提出的驯服的未调整随机Langevin算法(TUSLA)进行了非渐近分析。特别地,我们在Wasserstein-1和Wasserstein-2距离上建立了TUSLA算法的非渐近误差界。后一个结果使我们能够进一步得到期望超额风险的非渐近估计。为了说明主要结果的适用性,我们考虑传递学习与Relu神经网络的例子,这代表了机器学习中的一个关键范例。对上述算例进行了数值实验,验证了我们的理论结果。因此,在这种情况下,我们从理论和数值两方面证明了TUSLA算法可以解决具有ReLU激活函数的神经网络优化问题。此外,我们还提供了一些综合例子的仿真结果,这些例子中流行的算法,例如ADAM、AMSGrad、RMSProp和(vanilla)SGD,由于超线性增长和相应随机梯度的不连续性,可能无法找到目标函数的最小值,而TUSLA算法快速收敛到最优解。 摘要:We consider non-convex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a non-asymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2021). In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive non-asymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example which supports our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically that the TUSLA algorithm can solve the optimization problem involving neural networks with ReLU activation function. Besides, we provide simulation results for synthetic examples where popular algorithms, e.g. ADAM, AMSGrad, RMSProp, and (vanilla) SGD, may fail to find the minimizer of the objective functions due to the super-linear growth and the discontinuity of the corresponding stochastic gradient, while the TUSLA algorithm converges rapidly to the optimal solution.

【43】 Transport away your problems: Calibrating stochastic simulations with optimal transport 标题:解决您的问题:使用最优传输来校准随机模拟

作者:Chris Pollard,Philipp Windischhofer 机构:Department of Physics, University of Oxford, Keble Road, Oxford, United Kingdom 链接:https://arxiv.org/abs/2107.08648 摘要:随机模拟器是许多科学分支中不可缺少的工具。通常基于第一性原理,他们提供一系列样本,其分布隐含地定义了一个概率度量来描述感兴趣的现象。然而,这些模拟器的保真度并不总是足以满足所有的科学目的,因此需要构建特别的校正来“校准”模拟,并确保其输出是真实的真实再现。在本文中,我们利用运输理论中的方法,以系统的方式构建这样的修正。我们使用一个神经网络来计算对模拟器产生的单个样本的最小修改,从而使得到的分布得到适当的校准。我们在实验粒子物理的背景下说明了这种方法及其优点,其中校准随机模拟器的需求尤为突出。 摘要:Stochastic simulators are an indispensable tool in many branches of science. Often based on first principles, they deliver a series of samples whose distribution implicitly defines a probability measure to describe the phenomena of interest. However, the fidelity of these simulators is not always sufficient for all scientific purposes, necessitating the construction of ad-hoc corrections to "calibrate" the simulation and ensure that its output is a faithful representation of reality. In this paper, we leverage methods from transportation theory to construct such corrections in a systematic way. We use a neural network to compute minimal modifications to the individual samples produced by the simulator such that the resulting distribution becomes properly calibrated. We illustrate the method and its benefits in the context of experimental particle physics, where the need for calibrated stochastic simulators is particularly pronounced.

【44】 A Topological Perspective on Causal Inference 标题:因果推理的拓扑学视角

作者:Duligur Ibeling,Thomas Icard 机构:Department of Computer Science, Stanford University, Stanford, CA , Department of Philosophy 备注:ICML 2021 NACI workshop 链接:https://arxiv.org/abs/2107.08558 摘要:通过在结构因果模型(SCMs)的一般空间上定义一系列拓扑,提出了因果推理的拓扑学习理论观点。作为框架的一个例子,我们证明了一个拓扑因果层次定理,表明实质性的无假设因果推理只可能在一个有限的scm集合中。由于弱拓扑中的开集与统计上可验证的假设之间存在已知的对应关系,我们的结果表明,足以证明有效因果推论的归纳假设在原则上是统计上不可验证的。与统计推断的无免费午餐定理类似,本文的结果阐明了因果推断的实质性假设的必然性。我们的拓扑方法的另一个好处是它很容易适应具有无限多变量的scm。最后,我们建议,该框架可能有助于探索和评估替代因果归纳假设的积极项目。 摘要:This paper presents a topological learning-theoretic perspective on causal inference by introducing a series of topologies defined on general spaces of structural causal models (SCMs). As an illustration of the framework we prove a topological causal hierarchy theorem, showing that substantive assumption-free causal inference is possible only in a meager set of SCMs. Thanks to a known correspondence between open sets in the weak topology and statistically verifiable hypotheses, our results show that inductive assumptions sufficient to license valid causal inferences are statistically unverifiable in principle. Similar to no-free-lunch theorems for statistical inference, the present results clarify the inevitability of substantial assumptions for causal inference. An additional benefit of our topological approach is that it easily accommodates SCMs with infinitely many variables. We finally suggest that the framework may be helpful for the positive project of exploring and assessing alternative causal-inductive assumptions.

【45】 Decoupling Shrinkage and Selection for the Bayesian Quantile Regression 标题:贝叶斯分位数回归的解耦收缩与选择

作者:David Kohns,Tibor Szendrei 机构:Department of Economics, Heriot-Watt University, and 备注:First Draft: 18/07/2021 链接:https://arxiv.org/abs/2107.08498 摘要:本文将连续先验的解耦收缩和稀疏性的思想推广到贝叶斯分位数回归(BQR)。该过程分为两步:第一步,我们通过最先进的连续先验压缩分位数回归后验值;第二步,我们通过自适应套索的一个有效变体,即信号自适应变量选择(SAVS)算法稀疏后验值。我们提出了一种新的SAVS变体,它通过在高维有效的分位数特定损失函数来自动选择惩罚。我们在大规模模拟中显示,与非稀疏回归相比,我们的选择过程减少了偏差,而与数据中真正的潜在稀疏度无关。我们将我们的两步方法应用于高维风险增长(GaR)练习。在得到可解释的分位数特定变量选择结果的同时,保留了非稀疏后验概率的预测精度。我们的程序可用于与决策者沟通哪些变量会导致宏观经济下行风险。 摘要:This paper extends the idea of decoupling shrinkage and sparsity for continuous priors to Bayesian Quantile Regression (BQR). The procedure follows two steps: In the first step, we shrink the quantile regression posterior through state of the art continuous priors and in the second step, we sparsify the posterior through an efficient variant of the adaptive lasso, the signal adaptive variable selection (SAVS) algorithm. We propose a new variant of the SAVS which automates the choice of penalisation through quantile specific loss-functions that are valid in high dimensions. We show in large scale simulations that our selection procedure decreases bias irrespective of the true underlying degree of sparsity in the data, compared to the un-sparsified regression posterior. We apply our two-step approach to a high dimensional growth-at-risk (GaR) exercise. The prediction accuracy of the un-sparsified posterior is retained while yielding interpretable quantile specific variable selection results. Our procedure can be used to communicate to policymakers which variables drive downside risk to the macro economy.

【46】 GoTube: Scalable Stochastic Verification of Continuous-Depth Models 标题:GoTube:连续深度模型的可扩展随机验证

作者:Sophie Gruenbacher,Mathias Lechner,Ramin Hasani,Daniela Rus,Thomas A. Henzinger,Scott Smolka,Radu Grosu 机构: 4Stony Brook University 备注:17 Pages 链接:https://arxiv.org/abs/2107.08467 摘要:我们引入了一种新的随机验证算法,该算法将任意时间连续过程的行为鲁棒性形式化为一个连续深度模型。该算法在给定的时间范围内求解一组全局优化问题,从一个初始状态球出发,构造一个由所有进程执行组成的紧围筒。我们称我们的算法为GoTube。通过它的构造,GoTube确保边界管在期望的概率内是保守的。GoTube是在JAX中实现的,并经过优化以扩展到复杂的连续深度模型。与先进的时间连续神经网络可达性分析工具相比,GoTube不积累时间步长之间的过逼近误差,避免了符号技术固有的包装效应。在大量实验中,我们发现GoTube在初始球的大小、速度、时间范围、任务完成和可伸缩性等方面明显优于最先进的验证工具。GoTube是稳定的,并为其扩展到时间范围的能力设定了最先进的水平,远远超出了以前的可能。 摘要:We introduce a new stochastic verification algorithm that formally quantifies the behavioral robustness of any time-continuous process formulated as a continuous-depth model. The algorithm solves a set of global optimization (Go) problems over a given time horizon to construct a tight enclosure (Tube) of the set of all process executions starting from a ball of initial states. We call our algorithm GoTube. Through its construction, GoTube ensures that the bounding tube is conservative up to a desired probability. GoTube is implemented in JAX and optimized to scale to complex continuous-depth models. Compared to advanced reachability analysis tools for time-continuous neural networks, GoTube provably does not accumulate over-approximation errors between time steps and avoids the infamous wrapping effect inherent in symbolic techniques. We show that GoTube substantially outperforms state-of-the-art verification tools in terms of the size of the initial ball, speed, time-horizon, task completion, and scalability, on a large set of experiments. GoTube is stable and sets the state-of-the-art for its ability to scale up to time horizons well beyond what has been possible before.

【47】 Compressed particle methods for expensive models with application in Astronomy and Remote Sensing 标题:昂贵模型的压缩质点方法及其在天文遥感中的应用

作者:Luca Martino,Víctor Elvira,Javier López-Santiago,Gustau Camps-Valls 备注:published in IEEE Transactions on Aerospace and Electronic Systems 链接:https://arxiv.org/abs/2107.08465 摘要:在许多推理问题中,往往需要对复杂而昂贵的模型进行评估。在这种背景下,贝叶斯方法在过去的几年中已经在许多领域中非常流行,以获得参数反演、模型选择或不确定性量化。贝叶斯推理需要对复杂的积分进行近似,这些积分涉及(通常代价高昂的)后验分布。通常,这种近似是通过蒙特卡罗(MC)方法得到的。为了降低相应技术的计算成本,通常采用代理模型(也称为仿真器)。另一种方法是所谓的近似贝叶斯计算(ABC)方案。ABC不需要评估昂贵的模型,但需要根据该模型模拟人工数据的能力。此外,在作业成本法中,还需要在真实数据和人工数据之间选择合适的距离。在这项工作中,我们介绍了一种新的方法,其中昂贵的模型只评估在一些精心挑选的样本。这些节点的选择基于所谓的压缩蒙特卡罗(CMC)方案。我们提供了支持新算法的理论结果,并在几个数值实验中给出了该方法性能的经验证据。其中两个是天文学和卫星遥感的实际应用。 摘要:In many inference problems, the evaluation of complex and costly models is often required. In this context, Bayesian methods have become very popular in several fields over the last years, in order to obtain parameter inversion, model selection or uncertainty quantification. Bayesian inference requires the approximation of complicated integrals involving (often costly) posterior distributions. Generally, this approximation is obtained by means of Monte Carlo (MC) methods. In order to reduce the computational cost of the corresponding technique, surrogate models (also called emulators) are often employed. Another alternative approach is the so-called Approximate Bayesian Computation (ABC) scheme. ABC does not require the evaluation of the costly model but the ability to simulate artificial data according to that model. Moreover, in ABC, the choice of a suitable distance between real and artificial data is also required. In this work, we introduce a novel approach where the expensive model is evaluated only in some well-chosen samples. The selection of these nodes is based on the so-called compressed Monte Carlo (CMC) scheme. We provide theoretical results supporting the novel algorithms and give empirical evidence of the performance of the proposed method in several numerical experiments. Two of them are real-world applications in astronomy and satellite remote sensing.

【48】 Differentially Private Bayesian Neural Networks on Accuracy, Privacy and Reliability 标题:差分私有贝叶斯神经网络对精度、保密性和可靠性的影响

作者:Qiyiwen Zhang,Zhiqi Bu,Kan Chen,Qi Long 机构:University of Pennsylvania 链接:https://arxiv.org/abs/2107.08461 摘要:贝叶斯神经网络(BNN)允许预测中的不确定性量化,与常规神经网络相比具有优势,而常规神经网络尚未在差分隐私(DP)框架中进行探索。我们通过利用贝叶斯深度学习和隐私会计的最新发展来填补这一重要空白,从而对BNN中隐私和准确性之间的权衡提供更精确的分析。我们提出了三种不同的DP-bnn来描述同一网络结构的权重不确定性,即DP-SGLD(通过噪声梯度法)、DP-BBP(通过改变感兴趣的参数)和DP-MC-Dropout(通过模型结构)。有趣的是,我们在DP-SGD和DP-SGLD之间展示了一个新的等价性,这意味着一些非贝叶斯DP训练自然允许不确定性量化。然而,学习率和批量大小等超参数在DP-SGD和DP-SGLD中可能产生不同甚至相反的影响。在隐私保证、预测准确性、不确定性量化、校准、计算速度和对网络结构的可推广性等方面,对DP-BNNs进行了大量的实验比较。因此,我们观察到隐私和可靠性之间的一个新的权衡。与非DP和非贝叶斯方法相比,DP-SGLD在强隐私保证下具有显著的准确性,显示了DP-BNN在实际任务中的巨大潜力。 摘要:Bayesian neural network (BNN) allows for uncertainty quantification in prediction, offering an advantage over regular neural networks that has not been explored in the differential privacy (DP) framework. We fill this important gap by leveraging recent development in Bayesian deep learning and privacy accounting to offer a more precise analysis of the trade-off between privacy and accuracy in BNN. We propose three DP-BNNs that characterize the weight uncertainty for the same network architecture in distinct ways, namely DP-SGLD (via the noisy gradient method), DP-BBP (via changing the parameters of interest) and DP-MC Dropout (via the model architecture). Interestingly, we show a new equivalence between DP-SGD and DP-SGLD, implying that some non-Bayesian DP training naturally allows for uncertainty quantification. However, the hyperparameters such as learning rate and batch size, can have different or even opposite effects in DP-SGD and DP-SGLD. Extensive experiments are conducted to compare DP-BNNs, in terms of privacy guarantee, prediction accuracy, uncertainty quantification, calibration, computation speed, and generalizability to network architecture. As a result, we observe a new tradeoff between the privacy and the reliability. When compared to non-DP and non-Bayesian approaches, DP-SGLD is remarkably accurate under strong privacy guarantee, demonstrating the great potential of DP-BNN in real-world tasks.

【49】 A Theory of PAC Learnability of Partial Concept Classes 标题:部分概念类的PAC可学习性理论

作者:Noga Alon,Steve Hanneke,Ron Holzman,Shay Moran 机构:Princeton University, Toyota Technological Institute at Chicago, Technion and Google Research 链接:https://arxiv.org/abs/2107.08444 摘要:我们扩展了PAC学习的理论,使得我们能够对各种各样的学习任务进行建模,其中的数据满足简化学习过程的特殊属性。例如,数据与决策边界的距离以零为界的任务。基本和简单的想法是考虑部分概念:这些是可以在空间的某些部分上未定义的函数。当学习部分概念时,我们假设只有在定义部分概念的点上才支持源分布。这样,人们就可以自然地表达对数据的假设,例如躺在低维表面上或边缘条件。与此相反,传统的PAC理论是否能够表达这样的假设并不十分清楚。事实上,我们展示了易于学习的部分概念类,这是传统PAC理论无法捕捉的。这也解决了Attias、Kontorovich和Mansour 2019提出的一个问题。我们刻画了部分概念类的PAC可学习性,并揭示了一个与经典概念类根本不同的算法景观。例如,在经典的PAC模型中,学习归结为经验风险最小化(ERM)。与此形成鲜明对比的是,ERM原理不能解释部分概念类的可学习性。事实上,我们演示了非常容易学习的类,但是任何学习它们的算法都必须使用一个具有无界VC维的假设空间。我们还发现,样本压缩猜想在这种情况下失败。因此,这一理论的特点是不能用传统的方法来表示和解决问题。我们认为这是一个证据,它可能提供了关于现实场景中学习性本质的见解,而经典理论无法解释。 摘要:We extend the theory of PAC learning in a way which allows to model a rich variety of learning tasks where the data satisfy special properties that ease the learning process. For example, tasks where the distance of the data from the decision boundary is bounded away from zero. The basic and simple idea is to consider partial concepts: these are functions that can be undefined on certain parts of the space. When learning a partial concept, we assume that the source distribution is supported only on points where the partial concept is defined. This way, one can naturally express assumptions on the data such as lying on a lower dimensional surface or margin conditions. In contrast, it is not at all clear that such assumptions can be expressed by the traditional PAC theory. In fact we exhibit easy-to-learn partial concept classes which provably cannot be captured by the traditional PAC theory. This also resolves a question posed by Attias, Kontorovich, and Mansour 2019. We characterize PAC learnability of partial concept classes and reveal an algorithmic landscape which is fundamentally different than the classical one. For example, in the classical PAC model, learning boils down to Empirical Risk Minimization (ERM). In stark contrast, we show that the ERM principle fails in explaining learnability of partial concept classes. In fact, we demonstrate classes that are incredibly easy to learn, but such that any algorithm that learns them must use an hypothesis space with unbounded VC dimension. We also find that the sample compression conjecture fails in this setting. Thus, this theory features problems that cannot be represented nor solved in the traditional way. We view this as evidence that it might provide insights on the nature of learnability in realistic scenarios which the classical theory fails to explain.

【50】 Top-label calibration 标题:顶标校准

作者:Chirag Gupta,Aaditya K. Ramdas 机构:Carnegie Mellon University 备注:33 pages, 15 figures 链接:https://arxiv.org/abs/2107.08353 摘要:我们研究了多类分类的事后校正问题,重点是直方图分块。许多工作都集中在关于预测类(或“顶标签”)置信度的校准上。我们发现,置信度校准的流行概念[Guo et al.,2017]不够强大——存在未以任何有意义的方式校准但完全置信校准的预测因子。我们提出了一个密切相关(但微妙不同)的概念,顶部标签校准,它准确地捕捉了信心校准的直观性和简单性,但解决了它的缺点。本文提出了一种直方图分块(HB)算法,该算法将顶标多类校正问题简化为二值化问题,证明了该算法在没有分布假设的情况下具有清晰的理论保证,并对其实际性能进行了系统的研究。一些预测任务需要更严格的多类校正概念,如类校正或规范校正。我们形式化了相应于这些目标的HB算法。在使用深度神经网络的实验中,我们发现我们的HB原则版本通常比温度标度更好,无论是顶级标签还是类级校准。这项工作的代码将在网站上公开https://github.com/aigen/df-posthoc-calibration. 摘要:We study the problem of post-hoc calibration for multiclass classification, with an emphasis on histogram binning. Multiple works have focused on calibration with respect to the confidence of just the predicted class (or 'top-label'). We find that the popular notion of confidence calibration [Guo et al., 2017] is not sufficiently strong -- there exist predictors that are not calibrated in any meaningful way but are perfectly confidence calibrated. We propose a closely related (but subtly different) notion, top-label calibration, that accurately captures the intuition and simplicity of confidence calibration, but addresses its drawbacks. We formalize a histogram binning (HB) algorithm that reduces top-label multiclass calibration to the binary case, prove that it has clean theoretical guarantees without distributional assumptions, and perform a methodical study of its practical performance. Some prediction tasks require stricter notions of multiclass calibration such as class-wise or canonical calibration. We formalize appropriate HB algorithms corresponding to each of these goals. In experiments with deep neural nets, we find that our principled versions of HB are often better than temperature scaling, for both top-label and class-wise calibration. Code for this work will be made publicly available at https://github.com/aigen/df-posthoc-calibration.

【51】 Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses 标题:对抗性MDP的政策优化:通过扩大奖金来改进探索

作者:Haipeng Luo,Chen-Yu Wei,Chung-Wei Lee 机构:University of Southern California 链接:https://arxiv.org/abs/2107.08346 摘要:策略优化是强化学习中广泛应用的一种方法。然而,由于马尔可夫决策过程(MDPs)的局部搜索性质,其全局最优性的理论保证往往依赖于对MDPs的额外假设,从而避开了全局搜索的挑战。为了消除这种假设的必要性,在这项工作中,我们开发了一个通用的解决方案,在政策更新中增加扩大的奖金,以促进全球探索。为了展示这项技术的力量和普遍性,我们将其应用于几个具有对抗性损失和强盗反馈的情节MDP设置,改进和推广了最新技术。具体地说,在表格的情况下,我们得到了$\widetilde{\mathcal{O}(\sqrt{T})$后悔,其中$T$是剧集数,改进了Shani等人(2020)定义的$\widetilde{\mathcal{O}({T}^{2/3})$后悔。当状态数为无穷大时,在假设状态作用值在某些低维特征中是线性的前提下,借助模拟器得到$\widetilde{\mathcal{O}}({T}^{2/3})$遗憾,匹配Neu和Olkhovskaya(2020)的结果,同时重要的是消除了他们的算法需要的探索性策略的需要。当模拟器不可用时,我们进一步考虑线性MDP设置,并获得$\广角{MathCAL{O}}({t}^ ^ 14/15 })$遗憾,这是具有敌方损失和强盗反馈的线性MDP的第一个结果。 摘要:Policy optimization is a widely-used method in reinforcement learning. Due to its local-search nature, however, theoretical guarantees on global optimality often rely on extra assumptions on the Markov Decision Processes (MDPs) that bypass the challenge of global exploration. To eliminate the need of such assumptions, in this work, we develop a general solution that adds dilated bonuses to the policy update to facilitate global exploration. To showcase the power and generality of this technique, we apply it to several episodic MDP settings with adversarial losses and bandit feedback, improving and generalizing the state-of-the-art. Specifically, in the tabular case, we obtain $\widetilde{\mathcal{O}}(\sqrt{T})$ regret where $T$ is the number of episodes, improving the $\widetilde{\mathcal{O}}({T}^{2/3})$ regret bound by Shani et al. (2020). When the number of states is infinite, under the assumption that the state-action values are linear in some low-dimensional features, we obtain $\widetilde{\mathcal{O}}({T}^{2/3})$ regret with the help of a simulator, matching the result of Neu and Olkhovskaya (2020) while importantly removing the need of an exploratory policy that their algorithm requires. When a simulator is unavailable, we further consider a linear MDP setting and obtain $\widetilde{\mathcal{O}}({T}^{14/15})$ regret, which is the first result for linear MDPs with adversarial losses and bandit feedback.

【52】 STRODE: Stochastic Boundary Ordinary Differential Equation 标题:STRODE:随机边界常微分方程

作者:Hengguan Huang,Hongfu Liu,Hao Wang,Chang Xiao,Ye Wang 机构:Numerous neuroscience studies have provided evidencefor close connections between time perception and sensoryinput from multiple sensory modalities such as audition 1National University of Singapore 2Rutgers University 备注:Accepted at ICML 2021; typos corrected 链接:https://arxiv.org/abs/2107.08273 摘要:从连续获得的感觉输入中对时间的感知植根于个体有机体的日常行为。然而,大多数用于时间序列建模的算法无法直接从视觉或音频输入中学习随机事件计时的动态,因此在训练过程中需要时间注释,而这些注释通常不适用于实际应用。例如,从神经科学的角度来看,后记意味着存在着可变的时间范围,在这个范围内,传入的感觉输入可以影响早期的感知,但这种时间范围在诸如自动语音识别(ASR)等实际应用中大多没有被注记。在本文中,我们提出了一个概率常微分方程(ODE),称为随机边界ODE(STRODE),它可以学习时间序列数据的时序和动态,而不需要在训练过程中进行任何时序注释。STRODE允许使用微分方程从后点过程中进行采样,有效地分析。为STRODE的学习提供了理论保证。我们的实证结果表明,我们的方法成功地推断了时间序列数据的事件计时。对于合成数据集和真实数据集,与现有的最新方法相比,我们的方法获得了具有竞争力或优越的性能。 摘要:Perception of time from sequentially acquired sensory inputs is rooted in everyday behaviors of individual organisms. Yet, most algorithms for time-series modeling fail to learn dynamics of random event timings directly from visual or audio inputs, requiring timing annotations during training that are usually unavailable for real-world applications. For instance, neuroscience perspectives on postdiction imply that there exist variable temporal ranges within which the incoming sensory inputs can affect the earlier perception, but such temporal ranges are mostly unannotated for real applications such as automatic speech recognition (ASR). In this paper, we present a probabilistic ordinary differential equation (ODE), called STochastic boundaRy ODE (STRODE), that learns both the timings and the dynamics of time series data without requiring any timing annotations during training. STRODE allows the usage of differential equations to sample from the posterior point processes, efficiently and analytically. We further provide theoretical guarantees on the learning of STRODE. Our empirical results show that our approach successfully infers event timings of time series data. Our method achieves competitive or superior performances compared to existing state-of-the-art methods for both synthetic and real-world datasets.

【53】 Sparse Bayesian Learning with Diagonal Quasi-Newton Method For Large Scale Classification 标题:基于对角拟牛顿方法的稀疏贝叶斯学习在大规模分类中的应用

作者:Jiahua Luo,Chi-Man Vong,Jie Du 机构:C 备注:11 pages,5 figures 链接:https://arxiv.org/abs/2107.08195 摘要:稀疏贝叶斯学习(SBL)构造了一个极为稀疏的概率模型,具有很强的竞争泛化能力。然而,SBL需要反演一个复杂度为O(M^3)(M:特征大小)的大协方差矩阵来更新正则化先验,这给实际应用带来了困难。SBL算法存在三个问题:1)在某些情况下,对协方差矩阵求逆可能得到奇异解,这阻碍了SBL算法的收敛;2) 对高维特征空间或大数据量问题的可扩展性差;3) 对于大规模数据,SBL容易出现内存溢出。本文提出了一种新的对角拟牛顿法(DQN-SBL),该方法忽略了大协方差矩阵的求逆,从而使算法的复杂度和存储量降低到O(M)。DQN-SBL是全面评估非线性分类器和线性特征选择使用各种基准数据集的不同大小。实验结果表明,DQN-SBL模型具有很好的泛化能力,能很好地推广到大规模问题。 摘要:Sparse Bayesian Learning (SBL) constructs an extremely sparse probabilistic model with very competitive generalization. However, SBL needs to invert a big covariance matrix with complexity O(M^3 ) (M: feature size) for updating the regularization priors, making it difficult for practical use. There are three issues in SBL: 1) Inverting the covariance matrix may obtain singular solutions in some cases, which hinders SBL from convergence; 2) Poor scalability to problems with high dimensional feature space or large data size; 3) SBL easily suffers from memory overflow for large-scale data. This paper addresses these issues with a newly proposed diagonal Quasi-Newton (DQN) method for SBL called DQN-SBL where the inversion of big covariance matrix is ignored so that the complexity and memory storage are reduced to O(M). The DQN-SBL is thoroughly evaluated on non-linear classifiers and linear feature selection using various benchmark datasets of different sizes. Experimental results verify that DQN-SBL receives competitive generalization with a very sparse model and scales well to large-scale problems.

【54】 Markov Blanket Discovery using Minimum Message Length 标题:使用最小消息长度的马尔可夫毯子发现

作者:Yang Li,Kevin B Korb,Lloyd Allison 机构:School of Computing, The Australian National University, Canberra, ACT, Australia, Monash University, Clayton, VIC, Australia 链接:https://arxiv.org/abs/2107.08140 摘要:因果发现从数据中自动学习因果贝叶斯网络,从一开始就引起了人们的积极兴趣。随着从互联网上获取大型数据集,人们对扩展到非常大的数据集的兴趣与日俱增。解决这个问题的一种方法是首先使用Markov毯子(MB)发现并行化搜索,然后在全局因果模型中结合MBs。我们开发和探索了三种使用最小消息长度(MML)的MB发现新方法,并将它们与现有的最佳方法进行了经验比较,无论是作为MB发现还是作为特征选择。我们最好的MML方法是一致的竞争和有一些优势的特点。 摘要:Causal discovery automates the learning of causal Bayesian networks from data and has been of active interest from their beginning. With the sourcing of large data sets off the internet, interest in scaling up to very large data sets has grown. One approach to this is to parallelize search using Markov Blanket (MB) discovery as a first step, followed by a process of combining MBs in a global causal model. We develop and explore three new methods of MB discovery using Minimum Message Length (MML) and compare them empirically to the best existing methods, whether developed specifically as MB discovery or as feature selection. Our best MML method is consistently competitive and has some advantageous features.

【55】 Hamiltonian Monte Carlo for Regression with High-Dimensional Categorical Data 标题:高维分类数据回归的哈密顿蒙特卡罗方法

作者:Szymon Sacher,Laura Battaglia,Stephen Hansen 机构:Columbia University, Barcelona Graduate School of Economics, Imperial College London 备注:30 pages 6 figures, 3 tables 链接:https://arxiv.org/abs/2107.08112 摘要:潜变量模型在经济学中越来越流行,用于文本和调查等高维分类数据。结果往往是低维表示插入下游经济计量模型,忽略了上游模型的统计结构,这对有效的推断提出了严重的挑战。我们展示了如何用并行化的自动微分实现哈密顿蒙特卡罗(HMC)为这个问题提供了一个计算效率高、易于编码和统计健壮的解决方案。通过一系列的应用,我们证明了集成结构建模对推理的影响是不可忽视的,并且HMC在集成模型中的推理性能明显优于现有的方法。 摘要:Latent variable models are becoming increasingly popular in economics for high-dimensional categorical data such as text and surveys. Often the resulting low-dimensional representations are plugged into downstream econometric models that ignore the statistical structure of the upstream model, which presents serious challenges for valid inference. We show how Hamiltonian Monte Carlo (HMC) implemented with parallelized automatic differentiation provides a computationally efficient, easy-to-code, and statistically robust solution for this problem. Via a series of applications, we show that modeling integrated structure can non-trivially affect inference and that HMC appears to markedly outperform current approaches to inference in integrated models.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-07-20,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
数据库
云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档