前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >统计学学术速递[7.12]

统计学学术速递[7.12]

作者头像
公众号-arXiv每日学术速递
发布2021-07-27 10:45:36
5040
发布2021-07-27 10:45:36
举报
文章被收录于专栏:arXiv每日学术速递

stat统计学,共计32篇

【1】 Statistical Estimation and Nonlinear Filtering in Environmental Pollution 标题:环境污染中的统计估计与非线性滤波

作者:Qizhu Liang,Jie Xiong,Xingqiu Zhao 机构: Southern University of Science and Technology 链接:https://arxiv.org/abs/2107.04592 摘要:本文研究无限时间区间上的非线性滤波问题。待估计信号由一个含有未知参数的随机偏微分方程驱动。基于离散观测,首先导出了参数的强相合估计。利用Bayes公式给出的最优滤波器,验证了信号滤波器对不变测度的唯一性。然后建立了对最优滤波器的逼近,证明了由最优滤波器计算出的逼近滤波器每单位时间的路径平均距离在概率上收敛到零。最后给出了仿真结果。 摘要:This paper studies a nonlinear filtering problem over an infinite time interval. The signal to be estimated is driven by a stochastic partial differential equation involves unknown parameters. Based on discrete observation, strongly consistent estimators of the parameters are derived at first. With the optimal filter given by Bayes formula, the uniqueness of invariant measure for the signal-filter pair has been verified. The paper then establishes approximation to the optimal filter, showing that the pathwise average distance, per unit time, of the computed approximating filter from the optimal filter converges to zero in probability. Simulation results are presented at last.

【2】 The Bayesian Learning Rule 标题:贝叶斯学习规则

作者:Mohammad Emtiyaz Khan,Håvard Rue 机构:RIKEN Center for AI Project, Tokyo, Japan, H˚avard Rue, CEMSE Division, KAUST, Thuwal, Saudi Arabia 链接:https://arxiv.org/abs/2107.04562 摘要:我们证明了许多机器学习算法都是一种称为贝叶斯学习规则的算法的具体实例。该规则源于贝叶斯原理,产生了优化、深度学习和图形模型等领域的广泛算法。这包括经典算法,如岭回归、牛顿法和卡尔曼滤波,以及现代深度学习算法,如随机梯度下降、RMSprop和Dropout。推导这种算法的关键思想是利用自然梯度估计的候选分布来逼近后验分布。不同的候选分布会导致不同的算法,而对自然梯度的进一步逼近会导致这些算法的变体。我们的工作不仅统一、推广和改进了现有的算法,而且有助于我们设计新的算法。 摘要:We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.

【3】 Fast compression of MCMC output 标题:MCMC输出的快速压缩

作者:Nicolas Chopin,Gabriel Ducrocq 链接:https://arxiv.org/abs/2107.04552 摘要:我们提出了一种新的立方体细化方法,在控制变量可用的情况下压缩MCMC(Markov-chain-montecarlo)算法的输出。它相当于对初始MCMC样本重新采样(根据从控制变量导出的权重),同时使用[1]的立方体方法对这些控制变量的平均值施加等式约束。它的主要优点是CPU开销与原始样本大小N成线性关系,与压缩样本所需大小M成常数。这与Stein细化[2]相比更有利,Stein细化具有复杂性OpNM2q,并且需要目标对数密度梯度的可用性(这自动意味着控制变量的可用性)。我们的数值实验表明立方体细化在统计误差方面也是有竞争力的。 摘要:We propose cube thinning, a novel method for compressing the output of a MCMC (Markov chain Monte Carlo) algorithm when control variates are available. It amounts to resampling the initial MCMC sample (according to weights derived from control variates), while imposing equality constraints on averages of these control variates, using the cube method of [1]. Its main advantage is that its CPU cost is linear in N, the original sample size, and is constant in M, the required size for the compressed sample. This compares favourably to Stein thinning [2], which has complexity OpNM2q, and which requires the availability of the gradient of the target log-density (which automatically implies the availability of control variates). Our numerical experiments suggest that cube thinning is also competitive in terms of statistical error.

【4】 Higher Order Imprecise Probabilities and Statistical Testing 标题:高阶不精确概率与统计检验

作者:Justus Hibshman,Tim Weninger 机构:University of Notre Dame 链接:https://arxiv.org/abs/2107.04542 摘要:我们将不精确概率的标准credal集模型推广到包括高阶credal集——关于置信度的置信度。在此过程中,我们指定了代理的高阶信任度(credal集)在观察事件时如何更新。我们的模型开始用不精确的概率模型来解决标准问题,比如膨胀和信念惯性。我们推测,当高阶credal集包含所有可能的概率函数时,在极限情况下,最高阶置信度在一阶credal集上收敛形成均匀分布,其中我们用统计距离度量(总变差距离)来定义均匀性。有限模拟支持这一猜想。我们进一步建议,这种收敛性将总变差均匀分布表示为统计假设检验的一个自然的、特权的先验。 摘要:We generalize standard credal set models for imprecise probabilities to include higher order credal sets -- confidences about confidences. In doing so, we specify how an agent's higher order confidences (credal sets) update upon observing an event. Our model begins to address standard issues with imprecise probability models, like Dilation and Belief Inertia. We conjecture that when higher order credal sets contain all possible probability functions, then in the limiting case the highest order confidences converge to form a uniform distribution over the first order credal set, where we define uniformity in terms of the statistical distance metric (total variation distance). Finite simulation supports the conjecture. We further suggest that this convergence presents the total-variation-uniform distribution as a natural, privileged prior for statistical hypothesis testing.

【5】 Joint Modeling of Longitudinal and Survival Data with Censored Single-index Varying Coefficient Models 标题:基于删失单指标变系数模型的纵向数据与生存数据联合建模

作者:Jizi Shangguan 机构:George Washington University, Washington DC 链接:https://arxiv.org/abs/2107.04496 摘要:在医学和生物学研究中,纵向数据和生存数据类型是常见的。传统的统计模型大多考虑处理数据类型,例如纵向数据的线性混合模型和生存数据的COX模型,而它们不调整这两种不同数据类型之间的关联。我们希望有一种联合建模方法,既考虑数据类型,又考虑它们之间的依赖关系。本文将传统的单指标模型推广到一种新的联合建模方法中,将单指标分量改为变系数分量来处理纵向结果,并用非参数综合数据回归方法来处理生存分析中的随机删失问题。通过数值实验对有限样本的性能进行了评价。 摘要:In medical and biological research, longitudinal data and survival data types are commonly seen. Traditional statistical models mostly consider to deal with either of the data types, such as linear mixed models for longitudinal data, and the Cox models for survival data, while they do not adjust the association between these two different data types. It is desirable to have a joint modeling approach which accomadates both data types and the dependency between them. In this paper, we extend traditional single-index models to a new joint modeling approach, by replacing the single-index component to a varying coefficient component to deal with longitudinal outcomes, and accomadate the random censoring problem in survival analysis by nonparametric synthetic data regression for the link function. Numerical experiments are conducted to evaluate the finite sample performance.

【6】 Hypothetical estimands in clinical trials: a unification of causal inference and missing data methods 标题:临床试验中的假设估计:因果推论和缺失数据方法的统一

作者:Camila Olarte Parra,Rhian M. Daniel,Jonathan W. Bartlett 机构:Department of Mathematical Sciences, University of Bath, Division of Population Medicine, Cardiff University, and 备注:44 pages, 12 figures 链接:https://arxiv.org/abs/2107.04392 摘要:ICH E9附录引入了“并发事件”一词,指随机化后发生的事件,这些事件既可以排除对感兴趣结果的观察,也可以影响其解释。它提出了五种处理并发事件的策略,以形成一个估计,但没有提出估计的统计方法。在本文中,我们关注的是假设策略,其中治疗效果是在防止并发事件的假设情景下定义的。对于它的估计,我们考虑因果推理和缺失数据方法。我们建立了某些'因果推断估计'是相同的某些'缺失数据估计'。这些链接可以帮助那些熟悉一组方法但不熟悉另一组方法的人。此外,使用潜在结果表示法可以更清楚地说明缺失数据方法估计假设估计所依赖的假设。这有助于说明估计假设估计是否合理,以及在分析中应使用哪些数据。我们表明,假设估计可以通过利用并发事件发生后的数据来估计,这通常是不使用的。我们还提出了蒙特卡罗模拟,说明了实现和性能的方法在不同的设置。 摘要:The ICH E9 addendum introduces the term intercurrent event to refer to events that happen after randomisation and that can either preclude observation of the outcome of interest or affect its interpretation. It proposes five strategies for handling intercurrent events to form an estimand but does not suggest statistical methods for estimation. In this paper we focus on the hypothetical strategy, where the treatment effect is defined under the hypothetical scenario in which the intercurrent event is prevented. For its estimation, we consider causal inference and missing data methods. We establish that certain 'causal inference estimators' are identical to certain 'missing data estimators'. These links may help those familiar with one set of methods but not the other. Moreover, using potential outcome notation allows us to state more clearly the assumptions on which missing data methods rely to estimate hypothetical estimands. This helps to indicate whether estimating a hypothetical estimand is reasonable, and what data should be used in the analysis. We show that hypothetical estimands can be estimated by exploiting data after intercurrent event occurrence, which is typically not used. We also present Monte Carlo simulations that illustrate the implementation and performance of the methods in different settings.

【7】 Continual Learning in the Teacher-Student Setup: Impact of Task Similarity 标题:师生系统中的持续学习:任务相似性的影响

作者:Sebastian Lee,Sebastian Goldt,Andrew Saxe 机构: UK 2International School ofAdvanced Studies (SISSA), Italy 3Department of Ex-perimental Psychology, University of Oxford 备注:None 链接:https://arxiv.org/abs/2107.04384 摘要:连续学习——按顺序学习许多任务的能力对于人工学习系统来说是至关重要的。然而,深度网络的标准训练方法经常遭受灾难性遗忘,即学习新任务会抹去先前任务的知识。虽然灾难性遗忘给问题贴上了标签,但任务间干扰的理论原因仍然不清楚。在这里,我们试图通过研究师生互动中的持续学习来缩小理论与实践之间的差距。我们扩展了以往的分析工作,在两层网络的师生设置到多个教师。以每一位教师代表一个不同的任务为例,我们研究了教师之间的关系如何影响学生在任务转换时表现出的遗忘和迁移。与最近的研究一致,我们发现当任务依赖于相似的特征时,中间任务相似性会导致最大的遗忘。然而,特征相似性只是任务关联的一种方式。师生教学法允许我们在读出(隐到输出的权重)和特征(输入到隐到输出的权重)的层次上分离任务相似性。我们发现两种类型的相似性、初始转移/遗忘率、最大转移/遗忘和长期转移/遗忘之间存在复杂的相互作用。总之,这些结果有助于阐明导致灾难性遗忘的各种因素。 摘要:Continual learning-the ability to learn many tasks in sequence-is critical for artificial learning systems. Yet standard training methods for deep networks often suffer from catastrophic forgetting, where learning new tasks erases knowledge of earlier tasks. While catastrophic forgetting labels the problem, the theoretical reasons for interference between tasks remain unclear. Here, we attempt to narrow this gap between theory and practice by studying continual learning in the teacher-student setup. We extend previous analytical work on two-layer networks in the teacher-student setup to multiple teachers. Using each teacher to represent a different task, we investigate how the relationship between teachers affects the amount of forgetting and transfer exhibited by the student when the task switches. In line with recent work, we find that when tasks depend on similar features, intermediate task similarity leads to greatest forgetting. However, feature similarity is only one way in which tasks may be related. The teacher-student approach allows us to disentangle task similarity at the level of readouts (hidden-to-output weights) and features (input-to-hidden weights). We find a complex interplay between both types of similarity, initial transfer/forgetting rates, maximum transfer/forgetting, and long-term transfer/forgetting. Together, these results help illuminate the diverse factors contributing to catastrophic forgetting.

【8】 A Bayesian Semiparametric Vector Multiplicative Error Model 标题:一种贝叶斯半参数向量乘法误差模型

作者:Nicola Donelli,Stefano Peluso,Antonietta Mira 链接:https://arxiv.org/abs/2107.04354 摘要:正随机变量的多个时间序列之间的相互作用在从溢出效应到波动性相互依赖的各种金融应用中至关重要。在这种情况下,一个流行的模型是向量乘性误差模型(vMEM),它对条件均值的动态性提出了一个线性迭代结构,并受到乘性新息项的干扰。然而,vMEM的一个主要限制是它对随机创新项分布的限制性假设。针对vMEM的这一主要缺点,提出了一种贝叶斯半参数方法,该方法将新息向量建模为多维核的无限位置-尺度混合模型,并在正正正态上提供支持。通过在模型的参数扩展无约束形式上建立切片取样器,避免了正正正态约束带来的计算复杂性。将该方法应用于模拟数据和实际数据,得到了一个在拟合能力和预测能力方面优于经典方法的柔性指标。 摘要:Interactions among multiple time series of positive random variables are crucial in diverse financial applications, from spillover effects to volatility interdependence. A popular model in this setting is the vector Multiplicative Error Model (vMEM) which poses a linear iterative structure on the dynamics of the conditional mean, perturbed by a multiplicative innovation term. A main limitation of vMEM is however its restrictive assumption on the distribution of the random innovation term. A Bayesian semiparametric approach that models the innovation vector as an infinite location-scale mixture of multidimensional kernels with support on the positive orthant is used to address this major shortcoming of vMEM. Computational complications arising from the constraints to the positive orthant are avoided through the formulation of a slice sampler on the parameter-extended unconstrained version of the model. The method is applied to simulated and real data and a flexible specification is obtained that outperforms the classical ones in terms of fitting and predictive power.

【9】 Generalization of the Change of Variables Formula with Applications to Residual Flows 标题:变量变换公式的推广及其在剩余流中的应用

作者:Niklas Koenen,Marvin N. Wright,Peter Maaß,Jens Behrmann 机构: Universityof Bremen, Germany 2Leibniz Institute for PreventionResearch and Epidemiology – BIPS 链接:https://arxiv.org/abs/2107.04346 摘要:标准化流程利用变量变化公式(CVF)定义灵活的密度模型。然而,CVF中光滑变换(微分同胚)的要求对这些模型的构造提出了重大挑战。为了扩大流的设计空间,我们引入$\mathcal{L}$-微分同胚作为广义变换,这可能违反零Lebesgue测度集的这些要求。这种松弛允许使用非光滑激活函数,例如ReLU。最后,我们将所得结果应用于平面、径向和收缩残余流。 摘要:Normalizing flows leverage the Change of Variables Formula (CVF) to define flexible density models. Yet, the requirement of smooth transformations (diffeomorphisms) in the CVF poses a significant challenge in the construction of these models. To enlarge the design space of flows, we introduce $\mathcal{L}$-diffeomorphisms as generalized transformations which may violate these requirements on zero Lebesgue-measure sets. This relaxation allows e.g. the use of non-smooth activation functions such as ReLU. Finally, we apply the obtained results to planar, radial, and contractive residual flows.

【10】 Parsimonious Hidden Markov Models for Matrix-Variate Longitudinal Data 标题:矩阵变量纵向数据的简约隐马尔可夫模型

作者:Tomarchio Salvatore D.,Punzo Antonio,Maruotti Antonello 链接:https://arxiv.org/abs/2107.04330 摘要:隐马尔可夫模型(HMMs)在单变量和多变量文献中有着广泛的应用。然而,近年来对矩阵变量数据的分析越来越感兴趣。在这篇手稿中,我们介绍了矩阵变量纵向数据的HMMs,通过假设每个隐藏状态下的矩阵正态分布。这些数据以四路阵列排列。为了解决可能的过度参数化问题,我们考虑协方差矩阵的谱分解,导致总共98个HMM。讨论了参数估计的期望条件最大化算法。首先从参数恢复、计算时间和模型选择等方面对所提出的模型进行了仿真研究。然后,将这些数据与意大利各省过去16年按性别和年龄等级评估的失业率的四方真实数据集进行拟合。 摘要:Hidden Markov models (HMMs) have been extensively used in the univariate and multivariate literature. However, there has been an increased interest in the analysis of matrix-variate data over the recent years. In this manuscript we introduce HMMs for matrix-variate longitudinal data, by assuming a matrix normal distribution in each hidden state. Such data are arranged in a four-way array. To address for possible overparameterization issues, we consider the spectral decomposition of the covariance matrices, leading to a total of 98 HMMs. An expectation-conditional maximization algorithm is discussed for parameter estimation. The proposed models are firstly investigated on simulated data, in terms of parameter recovery, computational times and model selection. Then, they are fitted to a four-way real data set concerning the unemployment rates of the Italian provinces, evaluated by gender and age classes, over the last 16 years.

【11】 Prediction of butt rot volume in Norway spruce forest stands using harvester, remotely sensed and environmental data 标题:利用收割机、遥感和环境数据预测挪威云杉林的腐烂量

作者:Janne Räty,Johannes Breidenbach,Marius Hauglin,Rasmus Astrup 机构:Norwegian Institute of Bioeconomy Research (NIBIO), Høgskoleveien , Ås, Norway 备注:22 pages, 6 figures, 3 tables 链接:https://arxiv.org/abs/2107.04316 摘要:与挪威云杉(Picea abies[L.]castlest.)相关的倒腐病(BR)危害造成了北半球木材生产的巨大经济损失。在森林管理中,森林自然灾害的信息对于优化决策至关重要,而森林信息系统中通常缺乏森林自然灾害的地图。我们利用186026根树干的收获机信息(清晰切割)、遥感和环境数据(如气候和地形特征),预测了挪威林分水平上BR对木材的破坏量。我们利用随机森林(RF)模型和两组预测变量:(1)收获后可用的预测变量(理论案例)和(2)收获前可用的预测变量(映射案例)。我们发现,表征森林成熟度的森林属性,例如基于遥感的高度、采伐木材体积和胸径的二次平均直径,是最重要的预测变量。从机载激光扫描数据和Sentinel-2图像获得的遥感预测变量比环境变量更重要。带leave-out交叉验证的理论案例的RMSE为11.4$m^3ha^{-1}$(pseudo$R^2$:0.66),而映射案例的pseudo$R^2$为0.60。在交叉验证中,当采伐林分空间上不同的k均值聚类作为单位时,与制图案例相关的RMSE值和伪R^2分别为15.6$m^3ha^{-1}$和0.37。这表明,了解空间近缘林分的BR状况对于获得满意的BR损害图的错误率是非常重要的。 摘要:Butt rot (BR) damages associated with Norway spruce (Picea abies [L.] Karst.) account for considerable economic losses in timber production across the northern hemisphere. While information on BR damages is critical for optimal decision-making in forest management, the maps of BR damages are typically lacking in forest information systems. We predicted timber volume damaged by BR at the stand-level in Norway using harvester information of 186,026 stems (clear-cuts), remotely sensed, and environmental data (e.g. climate and terrain characteristics). We utilized random forest (RF) models with two sets of predictor variables: (1) predictor variables available after harvest (theoretical case) and (2) predictor variables available prior to harvest (mapping case). We found that forest attributes characterizing the maturity of forest, such as remote sensing-based height, harvested timber volume and quadratic mean diameter at breast height, were among the most important predictor variables. Remotely sensed predictor variables obtained from airborne laser scanning data and Sentinel-2 imagery were more important than the environmental variables. The theoretical case with a leave-stand-out cross-validation achieved an RMSE of 11.4 $m^3ha^{-1}$ (pseudo $R^2$: 0.66) whereas the mapping case resulted in a pseudo $R^2$ of 0.60. When the spatially distinct k-means clusters of harvested forest stands were used as units in the cross-validation, the RMSE value and pseudo $R^2$ associated with the mapping case were 15.6 $m^3ha^{-1}$ and 0.37, respectively. This indicates that the knowledge about the BR status of spatially close stands is of high importance for obtaining satisfactory error rates in the mapping of BR damages.

【12】 Two Sample Test for Extrinsic Antimeans on Planar Kendall Shape Spaces with an Application to Medical Imaging 标题:平面Kendall形状空间的两样本反均值检验及其在医学成像中的应用

作者:Aaid Algahtani,Vic Patrangenaru 链接:https://arxiv.org/abs/2107.04230 摘要:本文发展了比较紧致流形上两个非本征反平均的非参数推理过程。基于欧氏空间中紧流形的任意嵌入,本文导出了两个非本征样本反平均的渐近卡方检验。应用于复射影空间$CP^{k-2}$w.r.t.上的分布。Veronese-Whitney嵌入是Kendall平面形状空间的子流形表示。给出了两个医学图像分析应用实例。 摘要:In this paper one develops nonparametric inference procedures for comparing two extrinsic antimeans on compact manifolds. Based on recent Central limit theorems for extrinsic sample antimeans w.r.t. an arbitrary embedding of a compact manifold in a Euclidean space, one derives an asymptotic chi square test for the equality of two extrinsic antimeans. Applications are given to distributions on complex projective space $CP^{k-2}$ w.r.t. the Veronese-Whitney embedding, that is a submanifold representation for the Kendall planar shape space. Two medical imaging analysis applications are also given.

【13】 From Many to One: Consensus Inference in a MIP 标题:从多到一:MIP中的共识推理

作者:Noel Cressie,Michael Bertolacci,Andrew Zammit-Mangion 机构: School of Mathematics and Applied Statistics, University of Wollongong, Australia, arXiv Version: , Jul 链接:https://arxiv.org/abs/2107.04208 摘要:模型相互比较项目(MIP)由团队组成,每个团队对相同的潜在量进行估计(例如,到2070年的温度预测),估计值的分布表明了他们的不确定性。它认识到,科学家群体不会完全同意,但在分歧范围内寻找共识和信息是有价值的。团队产出的简单平均值给出了一个一致的估计,但它没有认识到某些产出比其他产出更可变。方差统计分析(ANOVA)模型提供了一种方法,可以获得方差最小的产出加权一致性估计,因此可能的“一西格玛”和“两西格玛”区间最紧。模依赖MIP输出之间,方差分析方法权重一个团队的输出成反比的变化。当外部验证数据可用于评估每个MIP输出的保真度时,方差分析权重还可以为贝叶斯模型平均提供先验分布,以产生一致性估计。我们使用二氧化碳通量反演的MIP来说明基于方差分析的加权和随后的一致性推断。 摘要:A Model Intercomparison Project (MIP) consists of teams who each estimate the same underlying quantity (e.g., temperature projections to the year 2070), and the spread of the estimates indicates their uncertainty. It recognizes that a community of scientists will not agree completely but that there is value in looking for a consensus and information in the range of disagreement. A simple average of the teams' outputs gives a consensus estimate, but it does not recognize that some outputs are more variable than others. Statistical analysis of variance (ANOVA) models offer a way to obtain a weighted consensus estimate of outputs with a variance that is the smallest possible and hence the tightest possible 'one-sigma' and 'two-sigma' intervals. Modulo dependence between MIP outputs, the ANOVA approach weights a team's output inversely proportional to its variation. When external verification data are available for evaluating the fidelity of each MIP output, ANOVA weights can also provide a prior distribution for Bayesian Model Averaging to yield a consensus estimate. We use a MIP of carbon dioxide flux inversions to illustrate the ANOVA-based weighting and subsequent consensus inferences.

【14】 Diagonal Nonlinear Transformations Preserve Structure in Covariance and Precision Matrices 标题:对角非线性变换在协方差和精度矩阵中保持结构

作者:Rebecca E Morrison,Ricardo Baptista,Estelle L Basor 链接:https://arxiv.org/abs/2107.04136 摘要:对于多元正态分布,协方差矩阵和精度矩阵的稀疏性编码了独立性和条件独立性的完整信息。对于一般分布,协方差矩阵和精度矩阵揭示了变量之间的相关性和所谓的偏相关,但这些通常在独立性属性方面没有任何对应关系。在本文中,我们证明了对于一类非高斯分布,这些对应关系对于协方差和精度都是精确的和近似的。分布——有时被称为“非正态”——由多元正态随机变量的对角变换给出。我们提供了几个分析和数值例子来说明这些结果。 摘要:For a multivariate normal distribution, the sparsity of the covariance and precision matrices encodes complete information about independence and conditional independence properties. For general distributions, the covariance and precision matrices reveal correlations and so-called partial correlations between variables, but these do not, in general, have any correspondence with respect to independence properties. In this paper, we prove that, for a certain class of non-Gaussian distributions, these correspondences still hold, exactly for the covariance and approximately for the precision. The distributions -- sometimes referred to as "nonparanormal" -- are given by diagonal transformations of multivariate normal random variables. We provide several analytic and numerical examples illustrating these results.

【15】 Many Objective Bayesian Optimization 标题:多目标贝叶斯优化

作者:Lucia Asencio Martín,Eduardo C. Garrido-Merchán 机构: Garrido-Merch´anUniversidad Aut´onoma de Madrid 备注:arXiv admin note: text overlap with arXiv:2101.08061 链接:https://arxiv.org/abs/2107.04126 摘要:一些实际问题需要评估昂贵且有噪声的目标函数。此外,这些目标函数的解析表达式可能是未知的。这些函数被称为黑匣子,例如,估计机器学习算法的泛化误差,并根据其超参数计算其预测时间。多目标贝叶斯优化(MOBO)是一组已成功应用于黑箱同时优化的方法。具体来说,BO方法依赖于目标函数的概率模型,通常是高斯过程。该模型生成目标的预测分布。然而,当多目标优化问题中的目标个数为3个或3个以上时,即多目标设置时,MOBO方法存在问题。特别是,BO过程的代价更高,因为考虑到更多的目标,通过超体积计算解的质量的代价也更高,最重要的是,我们必须评估每个目标函数,浪费昂贵的计算、经济或其他资源。然而,由于优化问题涉及到更多的目标,其中一些目标很可能是多余的,并且没有添加有关问题解决方案的信息。提出了一种表示GP预测分布相似程度的度量方法。我们还提出了一个多目标贝叶斯优化算法,该算法使用这个度量来确定两个目标是否冗余。该算法在发现相似度的情况下停止对其中一个进行评价,既节省了资源,又不影响多目标BO算法的性能。我们在一组玩具、合成、基准和真实的实验中展示了经验证据,证明了GPs预测分布度量和算法的有效性。 摘要:Some real problems require the evaluation of expensive and noisy objective functions. Moreover, the analytical expression of these objective functions may be unknown. These functions are known as black-boxes, for example, estimating the generalization error of a machine learning algorithm and computing its prediction time in terms of its hyper-parameters. Multi-objective Bayesian optimization (MOBO) is a set of methods that has been successfully applied for the simultaneous optimization of black-boxes. Concretely, BO methods rely on a probabilistic model of the objective functions, typically a Gaussian process. This model generates a predictive distribution of the objectives. However, MOBO methods have problems when the number of objectives in a multi-objective optimization problem are 3 or more, which is the many objective setting. In particular, the BO process is more costly as more objectives are considered, computing the quality of the solution via the hyper-volume is also more costly and, most importantly, we have to evaluate every objective function, wasting expensive computational, economic or other resources. However, as more objectives are involved in the optimization problem, it is highly probable that some of them are redundant and not add information about the problem solution. A measure that represents how similar are GP predictive distributions is proposed. We also propose a many objective Bayesian optimization algorithm that uses this metric to determine whether two objectives are redundant. The algorithm stops evaluating one of them if the similarity is found, saving resources and not hurting the performance of the multi-objective BO algorithm. We show empirical evidence in a set of toy, synthetic, benchmark and real experiments that GPs predictive distributions of the effectiveness of the metric and the algorithm.

【16】 OCDE: Odds Conditional Density Estimator 标题:OCDE:ODDS条件密度估计器

作者:Alex Akira Okuno,Felipe Maia Polo 机构: uni-Equal contribution 1Department of Statistics, Institute of Math-ematics and Statistics, University of S˜ao Paulo, University of Michigan, USA 3Advanced Institutefor Artificial Intelligence (AI 2) 链接:https://arxiv.org/abs/2107.04118 摘要:条件密度估计(CDE)模型在许多统计应用中都是有用的,特别是因为用全条件密度估计代替了传统的回归点估计,揭示了更多有关随机变量不确定性的信息。在本文中,我们提出了一种新的方法称为优势条件密度估计(OCDE)估计条件密度在监督学习计划。其主要思想是很难估计$p{x,y}$和$p{x}$来估计条件密度$p{y}x}$,但是通过引入工具分布,我们将CDE问题转化为一个几率估计问题,或者类似地,训练一个二元概率分类器。我们使用模拟数据演示OCDE的工作原理,然后在实际数据中与其他已知的最先进的CDE方法进行性能测试。总体而言,OCDE在实际数据集上与这些方法相比具有一定的竞争力。 摘要:Conditional density estimation (CDE) models can be useful for many statistical applications, especially because the full conditional density is estimated instead of traditional regression point estimates, revealing more information about the uncertainty of the random variable of interest. In this paper, we propose a new methodology called Odds Conditional Density Estimator (OCDE) to estimate conditional densities in a supervised learning scheme. The main idea is that it is very difficult to estimate $p_{x,y}$ and $p_{x}$ in order to estimate the conditional density $p_{y|x}$, but by introducing an instrumental distribution, we transform the CDE problem into a problem of odds estimation, or similarly, training a binary probabilistic classifier. We demonstrate how OCDE works using simulated data and then test its performance against other known state-of-the-art CDE methods in real data. Overall, OCDE is competitive compared with these methods in real datasets.

【17】 Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning 标题:高效的基于模型的多智能体平均场强化学习

作者:Barna Pasztor,Ilija Bogunovic,Andreas Krause 机构:ETH Zürich 备注:28 pages, 2 figures, Preprint, Submitted to NeurIPS 2021 链接:https://arxiv.org/abs/2107.04050 摘要:多智能体系统中的学习具有很大的挑战性,这是由于智能体之间的交互所带来的固有复杂性。我们通过平均场控制(MFC)来处理具有大量交互代理(如群集)的系统。MFC考虑了一个渐近无限的群体,这个群体的目标是使集体报酬最大化。具体而言,我们考虑的情况下,未知的系统动力学的目标是同时优化奖励和学习经验。我们提出了一个有效的基于模型的强化学习算法$\text{M}^3\text{-UCRL}$,该算法以片段形式运行并可证明地解决了这个问题$\text{M}^3\text{-UCRL}$在策略学习期间使用置信上限来平衡探索和利用。我们的主要理论贡献是通过一种新的平均场类型分析获得的第一个基于模型的MFC RL的一般遗憾界$\text{M}^3\text{-UCRL}$可以用不同的模型(如神经网络或高斯过程)实例化,并与神经网络策略学习有效结合。我们实证地证明了$\text{M}^3\text{-UCRL}$在群体运动问题上的收敛性,该群体运动问题是控制无限多个个体寻求最大的位置依赖报酬和避免拥挤区域。 摘要:Learning in multi-agent systems is highly challenging due to the inherent complexity introduced by agents' interactions. We tackle systems with a huge population of interacting agents (e.g., swarms) via Mean-Field Control (MFC). MFC considers an asymptotically infinite population of identical agents that aim to collaboratively maximize the collective reward. Specifically, we consider the case of unknown system dynamics where the goal is to simultaneously optimize for the rewards and learn from experience. We propose an efficient model-based reinforcement learning algorithm $\text{M}^3\text{-UCRL}$ that runs in episodes and provably solves this problem. $\text{M}^3\text{-UCRL}$ uses upper-confidence bounds to balance exploration and exploitation during policy learning. Our main theoretical contributions are the first general regret bounds for model-based RL for MFC, obtained via a novel mean-field type analysis. $\text{M}^3\text{-UCRL}$ can be instantiated with different models such as neural networks or Gaussian Processes, and effectively combined with neural network policy learning. We empirically demonstrate the convergence of $\text{M}^3\text{-UCRL}$ on the swarm motion problem of controlling an infinite population of agents seeking to maximize location-dependent reward and avoid congested areas.

【18】 Entropy, Information, and the Updating of Probabilities 标题:熵、信息与概率的更新

作者:Ariel Caticha 机构:Department of Physics, University at Albany-SUNY, Albany, NY , USA. 备注:28 pages. Invited paper to appear in Entropy in the special volume "Statistical Foundations of Entropy", ed. by P. Jizba and J. Korbel. arXiv admin note: text overlap with arXiv:1412.5644 链接:https://arxiv.org/abs/2107.04529 摘要:本文回顾了最大熵方法作为一般推理框架的一种特殊方法。讨论强调了派生语中的语用因素。信息的认知概念是根据它与理想理性主体的贝叶斯信念的关系来定义的。通过消除归纳过程,设计了由先验概率分布到后验概率分布的更新方法。对数相对熵是唯一的更新工具,具有普遍适用性(b) 承认先前信息的价值;(c)承认科学中独立概念所起的特权作用。由此产生的框架——ME方法——可以处理任意先验和任意约束。它将MaxEnt和Bayes规则作为特例,将熵方法和贝叶斯方法统一为一个通用的推理方案。ME方法不仅仅是选择单一的后验分布,还解决了其他分布的可能性要小得多的问题,这为波动和大偏差理论提供了直接的桥梁。 摘要:This paper is a review of a particular approach to the method of maximum entropy as a general framework for inference. The discussion emphasizes the pragmatic elements in the derivation. An epistemic notion of information is defined in terms of its relation to the Bayesian beliefs of ideally rational agents. The method of updating from a prior to a posterior probability distribution is designed through an eliminative induction process. The logarithmic relative entropy is singled out as the unique tool for updating that (a) is of universal applicability; (b) that recognizes the value of prior information; and (c) that recognizes the privileged role played by the notion of independence in science. The resulting framework -- the ME method -- can handle arbitrary priors and arbitrary constraints. It includes MaxEnt and Bayes' rule as special cases and, therefore, it unifies entropic and Bayesian methods into a single general inference scheme. The ME method goes beyond the mere selection of a single posterior, but also addresses the question of how much less probable other distributions might be, which provides a direct bridge to the theories of fluctuations and large deviations.

【19】 Online Adaptation to Label Distribution Shift 标题:适应标签分销转变的在线调整

作者:Ruihan Wu,Chuan Guo,Yi Su,Kilian Q. Weinberger 机构:Cornell University, Facebook AI Research 链接:https://arxiv.org/abs/2107.04520 摘要:机器学习模型在实际应用中经常会遇到分布变化。本文主要研究在线环境下标签分布变化的自适应问题,在线环境下测试时间标签分布是不断变化的,模型必须在不观察真实标签的情况下动态地适应这种变化。利用一个新的分析,我们表明,缺乏真正的标签并不妨碍估计预期的测试损失,这使得减少在线标签转移适应传统的在线学习。基于这一观察结果,我们提出了受经典在线学习技术启发的自适应算法,如跟随引导(FTL)和在线梯度下降(OGD),并推导了它们的遗憾界。我们在模拟和真实世界的标签分布转移下验证了我们的研究结果,并表明OGD对于各种具有挑战性的标签转移场景特别有效和稳健。 摘要:Machine learning models often encounter distribution shifts when deployed in the real world. In this paper, we focus on adaptation to label distribution shift in the online setting, where the test-time label distribution is continually changing and the model must dynamically adapt to it without observing the true label. Leveraging a novel analysis, we show that the lack of true label does not hinder estimation of the expected test loss, which enables the reduction of online label shift adaptation to conventional online learning. Informed by this observation, we propose adaptation algorithms inspired by classical online learning techniques such as Follow The Leader (FTL) and Online Gradient Descent (OGD) and derive their regret bounds. We empirically verify our findings under both simulated and real world label distribution shifts and show that OGD is particularly effective and robust to a variety of challenging label shift scenarios.

【20】 Optimal Gradient-based Algorithms for Non-concave Bandit Optimization 标题:基于最优梯度的非凹Bandit优化算法

作者:Baihe Huang,Kaixuan Huang,Sham M. Kakade,Jason D. Lee,Qi Lei,Runzhe Wang,Jiaqi Yang 机构:Peking University, Princeton University, University of Washington, Microsoft Research, Tsinghua University 链接:https://arxiv.org/abs/2107.04518 摘要:线性报酬和凹报酬的土匪问题已经得到了广泛的研究,但对非凹报酬土匪问题的研究相对较少。研究了一大类未知报酬函数为非凹函数的bandit问题,包括低阶广义线性bandit问题和带多项式激活的双层神经网络bandit问题。对于低秩广义线性bandit问题,我们在维数上给出了一个minimax最优算法,驳斥了[LMT21,JWWN19]中的两个猜想。我们的算法是基于一个统一的零阶优化范式,适用于极为普遍性,并获得最佳利率在几个结构化多项式设置(在维度)。我们进一步证明了我们的算法在RL生成模型环境中的适用性,从而提高了样本复杂度。最后,我们证明了标准的乐观算法(如UCB)是维数次优的。在具有无噪声报酬的神经网络环境(多项式激活函数)中,我们提出了一种样本复杂度等于内在代数维数的bandit算法。再次,我们证明了乐观方法具有更差的样本复杂度,即在外在维度上的多项式(在多项式次数上可能是指数级的)。 摘要:Bandit problems with linear or concave reward have been extensively studied, but relatively few works have studied bandits with non-concave reward. This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem. For the low-rank generalized linear bandit problem, we provide a minimax-optimal algorithm in the dimension, refuting both conjectures in [LMT21, JWWN19]. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality and attains optimal rates in several structured polynomial settings (in the dimension). We further demonstrate the applicability of our algorithms in RL in the generative model setting, resulting in improved sample complexity over prior approaches. Finally, we show that the standard optimistic algorithms (e.g., UCB) are sub-optimal by dimension factors. In the neural net setting (with polynomial activation functions) with noiseless reward, we provide a bandit algorithm with sample complexity equal to the intrinsic algebraic dimension. Again, we show that optimistic approaches have worse sample complexity, polynomial in the extrinsic dimension (which could be exponentially worse in the polynomial degree).

【21】 Staged tree models with toric structure 标题:具有环状结构的阶梯树模型

作者:Christiane Görgen,Aida Maraj,Lisa Nicklasson 机构:Max Planck Institute for Mathematics in the Sciences 链接:https://arxiv.org/abs/2107.04516 摘要:分阶段树模型是一个离散的统计模型,用于编码事件之间的关系。这些模型是通过具有彩色顶点的有向树来实现的。在代数几何术语中,模型由复曲面簇内的点组成。对于某些树,称为平衡树,该模型实际上是复曲面变化和概率单纯形的交集。这给了模型一个直观的描述,并具有计算优势。在本文中,我们证明了一类具有复曲面结构的分阶段树模型,如果我们允许坐标的变化,它可以扩展到平衡情形之外。所有阶段树模型是否都具有复曲面结构是一个开放的问题。 摘要:A staged tree model is a discrete statistical model encoding relationships between events. These models are realised by directed trees with coloured vertices. In algebro-geometric terms, the model consists of points inside a toric variety. For certain trees, called balanced, the model is in fact the intersection of the toric variety and the probability simplex. This gives the model a straightforward description, and has computational advantages. In this paper we show that the class of staged tree models with a toric structure extends far outside of the balanced case, if we allow a change of coordinates. It is an open problem whether all staged tree models have toric structure.

【22】 Batch Inverse-Variance Weighting: Deep Heteroscedastic Regression 标题:批次逆方差加权:深度异方差回归

作者:Vincent Mai,Waleed Khamies,Liam Paull 机构:Whaleed Khamies, Robotics and Embodied AI Lab, Mila - Quebec Institute of Artificial Intelligence, Université de Montréal, Canada, Canada CIFAR AI Chair 备注:Accepted at the Uncertainty in Deep Learning (UDL) workshop at ICML 2021 链接:https://arxiv.org/abs/2107.04497 摘要:异方差回归是监督学习的任务,其中每个标签都受到来自不同分布的噪声的影响。这种噪声可能是由标记过程引起的,并且会对学习算法的性能产生负面影响,因为它违反了i.i.d.的假设。然而,在许多情况下,标签过程能够估计每个标签的这种分布的方差,这可以用作减轻这种影响的附加信息。基于Gauss-Markov定理,提出了一种逆方差加权均方误差的神经网络参数优化方法。我们引入了一种对近地真值样本具有鲁棒性的损失函数批量逆方差,并允许控制有效学习率。实验结果表明,与L2丢失、逆方差加权以及基于滤波的基线相比,BIV算法在两个噪声数据集上都显著提高了网络的性能。 摘要:Heteroscedastic regression is the task of supervised learning where each label is subject to noise from a different distribution. This noise can be caused by the labelling process, and impacts negatively the performance of the learning algorithm as it violates the i.i.d. assumptions. In many situations however, the labelling process is able to estimate the variance of such distribution for each label, which can be used as an additional information to mitigate this impact. We adapt an inverse-variance weighted mean square error, based on the Gauss-Markov theorem, for parameter optimization on neural networks. We introduce Batch Inverse-Variance, a loss function which is robust to near-ground truth samples, and allows to control the effective learning rate. Our experimental results show that BIV improves significantly the performance of the networks on two noisy datasets, compared to L2 loss, inverse-variance weighting, as well as a filtering-based baseline.

【23】 Bayesian Error-in-Variables Models for the Identification of Power Networks 标题:用于电网辨识的贝叶斯变量误差模型

作者:Jean-Sébastien Brouillon,Emanuele Fabbiani,Pulkit Nahata,Florian Dörfler,Giancarlo Ferrari-Trecate 链接:https://arxiv.org/abs/2107.04480 摘要:间歇式可再生能源发电的日益一体化,特别是在配电层面,需要先进的规划和优化方法,这取决于电网的知识,特别是捕捉电网拓扑和线路参数的导纳矩阵。然而,导纳矩阵的可靠估计可能会丢失,或者对于时间变化的网格很快就会过时。在这项工作中,我们提出了一种数据驱动的识别方法,利用电压和电流测量收集的微型永磁同步电机。更准确地说,我们首先提出了一种最大似然方法,然后利用最大后验估计的原理向贝叶斯框架发展。与现有的大多数方法相比,我们的方法不仅考虑了电压和电流数据的测量噪声,而且能够利用可用的先验信息,如稀疏模式和已知参数。通过对基准算例的仿真表明,与其他算法相比,该方法具有更高的精度。 摘要:The increasing integration of intermittent renewable generation, especially at the distribution level,necessitates advanced planning and optimisation methodologies contingent on the knowledge of thegrid, specifically the admittance matrix capturing the topology and line parameters of an electricnetwork. However, a reliable estimate of the admittance matrix may either be missing or quicklybecome obsolete for temporally varying grids. In this work, we propose a data-driven identificationmethod utilising voltage and current measurements collected from micro-PMUs. More precisely,we first present a maximum likelihood approach and then move towards a Bayesian framework,leveraging the principles of maximum a posteriori estimation. In contrast with most existing con-tributions, our approach not only factors in measurement noise on both voltage and current data,but is also capable of exploiting available a priori information such as sparsity patterns and knownline parameters. Simulations conducted on benchmark cases demonstrate that, compared to otheralgorithms, our method can achieve significantly greater accuracy.

【24】 Identifying latent shared mobility preference segments in low-income communities: ride-hailing, fixed-route bus, and mobility-on-demand transit 标题:识别低收入社区中潜在的共享移动性偏好部分:叫车、固定路线公交和按需移动性公交

作者:Xinyi Wang,Xiang Yan,Xilei Zhao,Zhuoxuan Cao 机构:School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlantic Drive, Atlanta, GA 链接:https://arxiv.org/abs/2107.04412 摘要:移动点播(MOD)和移动即服务(MaaS)是近年来广泛流行的概念,其特点是集成了各种共享使用移动选项。虽然这些概念为旅行者带来了巨大的好处,但他们对技术的高度依赖引发了公平问题,因为在按需流动的时代,社会弱势群体可能被排除在外。本文调查了低收入人群对MOD公交服务(综合固定线路和按需服务)的潜在接受程度。特别地,我们分析了人们对三种共用交通服务的潜在态度,包括代驾服务、固定路线交通和MOD交通。我们对从底特律和密歇根州Ypsilanti的低收入社区抽样的825名调查对象进行潜在类聚类分析。我们确定了三个潜在的部分:共享模式的爱好者,共享模式的对手,和固定路线过境忠诚。共享模式发烧友阶层的人经常使用代驾服务,住在交通便利的地区,他们很可能是MOD transit服务的早期使用者。共享模式对手部分主要包括对共享移动选项缺乏兴趣的车主。固定路线运输忠诚者部分包括相当一部分低收入个人,他们在使用国防部运输时面临技术障碍。我们还发现,男性、大学毕业生、车主、有移动数据计划的人以及生活在交通便利性较差地区的人对交通服务的偏好程度较高。最后,我们提出了发展更方便、更公平的交通运输服务的政策建议。 摘要:Concepts of Mobility-on-Demand (MOD) and Mobility as a Service (MaaS), which feature the integration of various shared-use mobility options, have gained widespread popularity in recent years. While these concepts promise great benefits to travelers, their heavy reliance on technology raises equity concerns as socially disadvantaged population groups can be left out in an era of on-demand mobility. This paper investigates the potential uptake of MOD transit services (integrated fixed-route and on-demand services) among travelers living in low-income communities. Specially, we analyze people's latent attitude towards three shared-use mobility services, including ride-hailing services, fixed-route transit, and MOD transit. We conduct a latent class cluster analysis of 825 survey respondents sampled from low-income neighborhoods in Detroit and Ypsilanti, Michigan. We identified three latent segments: shared-mode enthusiast, shared-mode opponent, and fixed-route transit loyalist. People from the shared-mode enthusiast segment often use ride-hailing services and live in areas with poor transit access, and they are likely to be the early adopters of MOD transit services. The shared-mode opponent segment mainly includes vehicle owners who lack interests in shared mobility options. The fixed-route transit loyalist segment includes a considerable share of low-income individuals who face technological barriers to use the MOD transit. We also find that males, college graduates, car owners, people with a mobile data plan, and people living in poor-transit-access areas have a higher level of preferences for MOD transit services. We conclude with policy recommendations for developing more accessible and equitable MOD transit services.

【25】 Block Alternating Bregman Majorization Minimization with Extrapolation 标题:挡路的布雷格曼优化最小化与外推交替进行

作者:Le Thi Khanh Hien,Duy Nhat Phan,Nicolas Gillis,Masoud Ahookhosh,Panagiotis Patrinos 机构: Carnegie Mellon University 链接:https://arxiv.org/abs/2107.04395 摘要:本文考虑一类非光滑非凸优化问题,其目标是块相对光滑函数和一个适当的和下半连续的分块函数的和。虽然块近端梯度(BPG)方法对块$L$-光滑函数类的分析已经成功地推广到了处理块相对光滑函数类的Bregman-BPG方法,但加速Bregman-BPG方法是一种稀缺的设计方法。借鉴Nesterov型加速和优化最小化方案,提出了一种带外推的块交替Bregman优化最小化框架(BMME)。我们在温和的假设下证明了BMME对一阶平稳点的后续收敛性,并在较强的条件下研究了BMME的全局收敛性。我们说明了BMME在惩罚正交非负矩阵分解问题上的有效性。 摘要:In this paper, we consider a class of nonsmooth nonconvex optimization problems whose objective is the sum of a block relative smooth function and a proper and lower semicontinuous block separable function. Although the analysis of block proximal gradient (BPG) methods for the class of block $L$-smooth functions have been successfully extended to Bregman BPG methods that deal with the class of block relative smooth functions, accelerated Bregman BPG methods are scarce and challenging to design. Taking our inspiration from Nesterov-type acceleration and the majorization-minimization scheme, we propose a block alternating Bregman Majorization-Minimization framework with Extrapolation (BMME). We prove subsequential convergence of BMME to a first-order stationary point under mild assumptions, and study its global convergence under stronger conditions. We illustrate the effectiveness of BMME on the penalized orthogonal nonnegative matrix factorization problem.

【26】 Specialists Outperform Generalists in Ensemble Classification 标题:专家在合奏分类中的表现优于多面手

作者:Sascha Meyen,Frieder Göppert,Helen Alber,Ulrike von Luxburg,Volker H. Franz 机构:Frieder G¨oppert, Department of Computer Science, University of T¨ubingen, T¨ubingen, Germany, Max Planck Institute for Intelligent Systems, T¨ubingen, Germany 链接:https://arxiv.org/abs/2107.04381 摘要:考虑一组$K$个体分类器,其精度是已知的。在接收到一个测试点时,每个分类器输出一个预测的标签和对该特定测试点的预测的置信度。在本文中,我们讨论的问题是,我们是否可以确定的准确性的集合。令人惊讶的是,即使在该设置中以统计上最优的方式组合分类器,也不能像在置信加权多数表决的标准设置中那样,从单个分类器的精度来计算得到的集成分类器的精度。我们证明了集合精度的严格上下界。我们显式地构造达到上下界的个体分类器:专家和通才。我们的理论结果具有非常实际的意义:(1)如果我们使用集成方法,并且可以选择从零开始构造我们的个体(独立)分类器,那么我们应该瞄准专家分类器而不是多面手(2) 我们的界限可以用来确定至少需要多少分类器来达到所需的集成精度。最后,我们通过考虑真实标签和单个分类器输出之间的互信息来改进边界。 摘要:Consider an ensemble of $k$ individual classifiers whose accuracies are known. Upon receiving a test point, each of the classifiers outputs a predicted label and a confidence in its prediction for this particular test point. In this paper, we address the question of whether we can determine the accuracy of the ensemble. Surprisingly, even when classifiers are combined in the statistically optimal way in this setting, the accuracy of the resulting ensemble classifier cannot be computed from the accuracies of the individual classifiers-as would be the case in the standard setting of confidence weighted majority voting. We prove tight upper and lower bounds on the ensemble accuracy. We explicitly construct the individual classifiers that attain the upper and lower bounds: specialists and generalists. Our theoretical results have very practical consequences: (1) If we use ensemble methods and have the choice to construct our individual (independent) classifiers from scratch, then we should aim for specialist classifiers rather than generalists. (2) Our bounds can be used to determine how many classifiers are at least required to achieve a desired ensemble accuracy. Finally, we improve our bounds by considering the mutual information between the true label and the individual classifier's output.

【27】 Multi-headed Neural Ensemble Search 标题:多头神经集成搜索

作者:Ashwin Raaghav Narayanan,Arber Zela,Tonmoy Saikia,Thomas Brox,Frank Hutter 机构: 1University of Freiburg 2Bosch Center for Artificial In-telligence 备注:8 pages, 12 figures, 3 tables 链接:https://arxiv.org/abs/2107.04369 摘要:使用不同种子训练的CNN模型的集合(也称为深集合)被认为比CNN的单个拷贝获得更高的性能。神经集成搜索(NES)可以通过增加架构多样性来进一步提高性能。然而,在有限的计算资源下,NES的范围仍然是禁止的。在这项工作中,我们将神经网络扩展到多头群,它由一个连接到多个预测头的共享主干组成。不同于深层集合,这些多头集合可以端到端地训练,这使我们能够利用一次性NAS方法来优化集合目标。通过大量的实证评估,我们证明了多头集成搜索发现鲁棒集成的速度快3倍,同时在预测性能和不确定度校准方面与其他集成搜索方法具有相当的性能。 摘要:Ensembles of CNN models trained with different seeds (also known as Deep Ensembles) are known to achieve superior performance over a single copy of the CNN. Neural Ensemble Search (NES) can further boost performance by adding architectural diversity. However, the scope of NES remains prohibitive under limited computational resources. In this work, we extend NES to multi-headed ensembles, which consist of a shared backbone attached to multiple prediction heads. Unlike Deep Ensembles, these multi-headed ensembles can be trained end to end, which enables us to leverage one-shot NAS methods to optimize an ensemble objective. With extensive empirical evaluations, we demonstrate that multi-headed ensemble search finds robust ensembles 3 times faster, while having comparable performance to other ensemble search methods, in both predictive performance and uncertainty calibration.

【28】 Structured Hammerstein-Wiener Model Learning for Model Predictive Control 标题:用于模型预测控制的结构化Hammerstein-Wiener模型学习

作者:Ryuta Moriyasu,Taro Ikeda,Sho Kawaguchi,Kenji Kashima 机构: Kashima is with Graduate School of Informatics 备注:None 链接:https://arxiv.org/abs/2107.04247 摘要:本文旨在利用机器学习方法建立的模型来提高最优控制的可靠性。基于这些模型的最优控制问题一般是非凸的,难以在线求解。本文提出了一种将Hammerstein-Wiener模型与最近在机器学习领域提出的输入凸神经网络相结合的模型。该模型的一个重要特点是,所得到的最优控制问题在保持灵活建模能力的同时,利用其凸性和部分线性,可以有效地求解。通过发动机气道系统的建模与控制,验证了该方法的实用性。 摘要:This paper aims to improve the reliability of optimal control using models constructed by machine learning methods. Optimal control problems based on such models are generally non-convex and difficult to solve online. In this paper, we propose a model that combines the Hammerstein-Wiener model with input convex neural networks, which have recently been proposed in the field of machine learning. An important feature of the proposed model is that resulting optimal control problems are effectively solvable exploiting their convexity and partial linearity while retaining flexible modeling ability. The practical usefulness of the method is examined through its application to the modeling and control of an engine airpath system.

【29】 On the Variance of the Fisher Information for Deep Learning 标题:深度学习中Fisher信息的方差研究

作者:Alexander Soen,Ke Sun 机构:The Australian National University, Canberra, Australia, CSIRO’s Data, Sydney, Australia 链接:https://arxiv.org/abs/2107.04205 摘要:Fisher信息矩阵(FIM)已被应用于深度学习领域。它与损失景观、参数方差、二阶优化和深度学习理论密切相关。确切的FIM要么以封闭形式不可用,要么计算成本太高。在实践中,它几乎总是基于经验样本进行估计。我们基于FIM的两个等价表示研究了两个这样的估计量。它们都是公正的,并且与基本的“真实”职能指令手册保持一致。它们的估计质量的特征是它们的方差以封闭形式给出。我们限制了它们的方差,并分析了深度神经网络的参数结构如何影响方差。我们讨论了这个方差度量的意义和我们在深度学习中的界限。 摘要:The Fisher information matrix (FIM) has been applied to the realm of deep learning. It is closely related to the loss landscape, the variance of the parameters, second order optimization, and deep learning theory. The exact FIM is either unavailable in closed form or too expensive to compute. In practice, it is almost always estimated based on empirical samples. We investigate two such estimators based on two equivalent representations of the FIM. They are both unbiased and consistent with respect to the underlying "true" FIM. Their estimation quality is characterized by their variance given in closed form. We bound their variances and analyze how the parametric structure of a deep neural network can impact the variance. We discuss the meaning of this variance measure and our bounds in the context of deep learning.

【30】 MCMC Variational Inference via Uncorrected Hamiltonian Annealing 标题:基于未校正哈密顿退火的MCMC变分推断

作者:Tomas Geffner,Justin Domke 机构:College of Information and Computer Science, University of Massachusetts, Amherst, Amherst, MA 链接:https://arxiv.org/abs/2107.04150 摘要:给定一个非标准化的目标分布,我们希望从中获得近似样本,并在其(log)标准化常数logz上获得一个紧下界。退火重要性抽样(AIS)与哈密顿MCMC是一个强大的方法,可以用来做到这一点。它的主要缺点是使用了不可微的过渡核,这使得它的许多参数很难调整。我们提出了一个框架来使用类AIS程序与未修正的哈密顿MCMC,称为未修正的哈密顿退火。我们的方法得到了logz上的紧下界和可微下界。我们的经验表明,我们的方法比其他竞争的方法产生更好的性能,并且使用重参数化梯度调整其参数的能力可能会导致性能的大幅提高。 摘要:Given an unnormalized target distribution we want to obtain approximate samples from it and a tight lower bound on its (log) normalization constant log Z. Annealed Importance Sampling (AIS) with Hamiltonian MCMC is a powerful method that can be used to do this. Its main drawback is that it uses non-differentiable transition kernels, which makes tuning its many parameters hard. We propose a framework to use an AIS-like procedure with Uncorrected Hamiltonian MCMC, called Uncorrected Hamiltonian Annealing. Our method leads to tight and differentiable lower bounds on log Z. We show empirically that our method yields better performances than other competing approaches, and that the ability to tune its parameters using reparameterization gradients may lead to large performance improvements.

【31】 Ensembles of Randomized NNs for Pattern-based Time Series Forecasting 标题:基于模式的时间序列预测的随机神经网络集成

作者:Grzegorz Dudek,Paweł Pełka 机构:Electrical Engineering Faculty, Częstochowa University of Technology, Częstochowa, Poland 备注:arXiv admin note: text overlap with arXiv:2107.01705 链接:https://arxiv.org/abs/2107.04091 摘要:本文提出了一种基于随机神经网络的集成预测方法。改进的随机学习算法根据数据和目标函数特征生成网络参数,简化了个体学习者的拟合能力。基于模式的时间序列表示方法适用于多季节性时间序列的预测。我们提出了六个策略来控制集合成员的多样性。通过对四个实际预报问题的实例分析,验证了该方法的有效性和优越的性能。在预测精度方面,它优于统计模型以及最先进的机器学习模型。该方法具有训练速度快、结构简单、易于实现、精度高、能够处理时间序列的非平稳性和多个季节性等优点。 摘要:In this work, we propose an ensemble forecasting approach based on randomized neural networks. Improved randomized learning streamlines the fitting abilities of individual learners by generating network parameters in accordance with the data and target function features. A pattern-based representation of time series makes the proposed approach suitable for forecasting time series with multiple seasonality. We propose six strategies for controlling the diversity of ensemble members. Case studies conducted on four real-world forecasting problems verified the effectiveness and superior performance of the proposed ensemble forecasting approach. It outperformed statistical models as well as state-of-the-art machine learning models in terms of forecasting accuracy. The proposed approach has several advantages: fast and easy training, simple architecture, ease of implementation, high accuracy and the ability to deal with nonstationarity and multiple seasonality in time series.

【32】 Scaling Gaussian Processes with Derivative Information Using Variational Inference 标题:利用变分推论对具有导数信息的高斯过程进行标度

作者:Misha Padidar,Xinran Zhu,Leo Huang,Jacob R. Gardner,David Bindel 机构:Cornell University, University of Pennsylvania 链接:https://arxiv.org/abs/2107.04061 摘要:具有导数信息的高斯过程在导数信息可用的许多环境中都很有用,包括自然科学中出现的许多贝叶斯优化和回归任务。然而,当在$D$输入维度中对$N$点进行训练时,合并导数观测值会带来占主导地位的$O(N^3D^3)$计算成本。即使是中等规模的问题也难以解决。虽然最近的工作已经解决了低-$D$设置中的这一棘手问题,但是高-$N$,高-$D$设置仍然没有被探索,并且具有很大的价值,特别是随着机器学习问题越来越高维化。本文介绍了利用变分推理实现带导数的完全可伸缩高斯过程回归的方法。类似于使用诱导值稀疏训练集的标签,我们引入了诱导方向导数的概念来稀疏训练集的偏导数信息。这使得我们能够构造一个包含导数信息的变分后验,但其大小既不依赖于完整数据集大小$N$,也不依赖于完整维度$D$。我们展示了我们的方法在各种任务上的完全可扩展性,从高维恒星融合回归任务到使用贝叶斯优化在Pubmed上训练图卷积神经网络。令人惊讶的是,我们发现,即使在只有标签数据可用的情况下,我们的方法也可以提高回归性能。 摘要:Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating $O(N^3D^3)$ computational cost when training on $N$ points in $D$ input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-$D$ setting, the high-$N$, high-$D$ setting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size $N$ nor the full dimensionality $D$. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-07-12,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
联邦学习
联邦学习(Federated Learning,FELE)是一种打破数据孤岛、释放 AI 应用潜能的分布式机器学习技术,能够让联邦学习各参与方在不披露底层数据和底层数据加密(混淆)形态的前提下,通过交换加密的机器学习中间结果实现联合建模。该产品兼顾AI应用与隐私保护,开放合作,协同性高,充分释放大数据生产力,广泛适用于金融、消费互联网等行业的业务创新场景。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档