
Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
stat统计学,共计27篇
【1】 Stochastic Processes Under Linear Differential Constraints : Application to Gaussian Process Regression for the 3 Dimensional Free Space Wave Equation 标题:线性微分约束下的随机过程:在三维自由空间波方程高斯过程回归中的应用 链接:https://arxiv.org/abs/2111.12035
作者:Iain Henderson,Pascal Noble,Olivier Roustant 摘要:设$P$是二阶随机过程$\mathcal{D}\subset\mathbb{R}^D$和$U=(U_x){x\in\mathcal{D}$上的线性微分算子。在本文的第一部分中,我们证明了一个新的简单的充分必要条件,证明了偏微分方程(PDE)$T(U)=0$。这个条件用$U$的协方差核表示。该结果的新颖之处在于,等式$T(U)=0$在分布意义上被理解,这是一个特别适合于偏微分方程研究的函数分析框架。该定理在本文的第二部分提供了宝贵的见解,该部分致力于对齐次三维自由空间波动方程的解进行“物理信息”机器学习。我们对这些数据进行高斯过程回归(GPR),这是一种基于核的机器学习技术。为此,我们将此偏微分方程的解建模为从精心选择的高斯过程(GP)绘制的轨迹。我们得到了相应随机过程协方差核的显式公式;这个内核可以用于GPR。我们探讨两种特殊情况:径向对称和点源。在径向对称的情况下,我们推导了“快速计算”探地雷达公式;在点源的情况下,我们展示了GPR和用于点源定位的经典三角测量方法之间的直接联系,例如在GPS系统中。我们还表明,使用探地雷达可以解释为用有限维数据重建波动方程初始条件的不适定反问题的新答案,并且还提供了一种根据该数据估算物理参数的方法,如[Raissi等人,2017]。最后,我们通过一些实际例子展示了这种物理信息的探地雷达。 摘要:Let $P$ be a linear differential operator over $\mathcal{D} \subset \mathbb{R}^d$ and $U = (U_x)_{x \in \mathcal{D}}$ a second order stochastic process. In the first part of this article, we prove a new simple necessary and sufficient condition for all the trajectories of $U$ to verify the partial differential equation (PDE) $T(U) = 0$. This condition is formulated in terms of the covariance kernel of $U$. The novelty of this result is that the equality $T(U) = 0$ is understood in the sense of distributions, which is a functional analysis framework particularly adapted to the study of PDEs. This theorem provides precious insights during the second part of this article, which is dedicated to performing "physically informed" machine learning on data that is solution to the homogeneous 3 dimensional free space wave equation. We perform Gaussian Process Regression (GPR) on this data, which is a kernel based machine learning technique. To do so, we model the solution of this PDE as a trajectory drawn from a well-chosen Gaussian process (GP). We obtain explicit formulas for the covariance kernel of the corresponding stochastic process; this kernel can then be used for GPR. We explore two particular cases : the radial symmetry and the point source. In the case of radial symmetry, we derive "fast to compute" GPR formulas; in the case of the point source, we show a direct link between GPR and the classical triangulation method for point source localization used e.g. in GPS systems. We also show that this use of GPR can be interpreted as a new answer to the ill-posed inverse problem of reconstructing initial conditions for the wave equation with finite dimensional data, and also provides a way of estimating physical parameters from this data as in [Raissi et al,2017]. We finish by showcasing this physically informed GPR on a number of practical examples.
【2】 Measurement That Matches Theory: Theory-Driven Identification in IRT Models 标题:符合理论的测量:IRT模型中的理论驱动的识别 链接:https://arxiv.org/abs/2111.11979
作者:Marco Morucci,Margaret Foster,Kaitlyn Webster,So Jin Lee,David Siegel 摘要:测量架起了理论与实证的桥梁。如果没有适当捕捉理论概念的措施,描述将无法代表现实,真正的因果推理将是不可能的。然而,社会科学涉及复杂的概念,并且很难对其进行测量。项目反应理论(IRT)模型将多个变量的变化减少为沿着一个或多个潜在维度的连续变化,以捕获关键的理论概念。不幸的是,这些潜在的维度没有内在的概念意义。该问题的部分解决方案包括将维度数量限制为一个维度或在分析后赋予意义,但这两者都可能导致潜在的偏差和数据源之间的可靠性不足。我们提出、详细说明并验证了一种在多个潜在维度和二进制数据上采用贝叶斯项目响应理论的半监督方法。我们在模拟和真实数据上验证的方法,产生了概念上有意义的潜在维度,这些维度在不同的数据源中是可靠的,无需额外的外部假设。 摘要:Measurement bridges theory and empirics. Without measures that appropriately capture theoretical concepts, description will fail to represent reality and true causal inference will be impossible. Yet, the social sciences traffic in complex concepts and their measurement is difficult. Item Response Theory (IRT) models reduce variation in multiple variables to continuous variation along one or more latent dimensions intended to capture key theoretical concepts. Unfortunately, those latent dimensions have no intrinsic conceptual meaning. Partial solutions to that problem include limiting the number of dimensions to one or assigning meaning post-analysis, but either can lead to potential bias and a lack of reliability across data sources. We propose, detail, and validate a semi-supervised approach employing Bayesian Item Response Theory on multiple latent dimensions and binary data. Our approach, which we validate on simulated and real data, yields conceptually meaningful latent dimensions that are reliable across different data sources without additional exogenous assumptions.
【3】 Tree density estimation 标题:树木密度估计 链接:https://arxiv.org/abs/2111.11971
作者:László Györfi,Aryeh Kontorovich,Roi Weiss 机构:Department of Computer Science, Ben-Gurion University of the Negev, Beer Sheva, Israel, Ariel University, Shomron, Israel 摘要:我们研究了概率密度为$f(\boldsymbol X)$的随机向量${\boldsymbol X}$在$\mathbb R^d$中的密度估计问题。对于定义在顶点集$\{1、\dots,d\}$上的生成树$T$,树密度$f\{T}$是二元条件密度的乘积。最优生成树$T^*$是生成树$T$,其中$f$和$f{T}$的Kullback-Leibler散度最小。根据i.i.d.数据,我们确定了最优树$T^*$,并在计算上有效地构造了树密度估计$f|n$,这样,在密度$f$上没有任何正则性条件的情况下,一个具有$\lim|n\to\infty}\int|f|n(\boldsymbol x)-f|T^*}(\boldsymbol x)| d\boldsymbol x=0$a.s。对于有界支撑的Lipschitz连续$f$,$\mathb E\{\int | f|n(\boldsymbol x)-f|T^*}(\boldsymbol x)| d\boldsymbol x\}=O(n^{-1/4})$。 摘要:We study the problem of density estimation for a random vector ${\boldsymbol X}$ in $\mathbb R^d$ with probability density $f(\boldsymbol x)$. For a spanning tree $T$ defined on the vertex set $\{1,\dots ,d\}$, the tree density $f_{T}$ is a product of bivariate conditional densities. The optimal spanning tree $T^*$ is the spanning tree $T$, for which the Kullback-Leibler divergence of $f$ and $f_{T}$ is the smallest. From i.i.d. data we identify the optimal tree $T^*$ and computationally efficiently construct a tree density estimate $f_n$ such that, without any regularity conditions on the density $f$, one has that $\lim_{n\to \infty} \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x=0$ a.s. For Lipschitz continuous $f$ with bounded support, $\mathbb E\{ \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x\}=O(n^{-1/4})$.
【4】 Is Shapley Explanation for a model unique? 标题:沙普利对模型的解释是独一无二的吗? 链接:https://arxiv.org/abs/2111.11946
作者:Harsh Kumar,Jithu Chandran 摘要:Shapley值最近已成为解释复杂和简单机器学习模型预测的流行方法。本文讨论了影响Shapley值的因素。特别是,我们探讨了特征分布与其Shapley值之间的关系。我们通过讨论同一模型的不同预测结果在Shapley解释中产生的差异来扩展我们的分析。我们的评估是,特定特征的Shapley值不仅取决于其预期平均值,还取决于其他时刻,如方差,基线预测存在分歧,迹象存在分歧,不同结果的最重要特征存在分歧,如概率、对数优势,以及使用相同的线性概率模型(logit/probit)生成的二元决策。这些分歧不仅停留在局部可解释性上,而且影响全局特征的重要性。我们得出结论,对于给定的模型,没有唯一的Shapley解释。它随模型结果(概率/对数赔率/二元决策,如接受与拒绝)以及模型应用而变化。 摘要:Shapley value has recently become a popular way to explain the predictions of complex and simple machine learning models. This paper is discusses the factors that influence Shapley value. In particular, we explore the relationship between the distribution of a feature and its Shapley value. We extend our analysis by discussing the difference that arises in Shapley explanation for different predicted outcomes from the same model. Our assessment is that Shapley value for particular feature not only depends on its expected mean but on other moments as well such as variance and there are disagreements for baseline prediction, disagreements for signs and most important feature for different outcomes such as probability, log odds, and binary decision generated using same linear probability model (logit/probit). These disagreements not only stay for local explainability but also affect the global feature importance. We conclude that there is no unique Shapley explanation for a given model. It varies with model outcome (Probability/Log-odds/binary decision such as accept vs reject) and hence model application.
【5】 Trimming Stability Selection increases variable selection robustness 标题:修剪稳定性选择提高了变量选择的稳健性 链接:https://arxiv.org/abs/2111.11818
作者:Tino Werner 机构:Institute for Mathematics, Carl von Ossietzky University Oldenburg 摘要:除非估算程序具有适当的稳健性,否则污染会严重扭曲估算值。这是一个众所周知的问题,并已在稳健统计中得到解决,然而,文献中很少考虑污染与扭曲变量选择之间的关系。在变量选择方面,人们提出了许多稀疏模型选择的方法,包括稳定性选择,它是一种基于某些变量选择算法的元算法,以免疫于特定的数据配置。我们引入了变量选择分解点,它量化了案例数量。为了检测不到相关变量,必须对细胞进行污染。我们证明了特定的异常值配置会完全误导模型选择,并论证了为什么即使是单元鲁棒方法也不能解决这个问题。我们将变量选择分解点与重采样相结合,得到稳定选择分解点,从而量化稳定选择的鲁棒性。我们提出了一个修剪稳定性选择,它只聚合样本损失最低的模型,因此,启发式地,应该修剪掉重污染重采样计算的模型。我们提供了一个简短的模拟研究,揭示了我们的方法的潜力以及变量选择的脆弱性,即使是对于极低的细胞污染率。 摘要:Contamination can severely distort an estimator unless the estimation procedure is suitably robust. This is a well-known issue and has been addressed in Robust Statistics, however, the relation of contamination and distorted variable selection has been rarely considered in literature. As for variable selection, many methods for sparse model selection have been proposed, including Stability Selection which is a meta-algorithm based on some variable selection algorithm in order to immunize against particular data configurations. We introduce the variable selection breakdown point that quantifies the number of cases resp. cells that have to be contaminated in order to let no relevant variable be detected. We show that particular outlier configurations can completely mislead model selection and argue why even cell-wise robust methods cannot fix this problem. We combine the variable selection breakdown point with resampling, resulting in the Stability Selection breakdown point that quantifies the robustness of Stability Selection. We propose a trimmed Stability Selection which only aggregates the models with the lowest in-sample losses so that, heuristically, models computed on heavily contaminated resamples should be trimmed away. We provide a short simulation study that reveals both the potential of our approach as well as the fragility of variable selection, even for an extremely small cell-wise contamination rate.
【6】 A Global Two-stage Algorithm for Non-convex Penalized High-dimensional Linear Regression Problems 链接:https://arxiv.org/abs/2111.11801
作者:Peili Li,Min Liu,Zhou Yu 机构:†School of Mathematics and Statistics, Wuhan University 摘要:以极大极小凹罚(MCP)和平滑剪裁绝对偏差(SCAD)为代表的非凸罚因其渐近预言性,在高维数据分析中引起了广泛关注,并被广泛应用于信号处理、图像恢复、矩阵估计等领域,鉴于它们的非凸和非光滑特性,它们在计算上具有挑战性。几乎所有现有的算法都是局部收敛的,正确选择初始值至关重要。因此,在实际操作中,他们经常结合温启动技术来满足刚性要求,即初始值必须足够接近相应问题的最优解。本文基于MCP和SCAD惩罚的DC(凸函数差分)性质,设计了一种求解高维最小二乘线性回归问题的全局两阶段算法。使该算法高效的一个关键思想是使用与半光滑牛顿(SSN)方法等价的原始-对偶活动集延拓(PDASC)方法来解决相应的子问题。理论上,我们不仅证明了该算法的全局收敛性,而且还证明了生成的迭代序列收敛到d-平稳点。在计算性能方面,大量的仿真和实际数据研究表明,本文算法在求解非凸惩罚高维线性回归问题时优于最新的SSN方法和经典的坐标下降(CD)算法。 摘要:By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally challenging. Almost all existing algorithms converge locally, and the proper selection of initial values is crucial. Therefore, in actual operation, they often combine a warm-starting technique to meet the rigid requirement that the initial value must be sufficiently close to the optimal solution of the corresponding problem. In this paper, based on the DC (difference of convex functions) property of MCP and SCAD penalties, we aim to design a global two-stage algorithm for the high-dimensional least squares linear regression problems. A key idea for making the proposed algorithm to be efficient is to use the primal dual active set with continuation (PDASC) method, which is equivalent to the semi-smooth Newton (SSN) method, to solve the corresponding sub-problems. Theoretically, we not only prove the global convergence of the proposed algorithm, but also verify that the generated iterative sequence converges to a d-stationary point. In terms of computational performance, the abundant research of simulation and real data show that the algorithm in this paper is superior to the latest SSN method and the classic coordinate descent (CD) algorithm for solving non-convex penalized high-dimensional linear regression problems.
【7】 Trimmed Harrell-Davis quantile estimator based on the highest density interval of the given width 标题:基于给定宽度最高密度区间的修剪Harrell-Davis分位数估计器 链接:https://arxiv.org/abs/2111.11776
作者:Andrey Akinshin 备注:11 pages, 6 figures, the paper source code is available at this https URL 摘要:基于一阶或二阶统计量的传统分位数估计是基于给定样本估计分布分位数的常用方法。这些估计是稳健的,但它们的统计效率并不总是足够好。一个更有效的替代方法是Harrell-Davis分位数估计器,它使用所有阶统计量的加权和。尽管这种方法为轻尾分布提供了更精确的估计,但它并不稳健。为了能够定制统计效率和鲁棒性之间的折衷,我们可以考虑Harrell Davis quantile估计的修整修改。在这种方法中,我们根据贝塔分布的最高密度区间丢弃低权重的顺序统计。 摘要:Traditional quantile estimators that are based on one or two order statistics are a common way to estimate distribution quantiles based on the given samples. These estimators are robust, but their statistical efficiency is not always good enough. A more efficient alternative is the Harrell-Davis quantile estimator which uses a weighted sum of all order statistics. Whereas this approach provides more accurate estimations for the light-tailed distributions, it's not robust. To be able to customize the trade-off between statistical efficiency and robustness, we could consider a trimmed modification of the Harrell-Davis quantile estimator. In this approach, we discard order statistics with low weights according to the highest density interval of the beta distribution.
【8】 MARS via LASSO 标题:火星通过套索 链接:https://arxiv.org/abs/2111.11694
作者:Dohyeong Ki,Billy Fang,Adityanand Guntuboyina 机构:University of California Berkeley, Evans Hall, Berkeley, CA , Google LLC, Mountain View, CA 摘要:MARS是Friedman于1991年引入的一种流行的非参数回归方法。MARS将简单的非线性和非加性函数拟合到回归数据中。我们提出并研究了MARS方法的自然套索变体。我们的方法基于凸函数类上的最小二乘估计,该凸函数类是通过在MARS基上考虑函数的无限维线性组合并施加基于变化的复杂性约束而获得的。我们证明了我们的估计可以通过有限维凸优化来计算,并且它自然地与基于光滑性约束的非参数函数估计技术相联系。在一个简单的设计假设下,我们证明了我们的估计达到了仅依赖于维数的对数收敛速度,从而在一定程度上避免了通常的维数灾难。我们使用交叉验证方案实现了我们的方法,用于选择相关的调谐参数,并表明与通常的MARS方法相比,该方法在模拟和实际数据设置中具有良好的性能。 摘要:MARS is a popular method for nonparametric regression introduced by Friedman in 1991. MARS fits simple nonlinear and non-additive functions to regression data. We propose and study a natural LASSO variant of the MARS method. Our method is based on least squares estimation over a convex class of functions obtained by considering infinite-dimensional linear combinations of functions in the MARS basis and imposing a variation based complexity constraint. We show that our estimator can be computed via finite-dimensional convex optimization and that it is naturally connected to nonparametric function estimation techniques based on smoothness constraints. Under a simple design assumption, we prove that our estimator achieves a rate of convergence that depends only logarithmically on dimension and thus avoids the usual curse of dimensionality to some extent. We implement our method with a cross-validation scheme for the selection of the involved tuning parameter and show that it has favorable performance compared to the usual MARS method in simulation and real data settings.
【9】 RIO: Rotation-equivariance supervised learning of robust inertial odometry 链接:https://arxiv.org/abs/2111.11676
作者:Caifa Zhou,Xiya Cao,Dandan Zeng,Yongliang Wang 机构:Riemann lab, Laboratories, Huawei Technologies Co. Ltd 备注:12 pages, 17 figures, 2 tables 摘要:本文引入旋转等变作为惯性里程计模型训练的自监督器。我们证明了自监督方案在训练阶段和推理阶段都提供了强大的监督信号。它减少了训练鲁棒模型对大量标记数据的依赖,并使使用各种未标记数据更新模型成为可能。此外,我们提出了基于不确定性估计的自适应测试时间训练(TTT),以增强惯性里程计对各种未知数据的通用性。实验表明,用30%数据训练的旋转等方差监督Inertial Odometry(RIO)实现了与整个数据库训练的模型的PAR性能。Adaptive TTT在所有情况下都提高了模型的性能,并在几种情况下提高了25%以上。 摘要:This paper introduces rotation-equivariance as a self-supervisor to train inertial odometry models. We demonstrate that the self-supervised scheme provides a powerful supervisory signal at training phase as well as at inference stage. It reduces the reliance on massive amounts of labeled data for training a robust model and makes it possible to update the model using various unlabeled data. Further, we propose adaptive Test-Time Training (TTT) based on uncertainty estimations in order to enhance the generalizability of the inertial odometry to various unseen data. We show in experiments that the Rotation-equivariance-supervised Inertial Odometry (RIO) trained with 30% data achieves on par performance with a model trained with the whole database. Adaptive TTT improves models performance in all cases and makes more than 25% improvements under several scenarios.
【10】 Isolation forests: looking beyond tree depth 标题:与世隔绝的森林:超越树木深度 链接:https://arxiv.org/abs/2111.11639
作者:David Cortes 摘要:离群点检测的隔离林算法利用了一个简单但有效的观察:如果获取一些多元数据并递归地在特征空间中进行均匀随机切割,则与常规观察相比,离群点在给定子空间中单独存在所需的随机切割更少。最初的想法提出了一个基于隔离所需的树深度(随机切割数)的离群值分数,但这里的实验表明,在许多情况下,使用有关特征空间大小和分配给它的点数的信息可以在不修改树结构的情况下改进结果,尤其是在存在分类特征的情况下。 摘要:The isolation forest algorithm for outlier detection exploits a simple yet effective observation: if taking some multivariate data and making uniformly random cuts across the feature space recursively, it will take fewer such random cuts for an outlier to be left alone in a given subspace as compared to regular observations. The original idea proposed an outlier score based on the tree depth (number of random cuts) required for isolation, but experiments here show that using information about the size of the feature space taken and the number of points assigned to it can result in improved results in many situations without any modification to the tree structure, especially in the presence of categorical features.
【11】 Combining chains of Bayesian models with Markov melding 标题:贝叶斯模型链与马尔可夫融合的结合 链接:https://arxiv.org/abs/2111.11566
作者:Andrew A. Manderson,Robert J. B. Goudie 备注:32 pages, 14 figures 摘要:贝叶斯推理实践者面临的一个挑战是指定一个包含多个相关、异构数据的模型。相反,为每个数据源指定不同的子模型,然后将子模型连接在一起可能更容易。我们考虑子模型链,其中子模型通过共同的量直接与它们的邻居相关,这些共同的量可以是它们的参数或确定性函数。我们提出了链式马尔可夫融合,它是马尔可夫融合的一个扩展,是一种将子模型链组合成联合模型的通用方法。我们要解决的一个挑战是,适当地捕获子模型内公共数量之间的先验依赖关系,同时也协调两个相邻子模型之间相同公共数量的先验差异。估计得到的整体关节模型的后验值也是一个挑战,因此我们描述了一个取样器,它使用链式结构在多个阶段(可能并行)合并子模型中包含的信息。我们用两个例子来演示我们的方法。第一个例子考虑生态综合人口模型,其中需要多个数据来准确估计人口移民和繁殖率。我们还考虑一个联合纵向和时间事件模型与不确定的,子模型衍生事件时间。链式马尔可夫融合是在这些环境中集成子模型的一种概念上有吸引力的方法。 摘要:A challenge for practitioners of Bayesian inference is specifying a model that incorporates multiple relevant, heterogeneous data. It may be easier to instead specify distinct submodels for each source of data, then join the submodels together. We consider chains of submodels, where submodels directly relate to their neighbours via common quantities which may be parameters or deterministic functions thereof. We propose chained Markov melding, an extension of Markov melding, a generic method to combine chains of submodels into a joint model. One challenge we address is appropriately capturing the prior dependence between common quantities within a submodel, whilst also reconciling differences in priors for the same common quantity between two adjacent submodels. Estimating the posterior of the resulting overall joint model is also challenging, so we describe a sampler that uses the chain structure to incorporate information contained in the submodels in multiple stages, possibly in parallel. We demonstrate our methodology using two examples. The first example considers an ecological integrated population model, where multiple data are required to accurately estimate population immigration and reproduction rates. We also consider a joint longitudinal and time-to-event model with uncertain, submodel-derived event times. Chained Markov melding is a conceptually appealing approach to integrating submodels in these settings.
【12】 Identification of vaccine effects when exposure status is unknown 标题:暴露状态未知时疫苗效果的识别 链接:https://arxiv.org/abs/2111.11548
作者:Mats J. Stensrud,Louisa H. Smith 机构: Department of Mathematics, Ecole Polytechnique F´ed´erale de Lausanne, Switzerland, Roux Institute, Northeastern University, Portland, Maine, USA 摘要:随机对照试验(RCT)的结果有助于确定疫苗接种策略和相关公共卫生政策。然而,即使是在随机对照试验中,也很难定义和确定能够在传染病环境中指导政策的评估。疫苗接种的效果在很大程度上取决于感兴趣人群的特征,如感染率、接种数量和社会行为。为了减轻对这些特征的依赖,已经提倡需要对接触传染源进行调节或干预的评估(和研究设计)。但随机对照试验和观察性研究的一个根本问题是,暴露状态通常不可用或难以测量,这使得无法应用现有方法来研究说明暴露状态的疫苗效果。在这项工作中,我们提出了关于这种疫苗效果的新结果。在可能的条件下,我们表明,即使在暴露状态未知的情况下,某些相对效应的点识别也是可能的。此外,我们推导了相应绝对效应的锐界。我们应用这些结果,利用大型随机对照试验的数据,以疫苗接种后暴露于SARS-CoV-2疾病(COVID-19)为条件,评估ChAdOx1 nCoV-19疫苗对SARS-CoV-2疾病(COVID-19)的影响。 摘要:Results from randomized controlled trials (RCTs) help determine vaccination strategies and related public health policies. However, defining and identifying estimands that can guide policies in infectious disease settings is difficult, even in an RCT. The effects of vaccination critically depend on characteristics of the population of interest, such as the prevalence of infection, the number of vaccinated, and social behaviors. To mitigate the dependence on such characteristics, estimands (and study designs) that require conditioning or intervening on exposure to the infectious agent have been advocated. But a fundamental problem for both RCTs and observational studies is that exposure status is often unavailable or difficult to measure, which has made it impossible to apply existing methodology to study vaccine effects that account for exposure status. In this work, we present new results on this type of vaccine effects. Under plausible conditions, we show that point identification of certain relative effects is possible even when the exposure status is unknown. Furthermore, we derive sharp bounds on the corresponding absolute effects. We apply these results to estimate the effects of the ChAdOx1 nCoV-19 vaccine on SARS-CoV-2 disease (COVID-19) conditional on post-vaccine exposure to the virus, using data from a large RCT.
【13】 Depth Without the Magic: Inductive Bias of Natural Gradient Descent 标题:没有魔力的深度:自然梯度下降的归纳偏差 链接:https://arxiv.org/abs/2111.11542
作者:Anna Kerekes,Anna Mészáros,Ferenc Huszár 机构:University of Cambridge, UK, Anna M´esz´aros ∗, E¨otv¨os Lor´and University, Hungary, Ferenc Husz´ar, Computer Laboratory 摘要:在梯度下降法中,改变模型参数化的方式会导致截然不同的优化轨迹,从而产生一系列令人惊讶的有意义的归纳偏差:识别稀疏分类器或在没有显式正则化的情况下重建低秩矩阵。这种内隐正则化被认为是深度学习中良好泛化的一个因素。然而,自然梯度下降对重新参数化是近似不变的,它总是遵循相同的轨迹并找到相同的最优解。问题自然而然地出现了:如果我们消除了参数化的作用,会发生什么,会找到什么解决方案,会出现什么新特性?在logistic损失和深矩阵分解下,我们刻画了可分离分类的深线性网络中自然梯度流的行为。我们的一些发现扩展到具有充分但有限参数化的非线性神经网络。我们证明了存在自然梯度下降不能推广的学习问题,而具有正确结构的梯度下降表现良好。 摘要:In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing low-rank matrices without explicit regularization. This implicit regularization has been hypothesised to be a contributing factor to good generalization in deep learning. However, natural gradient descent is approximately invariant to reparameterization, it always follows the same trajectory and finds the same optimum. The question naturally arises: What happens if we eliminate the role of parameterization, which solution will be found, what new properties occur? We characterize the behaviour of natural gradient flow in deep linear networks for separable classification under logistic loss and deep matrix factorization. Some of our findings extend to nonlinear neural networks with sufficient but finite over-parametrization. We demonstrate that there exist learning problems where natural gradient descent fails to generalize, while gradient descent with the right architecture performs well.
【14】 Bayesian Robust Learning in Chain Graph Models for Integrative Pharmacogenomics 标题:综合药物基因组学链图模型中的贝叶斯稳健学习 链接:https://arxiv.org/abs/2111.11529
作者:Moumita Chakraborty,Veerabhadran Baladandayuthapani,Anindya Bhadra,Min Jin Ha 机构:Department of Biostatistics, The University of Texas MD Anderson, Cancer Center, Houston, TX, Department of Biostatistics, University of Michigan, Ann Arbor, MI, Department of Statistics, Purdue University, West Lafayette, IN 备注:35 pages, 5 figures; Supplementary material follows after the main document 摘要:多水平药物基因组学数据的综合分析对于建立不同生物领域的依赖性模型是开发基于基因组测试的治疗方法的关键。链图描述了这种多级数据的条件依赖结构,其中变量被自然地划分为多个有序层,包括有向边和无向边。现有文献大多集中于高斯链图,它不适合于具有重尾边缘的非正态分布,可能导致不准确的推断。我们提出了一种基于边缘随机变换的贝叶斯鲁棒链图模型(RCGM),该模型使用高斯尺度混合来解释连续多变量数据中的节点级非正态性。这种灵活的建模策略有助于识别非正常节点之间的条件符号依赖关系,同时仍然能够推断正常节点之间的条件依赖关系。在仿真中,我们证明了RCGM在各种非正态机制产生的数据中优于现有的高斯链图推理方法。我们将我们的方法应用于基因组学、转录组学和蛋白质组学数据,以全面了解肺癌细胞系药物反应和耐药性的潜在生物学过程。我们的分析揭示了伊可替尼、厄洛替尼和奥西米替尼等单一药物的关键信号通路在平台间和平台内的依赖性,以及药物作用背后的分子机制的共同模式。 摘要:Integrative analysis of multi-level pharmacogenomic data for modeling dependencies across various biological domains is crucial for developing genomic-testing based treatments. Chain graphs characterize conditional dependence structures of such multi-level data where variables are naturally partitioned into multiple ordered layers, consisting of both directed and undirected edges. Existing literature mostly focus on Gaussian chain graphs, which are ill-suited for non-normal distributions with heavy-tailed marginals, potentially leading to inaccurate inferences. We propose a Bayesian robust chain graph model (RCGM) based on random transformations of marginals using Gaussian scale mixtures to account for node-level non-normality in continuous multivariate data. This flexible modeling strategy facilitates identification of conditional sign dependencies among non-normal nodes while still being able to infer conditional dependencies among normal nodes. In simulations, we demonstrate that RCGM outperforms existing Gaussian chain graph inference methods in data generated from various non-normal mechanisms. We apply our method to genomic, transcriptomic and proteomic data to understand underlying biological processes holistically for drug response and resistance in lung cancer cell lines. Our analysis reveals inter- and intra- platform dependencies of key signaling pathways to monotherapies of icotinib, erlotinib and osimertinib among other drugs, along with shared patterns of molecular mechanisms behind drug actions.
【15】 Approximate Bayesian Computation via Classification 标题:基于分类的近似贝叶斯计算 链接:https://arxiv.org/abs/2111.11507
作者:Yuexi Wang,Tetsuya Kaji,Veronika Ročková 机构:Booth School of Business, University of Chicago 摘要:近似贝叶斯计算(ABC)能够在概率难以计算但易于模拟的复杂模型中进行统计推断。ABC通过接受/拒绝机制构建后验分布的核类型近似,该机制比较真实数据和模拟数据的汇总统计数据。为了避免汇总统计的需要,我们直接将经验分布与通过分类获得的Kullback-Leibler(KL)散度估计进行比较。特别是,我们在ABC中混合了灵活的机器学习分类器,以自动化假/真数据比较。我们考虑传统的接受/拒绝内核以及不需要ABC接受阈值的指数加权方案。我们的理论结果表明,我们的ABC后验分布集中在真实参数周围的速率取决于分类器的估计误差。我们得到了极限后验形状的结果,并且发现,对于适当比例的指数核,渐近正态性成立。我们通过模拟例子以及股票波动率估计的真实数据证明了我们的方法的有效性。 摘要:Approximate Bayesian Computation (ABC) enables statistical inference in complex models whose likelihoods are difficult to calculate but easy to simulate from. ABC constructs a kernel-type approximation to the posterior distribution through an accept/reject mechanism which compares summary statistics of real and simulated data. To obviate the need for summary statistics, we directly compare empirical distributions with a Kullback-Leibler (KL) divergence estimator obtained via classification. In particular, we blend flexible machine learning classifiers within ABC to automate fake/real data comparisons. We consider the traditional accept/reject kernel as well as an exponential weighting scheme which does not require the ABC acceptance threshold. Our theoretical results show that the rate at which our ABC posterior distributions concentrate around the true parameter depends on the estimation error of the classifier. We derive limiting posterior shape results and find that, with a properly scaled exponential kernel, asymptotic normality holds. We demonstrate the usefulness of our approach on simulated examples as well as real data in the context of stock volatility estimation.
【16】 A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning 标题:噪音中的免费午餐:表征学习的可证明性和实践性探索 链接:https://arxiv.org/abs/2111.11485
作者:Tongzheng Ren,Tianjun Zhang,Csaba Szepesvári,Bo Dai 机构:UT Austin & Google Brain, UC Berkeley, Csaba Szepesv´ari, University of Alberta & DeepMind 备注:The first two authors contribute equally 摘要:表征学习是深度学习在应对维度诅咒方面取得经验成功的核心。然而,在强化学习(RL)中,表征学习的力量尚未得到充分利用,这是因为在表达性和可驯化性之间进行了权衡;探索与表征学习之间的耦合。在本文中,我们首先揭示了在随机控制模型的某些噪声假设下,我们可以免费获得其相应的马尔可夫转移算子的闭式线性谱特征。基于这一观察,我们提出了谱动力学嵌入(SPEDE),它打破了折衷,通过利用噪声的结构完成了表征学习的乐观探索。我们对SPEDE进行了严格的理论分析,并在几个基准上证明了与现有最先进的经验算法相比,SPEDE的实际优越性能。 摘要:Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality. However, the power of representation learning has not been fully exploited yet in reinforcement learning (RL), due to i), the trade-off between expressiveness and tractability; and ii), the coupling between exploration and representation learning. In this paper, we first reveal the fact that under some noise assumption in the stochastic control model, we can obtain the linear spectral feature of its corresponding Markov transition operator in closed-form for free. Based on this observation, we propose Spectral Dynamics Embedding (SPEDE), which breaks the trade-off and completes optimistic exploration for representation learning by exploiting the structure of the noise. We provide rigorous theoretical analysis of SPEDE, and demonstrate the practical superior performance over the existing state-of-the-art empirical algorithms on several benchmarks.
【17】 Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless Cellular Networks 标题:无线蜂窝网络中的语义感知协同深度强化学习 链接:https://arxiv.org/abs/2111.12064
作者:Fatemeh Lotfi,Omid Semiari,Walid Saad 机构: Semiari are with Department of Electrical andComputer Engineering, University of Colorado 摘要:协作式深度强化学习(CDRL)算法是一种多智能体通过无线网络进行协作的方法,它可以使未来的智能自主系统在复杂的动态环境中依靠实时决策。然而,在实际场景中,由于代理及其学习任务的异构性、不同的环境、学习的时间限制以及无线网络的资源限制,CDRL面临许多挑战。为了应对这些挑战,本文提出了一种新的语义感知CDRL方法,使一组具有语义链接DRL任务的异构未经训练的代理能够在资源受限的无线蜂窝网络中高效协作。为此,提出了一种新的异构联邦DRL(HFDRL)算法来选择语义相关DRL代理的最佳子集进行协作。然后,所提出的方法联合优化协作选择代理的训练损失和无线带宽分配,以便在其实时任务的时间限制内训练每个代理。仿真结果表明,与最先进的基线相比,该算法具有优越的性能。 摘要:Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach to enable future intelligent and autonomous systems that rely on real-time decision-making in complex dynamic environments. Nonetheless, in practical scenarios, CDRL faces many challenges due to the heterogeneity of agents and their learning tasks, different environments, time constraints of the learning, and resource limitations of wireless networks. To address these challenges, in this paper, a novel semantic-aware CDRL method is proposed to enable a group of heterogeneous untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network. To this end, a new heterogeneous federated DRL (HFDRL) algorithm is proposed to select the best subset of semantically relevant DRL agents for collaboration. The proposed approach then jointly optimizes the training loss and wireless bandwidth allocation for the cooperating selected agents in order to train each agent within the time limit of its real-time task. Simulation results show the superior performance of the proposed algorithm compared to state-of-the-art baselines.
【18】 Depth induces scale-averaging in overparameterized linear Bayesian neural networks 链接:https://arxiv.org/abs/2111.11954
作者:Jacob A. Zavatone-Veth,Cengiz Pehlevan 机构:Department of Physics, Harvard University, Cambridge, MA, United States, John A. Paulson School of Engineering and Applied Sciences 备注:8 pages, no figures 摘要:深度贝叶斯神经网络中的推理只有在无限宽度限制下才能完全理解,在无限宽度限制下,深度增加所提供的后验灵活性被冲掉,后验预测崩溃为浅高斯过程。在这里,我们将有限深线性贝叶斯神经网络解释为高斯过程预测器跨输出通道的数据相关尺度混合。我们利用这一观察来研究这些网络中的表征学习,使我们能够在一个统一的框架内连接先前研究中获得的有限结果。总的来说,这些结果促进了我们对深度如何影响一类简单贝叶斯神经网络推理的分析理解。 摘要:Inference in deep Bayesian neural networks is only fully understood in the infinite-width limit, where the posterior flexibility afforded by increased depth washes out and the posterior predictive collapses to a shallow Gaussian process. Here, we interpret finite deep linear Bayesian neural networks as data-dependent scale mixtures of Gaussian process predictors across output channels. We leverage this observation to study representation learning in these networks, allowing us to connect limiting results obtained in previous studies within a unified framework. In total, these results advance our analytical understanding of how depth affects inference in a simple class of Bayesian neural networks.
【19】 Uncertainty estimation under model misspecification in neural network regression 标题:神经网络回归中模型误指定下的不确定性估计 链接:https://arxiv.org/abs/2111.11763
作者:Maria R. Cervera,Rafael Dätwyler,Francesco D'Angelo,Hamza Keurti,Benjamin F. Grewe,Christian Henning 机构:Equal Contribution, Institute of Neuroinformatics, University of Zürich and ETH Zürich, Zürich, Switzerland, Max Planck ETH Center for Learning Systems, Institute of Theoretical Computer Science, ETH Zürich, Zürich, Switzerland 备注:Published at the NeurIPS 2021 workshop "Your Model Is Wrong: Robustness and Misspecification in Probabilistic Modeling" 摘要:尽管神经网络是功能强大的函数逼近器,但基本的建模假设最终定义了可能性,从而定义了它们参数化的假设类别。在分类中,这些假设是最小的,因为常用的softmax能够表示任何分类分布。然而,在回归中,通常会对要实现的连续分布类型进行限制性假设,如通过均方误差进行训练的主要选择及其潜在的高斯假设。最近,建模技术的进步使得对要建模的连续分布类型不可知,从而使回归具有分类模型的灵活性。虽然过去的研究强调这种灵活的回归模型在性能方面的好处,但我们在此研究模型选择对不确定性估计的影响。我们强调,在模型错误指定的情况下,任意不确定性没有被正确捕获,并且错误指定模型的贝叶斯处理导致不可靠的认知不确定性估计。总的来说,我们的研究概述了回归中的建模选择如何影响不确定性估计,从而影响任何下游决策过程。 摘要:Although neural networks are powerful function approximators, the underlying modelling assumptions ultimately define the likelihood and thus the hypothesis class they are parameterizing. In classification, these assumptions are minimal as the commonly employed softmax is capable of representing any categorical distribution. In regression, however, restrictive assumptions on the type of continuous distribution to be realized are typically placed, like the dominant choice of training via mean-squared error and its underlying Gaussianity assumption. Recently, modelling advances allow to be agnostic to the type of continuous distribution to be modelled, granting regression the flexibility of classification models. While past studies stress the benefit of such flexible regression models in terms of performance, here we study the effect of the model choice on uncertainty estimation. We highlight that under model misspecification, aleatoric uncertainty is not properly captured, and that a Bayesian treatment of a misspecified model leads to unreliable epistemic uncertainty estimates. Overall, our study provides an overview on how modelling choices in regression may influence uncertainty estimation and thus any downstream decision making process.
【20】 Importance sampling approach to chance-constrained DC optimal power flow 标题:机会约束直流最优潮流的重要抽样法 链接:https://arxiv.org/abs/2111.11729
作者:Aleksander Lukashevich,Vyacheslav Gorchakov,Petr Vorobev,Deepjyoti Deka,Yury Maximov 机构:Center for Energy Systems and Technology, Skolkovo Institute of Science and Technology, Skokovo Innovation Center, Moscow Region, Russia, Theoretical Division, T-, group, Los Alamos National Laboratory, Los Alamos, New Mexico, USA 摘要:尽管存在重大的经济和生态影响,但更高水平的可再生能源发电会导致注入电力的不确定性和可变性增加,从而影响电网的可靠性。为了提高电网的安全性,我们研究了一个联合机会约束(CC)直流(DC)最优潮流(OPF)问题。该问题旨在寻找经济最优的发电量,同时确保所有发电量、线路流量和电压以预定义的概率同时保持在其范围内。不幸的是,即使指定了可再生能源波动的分布,这个问题在计算上也是难以解决的。此外,现有的联合CC-OPF问题近似解过于保守,因此对作战实践的价值较小。本文提出了一种重要抽样方法来解决CC-DC-OPF问题,该方法比目前最先进的方法具有更好的复杂性和准确性。该算法通过仅生成和使用最重要的场景,有效地减少了场景数量,从而能够为多达数百条总线的测试用例提供实时解决方案。 摘要:Despite significant economic and ecological effects, a higher level of renewable energy generation leads to increased uncertainty and variability in power injections, thus compromising grid reliability. In order to improve power grid security, we investigate a joint chance-constrained (CC) direct current (DC) optimal power flow (OPF) problem. The problem aims to find economically optimal power generation while guaranteeing that all power generation, line flows, and voltages simultaneously remain within their bounds with a pre-defined probability. Unfortunately, the problem is computationally intractable even if the distribution of renewables fluctuations is specified. Moreover, existing approximate solutions to the joint CC OPF problem are overly conservative, and therefore have less value for the operational practice. This paper proposes an importance sampling approach to the CC DC OPF problem, which yields better complexity and accuracy than current state-of-the-art methods. The algorithm efficiently reduces the number of scenarios by generating and using only the most important of them, thus enabling real-time solutions for test cases with up to several hundred buses.
【21】 A Contextual Latent Space Model: Subsequence Modulation in Melodic Sequence 标题:上下文潜在空间模型:旋律序列中的子序列调制 链接:https://arxiv.org/abs/2111.11703
作者:Taketo Akama 机构:Sony Computer Science Laboratories, Tokyo, Japan 备注:22nd International Society for Music Information Retrieval Conference (ISMIR), 2021; 8 pages 摘要:一些序列的生成模型,如音乐和文本,允许我们只编辑子序列,给定周围的上下文序列,这在以交互方式指导生成中起着重要作用。然而,编辑子序列主要涉及从可能的生成空间对子序列进行随机重采样。我们提出了一个上下文潜在空间模型(CLSM),以便用户能够利用生成空间中的方向感探索子序列生成,例如插值,以及探索变化——语义相似的可能子序列。基于上下文的先验知识和译码器构成了CLSM的生成模型,而基于上下文位置的编码器则是CLSM的推理模型。在实验中,我们使用了一个单声道符号音乐数据集,证明了我们的上下文潜在空间在插值方面比基线更平滑,并且生成的样本质量优于基线模型。生成示例可在线获取。 摘要:Some generative models for sequences such as music and text allow us to edit only subsequences, given surrounding context sequences, which plays an important part in steering generation interactively. However, editing subsequences mainly involves randomly resampling subsequences from a possible generation space. We propose a contextual latent space model (CLSM) in order for users to be able to explore subsequence generation with a sense of direction in the generation space, e.g., interpolation, as well as exploring variations -- semantically similar possible subsequences. A context-informed prior and decoder constitute the generative model of CLSM, and a context position-informed encoder is the inference model. In experiments, we use a monophonic symbolic music dataset, demonstrating that our contextual latent space is smoother in interpolation than baselines, and the quality of generated samples is superior to baseline models. The generation examples are available online.
【22】 Multi-task manifold learning for small sample size datasets 标题:小样本数据集的多任务流形学习 链接:https://arxiv.org/abs/2111.11655
作者:Hideaki Ishibashi,Kazushi Higac,Tetsuo Furukawa 机构:Kyushu Institute of Technology,–, Hibikino, Wakamatsu-ku, Kitakyushu ,-, Japan, Horiba Ltd., Kyoto, Japan 备注:22 pages, 15 figures 摘要:在本研究中,我们开发了一种多任务流形学习方法。该方法旨在提高多任务的流形学习性能,特别是当每个任务的样本数较少时。此外,该方法还旨在为新任务生成新样本,以及为现有任务生成新样本。在该方法中,我们使用了两种不同类型的信息传输:实例传输和模型传输。例如,数据集在相似任务之间合并,而对于模型传输,流形模型在相似任务之间平均。为此,所提出的方法包括一组与任务对应的生成流形模型,这些模型集成到光纤束的通用模型中。我们将所提出的方法应用于人工数据集和人脸图像集,结果表明,该方法能够估计流形,即使是少量样本。 摘要:In this study, we develop a method for multi-task manifold learning. The method aims to improve the performance of manifold learning for multiple tasks, particularly when each task has a small number of samples. Furthermore, the method also aims to generate new samples for new tasks, in addition to new samples for existing tasks. In the proposed method, we use two different types of information transfer: instance transfer and model transfer. For instance transfer, datasets are merged among similar tasks, whereas for model transfer, the manifold models are averaged among similar tasks. For this purpose, the proposed method consists of a set of generative manifold models corresponding to the tasks, which are integrated into a general model of a fiber bundle. We applied the proposed method to artificial datasets and face image sets, and the results showed that the method was able to estimate the manifolds, even for a tiny number of samples.
【23】 FLIX: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning 标题:FLIX:一种简单高效的联邦学习局部方法替代方案 链接:https://arxiv.org/abs/2111.11556
作者:Elnur Gasanov,Ahmed Khaled,Samuel Horváth,Peter Richtárik 机构:KAUST, Saudi Arabia 摘要:联邦学习(FL)是一种越来越流行的机器学习范式,其中多个节点试图在隐私、通信和多种异质性约束下进行协作学习。联邦学习中一个长期存在的问题是,不清楚优化目标应该是什么:有监督学习的标准平均风险最小化不足以处理联邦学习特有的几个主要约束,例如通信自适应性和个性化控制。我们确定了联邦学习框架中的几个关键需求,并引入了一个新框架FLIX,该框架考虑了联邦学习带来的独特挑战。FLIX有一个标准的有限和形式,这使实践者能够利用大量现有的(可能是非局部的)分布式优化方法。通过不需要任何通信的智能初始化,FLIX不需要使用本地步骤,但仍然可证明能够与本地方法在PAR上执行相异正则化。我们给出了在通信约束下有效求解FLIX公式的几种算法。最后,我们通过大量实验验证了我们的理论结果。 摘要:Federated Learning (FL) is an increasingly popular machine learning paradigm in which multiple nodes try to collaboratively learn under privacy, communication and multiple heterogeneity constraints. A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control. We identify several key desiderata in frameworks for federated learning and introduce a new framework, FLIX, that takes into account the unique challenges brought by federated learning. FLIX has a standard finite-sum form, which enables practitioners to tap into the immense wealth of existing (potentially non-local) methods for distributed optimization. Through a smart initialization that does not require any communication, FLIX does not require the use of local steps but is still provably capable of performing dissimilarity regularization on par with local methods. We give several algorithms for solving the FLIX formulation efficiently under communication constraints. Finally, we corroborate our theoretical results with extensive experimentation.
【24】 Dynamic Regret for Strongly Adaptive Methods and Optimality of Online KRR 链接:https://arxiv.org/abs/2111.11550
作者:Dheeraj Baby,Hilaf Hasson,Yuyang Wang 机构:University of California, Santa Barbara, Amazon Research 摘要:我们考虑非平稳在线凸优化的框架,其中学习者试图控制其动态后悔反对任意序列的比较器。当损失函数是强凸函数或exp-凹函数时,我们证明了强自适应(SA)算法可以被视为根据比较器序列的路径变化$V_T$控制动态遗憾的一种原则性方法。具体地说,我们证明了SA算法在不预先知道$V_T$的情况下,分别享有$\tilde O(\sqrt{TV_T}\vee\log T)$和$\tilde O(\sqrt{dTV_T}\vee d\log T)$强凸和exp凹损失的动态后悔。基于有界线性预测和高斯核在线回归的学习设置的新结果进一步证明了该原理方法的通用性。在相关背景下,本文的第二部分讨论了Zhdanov和Kalnishkan(2010)提出的一个开放性问题,该问题涉及具有平方误差损失的在线内核回归。我们推导了一个新的关于惩罚遗憾的下界,它建立了在线核岭回归(KRR)的近极小极大最优性。我们的下界可视为Vovk(2001)中导出的有限维在线线性回归下界的RKHS扩展。 摘要:We consider the framework of non-stationary Online Convex Optimization where a learner seeks to control its dynamic regret against an arbitrary sequence of comparators. When the loss functions are strongly convex or exp-concave, we demonstrate that Strongly Adaptive (SA) algorithms can be viewed as a principled way of controlling dynamic regret in terms of path variation $V_T$ of the comparator sequence. Specifically, we show that SA algorithms enjoy $\tilde O(\sqrt{TV_T} \vee \log T)$ and $\tilde O(\sqrt{dTV_T} \vee d\log T)$ dynamic regret for strongly convex and exp-concave losses respectively without apriori knowledge of $V_T$. The versatility of the principled approach is further demonstrated by the novel results in the setting of learning against bounded linear predictors and online regression with Gaussian kernels. Under a related setting, the second component of the paper addresses an open question posed by Zhdanov and Kalnishkan (2010) that concerns online kernel regression with squared error losses. We derive a new lower bound on a certain penalized regret which establishes the near minimax optimality of online Kernel Ridge Regression (KRR). Our lower bound can be viewed as an RKHS extension to the lower bound derived in Vovk (2001) for online linear regression in finite dimensions.
【25】 Bootstrap Your Flow 标题:引导您的流 链接:https://arxiv.org/abs/2111.11510
作者:Laurence Illing Midgley,Vincent Stimper,Gregor N. C. Simm,José Miguel Hernández-Lobato 机构:Jos´e Miguel Hern´andez-Lobato, Department of Engineering, University of Cambridge 摘要:正态化流是一种灵活的参数化分布,可以通过重要性抽样来近似难处理分布的期望值。然而,当前基于流的方法仅限于具有挑战性的目标,这些目标要么受到模式寻求行为的影响,要么在训练损失中存在较大差异,要么依赖于目标分布的样本,而这些样本可能不可用。为了解决这些挑战,我们将流与退火重要性抽样(AIS)结合起来,同时使用$\alpha$-散度作为我们的目标,采用一种新的训练程序FAB(流AIS引导)。因此,流程和AIS将以自举的方式相互改进。我们证明,在以前基于流的方法失败的问题中,FAB可用于产生复杂目标分布(包括Boltzmann分布)的精确近似。 摘要:Normalising flows are flexible, parameterized distributions that can be used to approximate expectations from intractable distributions via importance sampling. However, current flow-based approaches are limited on challenging targets where they either suffer from mode seeking behaviour or high variance in the training loss, or rely on samples from the target distribution, which may not be available. To address these challenges, we combine flows with annealed importance sampling (AIS), while using the $\alpha$-divergence as our objective, in a novel training procedure, FAB (Flow AIS Bootstrap). Thereby, the flow and AIS to improve each other in a bootstrapping manner. We demonstrate that FAB can be used to produce accurate approximations to complex target distributions, including Boltzmann distributions, in problems where previous flow-based methods fail.
【26】 Graph Neural Networks with Parallel Neighborhood Aggregations for Graph Classification 标题:具有并行邻域聚集的图神经网络在图分类中的应用 链接:https://arxiv.org/abs/2111.11482
作者:Siddhant Doshi,Sundeep Prabhakar Chepuri 摘要:我们重点研究使用图神经网络(GNN)模型的图分类,该模型使用并行排列的邻域聚合图算子库预计算节点特征。由于预计算,这些GNN模型具有减少训练和推理时间的自然优势,但也从根本上不同于流行的GNN变体,后者在训练期间通过顺序邻域聚合过程更新节点特征。我们提供了理论条件,在此条件下,具有并行邻域聚合的一般GNN模型(简称PA GNN)在区分非同构图方面与著名的Weisfeiler-Lehman(WL)图同构测试一样强大。虽然PA-GNN模型与WL检验没有明显的关系,但我们证明了从这两种方法得到的图嵌入是内射相关的。然后,我们提出了一个特殊的PA-GNN模型,称为自旋模型,该模型遵循已发展的条件。我们通过数值实验证明,所开发的模型在许多不同的真实数据集上实现了最先进的性能,同时保持了WL测试的辨别能力和在训练过程之前预处理图形的计算优势。 摘要:We focus on graph classification using a graph neural network (GNN) model that precomputes the node features using a bank of neighborhood aggregation graph operators arranged in parallel. These GNN models have a natural advantage of reduced training and inference time due to the precomputations but are also fundamentally different from popular GNN variants that update node features through a sequential neighborhood aggregation procedure during training. We provide theoretical conditions under which a generic GNN model with parallel neighborhood aggregations (PA-GNNs, in short) are provably as powerful as the well-known Weisfeiler-Lehman (WL) graph isomorphism test in discriminating non-isomorphic graphs. Although PA-GNN models do not have an apparent relationship with the WL test, we show that the graph embeddings obtained from these two methods are injectively related. We then propose a specialized PA-GNN model, called SPIN, which obeys the developed conditions. We demonstrate via numerical experiments that the developed model achieves state-of-the-art performance on many diverse real-world datasets while maintaining the discriminative power of the WL test and the computational advantage of preprocessing graphs before the training process.
【27】 Gradient flows on graphons: existence, convergence, continuity equations 链接:https://arxiv.org/abs/2111.09459
作者:Sewoong Oh,Soumik Pal,Raghav Somani,Raghav Tripathi 机构: Department of Mathematics, University of Washington, Allen School of Computer Science & Engineering, University ofWashington 备注:40 pages, 2 figures 摘要:基于概率测度的Wasserstein梯度流在各种优化问题中有着广泛的应用。它们通常是可交换粒子系统的连续极限,由一些涉及梯度型势的平均场相互作用演化而来。然而,在许多问题中,例如在多层神经网络中,所谓的粒子是节点可交换的大型图的边权。这样的大型图在其尺寸增长到无穷大时会收敛到称为图素的连续极限。我们证明了具有适当边权函数的欧几里德梯度流收敛到一个新的连续极限,该极限由图素空间上的一条曲线给出,可以适当地描述为梯度流,或者更严格地说,是一条最大斜率曲线。文中介绍了图素上的几个自然函数,如同态函数和标量熵,并给出了具体的例子。 摘要:Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems. They typically arise as the continuum limit of exchangeable particle systems evolving by some mean-field interaction involving a gradient-type potential. However, in many problems, such as in multi-layer neural networks, the so-called particles are edge weights on large graphs whose nodes are exchangeable. Such large graphs are known to converge to continuum limits called graphons as their size grow to infinity. We show that the Euclidean gradient flow of a suitable function of the edge-weights converges to a novel continuum limit given by a curve on the space of graphons that can be appropriately described as a gradient flow or, more technically, a curve of maximal slope. Several natural functions on graphons, such as homomorphism functions and the scalar entropy, are covered by our set-up, and the examples have been worked out in detail.
机器翻译,仅供参考