统计学学术速递[7.16]

公众号-arXiv每日学术速递

发布于 2021-07-27 10:59:01

8160

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

stat统计学，共计41篇

【1】 Mid-flight Forecasting for CPA Lines in Online Advertising 标题：网络广告中CPA线路的中途预测

作者：Hao He,Tian Zhou,Lihua Ren,Niklas Karlsson,Aaron Flores 机构：For Verizon Media Demand Side Platform (DSP), forecasting of ad, campaign performance not only feeds key information to the op-, timization server to allow the system to operate on a high perfor- 备注：41st International Symposium on Forecasting, June 27-30, 2021 链接：https://arxiv.org/abs/2107.07494 摘要：对于Verizon MediaDemand-Side平台（DSP），广告活动绩效预测不仅向优化服务器提供关键信息，使系统能够在高性能模式下运行，还为广告商提供可操作的见解。本文采用招投标机制，研究了航班中线CPA线路的预测问题。提出的方法生成了各种关键性能指标和优化信号之间的关系。它还可以用来估计广告活动绩效指标对优化信号调整的敏感性，对广告活动管理系统的设计具有重要意义。分析了广告主支出与有效每次行动成本（eCPA）之间的关系，为广告主进行航线调整提供了依据。文中还讨论了实现中的一些实际问题，如数据集的降采样。最后，将预测结果与实际交货情况进行了比较，验证了预测结果的准确性。摘要：For Verizon MediaDemand Side Platform(DSP), forecasting of ad campaign performance not only feeds key information to the optimization server to allow the system to operate on a high-performance mode, but also produces actionable insights to the advertisers. In this paper, the forecasting problem for CPA lines in the middle of the flight is investigated by taking the bidding mechanism into account. The proposed methodology generates relationships between various key performance metrics and optimization signals. It can also be used to estimate the sensitivity of ad campaign performance metrics to the adjustments of optimization signal, which is important to the design of a campaign management system. The relationship between advertiser spends and effective Cost Per Action(eCPA) is also characterized, which serves as a guidance for mid-flight line adjustment to the advertisers. Several practical issues in implementation, such as downsampling of the dataset, are also discussed in the paper. At last, the forecasting results are validated against actual deliveries and demonstrates promising accuracy.

【2】 Personalized and Reliable Decision Sets: Enhancing Interpretability in Clinical Decision Support Systems 标题：个性化和可靠的决策集：增强临床决策支持系统的可解释性

作者：Francisco Valente,Simão Paredes,Jorge Henriques 机构： 1Centre for Informatics and Systems of University of Coimbra, Portugal 2Coimbra Institute of Engineering(ISEC) 备注：Accepted to the ICML 2021 Workshop on Interpretable Machine Learning in Healthcare 链接：https://arxiv.org/abs/2107.07483 摘要：在这项研究中，我们提出了一个新的临床决策支持系统，并讨论了其解释性相关的性质。它将决策规则集与机器学习方案相结合，以提供全局和局部的可解释性。更具体地说，机器学习用于预测这些规则中的每一个对于特定患者是正确的可能性，这也可能有助于更好的预测性能。此外，个人预测的可靠性分析也被提出，有助于进一步个性化的解释。这几个要素的结合对于获得临床利益相关者的信任至关重要，从而更好地评估患者的病情和改善医生的决策。摘要：In this study, we present a novel clinical decision support system and discuss its interpretability-related properties. It combines a decision set of rules with a machine learning scheme to offer global and local interpretability. More specifically, machine learning is used to predict the likelihood of each of those rules to be correct for a particular patient, which may also contribute to better predictive performances. Moreover, the reliability analysis of individual predictions is also addressed, contributing to further personalized interpretability. The combination of these several elements may be crucial to obtain the clinical stakeholders' trust, leading to a better assessment of patients' conditions and improvement of the physicians' decision-making.

【3】 A comparison of nonlinear extensions to the ensemble Kalman filter: Gaussian Anamorphosis and Two-Step Ensemble Filters 标题：集合卡尔曼过滤非线性扩展的比较：高斯变形和两步集合滤波器

作者：Ian Grooms 机构：Department of Applied Mathematics, University of Colorado, Boulder, CO, USA 备注：23 pages, 4 figures 链接：https://arxiv.org/abs/2107.07475 摘要：本文综述了集成卡尔曼滤波的两种非线性、非高斯扩展：高斯变形（GA）方法和两步更新，其中秩直方图滤波（RHF）是一个典型的例子。GA-EnKF方法在应用EnKF之前对状态变量和观测变量进行单变量变换，使其分布更为高斯。两步方法第一步使用标量贝叶斯更新，第二步使用线性回归。将两步框架与全贝叶斯问题联系起来，为全贝叶斯环境下更先进的两步方法打开了大门。提出了一种新的两步框架第一部分的方法，其形式与RHF相似，但动机不同，称为改进的RHF（iRHF）。用Lorenz-`96模型进行的一系列实验表明，GA-EnKF方法与EnKF方法相似，并且优于EnKF方法。实验还有力地支持了RHF和iRHF滤波器对非线性和非高斯观测的精度；这些方法在实验中均优于EnKF和GA-EnKF方法。在本文报道的实验中，新的iRHF方法只比RHF方法在小的系综尺寸下更精确。摘要：This paper reviews two nonlinear, non-Gaussian extensions of the Ensemble Kalman Filter: Gaussian anamorphosis (GA) methods and two-step updates, of which the rank histogram filter (RHF) is a prototypical example. GA-EnKF methods apply univariate transforms to the state and observation variables to make their distribution more Gaussian before applying an EnKF. The two-step methods use a scalar Bayesian update for the first step, followed by linear regression for the second step. The connection of the two-step framework to the full Bayesian problem is made, which opens the door to more advanced two-step methods in the full Bayesian setting. A new method for the first part of the two-step framework is proposed, with a similar form to the RHF but a different motivation, called the `improved RHF' (iRHF). A suite of experiments with the Lorenz-`96 model demonstrate situations where the GA-EnKF methods are similar to EnKF, and where they outperform EnKF. The experiments also strongly support the accuracy of the RHF and iRHF filters for nonlinear and non-Gaussian observations; these methods uniformly beat the EnKF and GA-EnKF methods in the experiments reported here. The new iRHF method is only more accurate than RHF at small ensemble sizes in the experiments reported here.

【4】 Multi-label Chaining with Imprecise Probabilities 标题：具有不精确概率的多标签链

作者：Yonatan Carlos Carranza Alarcón,Sébastien Destercke 机构：Multi-label chaining with imprecise probabilitiesYonatan Carlos Carranza Alarc´on 1[0000−000 2−86 57−6 3 5 5] and S´ebastienDestercke 1[0000−000 3− 20 26− 468X]Sorbonne Universit´es, Universit´e Technologique de Compiegne 链接：https://arxiv.org/abs/2107.07443 摘要：我们提出了两种不同的策略来扩展经典的多标签链方法来处理不精确的概率估计。这些估计使用分布的凸集（或credal集）来描述我们的不确定性，而不是精确的不确定性。使用这种估计的主要原因是：（1）当链中检测到高度不确定性时，做出谨慎的预测（或根本不做决定），以及（2）通过避免链中早期决策中产生的偏差，做出更精确的预测。通过使用朴素credal分类器，我们提出了有效的程序和理论证明来解决这两种策略。我们在缺失标签上的实验结果表明，我们的方法对精确模型失败的那些难以预测的实例产生了相应的谨慎性。摘要：We present two different strategies to extend the classical multi-label chaining approach to handle imprecise probability estimates. These estimates use convex sets of distributions (or credal sets) in order to describe our uncertainty rather than a precise one. The main reasons one could have for using such estimations are (1) to make cautious predictions (or no decision at all) when a high uncertainty is detected in the chaining and (2) to make better precise predictions by avoiding biases caused in early decisions in the chaining. Through the use of the naive credal classifier, we propose efficient procedures with theoretical justifications to solve both strategies. Our experimental results on missing labels, which investigate how reliable these predictions are in both approaches, indicate that our approaches produce relevant cautiousness on those hard-to-predict instances where the precise models fail.

【5】 FastSHAP: Real-Time Shapley Value Estimation 标题：FastSHAP：Shapley值的实时估计

作者：Neil Jethani,Mukund Sudarshan,Ian Covert,Su-In Lee,Rajesh Ranganath 机构：New York University, University of Washington 备注：20 pages, 10 figures, 3 tables 链接：https://arxiv.org/abs/2107.07436 摘要：Shapley值被广泛用于解释黑匣子模型，但是它们的计算成本很高，因为它们需要许多模型评估。我们介绍了FastSHAP，一种利用学习的解释模型估计单次向前传球中Shapley值的方法。FastSHAP通过受Shapley值的加权最小二乘特征启发的学习方法来分摊解释许多输入的成本，并且可以使用标准随机梯度优化来训练它。我们将FastSHAP与现有的估计方法进行了比较，结果表明它能产生高质量的解释，并具有数量级的加速比。摘要：Shapley values are widely used to explain black-box models, but they are costly to calculate because they require many model evaluations. We introduce FastSHAP, a method for estimating Shapley values in a single forward pass using a learned explainer model. FastSHAP amortizes the cost of explaining many inputs via a learning approach inspired by the Shapley value's weighted least squares characterization, and it can be trained using standard stochastic gradient optimization. We compare FastSHAP to existing estimation approaches, revealing that it generates high-quality explanations with orders of magnitude speedup.

【6】 Statistical inference using Regularized M-estimation in the reproducing kernel Hilbert space for handling missing data 标题：再生核Hilbert空间中处理缺失数据的正则化M-估计统计推断

作者：Hengfang Wang,Jae Kwang Kim 备注：arXiv admin note: text overlap with arXiv:2102.00058 链接：https://arxiv.org/abs/2107.07371 摘要：插补和倾向评分加权是处理缺失数据的两种常用方法。我们在再生核Hilbert空间中使用正则M-估计技术来解决这些问题。具体来说，我们首先使用核岭回归发展插补处理项目无反应。虽然这种非参数方法有可能用于插补，但其统计特性并未在文献中进行研究。在调整参数阶数一定的条件下，我们首先建立了核岭回归插补估计的根一致性，并证明了它达到了半参数渐近方差的下界。通过最大熵方法在密度比函数估计中的新应用，提出了一种基于再生核Hilbert空间的非参数倾向得分估计方法。我们证明了由此得到的倾向得分估计与核岭回归插补估计是渐近等价的。有限的模拟研究结果也证实了我们的理论。将该方法应用于北京市大气污染实测数据的分析。摘要：Imputation and propensity score weighting are two popular techniques for handling missing data. We address these problems using the regularized M-estimation techniques in the reproducing kernel Hilbert space. Specifically, we first use the kernel ridge regression to develop imputation for handling item nonresponse. While this nonparametric approach is potentially promising for imputation, its statistical properties are not investigated in the literature. Under some conditions on the order of the tuning parameter, we first establish the root-$n$ consistency of the kernel ridge regression imputation estimator and show that it achieves the lower bound of the semiparametric asymptotic variance. A nonparametric propensity score estimator using the reproducing kernel Hilbert space is also developed by a novel application of the maximum entropy method for the density ratio function estimation. We show that the resulting propensity score estimator is asymptotically equivalent to the kernel ridge regression imputation estimator. Results from a limited simulation study are also presented to confirm our theory. The proposed method is applied to analyze the air pollution data measured in Beijing, China.

【7】 The Completion of Covariance Kernels 标题：协方差核函数的完备性

作者：Kartik G. Waghmare,Victor M. Panaretos 链接：https://arxiv.org/abs/2107.07350 摘要：我们考虑正半连续延拓问题：将一个部分指定的协方差核从子域$$ω$推广到域i$i $到整个域上的协方差核$i\倍i $。对于一类广泛的域$\Omega$称为锯齿域，我们能够给出一个完整的理论。也就是说，我们证明了规范完备总是存在的，并且可以显式构造。我们将所有可能的完备刻画为正则完备的适当扰动，并确定了唯一完备存在的充要条件。我们通过它在相关的高斯过程上诱导的图形模型结构来解释正则完备。此外，我们还证明了正则完备估计如何在Hilbert-Schmidt算子空间中化为线性统计反问题组的解，并在标准源条件下给出了收敛速度。最后，我们将我们的理论扩展到更一般形式的领域。摘要：We consider the problem of positive-semidefinite continuation: extending a partially specified covariance kernel from a subdomain $\Omega$ of a domain $I\times I$ to a covariance kernel on the entire domain $I\times I$. For a broad class of domains $\Omega$ called serrated domains, we are able to present a complete theory. Namely, we demonstrate that a canonical completion always exists and can be explicitly constructed. We characterise all possible completions as suitable perturbations of the canonical completion, and determine necessary and sufficient conditions for a unique completion to exist. We interpret the canonical completion via the graphical model structure it induces on the associated Gaussian process. Furthermore, we show how the estimation of the canonical completion reduces to the solution of a system of linear statistical inverse problems in the space of Hilbert-Schmidt operators, and derive rates of convergence under standard source conditions. We conclude by providing extensions of our theory to more general forms of domains.

【8】 A unified framework for bandit multiple testing 标题：一种统一的盗版多重测试框架

作者：Ziyu Xu,Ruodu Wang,Aaditya Ramdas 机构：Departments of ,Statistics and ,Machine Learning, Carnegie Mellon University, Department of Statistics and Actuarial Science, University of Waterloo 备注：37 pages. 6 figures 链接：https://arxiv.org/abs/2107.07322 摘要：在bandit多假设测试中，每个arm对应一个我们希望测试的不同的零假设，目标是设计自适应算法，正确地识别大量有趣的arm（真实的发现），而只是错误地识别一些不感兴趣的arm（错误的发现）。非bandit多重测试中的一个常见指标是错误发现率（FDR）。我们提出了一个统一的，模块化的框架，土匪罗斯福控制，强调解耦的探索和总结的证据。我们利用强大的基于鞅的“e-过程”概念来确保在一般问题设置中对任意复合零、探索规则和停止时间的FDR控制。特别地，有效的FDR控制保持，即使武器的报酬分配可能是依赖的，多个武器可以同时被查询，并且多个（合作或竞争的）代理可以查询武器，也包括组合半强盗类型设置。先前的工作已经非常详细地考虑了这样一种设置，即每个手臂的报酬分布是独立的、次高斯分布的，并且在每一步都查询一个手臂。在这种特殊情况下，我们的框架恢复了匹配的样本复杂性保证，并且在实践中表现得相当或更好。对于其他设置，样本复杂性将取决于问题的更精细细节（正在测试的复合空值、探索算法、数据依赖结构、停止规则），我们不探索这些；我们的贡献是要表明罗斯福的保证是干净的，对这些细节是完全不可知的。摘要：In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that we wish to test, and the goal is to design adaptive algorithms that correctly identify large set of interesting arms (true discoveries), while only mistakenly identifying a few uninteresting ones (false discoveries). One common metric in non-bandit multiple testing is the false discovery rate (FDR). We propose a unified, modular framework for bandit FDR control that emphasizes the decoupling of exploration and summarization of evidence. We utilize the powerful martingale-based concept of ``e-processes'' to ensure FDR control for arbitrary composite nulls, exploration rules and stopping times in generic problem settings. In particular, valid FDR control holds even if the reward distributions of the arms could be dependent, multiple arms may be queried simultaneously, and multiple (cooperating or competing) agents may be querying arms, covering combinatorial semi-bandit type settings as well. Prior work has considered in great detail the setting where each arm's reward distribution is independent and sub-Gaussian, and a single arm is queried at each step. Our framework recovers matching sample complexity guarantees in this special case, and performs comparably or better in practice. For other settings, sample complexities will depend on the finer details of the problem (composite nulls being tested, exploration algorithm, data dependence structure, stopping rule) and we do not explore these; our contribution is to show that the FDR guarantee is clean and entirely agnostic to these details.

【9】 Nonparametric Statistical Inference via Metric Distribution Function in Metric Spaces 标题：度量空间中度量分布函数的非参数统计推断

作者：Xueqin Wang,Jin Zhu,Wenliang Pan,Junhao Zhu,Heping Zhang 机构：for the Alzheimer’s Disease Neuroimaging Initiative, University of Science and Technology of China, Sun Yat-Sen University, Yale University, arXiv:,.,v, [stat.ME] , Jul 链接：https://arxiv.org/abs/2107.07317 摘要：分布函数是统计推断中必不可少的函数，利用测度论中的对应定理和Glivenko-Cantelli、Donsker性质，将分布函数与样本联系起来形成有向闭环。这种联系创造了统计推断的范例。然而，现有的分布函数是在欧几里德空间中定义的，在快速演化的复杂数据对象中不再方便使用。为了满足新出现的需求，必须在一个更普遍的空间中发展分配函数的概念。注意，线性允许我们使用超立方体来定义欧氏空间中的分布函数，但是如果没有度量空间中的线性，我们必须使用度量来研究概率度量。通过度量空间中随机对象与固定位置之间的度量，引入了一类度量分布函数。我们通过证明度量空间中的度量分布函数的对应定理和Glivenko Cantelli定理克服了这一具有挑战性的步骤，这是度量空间值数据进行理性统计推断的基础。然后，我们对非欧几里德随机对象进行了同质性检验和相互独立性检验，并给出了综合的经验证据来支持我们提出的方法的性能。摘要：Distribution function is essential in statistical inference, and connected with samples to form a directed closed loop by the correspondence theorem in measure theory and the Glivenko-Cantelli and Donsker properties. This connection creates a paradigm for statistical inference. However, existing distribution functions are defined in Euclidean spaces and no longer convenient to use in rapidly evolving data objects of complex nature. It is imperative to develop the concept of distribution function in a more general space to meet emerging needs. Note that the linearity allows us to use hypercubes to define the distribution function in a Euclidean space, but without the linearity in a metric space, we must work with the metric to investigate the probability measure. We introduce a class of metric distribution functions through the metric between random objects and a fixed location in metric spaces. We overcome this challenging step by proving the correspondence theorem and the Glivenko-Cantelli theorem for metric distribution functions in metric spaces that lie the foundation for conducting rational statistical inference for metric space-valued data. Then, we develop homogeneity test and mutual independence test for non-Euclidean random objects, and present comprehensive empirical evidence to support the performance of our proposed methods.

【10】 The Taxicab Sampler: MCMC for Discrete Spaces with Application to Tree Models 标题：Taxicab采样器：离散空间MCMC及其在树木模型中的应用

作者：Vincent Geels,Matthew Pratola,Radu Herbei 链接：https://arxiv.org/abs/2107.07313 摘要：基于贝叶斯模型中离散但非常复杂的状态空间的探索问题，我们提出了一种新的马尔可夫链蒙特卡罗搜索算法：出租车取样器。我们描述了这种采样器的结构，并讨论了它的解释和使用如何不同于标准的大都会黑斯廷斯以及密切相关的汉明球采样器。所提出的出租车抽样算法证明，相对于激励贝叶斯回归树计数模型中的na“ive Metropolis Hastings搜索，计算时间有了实质性的改进，其中，我们利用离散状态空间假设构造了一个新的似然函数，该似然函数允许灵活地描述不同的均值-方差关系，同时与现有的计数数据似然函数相比保留了参数的可解释性。摘要：Motivated by the problem of exploring discrete but very complex state spaces in Bayesian models, we propose a novel Markov Chain Monte Carlo search algorithm: the taxicab sampler. We describe the construction of this sampler and discuss how its interpretation and usage differs from that of standard Metropolis-Hastings as well as the closely-related Hamming ball sampler. The proposed taxicab sampling algorithm is then shown to demonstrate substantial improvement in computation time relative to a na\"ive Metropolis-Hastings search in a motivating Bayesian regression tree count model, in which we leverage the discrete state space assumption to construct a novel likelihood function that allows for flexibly describing different mean-variance relationships while preserving parameter interpretability compared to existing likelihood functions for count data.

【11】 Covariate adjustment in randomised trials: canonical link functions protect against model mis-specification 标题：随机试验中的协变量调整：典型连接函数防止模型误指定

作者：Ian R. White,Tim P Morris,Elizabeth Williamson 机构： MRC Clinical Trials Unit at UCL, Institute of Clinical Trials, and Methodology, High Holborn, London WC,V ,LJ, UK., Department of Medical Statistics, LSHTM, UK 备注：10 pages, 1 figure 链接：https://arxiv.org/abs/2107.07278 摘要：协变量调整有可能增加随机试验分析的能力，但调整模型的错误说明可能会导致错误。我们探讨了在协变量在随机治疗之间完全平衡的情况下，当调整模型通过随机治疗相互作用忽略协变量时可能出现的错误。我们使用数学论证和对单个假设数据集的分析。我们证明了用一个带有正则连接函数的广义线性模型进行分析，在空值下不会产生误差——也就是说，如果处理效果在调整后的模型下确实为零，那么在未调整的模型下也是零。但是，使用非规范链接函数并不能提供此属性，并且在null下会导致潜在的重要错误。即使在大样本中也存在误差，因此构成偏差。我们的结论是，随机试验的协变量调整分析应避免非规范联系。如果边际风险差异是估计的目标，则不应使用同一联系进行估计；替代的优选方法包括标准化和治疗加权的逆概率。摘要：Covariate adjustment has the potential to increase power in the analysis of randomised trials, but mis-specification of the adjustment model could cause error. We explore what error is possible when the adjustment model omits a covariate by randomised treatment interaction, in a setting where the covariate is perfectly balanced between randomised treatments. We use mathematical arguments and analyses of single hypothetical data sets. We show that analysis by a generalised linear model with the canonical link function leads to no error under the null -- that is, if treatment effect is truly zero under the adjusted model then it is also zero under the unadjusted model. However, using non-canonical link functions does not give this property and leads to potentially important error under the null. The error is present even in large samples and hence constitutes bias. We conclude that covariate adjustment analyses of randomised trials should avoid non-canonical links. If a marginal risk difference is the target of estimation then this should not be estimated using an identity link; alternative preferable methods include standardisation and inverse probability of treatment weighting.

【12】 Nonparametric, tuning-free estimation of S-shaped functions 标题：S形函数的非参数、免调谐估计

作者：Oliver Y. Feng,Yining Chen,Qiyang Han,Raymond J. Carroll,Richard J. Samworth 机构：∗Statistical Laboratory, University of Cambridge, †Department of Statistics, London School of Economics and Political Science, ‡Department of Statistics, Rutgers University, ♯Department of Statistics, Texas A&M University 备注：79 pages, 10 figures 链接：https://arxiv.org/abs/2107.07257 摘要：我们考虑S型回归函数的非参数估计。最小二乘估计提供了一种非常自然的、无需调整的方法，但由于拐点未知，因此会导致非凸优化问题。我们证明了这种估计可以看作是凸锥的有限并集上的投影，这使得我们可以提出一种混合原对偶基算法来实现它的高效连续计算。在发展了一个投影框架来证明估计量的一致性和鲁棒性之后，我们的主要理论结果提供了尖锐的预言不等式，这些不等式产生了回归函数估计的最坏情况和自适应风险界，以及拐点估计的收敛速度。这些结果不仅表明，对于回归函数及其拐点的估计（在后一种情况下达到对数因子），估计量都达到了minimax最优收敛速度，而且，当真回归函数是分段仿射且仿射片段不太多时，它能达到几乎参数化的速率。模拟和大气污染模型的实际数据应用也证实了估计器的理想有限样本特性，并且我们的算法在R软件包Sshaped中实现。摘要：We consider the nonparametric estimation of an S-shaped regression function. The least squares estimator provides a very natural, tuning-free approach, but results in a non-convex optimisation problem, since the inflection point is unknown. We show that the estimator may nevertheless be regarded as a projection onto a finite union of convex cones, which allows us to propose a mixed primal-dual bases algorithm for its efficient, sequential computation. After developing a projection framework that demonstrates the consistency and robustness to misspecification of the estimator, our main theoretical results provide sharp oracle inequalities that yield worst-case and adaptive risk bounds for the estimation of the regression function, as well as a rate of convergence for the estimation of the inflection point. These results reveal not only that the estimator achieves the minimax optimal rate of convergence for both the estimation of the regression function and its inflection point (up to a logarithmic factor in the latter case), but also that it is able to achieve an almost-parametric rate when the true regression function is piecewise affine with not too many affine pieces. Simulations and a real data application to air pollution modelling also confirm the desirable finite-sample properties of the estimator, and our algorithm is implemented in the R package Sshaped.

【13】 Statistical modeling of corneal OCT speckle. A distributional model-free approach 标题：角膜OCT散斑的统计建模。一种无分布模型的方法

作者：Marcela Niemczyk,D. Robert Iskander 链接：https://arxiv.org/abs/2107.07256 摘要：在生物医学光学中，利用散斑分布模型的参数作为生物标志物，对散斑的振幅进行统计建模是一个很有意义的问题。本文提出了一种范式转换，即采用无模型的分布式方法。具体地说，考虑了归一化散斑振幅样本的经验非参数分布和基准Rayleigh分布之间在不同域中评估的距离范围。使用来自模型的OCT图像、两个猪角膜的离体实验和一个人类角膜的活体实验，提供了一个证据，证明无分布模型方法尽管简单，但比最佳拟合（在一系列考虑的模型中）的分布模型能产生更好的结果。最后，在实际应用中，在采用基于分布的散斑建模方法之前，应首先考虑无分布模型的散斑建模方法。摘要：In biomedical optics, it is often of interest to statistically model the amplitude of the speckle using some distributional models with their parameters acting as biomarkers. In this paper, a paradigm shift is being advocated in which a distributional model-free approach is used. Specifically, a range of distances, evaluated in different domains, between an empirical nonparametric distribution of the normalized speckle amplitude sample and the benchmark Rayleigh distribution, is considered. Using OCT images from phantoms, two ex-vivo experiments with porcine corneas and an in-vivo experiment with human corneas, an evidence is provided that the distributional model-free approach, despite its simplicity, could lead to better results than the best-fitted (among a range of considered models) distributional model. Concluding, in practice, the distributional model-free approach should be considered as the first choice to speckle modeling before a distributional-based approach is utilized.

【14】 Estimation of spatially varying parameters with application to hyperbolic SPDEs 标题：空间变化参数估计及其在双曲SPDEs中的应用

作者：David Angwenyi 机构： This work was funded by the DAAD and the Kenya National Research Fund (NRF)†Masinde Muliro University of Science and Technology (dangwenyi 链接：https://arxiv.org/abs/2107.07246 摘要：通常，我们会遇到不同参数的问题，而不是静态参数。本文讨论了参数随空间变化的估计问题。我们使用Metropolis-Hastings算法作为最大滤波器似然的选择准则。比较了空间变化参数和状态的联合估计。我们用两个双曲线spde：平流方程和波动方程来说明本文所采用的方法。大都会黑斯廷斯程序记录了更好的估计。摘要：More often than not, we encounter problems with varying parameters as opposed to those that are static. In this paper, we treat the estimation of parameters which vary with space. We use Metropolis-Hastings algorithm as a selection criteria for the maximum filter likelihood. Comparisons are made with the use of joint estimation of both the spatially varying parameters and the state. We illustrate the procedures employed in this paper by means of two hyperbolic SPDEs: the advection and the wave equation. The Metropolis-Hastings procedure registers better estimates.

【15】 The Information Projection in Moment Inequality Models: Existence, Dual Representation, and Approximation 标题：矩不等式模型中的信息投影：存在性、对偶表示和逼近

作者：Rami V. Tabri 链接：https://arxiv.org/abs/2107.07140 摘要：本文给出了矩不等式模型无穷维信息投影的新的存在性、对偶表示和逼近结果。这些结果是在矩不等式模型的一般规范下建立的，嵌套了条件和无条件模型，并考虑了无穷多这样的不等式。本文的一个重要创新是将对偶变量表示为弱向量值积分，给出了$I$-投影等价Fenchel对偶问题的近似格式。特别地，在适当的假设条件下，证明了对偶问题的最优值可以用有限维规划的值来近似，而且，近似规划的最优解序列的每个积累点都是对偶问题的最优解。本文阐述了无条件和条件一阶随机优势约束条件下的假设验证和近似格式参数的构造。摘要：This paper presents new existence, dual representation and approximation results for the information projection in the infinite-dimensional setting for moment inequality models. These results are established under a general specification of the moment inequality model, nesting both conditional and unconditional models, and allowing for an infinite number of such inequalities. An important innovation of the paper is the exhibition of the dual variable as a weak vector-valued integral to formulate an approximation scheme of the $I$-projection's equivalent Fenchel dual problem. In particular, it is shown under suitable assumptions that the dual problem's optimum value can be approximated by the values of finite-dimensional programs, and that, in addition, every accumulation point of a sequence of optimal solutions for the approximating programs is an optimal solution for the dual problem. This paper illustrates the verification of assumptions and the construction of the approximation scheme's parameters for the cases of unconditional and conditional first-order stochastic dominance constraints.

【16】 Principal component analysis for Gaussian process posteriors 标题：高斯过程后验的主成分分析

作者：Hideaki Ishibashi,Shotaro Akaho 机构：Kyushu Institute of Technology., The National Institute of Advanced Industrial Science and Technology RIKEN AIP. 链接：https://arxiv.org/abs/2107.07115 摘要：提出了一种基于GP-PCA的高斯过程后验主成分分析方法。由于GP-PCA估计的是GP后验概率的低维空间，因此它可以用于元学习，元学习是通过估计一组任务的结构来提高新任务精度的框架。问题是如何定义一组具有无限维参数（如坐标系和散度）的GPs的结构。在本研究中，我们在资讯几何的框架下，藉由考虑具有相同优先权的GP后验空间，将GP的无穷性化为有限维情形。此外，我们提出了一种基于变分推理的GP-PCA逼近方法，并通过实验验证了GP-PCA作为元学习的有效性。摘要：This paper proposes an extension of principal component analysis for Gaussian process posteriors denoted by GP-PCA. Since GP-PCA estimates a low-dimensional space of GP posteriors, it can be used for meta-learning, which is a framework for improving the precision of a new task by estimating a structure of a set of tasks. The issue is how to define a structure of a set of GPs with an infinite-dimensional parameter, such as coordinate system and a divergence. In this study, we reduce the infiniteness of GP to the finite-dimensional case under the information geometrical framework by considering a space of GP posteriors that has the same prior. In addition, we propose an approximation method of GP-PCA based on variational inference and demonstrate the effectiveness of GP-PCA as meta-learning through experiments.

【17】 Hida-Matérn Kernel 标题：Hida-Matérn核

作者：Matthew Dowling,Piotr Sokół,Il Memming Park 机构：Department of Neurobiology and Behavior, Department of Electrical and Computer Engineering, Institute for Advanced Computational Sciences, Institute of AI-driven Discovery and Innovation, Stony Brook University, Stony Brook, NY, USA 链接：https://arxiv.org/abs/2107.07098 摘要：我们提出了一类Hida-Mat核，它是平稳Gauss-Markov过程整个空间上协方差函数的规范族。它通过允许在具有振荡成分的过程上灵活构造先验，扩展到Mat核上。任何平稳核，包括广泛使用的平方指数核和谱混合核，要么直接在这类核内，要么是适当的渐近极限，证明了这类核的普遍性。利用它的马尔可夫性质，我们展示了如何仅使用核及其导数来表示状态空间模型这样的过程。反过来，这使我们能够更有效地执行高斯过程推理，并减轻通常的计算负担。我们还展示了如何利用状态空间表示的特殊性质，在进一步降低计算复杂度的同时，提高数值稳定性。摘要：We present the class of Hida-Mat\'ern kernels, which is the canonical family of covariance functions over the entire space of stationary Gauss-Markov Processes. It extends upon Mat\'ern kernels, by allowing for flexible construction of priors over processes with oscillatory components. Any stationary kernel, including the widely used squared-exponential and spectral mixture kernels, are either directly within this class or are appropriate asymptotic limits, demonstrating the generality of this class. Taking advantage of its Markovian nature we show how to represent such processes as state space models using only the kernel and its derivatives. In turn this allows us to perform Gaussian Process inference more efficiently and side step the usual computational burdens. We also show how exploiting special properties of the state space representation enables improved numerical stability in addition to further reductions of computational complexity.

【18】 Entropic Inequality Constraints from e-separation Relations in Directed Acyclic Graphs with Hidden Variables标题：隐变量有向无环图中e-分离关系的熵不等式约束

作者：Noam Finkelstein,Beata Zjawin,Elie Wolfe,Ilya Shpitser,Robert W. Spekkens 机构： Johns Hopkins University, Department of Computer Science, N Charles St, Baltimore, MD USA, Perimeter Institute for Theoretical Physics, Caroline St. N, Waterloo, Ontario, Canada, N,L ,Y 备注：15 pages. This arXiv version is slightly updated relative to the version in UAI proceedings. (Theorem 5 and Proposition 8 have been strengthened, with Appendix C revised correspondingly. Appendix D has been added.) 链接：https://arxiv.org/abs/2107.07087 摘要：带隐变量的有向无环图（dag）常被用来刻画系统中变量之间的因果关系。当一些变量不被观测时，DAG意味着对观测变量分布的一组众所周知的复杂约束。在这项工作中，我们提出了熵不等式约束所隐含的$e$-分离关系的隐变量DAG离散观测变量。这些约束可以直观地理解为遵循这样一个事实：沿因果路径的变量传递信息的能力受到其熵的限制；e、在极端情况下，熵为$0$的变量不能传递任何信息。我们展示了如何使用这些约束来从观察到的数据分布中了解真正的因果模型。此外，我们提出了一种因果影响的度量方法，称为最小中间熵，并证明它可以扩充传统的度量方法，如平均因果影响。摘要：Directed acyclic graphs (DAGs) with hidden variables are often used to characterize causal relations between variables in a system. When some variables are unobserved, DAGs imply a notoriously complicated set of constraints on the distribution of observed variables. In this work, we present entropic inequality constraints that are implied by $e$-separation relations in hidden variable DAGs with discrete observed variables. The constraints can intuitively be understood to follow from the fact that the capacity of variables along a causal pathway to convey information is restricted by their entropy; e.g. at the extreme case, a variable with entropy $0$ can convey no information. We show how these constraints can be used to learn about the true causal model from an observed data distribution. In addition, we propose a measure of causal influence called the minimal mediary entropy, and demonstrate that it can augment traditional measures such as the average causal effect.

【19】 Independence weights for causal inference with continuous exposures 标题：连续暴露条件下因果推断的独立性权重

作者：Jared D. Huling,Noah Greifer,Guanhua Chen 机构：Division of Biostatistics, University of Minnesota, Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison 链接：https://arxiv.org/abs/2107.07086 摘要：研究持续暴露的因果关系对于深入了解许多干预措施、政策或药物非常重要，然而研究人员为此往往只能进行观察性研究。在观察环境中，混淆是估计因果关系的障碍。加权方法试图通过重新加权样本来控制混杂，使混杂在不同的暴露值之间具有可比性，然而对于连续暴露，加权方法对模型的误判非常敏感。在本文中，我们阐明的关键性质，使权重有效地估计因果数量涉及连续暴露。我们表明，为了消除混淆，权重应该使暴露和混淆独立于加权尺度。我们开发了一个度量，它描述了一组权重导致这种独立性的程度。进一步，我们提出了一种新的无模型的权重估计方法。我们研究了我们的度量和我们的权重的理论性质，并证明了我们的权重可以显式地减轻暴露混杂依赖。在一系列具有挑战性的数值实验中，我们证明了我们方法的经验有效性，在这些实验中我们发现我们的权重是相当稳健的，并且在广泛的设置下工作得很好。摘要：Studying causal effects of continuous exposures is important for gaining a deeper understanding of many interventions, policies, or medications, yet researchers are often left with observational studies for doing so. In the observational setting, confounding is a barrier to estimation of causal effects. Weighting approaches seek to control for confounding by reweighting samples so that confounders are comparable across different values of the exposure, yet for continuous exposures, weighting methods are highly sensitive to model misspecification. In this paper we elucidate the key property that makes weights effective in estimating causal quantities involving continuous exposures. We show that to eliminate confounding, weights should make exposure and confounders independent on the weighted scale. We develop a measure that characterizes the degree to which a set of weights induces such independence. Further, we propose a new model-free method for weight estimation by optimizing our measure. We study the theoretical properties of our measure and our weights, and prove that our weights can explicitly mitigate exposure-confounder dependence. The empirical effectiveness of our approach is demonstrated in a suite of challenging numerical experiments, where we find that our weights are quite robust and work well under a broad range of settings.

【20】 A new class of conditional Markov jump processes with regime switching and path dependence: properties and maximum likelihood estimation 标题：一类新的具有状态切换和路径依赖的条件马尔可夫跳跃过程的性质和极大似然估计

作者：Budhi Surya 备注：28 pages, 3 figures 链接：https://arxiv.org/abs/2107.07026 摘要：本文提出了一类新的具有区域切换和路径依赖的条件Markov跳过程。所开发过程的关键新特性在于它能够在从一个状态移动到另一个状态时切换转换速率，切换概率取决于过程的当前状态和时间以及其过去的轨迹。因此，从当前状态到另一个状态的转换取决于进程在该状态下的保持时间。利用有限个不同的转移矩阵表示的速度区域、在每个状态中选择区域成员的概率以及过程的过去实现，给出了过程的分布性质。特别是，它具有分布等价随机表示，具有Frydman和Surya（2020）提出的马尔可夫跳跃过程的一般混合。以封闭形式导出了过程分布参数的极大似然估计。使用EM算法迭代地进行估计。采用Akaike信息准则来评价所选模型的拟合优度。为了计算极大似然估计的标准误差，导出了极大似然估计的显式观测Fisher信息矩阵。信息矩阵采用Louis（1982）的一般矩阵公式的简化形式。给出了极大似然估计的大样本性质。特别地，过渡率的极大似然估计的协方差矩阵等于Cram′er-Rao下界，而对于政权成员的极大似然估计的协方差矩阵较小。仿真研究证实了这些结论，并表明参数估计是准确的，一致的，并且随着样本量的增加具有渐近正态性。摘要：This paper develops a new class of conditional Markov jump processes with regime switching and paths dependence. The key novel feature of the developed process lies on its ability to switch the transition rate as it moves from one state to another with switching probability depending on the current state and time of the process as well as its past trajectories. As such, the transition from current state to another depends on the holding time of the process in the state. Distributional properties of the process are given explicitly in terms of the speed regimes represented by a finite number of different transition matrices, the probabilities of selecting regime membership within each state, and past realization of the process. In particular, it has distributional equivalent stochastic representation with a general mixture of Markov jump processes introduced in Frydman and Surya (2020). Maximum likelihood estimates (MLE) of the distribution parameters of the process are derived in closed form. The estimation is done iteratively using the EM algorithm. Akaike information criterion is used to assess the goodness-of-fit of the selected model. An explicit observed Fisher information matrix of the MLE is derived for the calculation of standard errors of the MLE. The information matrix takes on a simplified form of the general matrix formula of Louis (1982). Large sample properties of the MLE are presented. In particular, the covariance matrix for the MLE of transition rates is equal to the Cram\'er-Rao lower bound, and is less for the MLE of regime membership. The simulation study confirms these findings and shows that the parameter estimates are accurate, consistent, and have asymptotic normality as the sample size increases.

【21】 Temporally Local Maximum Likelihood with Application to SIS Model 标题：时间局部最大似然法及其在SIS模型中的应用

作者：Christian Gourieroux,Joann Jasiak 链接：https://arxiv.org/abs/2107.06971 摘要：滚动参数估计器常用于分析具有非线性特征的时间序列，如时变参数引起的结构变化和局部趋势。研究了一类时间局部极大似然（TLML）估计中滚动估计的性质。研究了常参数、随机参数和平稳参数以及超长期动态（ULR）参数的TLML估计，弥补了常参数和随机参数之间的差距。此外，我们还探讨了TLML估计在易感传染病（SIS）流行病模型中的应用，并在模拟研究中说明了其有限样本性能。摘要：The parametric estimators applied by rolling are commonly used in the analysis of time series with nonlinear features, such as structural change due to time varying parameters and local trends. This paper examines the properties of rolling estimators in the class of Temporally Local Maximum Likelihood (TLML) estimators. We study the TLML estimators of constant parameters, stochastic and stationary parameters and parameters with the Ultra Long Run (ULR) dynamics bridging the gap between the constant and stochastic parameters. Moreover, we explore the properties of TLML estimators in an application to the Susceptible-Infected-Susceptible (SIS) epidemiological model and illustrate their finite sample performance in a simulation study.

【22】 On the early solution path of best subset selection 标题：关于最佳子集选择的早期解路径

作者：Ziwei Zhu,Shihao Wu 机构：∗Department of Statistics, University of Michigan 链接：https://arxiv.org/abs/2107.06939 摘要：早期求解路径跟踪进入选择过程模型的前几个变量，对科学发现具有重要意义。在实践中，在没有错误发现的情况下，识别所有重要特征在统计上往往是无形的，更不用说测试其重要性的实验的可怕开销了。这种现实的局限性要求在大数据海洋中尽早发现一种模型选择器来驾驭科学探险。在本文中，我们主要研究最佳子集选择（BSS）的早期解路径，其中稀疏性约束被设置为低于真实稀疏性。在稀疏的高维线性模型下，我们建立了盲源分离系统在其整个早期路径上实现可靠的早期选择，或等价的零错误发现的充分必要条件。本质上，这个条件归结为最小投影信号裕度的下限，它表征了确定选择模型和虚假发现模型之间在信号捕获方面的基本差距。通过投影算子定义，该裕度与设计的限制特征值无关，表明盲源分离对共线性的鲁棒性。在数值方面，我们选择CoSaMP（Compressive Sampling Matching Pursuit）来逼近BSS解，结果表明，得到的早期路径比LASSO、MCP和SCAD具有更低的错误发现率（FDR），特别是在存在高度相关设计的情况下。最后，我们应用CoSaMP对敲打滤波器进行初步的特征筛选，以提高其功率。摘要：The early solution path, which tracks the first few variables that enter the model of a selection procedure, is of profound importance to scientific discovery. In practice, it is often statistically intangible to identify all the important features with no false discovery, let alone the intimidating expense of experiments to test their significance. Such realistic limitation calls for statistical guarantee for the early discovery of a model selector to navigate scientific adventure on the sea of big data. In this paper, we focus on the early solution path of best subset selection (BSS), where the sparsity constraint is set to be lower than the true sparsity. Under a sparse high-dimensional linear model, we establish the sufficient and (near) necessary condition for BSS to achieve sure early selection, or equivalently, zero false discovery throughout its entire early path. Essentially, this condition boils down to a lower bound of the minimum projected signal margin that characterizes the fundamental gap in signal capturing between sure selection models and those with spurious discovery. Defined through projection operators, this margin is independent of the restricted eigenvalues of the design, suggesting the robustness of BSS against collinearity. On the numerical aspect, we choose CoSaMP (Compressive Sampling Matching Pursuit) to approximate the BSS solutions, and we show that the resulting early path exhibits much lower false discovery rate (FDR) than LASSO, MCP and SCAD, especially in presence of highly correlated design. Finally, we apply CoSaMP to perform preliminary feature screening for the knockoff filter to enhance its power.

【23】 A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification 标题：温文尔雅地介绍保角预测和无分布不确定性量化

作者：Anastasios N. Angelopoulos,Stephen Bates 备注：Blog and tutorial video this http URL 链接：https://arxiv.org/abs/2107.07511 摘要：黑箱机器学习方法现在经常用于高风险环境，如医疗诊断，这需要不确定性量化，以避免间接的模型故障。无分布不确定性量化（Distribution-free UQ）是一种用户友好的范例，用于为此类预测创建统计上严格的置信区间/集。关键的是，区间/集合在没有分布假设或模型假设的情况下是有效的，并且具有有限多个数据点的显式保证。适应输入的难度；当输入示例很困难时，不确定性区间/集合很大，表明模型可能是错误的。不需要做太多的工作，就可以在任何底层算法（如神经网络）上使用无分布方法来生成保证包含用户指定概率（如90%）的基本真值的置信集。事实上，这些方法易于理解和通用，适用于计算机视觉、自然语言处理、深度强化学习等领域的许多现代预测问题。这个实际操作的介绍是针对读者感兴趣的实际执行无分布UQ，包括保形预测和相关方法，谁不一定是一个统计学家。我们将用pythorn语法在Python中包含许多解释性说明、示例和代码示例。其目的是让读者对无分布UQ有一个有效的理解，允许他们用一个自包含的文档在算法上设置置信区间。摘要：Black-box machine learning learning methods are now routinely used in high-risk settings, like medical diagnostics, which demand uncertainty quantification to avoid consequential model failures. Distribution-free uncertainty quantification (distribution-free UQ) is a user-friendly paradigm for creating statistically rigorous confidence intervals/sets for such predictions. Critically, the intervals/sets are valid without distributional assumptions or model assumptions, with explicit guarantees with finitely many datapoints. Moreover, they adapt to the difficulty of the input; when the input example is difficult, the uncertainty intervals/sets are large, signaling that the model might be wrong. Without much work, one can use distribution-free methods on any underlying algorithm, such as a neural network, to produce confidence sets guaranteed to contain the ground truth with a user-specified probability, such as 90%. Indeed, the methods are easy-to-understand and general, applying to many modern prediction problems arising in the fields of computer vision, natural language processing, deep reinforcement learning, and so on. This hands-on introduction is aimed at a reader interested in the practical implementation of distribution-free UQ, including conformal prediction and related methods, who is not necessarily a statistician. We will include many explanatory illustrations, examples, and code samples in Python, with PyTorch syntax. The goal is to provide the reader a working understanding of distribution-free UQ, allowing them to put confidence intervals on their algorithms, with one self-contained document.

【24】 Clustering of heterogeneous populations of networks 标题：网络中异质群体的聚类

作者：Jean-Gabriel Young,Alec Kirkley,M. E. J. Newman 机构：Department of Computer Science, University of Vermont, Burlington VT, USA, Vermont Complex Systems Center, University of Vermont, Burlington VT, USA, Department of Physics, University of Michigan, Ann Arbor MI, USA 备注：12 pages, 3 figures 链接：https://arxiv.org/abs/2107.07489 摘要：从重复测量重建网络的统计方法通常假定所有测量都是从相同的底层网络结构生成的。然而，情况并非如此。例如，人们的社交网络在工作日和周末可能不同。健康患者与痴呆或其他疾病患者的大脑网络可能不同。在这里，我们描述了一个贝叶斯分析框架，这样的数据允许的事实，网络测量可能反映了多种可能的结构。我们定义了一个测量过程的有限混合模型，并推导了一个快速Gibbs采样过程，该过程从模型参数的完全后验分布中精确采样。最终结果是将被测网络聚类成具有相似结构的组。我们在真实的和合成的网络群体上演示了该方法。摘要：Statistical methods for reconstructing networks from repeated measurements typically assume that all measurements are generated from the same underlying network structure. This need not be the case, however. People's social networks might be different on weekdays and weekends, for instance. Brain networks may differ between healthy patients and those with dementia or other conditions. Here we describe a Bayesian analysis framework for such data that allows for the fact that network measurements may be reflective of multiple possible structures. We define a finite mixture model of the measurement process and derive a fast Gibbs sampling procedure that samples exactly from the full posterior distribution of model parameters. The end result is a clustering of the measured networks into groups with similar structure. We demonstrate the method on both real and synthetic network populations.

【25】 Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update 标题：无牛顿：草图牛顿更新的简单化而不需要权衡

作者：Michał Dereziński,Jonathan Lacotte,Mert Pilanci,Michael W. Mahoney 链接：https://arxiv.org/abs/2107.07480 摘要：在二阶优化中，一个潜在的瓶颈是在每次迭代时计算优化函数的Hessian矩阵。随机素描已成为一种强有力的技术，用于构建估计的赫斯可用于执行近似牛顿步骤。这涉及到一个随机素描矩阵的乘法，这在素描的计算量和优化算法的收敛速度之间引入了一个折衷。一个理论上可取但实际上过于昂贵的选择是使用稠密高斯草图矩阵，它产生精确牛顿步的无偏估计，并提供与问题无关的强大收敛保证。我们证明了高斯素描矩阵可以极大地稀疏化，显著地减少了素描的计算量，而不影响其收敛性。这种称为Newton LESS的方法基于最近引入的一种绘制技术：利用分数稀疏（LESS）嵌入。我们证明了对于一大类优化任务，Newton-LESS与Gaussian嵌入具有几乎相同的与问题无关的局部收敛速度，不仅可以达到常数因子，甚至可以达到低阶项。特别是，这导致了一个新的国家的最先进的收敛结果的迭代最小二乘解算器。最后，我们将较少的嵌入扩展到均匀稀疏的随机符号矩阵，这些随机符号矩阵可以有效地实现，并且在数值实验中表现良好。摘要：In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence properties. This approach, called Newton-LESS, is based on a recently introduced sketching technique: LEverage Score Sparsified (LESS) embeddings. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings, not just up to constant factors but even down to lower order terms, for a large class of optimization tasks. In particular, this leads to a new state-of-the-art convergence result for an iterative least squares solver. Finally, we extend LESS embeddings to include uniformly sparsified random sign matrices which can be implemented efficiently and which perform well in numerical experiments.

【26】 Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks 标题：班次：跨多个大规模任务的实际分布班次的数据集

作者：Andrey Malinin,Neil Band,German Chesnokov,Yarin Gal,Mark J. F. Gales,Alexey Noskov,Andrey Ploskonosov,Liudmila Prokhorenkova,Ivan Provilkov,Vatsal Raina,Vyas Raina,Mariya Shmatova,Panos Tigas,Boris Yangel 机构： 2HSE University, 3Moscow Institute of Physics and Technology, 4University of Cambridge, 5University of Oxford, 6Alan Turing InstitutePreprint 链接：https://arxiv.org/abs/2107.07455 摘要：对于如何提高对分布偏移和不确定性估计的鲁棒性，人们已经进行了大量的研究。相比之下，只有有限的工作审查了为评估这些方法而开发的标准数据集和基准。此外，大多数关于不确定性估计和鲁棒性的工作已经发展了基于小尺度回归或图像分类任务的新技术。然而，许多实际感兴趣的任务具有不同的模式，例如表格数据、音频、文本或传感器数据，这对回归和离散或连续结构化预测提出了重大挑战。因此，鉴于该领域的现状，有必要建立一个标准化的大规模任务数据集，涵盖受分配变化影响的一系列模式。这将使研究人员能够有意义地评估最近开发的大量不确定性量化方法，以及评估标准和最先进的基线。在这项工作中，我们提出\emph{Shifts Dataset}来评估不确定性估计和对分布移位的鲁棒性。从工业来源和服务收集的数据集由三个任务组成，每个任务对应一种特定的数据模式：表格天气预报、机器翻译和自动驾驶汽车（SDC）车辆运动预测。所有这些数据模式和任务都受到真实的“野外”分布变化的影响，并在不确定性估计方面提出了有趣的挑战。在这项工作中，我们提供了所有任务的数据集和基线结果的描述。摘要：There has been significant research done on developing methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work has examined developing standard datasets and benchmarks for assessing these approaches. Additionally, most work on uncertainty estimation and robustness has developed new techniques based on small-scale regression or image classification tasks. However, many tasks of practical interest have different modalities, such as tabular data, audio, text, or sensor data, which offer significant challenges involving regression and discrete or continuous structured prediction. Thus, given the current state of the field, a standardized large-scale dataset of tasks across a range of modalities affected by distributional shifts is necessary. This will enable researchers to meaningfully evaluate the plethora of recently developed uncertainty quantification methods, as well as assessment criteria and state-of-the-art baselines. In this work, we propose the \emph{Shifts Dataset} for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation. In this work we provide a description of the dataset and baseline results for all tasks.

【27】 Hierarchical graph neural nets can capture long-range interactions 标题：层次图神经网络可以捕获远程交互

作者：Ladislav Rampášek,Guy Wolf 机构：Université de Montréal, Dept. of Math. & Stat.; Mila - Quebec AI Institute, Montreal, QC, Canada 链接：https://arxiv.org/abs/2107.07432 摘要：基于相邻节点间消息传递的图神经网络（GNNs）不足以捕捉图中的远程交互。在这个项目中，我们研究层次化的消息传递模型，利用多分辨率表示一个给定的图形。这有助于在不丢失局部信息的情况下学习跨越大感受野的特征，这在以前的分层GNNs研究中是没有研究过的。我们引入了层次图网（HGNet），对于任意两个连通的节点，它保证了最大对数长度w.r.t.的消息传递路径的存在。然而，在温和的假设下，它的内部层次结构保持与输入图的渐近大小相等。我们观察到，我们的HGNet优于传统的GCN层堆叠，特别是在分子性质预测基准上。最后，我们提出了两个基准任务，旨在阐明GNNs利用图形中的远程交互的能力。摘要：Graph neural networks (GNNs) based on message passing between neighboring nodes are known to be insufficient for capturing long-range interactions in graphs. In this project we study hierarchical message passing models that leverage a multi-resolution representation of a given graph. This facilitates learning of features that span large receptive fields without loss of local information, an aspect not studied in preceding work on hierarchical GNNs. We introduce Hierarchical Graph Net (HGNet), which for any two connected nodes guarantees existence of message-passing paths of at most logarithmic length w.r.t. the input graph size. Yet, under mild assumptions, its internal hierarchy maintains asymptotic size equivalent to that of the input graph. We observe that our HGNet outperforms conventional stacking of GCN layers particularly in molecular property prediction benchmarks. Finally, we propose two benchmarking tasks designed to elucidate capability of GNNs to leverage long-range interactions in graphs.

【28】 Optimal Scoring Rule Design 标题：最优评分规则设计

作者：Yiling Chen,Fang-Yi Yu 机构：Harvard University yiling, edu†Harvard University fangyiyu 链接：https://arxiv.org/abs/2107.07420 摘要：本文介绍了一个合理设计评分规则的优化问题。考虑一个想要收集一个代理人关于未知状态的预测的委托人。代理既可以报告其先前的预测，也可以访问代价高昂的信号并报告后验预测。给定一组可能的分布，其中包含代理的后验预测分布，委托人的目标是设计一个有界的评分规则，使代理在报告其后验预测和报告其先前预测之间的最坏情况收益增量最大化。我们研究了这类优化的两种设置：静态设置和渐近设置。在静态环境下，当代理可以访问一个信号时，我们提出了一种有效的算法来计算分布集合有限时的最优评分规则。agent可以在渐近环境下自适应地、无限期地改进其预测。我们首先考虑具有消失协方差的后验分布序列，它模拟了大样本的一般估计，并证明了二次评分规则的最优性。然后，当agent的后验分布为Beta-Bernoulli过程时，我们发现log评分规则是最优的。对于具有Dirichlet先验的分类分布，我们还证明了对数评分规则在较小的函数集上的最优性。摘要：This paper introduces an optimization problem for proper scoring rule design. Consider a principal who wants to collect an agent's prediction about an unknown state. The agent can either report his prior prediction or access a costly signal and report the posterior prediction. Given a collection of possible distributions containing the agent's posterior prediction distribution, the principal's objective is to design a bounded scoring rule to maximize the agent's worst-case payoff increment between reporting his posterior prediction and reporting his prior prediction. We study two settings of such optimization for proper scoring rules: static and asymptotic settings. In the static setting, where the agent can access one signal, we propose an efficient algorithm to compute an optimal scoring rule when the collection of distributions is finite. The agent can adaptively and indefinitely refine his prediction in the asymptotic setting. We first consider a sequence of collections of posterior distributions with vanishing covariance, which emulates general estimators with large samples, and show the optimality of the quadratic scoring rule. Then, when the agent's posterior distribution is a Beta-Bernoulli process, we find that the log scoring rule is optimal. We also prove the optimality of the log scoring rule over a smaller set of functions for categorical distributions with Dirichlet priors.

【29】 Efficient Möbius Transformations and their applications to Dempster-Shafer Theory: Clarification and implementation 标题：有效的Möbius变换及其在Dempster-Shafer理论中的应用：澄清和实现

作者：Maxime Chaveroche,Franck Davoine,Véronique Cherfaoui 机构：Sorbonne University Alliance, Universit´e de technologie de Compiegne, CNRS, Heudiasyc, CS , - , Compiegne Cedex, France 备注：Extension of an article published in the proceedings of the international conference on Scalable Uncertainty Management (SUM) in 2019 链接：https://arxiv.org/abs/2107.07359 摘要：Dempster-Shafer理论（DST）推广了贝叶斯概率理论，提供了有用的附加信息，但计算量大。为了降低Dempster规则在信息融合中的计算复杂度，人们做了大量的工作。主要的方法是利用布尔格的结构或信念源中包含的信息。每种方法都有其优点，视情况而定。本文提出了计算zeta变换和M“obius变换的图序列，这些图序列能最佳地利用分配半格的结构和信念源中包含的信息。我们称之为有效M“obius变换（EMT）。我们表明，EMT的复杂性总是低于考虑整个格子的算法的复杂性，例如用于所有DST变换的快速M’OBIUS变换（FMT）。然后我们解释如何使用它们来融合两个信仰来源。更一般地，我们的emt适用于任何有限分配格中的任何函数，关注于满足闭或连接闭子集。本文扩展了我们在可伸缩不确定性管理（SUM）国际会议上发表的工作。它澄清了它，带来了一些小的修正，并提供了实现细节，如数据结构和算法应用于DST。摘要：Dempster-Shafer Theory (DST) generalizes Bayesian probability theory, offering useful additional information, but suffers from a high computational burden. A lot of work has been done to reduce the complexity of computations used in information fusion with Dempster's rule. The main approaches exploit either the structure of Boolean lattices or the information contained in belief sources. Each has its merits depending on the situation. In this paper, we propose sequences of graphs for the computation of the zeta and M\"obius transformations that optimally exploit both the structure of distributive semilattices and the information contained in belief sources. We call them the Efficient M\"obius Transformations (EMT). We show that the complexity of the EMT is always inferior to the complexity of algorithms that consider the whole lattice, such as the Fast M\"obius Transform (FMT) for all DST transformations. We then explain how to use them to fuse two belief sources. More generally, our EMTs apply to any function in any finite distributive lattice, focusing on a meet-closed or join-closed subset. This article extends our work published at the international conference on Scalable Uncertainty Management (SUM). It clarifies it, brings some minor corrections and provides implementation details such as data structures and algorithms applied to DST.

【30】 Copula-Based Normalizing Flows 标题：基于Copula的归一化流

作者：Mike Laszkiewicz,Johannes Lederer,Asja Fischer 机构：Equal contribution 1Department of Mathematics 备注：Accepted for presentation at the ICML 2021 Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (INNF+ 2021) 链接：https://arxiv.org/abs/2107.07352 摘要：通过将数据转换为高斯基分布的样本来学习分布的规范化流已经证明了强大的密度近似。但它们的表现力受到这种基数分布选择的限制。因此，我们建议将基分布推广到更精细的copula分布，以便更准确地捕捉目标分布的特性。在第一个实证分析中，我们证明了这种替代可以显著地提高重尾数据的灵活性、稳定性和有效性。我们的结果表明，改进与学习流的局部Lipschitz稳定性增加有关。摘要：Normalizing flows, which learn a distribution by transforming the data to samples from a Gaussian base distribution, have proven powerful density approximations. But their expressive power is limited by this choice of the base distribution. We, therefore, propose to generalize the base distribution to a more elaborate copula distribution to capture the properties of the target distribution more accurately. In a first empirical analysis, we demonstrate that this replacement can dramatically improve the vanilla normalizing flows in terms of flexibility, stability, and effectivity for heavy-tailed data. Our results suggest that the improvements are related to an increased local Lipschitz-stability of the learned flow.

【31】 Predicting the near-wall region of turbulence through convolutional neural networks 标题：用卷积神经网络预测湍流近壁区

作者：A. G. Balasubramanian,L. Guastoni,A. Güemes,A. Ianiro,S. Discetti,P. Schlatter,H. Azizpour,R. Vinuesa 机构： SimExFLOW, Engineering Mechanics, KTH Royal Institute of Technology, Sweden, Swedish e-Science Research Centre (SeRC), Sweden, Aerospace Engineering Research Group, Universidad Carlos III de Madrid, Spain 备注：Proc. 13th ERCOFTAC Symp. on Engineering Turbulence Modeling and Measurements (ETMM13), Rhodes, Greece, September 15-17, 2021 链接：https://arxiv.org/abs/2107.07340 摘要：为降低高雷诺数大涡模拟（LESs）的计算成本，对壁面有界湍流的近壁区进行建模是一种普遍的做法。作为数据驱动壁面模型的第一步，研究了基于神经网络的明渠紊流近壁特性预测方法。Guastoni等人提出的全卷积网络（FCN）[预印本，arXiv:2006.12483]被训练来预测$y^{+}{\rm target}$处的二维速度波动场，使用位于离墙较远的墙平行平面上的采样波动，在$y^{+}{\rm input}$。训练和测试数据来自于摩擦雷诺数$Re{\tau}=180$和$550$的直接数值模拟（DNS）。湍流速度脉动场在不同壁面法向位置取样，即$y^{+}=\{15、30、50、80、100、120、150\}$。在$Re{{\tau}=550$时，FCN可以利用水流对数区域的自相似性，以$y^{+}=100$处的速度波动场为输入，预测$y^{+}=50$处的速度波动场，预测流向波动强度的误差小于20%。这些结果是一个鼓舞人心的起点，以发展一个神经网络为基础的方法来模拟湍流在壁面上的数值模拟。摘要：Modelling the near-wall region of wall-bounded turbulent flows is a widespread practice to reduce the computational cost of large-eddy simulations (LESs) at high Reynolds number. As a first step towards a data-driven wall-model, a neural-network-based approach to predict the near-wall behaviour in a turbulent open channel flow is investigated. The fully-convolutional network (FCN) proposed by Guastoni et al. [preprint, arXiv:2006.12483] is trained to predict the two-dimensional velocity-fluctuation fields at $y^{+}_{\rm target}$, using the sampled fluctuations in wall-parallel planes located farther from the wall, at $y^{+}_{\rm input}$. The data for training and testing is obtained from a direct numerical simulation (DNS) at friction Reynolds numbers $Re_{\tau} = 180$ and $550$. The turbulent velocity-fluctuation fields are sampled at various wall-normal locations, i.e. $y^{+} = \{15, 30, 50, 80, 100, 120, 150\}$. At $Re_{\tau}=550$, the FCN can take advantage of the self-similarity in the logarithmic region of the flow and predict the velocity-fluctuation fields at $y^{+} = 50$ using the velocity-fluctuation fields at $y^{+} = 100$ as input with less than 20% error in prediction of streamwise-fluctuations intensity. These results are an encouraging starting point to develop a neural-network based approach for modelling turbulence at the wall in numerical simulations.

【32】 Input Dependent Sparse Gaussian Processes 标题：依赖输入的稀疏高斯过程

作者：Bahram Jafrasteh,Carlos Villacampa-Calvo,Daniel Hernández-Lobato 机构：Computer Science Department, Universidad Autónoma de Madrid 链接：https://arxiv.org/abs/2107.07281 摘要：高斯过程（GPs）是一种贝叶斯模型，提供与预测相关的不确定性估计。由于其非参数性质，它们也非常灵活。然而，随着训练实例数量的增加，GPs的可扩展性较差。更准确地说，它们的立方成本是N$。为了克服这个问题，通常使用稀疏GP近似，在训练过程中引入一组$M\lln$诱导点。将诱导点作为近似后验分布参数$q$学习诱导点的位置。稀疏GPs与变分推理相结合，可将GPs的训练成本降低到$\mathcal{O}（M^3）$。关键的是，诱导点决定了模型的灵活性，它们通常位于潜在函数变化的输入空间区域。然而，对于某些学习任务，为了获得良好的预测性能，可能需要大量的诱导点。为了解决这一局限性，我们建议在这里分期计算诱导点的位置，以及变分后验近似q的参数。为此，我们使用一个神经网络来接收观测数据作为输入，并输出诱导点的位置和参数$q$。我们在几个实验中对我们的方法进行了评估，结果表明它的性能与其他最先进的稀疏变分GP方法相似或更好。然而，由于诱导点对输入数据的依赖性，我们的方法大大减少了诱导点的数目。这使得我们的方法可以扩展到更大的数据集，并且具有更快的训练和预测时间。摘要：Gaussian Processes (GPs) are Bayesian models that provide uncertainty estimates associated to the predictions made. They are also very flexible due to their non-parametric nature. Nevertheless, GPs suffer from poor scalability as the number of training instances N increases. More precisely, they have a cubic cost with respect to $N$. To overcome this problem, sparse GP approximations are often used, where a set of $M \ll N$ inducing points is introduced during training. The location of the inducing points is learned by considering them as parameters of an approximate posterior distribution $q$. Sparse GPs, combined with variational inference for inferring $q$, reduce the training cost of GPs to $\mathcal{O}(M^3)$. Critically, the inducing points determine the flexibility of the model and they are often located in regions of the input space where the latent function changes. A limitation is, however, that for some learning tasks a large number of inducing points may be required to obtain a good prediction performance. To address this limitation, we propose here to amortize the computation of the inducing points locations, as well as the parameters of the variational posterior approximation q. For this, we use a neural network that receives the observed data as an input and outputs the inducing points locations and the parameters of $q$. We evaluate our method in several experiments, showing that it performs similar or better than other state-of-the-art sparse variational GP approaches. However, with our method the number of inducing points is reduced drastically due to their dependency on the input data. This makes our method scale to larger datasets and have faster training and prediction times.

【33】 On the expressivity of bi-Lipschitz normalizing flows 标题：关于双Lipschitz正规流的表现性

作者：Alexandre Verine,Benjamin Negrevergne,Fabrice Rossi,Yann Chevaleyre 机构： several researchers have studied thelimitations of neural networks with bounded Lipschitz con- 1Universit´e Paris-Dauphine, PSL Research University, France 2Universit´e Paris-Dauphine, PSLResearchUniversity 链接：https://arxiv.org/abs/2107.07232 摘要：如果可逆函数及其逆函数都有界Lipschitz常数，则可逆函数是双Lipschitz。目前，大多数标准化流都是通过设计或训练来限制数值误差（除其他外）的双Lipschitz流。本文讨论了bi-Lipschitz规范化流的表示性，并确定了几种难以用这种模型来近似的目标分布。然后，我们通过给出这些特别不利分布之间的总变化距离的几个下界以及它们的最佳可能近似，来刻画bi-Lipschitz正规化流的表达能力。最后，我们讨论了潜在的补救措施，其中包括使用更复杂的潜在分布。摘要：An invertible function is bi-Lipschitz if both the function and its inverse have bounded Lipschitz constants. Nowadays, most Normalizing Flows are bi-Lipschitz by design or by training to limit numerical errors (among other things). In this paper, we discuss the expressivity of bi-Lipschitz Normalizing Flows and identify several target distributions that are difficult to approximate using such models. Then, we characterize the expressivity of bi-Lipschitz Normalizing Flows by giving several lower bounds on the Total Variation distance between these particularly unfavorable distributions and their best possible approximation. Finally, we discuss potential remedies which include using more complex latent distributions.

【34】 Determinantal Point Processes in the Flat Limit 标题：平坦极限中的行列点过程

作者：Simon Barthelmé,Nicolas Tremblay,Konstantin Usevich,Pierre-Olivier Amblard 机构：CNRS, Univ. Grenoble Alpes, Grenoble INP, GIPSA-lab, Université de Lorraine and CNRS, CRAN (Centre de Recherche en Automatique en Nancy) 备注：Most of this material first appeared in arXiv:2007.04117, which has been split into two. The presentation has been simplified and some material is new 链接：https://arxiv.org/abs/2107.07213 摘要：行列式点过程（dpp）是排斥点过程，点之间的相互作用依赖于半正定矩阵的行列式。在本文中，我们研究了基于核矩阵的L-系综的极限过程，当核函数变得平坦时（使每个点在某种意义上与其他点相互作用）。我们证明了这些极限过程最好用扩展L-系综和部分投影DPPs的形式来描述，并且精确极限主要取决于核函数的光滑性。在某些情况下，限制过程甚至是通用的，这意味着它不依赖于核函数的细节，而只依赖于它的平滑度。由于平限dpp仍然是排斥过程，这意味着实际有用的dpp族不需要空间长度尺度参数。摘要：Determinantal point processes (DPPs) are repulsive point processes where the interaction between points depends on the determinant of a positive-semi definite matrix. In this paper, we study the limiting process of L-ensembles based on kernel matrices, when the kernel function becomes flat (so that every point interacts with every other point, in a sense). We show that these limiting processes are best described in the formalism of extended L-ensembles and partial projection DPPs, and the exact limit depends mostly on the smoothness of the kernel function. In some cases, the limiting process is even universal, meaning that it does not depend on specifics of the kernel function, but only on its degree of smoothness. Since flat-limit DPPs are still repulsive processes, this implies that practically useful families of DPPs exist that do not require a spatial length-scale parameter.

【35】 Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo 标题：基于城域调整哈密顿蒙特卡罗的分散贝叶斯学习

作者：Vyacheslav Kungurtsev,Adam Cobb,Tara Javidi,Brian Jalaian 机构：Department of Computer Science, Czech Technical University in Prague, SRI International, Electrical and Computer Engineering, University of California, San Diego, DEVCOM Army Research Laboratory 链接：https://arxiv.org/abs/2107.07211 摘要：随着嵌入式软件在自主设备上的普及，由分散的代理网络执行的联合学习变得越来越重要。贝叶斯学习方法可以提供更多关于随机量不确定性的信息，而Langevin和Hamiltonian方法可以有效地实现对大参数维数不确定分布的采样。这种方法最近才出现在分散的环境中，或者只使用随机梯度Langevin和Hamiltonian蒙特卡罗方法，这些方法需要一个逐步减小的步长来从后验概率中进行渐近采样，并且在实践中已知，与具有Metropolis平差的常数步长方法相比，它们对不确定性的描述更不准确，或者假设势函数具有强凸性。我们提出了第一种将常数步长Metropolis调整HMC纳入分散抽样框架的方法，给出了一致性和后验平稳分布概率距离的理论保证，并用数值方法证明了它们对标准现实问题的有效性，包括已知高度非凸的神经网络的分散学习。摘要：Federated learning performed by a decentralized networks of agents is becoming increasingly important with the prevalence of embedded software on autonomous devices. Bayesian approaches to learning benefit from offering more information as to the uncertainty of a random quantity, and Langevin and Hamiltonian methods are effective at realizing sampling from an uncertain distribution with large parameter dimensions. Such methods have only recently appeared in the decentralized setting, and either exclusively use stochastic gradient Langevin and Hamiltonian Monte Carlo approaches that require a diminishing stepsize to asymptotically sample from the posterior and are known in practice to characterize uncertainty less faithfully than constant step-size methods with a Metropolis adjustment, or assume strong convexity properties of the potential function. We present the first approach to incorporating constant stepsize Metropolis-adjusted HMC in the decentralized sampling framework, show theoretical guarantees for consensus and probability distance to the posterior stationary distribution, and demonstrate their effectiveness numerically on standard real world problems, including decentralized learning of neural networks which is known to be highly non-convex.

【36】 Credit scoring using neural networks and SURE posterior probability calibration 标题：基于神经网络和确定性后验概率校正的信用评分

作者：Matthieu Garcin,Samuel Stéphan 机构：L´eonard de Vinci Pˆole Universitaire, Research center, Paris La D´efense, SAMM, Universit´e Paris , Panth´eon-Sorbonne, rue de Tolbiac, Paris, cedex , France. 备注：22 pages 链接：https://arxiv.org/abs/2107.07206 摘要：在这篇文章中，我们比较了性能的逻辑回归和前馈神经网络信用评分的目的。结果表明，logistic回归在数据集上得到了很好的结果，而神经网络在性能上有一定的提高。我们还考虑不同的特征集，以评估它们在预测精度方面的重要性。我们发现，时间特征（即随时间的重复测量）可以是一个重要的信息来源，从而提高整体模型的准确性。最后，我们介绍了一种基于Stein无偏风险估计（SURE）的预测概率校正方法。这种校准技术可以应用于非常一般的校准功能。特别是，我们详细介绍了sigmoid函数和Kumaraswamy函数的这种方法，其中包括作为特殊情况的标识。结果表明，将SURE校正技术与经典Platt方法叠加可以提高预测概率的校正效果。摘要：In this article we compare the performances of a logistic regression and a feed forward neural network for credit scoring purposes. Our results show that the logistic regression gives quite good results on the dataset and the neural network can improve a little the performance. We also consider different sets of features in order to assess their importance in terms of prediction accuracy. We found that temporal features (i.e. repeated measures over time) can be an important source of information resulting in an increase in the overall model accuracy. Finally, we introduce a new technique for the calibration of predicted probabilities based on Stein's unbiased risk estimate (SURE). This calibration technique can be applied to very general calibration functions. In particular, we detail this method for the sigmoid function as well as for the Kumaraswamy function, which includes the identity as a particular case. We show that stacking the SURE calibration technique with the classical Platt method can improve the calibration of predicted probabilities.

【37】 Lockout: Sparse Regularization of Neural Networks 标题：锁定：神经网络的稀疏正则化

作者：Gilmer Valdes,Wilmer Arbelo,Yannet Interian,Jerome H. Friedman 机构：Department of Radiation Oncology, Department of Epidemiology and Biostatistics, University of California San Francisco, CA , USA, M.S. in Data Science Program, University of San Francisco, San Francisco, CA , USA, Department of Statistics 链接：https://arxiv.org/abs/2107.07160 摘要：许多回归和分类程序都适合参数化函数$f（x；w）根据损失准则$L（y，f）$，预测变量$x$到数据$\{x{i}，y{i}\}u 1^N$。通常，通过在参数$w$的值上设置约束$P（w）\leq t$来应用正则化以提高精度。尽管在特殊情况下，$f$是线性函数时，存在有效的方法来为所有$t\geq0$的值寻找这些约束优化问题的解决方案，但在$f$是非线性函数时（例如神经网络）没有可用的方法。这里我们提出了一个快速算法，它为任何可微函数$f$和损失$L$以及任何约束$P$提供了所有这些解，这些约束$P$是每个参数绝对值的单调递增函数。讨论了稀疏诱导正则化在任意神经网络中的应用。实验结果表明，这些稀疏解在精度和可解释性上通常优于稠密解。这种精确度的提高常常使神经网络在分析表格数据时与最先进的方法相竞争，有时甚至优于这些方法。摘要：Many regression and classification procedures fit a parameterized function $f(x;w)$ of predictor variables $x$ to data $\{x_{i},y_{i}\}_1^N$ based on some loss criterion $L(y,f)$. Often, regularization is applied to improve accuracy by placing a constraint $P(w)\leq t$ on the values of the parameters $w$. Although efficient methods exist for finding solutions to these constrained optimization problems for all values of $t\geq0$ in the special case when $f$ is a linear function, none are available when $f$ is non-linear (e.g. Neural Networks). Here we present a fast algorithm that provides all such solutions for any differentiable function $f$ and loss $L$, and any constraint $P$ that is an increasing monotone function of the absolute value of each parameter. Applications involving sparsity inducing regularization of arbitrary Neural Networks are discussed. Empirical results indicate that these sparse solutions are usually superior to their dense counterparts in both accuracy and interpretability. This improvement in accuracy can often make Neural Networks competitive with, and sometimes superior to, state-of-the-art methods in the analysis of tabular data.

【38】 Hybrid Bayesian Neural Networks with Functional Probabilistic Layers 标题：具有函数概率层的混合贝叶斯神经网络

作者：Daniel T. Chang 链接：https://arxiv.org/abs/2107.07014 摘要：贝叶斯神经网络提供了一种直接而自然的方法来扩展标准的深度神经网络，通过使用传统上编码权重（和偏差）不确定性的概率层来支持概率深度学习。特别地，混合贝叶斯神经网络利用标准的确定性层以及在网络中司法定位的少数概率层来进行不确定性估计。贝叶斯推理的一个主要方面和好处是，先验在原则上提供了对先验知识进行编码以用于推理和预测的方法。然而，由于权重没有直观的解释，很难指定权重的先验值。此外，权值的先验值与网络计算的函数之间的关系很难刻画。相反，函数是直观的解释和直接的，因为它们将输入映射到输出。因此，对先验知识进行编码，并将其用于基于函数的推理和预测是很自然的。为了支持这一点，我们提出了混合贝叶斯神经网络的功能概率层编码功能（和激活）的不确定性。我们讨论了它们在函数贝叶斯推理、函数变分推理、稀疏高斯过程和稀疏变分高斯过程中的基础。我们还使用GPflus进行了一些概念验证实验，GPflus是一个新的库，提供了高斯过程层，并支持使用确定性Keras层来形成混合神经网络和高斯过程模型。摘要：Bayesian neural networks provide a direct and natural way to extend standard deep neural networks to support probabilistic deep learning through the use of probabilistic layers that, traditionally, encode weight (and bias) uncertainty. In particular, hybrid Bayesian neural networks utilize standard deterministic layers together with few probabilistic layers judicially positioned in the networks for uncertainty estimation. A major aspect and benefit of Bayesian inference is that priors, in principle, provide the means to encode prior knowledge for use in inference and prediction. However, it is difficult to specify priors on weights since the weights have no intuitive interpretation. Further, the relationships of priors on weights to the functions computed by networks are difficult to characterize. In contrast, functions are intuitive to interpret and are direct since they map inputs to outputs. Therefore, it is natural to specify priors on functions to encode prior knowledge, and to use them in inference and prediction based on functions. To support this, we propose hybrid Bayesian neural networks with functional probabilistic layers that encode function (and activation) uncertainty. We discuss their foundations in functional Bayesian inference, functional variational inference, sparse Gaussian processes, and sparse variational Gaussian processes. We further perform few proof-of-concept experiments using GPflus, a new library that provides Gaussian process layers and supports their use with deterministic Keras layers to form hybrid neural network and Gaussian process models.

【39】 Generalized Covariance Estimator 标题：广义协方差估计

作者：Christian Gourieroux,Joann Jasiak 机构：University of Toronto, Toulouse School of Economics and CREST, †York University 链接：https://arxiv.org/abs/2107.06979 摘要：我们考虑一类具有强白噪声误差的半参数动态模型。这类过程包括标准向量自回归（VAR）模型、非基本结构VAR模型、混合因果非因果模型以及非线性动态模型（如多变量ARCH-M模型）。对于这类过程的估计，我们提出了广义协方差（GCov）估计，它是通过最小化基于残差的多元portmanteau统计量而得到的，作为广义矩方法的替代。我们推导了GCov估计和相关的基于残差的portmanteau统计量的渐近性质。此外，我们还证明了GCov估计是半参数有效的，基于残差的portmanteau统计量是渐近卡方分布的。仿真研究了GCov估计的有限样本性能。该估计器还应用于加密货币价格的动态模型。摘要：We consider a class of semi-parametric dynamic models with strong white noise errors. This class of processes includes the standard Vector Autoregressive (VAR) model, the nonfundamental structural VAR, the mixed causal-noncausal models, as well as nonlinear dynamic models such as the (multivariate) ARCH-M model. For estimation of processes in this class, we propose the Generalized Covariance (GCov) estimator, which is obtained by minimizing a residual-based multivariate portmanteau statistic as an alternative to the Generalized Method of Moments. We derive the asymptotic properties of the GCov estimator and of the associated residual-based portmanteau statistic. Moreover, we show that the GCov estimators are semi-parametrically efficient and the residual-based portmanteau statistics are asymptotically chi-square distributed. The finite sample performance of the GCov estimator is illustrated in a simulation study. The estimator is also applied to a dynamic model of cryptocurrency prices.

【40】 Performance of Bayesian linear regression in a model with mismatch 标题：贝叶斯线性回归在失配模型中的性能

作者：Jean Barbier,Wei-Kuo Chen,Dmitry Panchenko,Manuel Sáenz 链接：https://arxiv.org/abs/2107.06936 摘要：对于一个随机设计的高维线性回归模型，分析了高斯先验下对数凹贝叶斯后验分布均值估计的性能。该模型在以下意义上是不匹配的：像统计学家假设的模型一样，标签生成过程在输入数据中是线性的，但是分类器的地面真值先验和高斯噪声方差对她来说都是未知的。这个推理模型可以用自旋玻璃中加德纳模型的一个版本来表述，并且，我们使用空腔方法，为各种重叠阶参数提供了不动点方程，特别是在假设解的唯一性的情况下，得到了分类器上均方重构误差的表达式。作为直接推论，我们得到了自由能的表达式。Shcherbina和Tirozzi以及Talagrand已经研究过类似的模型，但我们的论点更为直接，一些假设也有所放松。我们分析的一个有趣的结果是，在岭回归的随机设计环境中，后验平均数的性能独立于统计学家假设的噪声方差（或“温度”），并且与通常的（零温度）岭估计相匹配。摘要：For a model of high-dimensional linear regression with random design, we analyze the performance of an estimator given by the mean of a log-concave Bayesian posterior distribution with gaussian prior. The model is mismatched in the following sense: like the model assumed by the statistician, the labels-generating process is linear in the input data, but both the classifier ground-truth prior and gaussian noise variance are unknown to her. This inference model can be rephrased as a version of the Gardner model in spin glasses and, using the cavity method, we provide fixed point equations for various overlap order parameters, yielding in particular an expression for the mean-square reconstruction error on the classifier (under an assumption of uniqueness of solutions). As a direct corollary we obtain an expression for the free energy. Similar models have already been studied by Shcherbina and Tirozzi and by Talagrand, but our arguments are more straightforward and some assumptions are relaxed. An interesting consequence of our analysis is that in the random design setting of ridge regression, the performance of the posterior mean is independent of the noise variance (or "temperature") assumed by the statistician, and matches the one of the usual (zero temperature) ridge estimator.

【41】 Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests 标题：特征偏移检测：通过条件分布测试定位哪些特征发生了偏移

作者：Sean Kulinski,Saurabh Bagchi,David I. Inouye 机构：School of Electrical and Computer Engineering, Purdue University 备注：None 链接：https://arxiv.org/abs/2107.06929 摘要：虽然以前的分布偏移检测方法可以识别是否发生了偏移，但这些方法无法定位哪些特定功能导致了分布偏移——这是诊断或修复任何潜在问题的关键步骤。例如，在军事传感器网络中，用户需要检测一个或多个传感器何时受损，关键的是，他们需要知道哪些特定传感器可能受损。因此，我们首先将这个问题的形式化定义为多重条件分布假设检验，并提出非参数和参数统计检验。为了提高效率和灵活性，我们建议使用基于密度模型得分函数（即相对于输入的梯度）的检验统计量，该统计量可以方便地计算单个向前和向后过程中所有维度的检验统计量。任何密度模型都可用于计算必要的统计数据，包括深度密度模型，如标准化流或自回归模型。此外，我们还开发了在多变量时间序列数据中识别何时何地发生转移的方法，并在模拟数据和真实数据上使用真实攻击模型显示多个场景的结果。摘要：While previous distribution shift detection approaches can identify if a shift has occurred, these approaches cannot localize which specific features have caused a distribution shift -- a critical step in diagnosing or fixing any underlying issue. For example, in military sensor networks, users will want to detect when one or more of the sensors has been compromised, and critically, they will want to know which specific sensors might be compromised. Thus, we first define a formalization of this problem as multiple conditional distribution hypothesis tests and propose both non-parametric and parametric statistical tests. For both efficiency and flexibility, we then propose to use a test statistic based on the density model score function (i.e. gradient with respect to the input) -- which can easily compute test statistics for all dimensions in a single forward and backward pass. Any density model could be used for computing the necessary statistics including deep density models such as normalizing flows or autoregressive models. We additionally develop methods for identifying when and where a shift occurs in multivariate time-series data and show results for multiple scenarios using realistic attack models on both simulated and real world data.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-16，如有侵权请联系 cloudcommunity@tencent.com 删除

linux