统计学学术速递[7.8]

公众号-arXiv每日学术速递

发布于 2021-07-27 10:28:13

9220

stat统计学，共计42篇

【1】 Robust Variable Selection and Estimation Via Adaptive Elastic Net S-Estimators for Linear Regression 标题：基于自适应弹性网S-估计的线性回归稳健变量选择与估计

作者：David Kepplinger 机构：Department of Statistics, School of Computing, George Mason University 链接：https://arxiv.org/abs/2107.03325 摘要：重尾误差分布和具有异常值的预测因子在高维回归问题中普遍存在，如果处理不当，会严重影响统计分析的有效性。为了在这些不利条件下得到更可靠的估计，我们提出了一种新的鲁棒正则化估计，用于同时进行变量选择和系数估计。这种估计器称为自适应PENSE，它具有oracle性质，没有残差尺度的先验知识，也没有误差分布的任何矩条件。所提出的估计器即使在非常重尾误差分布和预测因子或残差的异常污染情况下也能给出可靠的结果。重要的是，即使在这些具有挑战性的设置变量选择自适应彭斯保持稳定。对模拟数据集和真实数据集的数值研究表明，在污染样本情况下，与其他稳健正则化估计相比，有限样本在很大范围内具有优越的性能；在干净样本情况下，与经典正则化估计相比，具有竞争力。摘要：Heavy-tailed error distributions and predictors with anomalous values are ubiquitous in high-dimensional regression problems and can seriously jeopardize the validity of statistical analyses if not properly addressed. For more reliable estimation under these adverse conditions, we propose a new robust regularized estimator for simultaneous variable selection and coefficient estimation. This estimator, called adaptive PENSE, possesses the oracle property without prior knowledge of the scale of the residuals and without any moment conditions on the error distribution. The proposed estimator gives reliable results even under very heavy-tailed error distributions and aberrant contamination in the predictors or residuals. Importantly, even in these challenging settings variable selection by adaptive PENSE remains stable. Numerical studies on simulated and real data sets highlight superior finite-sample performance in a vast range of settings compared to other robust regularized estimators in the case of contaminated samples and competitiveness compared to classical regularized estimators in clean samples.

【2】 An algorithmic view of \ell_2 regularization and some path-following algorithms

作者：Yunzhang Zhu,Renxiong Liu 机构：Department of Statistics, The Ohio State University, Columbus, OH , USA, Editor: Rina Foygel Barber 备注：62 pages, 7 figures 链接：https://arxiv.org/abs/2107.03322 摘要：建立了凸损失函数的$\ellu 2$-正则解路与常微分方程（ODE）解的等价性。重要的是，这种等价性揭示了解的路径可以看作是梯度下降法和牛顿法的混合应用于经验损失的流程，这类似于广泛使用的优化技术信赖域法。这提供了一个有趣的$\ellu 2$正则化算法视图，与传统的$\ellu 2$正则化解路径类似于经验损失的梯度流的观点不同，提出了基于同伦方法和数值ODE求解器的路径跟踪算法来数值逼近解路径。特别地，我们分别考虑牛顿方法和梯度下降法作为同伦方法的基础算法，并在解路径上建立它们的逼近误差率。重要的是，我们的理论提出了新的方案来选择网格点，以保证求解路径具有任意小的次优性。在计算量方面，我们证明了为了获得整个解路径的$\epsilon$-次优性，牛顿法所需的牛顿步数为$\mathcal O（\epsilon^{-1/2}）$，而梯度下降方法所需的梯度步数是$\mathcal O\left（\epsilon^{-1}\ln（\epsilon^{-1}）\right）$。最后，以$\ellu 2$-正则logistic回归为例，验证了所提出的路径跟踪算法的有效性。摘要：We establish an equivalence between the $\ell_2$-regularized solution path for a convex loss function, and the solution of an ordinary differentiable equation (ODE). Importantly, this equivalence reveals that the solution path can be viewed as the flow of a hybrid of gradient descent and Newton method applying to the empirical loss, which is similar to a widely used optimization technique called trust region method. This provides an interesting algorithmic view of $\ell_2$ regularization, and is in contrast to the conventional view that the $\ell_2$ regularization solution path is similar to the gradient flow of the empirical loss.New path-following algorithms based on homotopy methods and numerical ODE solvers are proposed to numerically approximate the solution path. In particular, we consider respectively Newton method and gradient descent method as the basis algorithm for the homotopy method, and establish their approximation error rates over the solution path. Importantly, our theory suggests novel schemes to choose grid points that guarantee an arbitrarily small suboptimality for the solution path. In terms of computational cost, we prove that in order to achieve an $\epsilon$-suboptimality for the entire solution path, the number of Newton steps required for the Newton method is $\mathcal O(\epsilon^{-1/2})$, while the number of gradient steps required for the gradient descent method is $\mathcal O\left(\epsilon^{-1} \ln(\epsilon^{-1})\right)$. Finally, we use $\ell_2$-regularized logistic regression as an illustrating example to demonstrate the effectiveness of the proposed path-following algorithms.

【3】 Assessing the forensic value of DNA evidence from Y chromosomes and mitogenomes 标题：Y染色体和有丝分裂染色体DNA证据的法医学价值评估

作者：Mikkel M Andersen,David J Balding 机构： Department of Mathematical Sciences, Aalborg University, Aalborg, Section of Forensic Genetics, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark, Melbourne Integrative Genomics, University of Melbourne, Melbourne 链接：https://arxiv.org/abs/2107.03289 摘要：Y染色体和线粒体DNA图谱已经在法庭上作为证据使用了几十年，然而评估证据的重要性的问题还没有得到充分的解决。两者都是谱系标记（仅从单亲遗传），与标准常染色体DNA图谱（从双亲遗传）相比，它们具有不同的解释挑战，重组增加了图谱多样性并削弱了相关性的影响。我们回顾了用于法医鉴定的谱系标记图谱的评估方法，重点讨论了图谱突变率和相关性的关键作用。更高的突变率意味着更少的个体符合所谓贡献者的特征，但他们之间的关系会更密切。这使得评估这些匹配个体中的一个可能是真正来源的可能性变得很有挑战性，因为关联性可能使他们比不太相关的个体更有可能成为替代贡献者，而且他们可能在人群中没有很好的混合。这些问题降低了从广泛人群中提取的档案数据库的有用性：人群越大，档案相对频率越低，因为与据称的贡献者的相关性越低。许多评估方法并没有充分考虑到相关性，但随着最新一代高突变率Y谱的出现，其影响变得更加明显。摘要：Y-chromosomal and mitochondrial DNA profiles have been used as evidence in courts for decades, yet the problem of evaluating the weight of evidence has not been adequately resolved. Both are lineage markers (inherited from just one parent), which presents different interpretation challenges compared with standard autosomal DNA profiles (inherited from both parents), for which recombination increases profile diversity and weakens the effects of relatedness. We review approaches to the evaluation of lineage marker profiles for forensic identification, focussing on the key roles of profile mutation rate and relatedness. Higher mutation rates imply fewer individuals matching the profile of an alleged contributor, but they will be more closely related. This makes it challenging to evaluate the possibility that one of these matching individuals could be the true source, because relatedness may make them more plausible alternative contributors than less-related individuals, and they may not be well mixed in the population. These issues reduce the usefulness of profile databases drawn from a broad population: the larger the population, the lower the profile relative frequency because of lower relatedness with the alleged contributor. Many evaluation methods do not adequately take account of relatedness, but its effects have become more pronounced with the latest generation of high-mutation-rate Y profiles.

【4】 MD-split+: Practical Local Conformal Inference in High Dimensions 标题：MD-Split+：实用的高维局部共形推理

作者：Benjamin LeRoy,David Zhao 机构： The classical version of conformal inference (VovkEqual contribution 1Department of Statistics and Data Sci-ence, Carnegie Mellon University 备注：Appearing in ICML 2021 workshop on distribution-free uncertainty quantification 链接：https://arxiv.org/abs/2107.03280 摘要：量化模型预测中的不确定性是寻求不仅仅是点预测的实践者的共同目标。不确定性量化的一个工具是共形推理，它可以帮助为黑盒模型创建概率有效的预测区域。经典的共形预测只提供了边缘有效性，而在许多情况下，局部有效的预测区域是可取的。在应用局部共形预测时，如何最好地划分特征空间X仍然是一个悬而未决的问题。我们提出MD-split+，一种实用的局部共形方法，它基于条件密度估计模型的局部模型性能来创建X分区。我们的方法处理复杂的实际数据设置，在这些设置中，这些模型可能被错误地指定，并扩展到高维输入。我们讨论如何我们的局部分区哲学上符合预期的行为从一个不可达到的条件共形推理方法。我们还根据经验比较了我们的方法与其他局部保角方法。摘要：Quantifying uncertainty in model predictions is a common goal for practitioners seeking more than just point predictions. One tool for uncertainty quantification that requires minimal assumptions is conformal inference, which can help create probabilistically valid prediction regions for black box models. Classical conformal prediction only provides marginal validity, whereas in many situations locally valid prediction regions are desirable. Deciding how best to partition the feature space X when applying localized conformal prediction is still an open question. We present MD-split+, a practical local conformal approach that creates X partitions based on localized model performance of conditional density estimation models. Our method handles complex real-world data settings where such models may be misspecified, and scales to high-dimensional inputs. We discuss how our local partitions philosophically align with expected behavior from an unattainable conditional conformal inference approach. We also empirically compare our method against other local conformal approaches.

【5】 Patient-reported outcomes in the context of the benefit assessment in Germany 标题：德国福利评估背景下的患者报告结果

作者：Sarah Böhme,Christoph Gerlinger,Susanne Huschens,Annett Kucka,Niclas Kürschner,Friedhelm Leverkus,Michael Schlichting,Waldemar Siemens,Kati Sternberg,Liping Hofmann-Xu 机构：University Medical School of Saarland, Germany, Authors contributed to different sections of the manuscript (authors in alphabetical order):, ) The role of PRO in HTA decision making regarding the benefit assessment in Germany: 备注：46 pages, 3 figures, 5 tables 链接：https://arxiv.org/abs/2107.03249 摘要：自2011年《药品市场改革法案》以来，制药公司提交了效益档案，以促进德国的卫生技术评估（HTA）评估。卫生保健质量与效率研究所根据其于2020年11月5日更新的一般方法文件进行附加效益评估。本白皮书专门介绍患者报告结果（PRO），以强调其对附加效益评估的重要性。我们专注于方法论方面，但也考虑其他相关的要求和挑战，这是要求的G-BA和IQWiG。将介绍和讨论以下主题：1.PRO在HTA决策中的作用-德国效益评估的典范2.PRO评估指南3.PRO评估框架4.德国效益评估中PRO的认知和要求5.工具的有效性6.评估临床疗效的反应阈值PRO的相关性7.PRO终点/结果测量/操作化8.PRO数据缺失9.PRO治疗后中止本白皮书旨在对德国HTA决策的PRO评估的新要求提供更深入的见解，重点要考虑的是，应该在学习规划方面通知全球发展，并在全球建议和指导方针的框架下制定要求。我们还致力于在准备福利档案时加强对复杂性的理解，并在适当情况下促进进一步的科学讨论。摘要：Since the 2011 Act on the Reform of the Market for Medicinal Products, benefit dossiers are submitted by pharmaceutical companies to facilitate the Health Technology Assessment (HTA) appraisals in Germany. The Institute for Quality and Efficiency in Health Care conducts the added benefit assessment following their General Methods Paper, which was updated November 5, 2020. This White Paper is dedicated to patient-reported outcomes (PRO) to highlight their importance for the added benefit assessment. We focus on methodological aspects but consider also other relevant requirements and challenges, which are demanded by G-BA and IQWiG. The following topics will be presented and discussed: 1. Role of PRO in HTA decision making exemplary to benefit assessment in Germany 2. Guidances of PRO evaluations 3. PRO Estimand framework 4. Perception and requirements for PRO within the German benefit assessment 5. Validity of instrument 6. Response thresholds for assessing clinical relevance of PRO 7. PRO endpoints / outcome measures / operationalization 8. Missing PRO data 9. PRO after treatment discontinuation This White Paper aims to provide deeper insights about new requirements concerning PRO evaluations for HTA decision making in Germany, highlight points to consider that should inform global development in terms of study planning and frame the requirements also in the context of global recommendations and guidelines. We also aim to enhance the understanding of the complexity when preparing the benefit dossier and promote further scientific discussions where appropriate.

【6】 Bounded support in linear random coefficient models: Identification and variable selection 标题：线性随机系数模型的有界支撑：辨识和变量选择

作者：Philipp Hermann,Hajo Holzmann 机构：Department of Mathematics and Computer Science, Philipps-Universit¨at Marburg 链接：https://arxiv.org/abs/2107.03245 摘要：我们考虑线性随机系数回归模型，其中回归器被允许具有有限的支持。首先，我们研究了可辨识性，证明了在给定协变量的情况下，如果协变量的支持度不包括截距，则随机系数的均值、方差和协方差是从响应的前两个条件矩中辨识出来的，包含在每个坐标中至少有三个点的笛卡尔积。接下来，我们展示了有限维和中高维随机系数的方差和协方差的自适应套索的变量选择一致性。这意味着估计的协方差矩阵实际上是半正定的，因此是一个有效的协方差矩阵，而不是由简单的最小二乘拟合产生的估计。我们在一个模拟研究中说明了所提出的方法。摘要：We consider linear random coefficient regression models, where the regressors are allowed to have a finite support. First, we investigate identifiability, and show that the means and the variances and covariances of the random coefficients are identified from the first two conditional moments of the response given the covariates if the support of the covariates, excluding the intercept, contains a Cartesian product with at least three points in each coordinate. Next we show the variable selection consistency of the adaptive LASSO for the variances and covariances of the random coefficients in finite and moderately high dimensions. This implies that the estimated covariance matrix will actually be positive semidefinite and hence a valid covariance matrix, in contrast to the estimate arising from a simple least squares fit. We illustrate the proposed method in a simulation study.

【7】 Coastal water quality prediction based on machine learning with feature interpretation and spatio-temporal analysis 标题：基于特征解释和时空分析的机器学习近岸海域水质预测

作者：Luka Grbčić,Siniša Družeta,Goran Mauša,Tomislav Lipić,Darija Vukić Lušić,Marta Alvir,Ivana Lučin,Ante Sikirica,Davor Davidović,Vanja Travaš,Daniela Kalafatović,Kristina Pikelj,Hana Fajković,Lado Kranjčević 机构：Department of Fluid Mechanics and Computational Engineering, Engineering, University of Rijeka, Vukovarska , Rijeka, Croatia, Department of Computer Engineering, University of, Center for Advanced Computing and Modelling, University of Rijeka, Radmile Matejčić 链接：https://arxiv.org/abs/2107.03230 摘要：沿海水质管理是一个公共卫生问题，因为糟糕的沿海水质可能含有危害人类健康的病原体。以旅游业为导向的国家需要在夏季积极监测旅游热点地区的沿海水域状况。本研究利用克罗地亚里耶卡市15个公共海滩大肠杆菌和肠球菌的常规监测数据，建立了基于环境参数预测其水平的机器学习模型，并探讨了它们与环境应激源的关系。梯度增强（Catboost，Xgboost），随机森林，支持向量回归和人工神经网络训练与测量从所有采样点，并用于预测$大肠杆菌$和肠球菌值基于环境特征。通过对机器学习模型进行10倍交叉验证分析，对稳定性和可推广性进行评估，结果表明，与其他评估的ML算法（包括Xgboost、Random Forests、，支持向量回归和人工神经网络。我们还使用SHapley加法解释技术来识别和解释哪些特征具有最大的预测能力。结果表明，现场盐度测量是预测大肠杆菌和肠球菌水平的最重要特征。最后，在沿海水质最低的地点检验了两种ML模型的时空精度。空间上的E。大肠杆菌和肠球菌模型的R$^2$值分别为0.85和0.83，而时间模型的R$^2$值分别为0.74和0.67。在沿海水质较高的地点，时间模型也获得了0.44和0.46的中等R$^2$值。摘要：Coastal water quality management is a public health concern, as poor coastal water quality can harbor pathogens that are dangerous to human health. Tourism-oriented countries need to actively monitor the condition of coastal water at tourist popular sites during the summer season. In this study, routine monitoring data of $Escherichia\ Coli$ and enterococci across 15 public beaches in the city of Rijeka, Croatia, were used to build machine learning models for predicting their levels based on environmental parameters as well as to investigate their relationships with environmental stressors. Gradient Boosting (Catboost, Xgboost), Random Forests, Support Vector Regression and Artificial Neural Networks were trained with measurements from all sampling sites and used to predict $E.\ Coli$ and enterococci values based on environmental features. The evaluation of stability and generalizability with 10-fold cross validation analysis of the machine learning models, showed that the Catboost algorithm performed best with R$^2$ values of 0.71 and 0.68 for predicting $E.\ Coli$ and enterococci, respectively, compared to other evaluated ML algorithms including Xgboost, Random Forests, Support Vector Regression and Artificial Neural Networks. We also use the SHapley Additive exPlanations technique to identify and interpret which features have the most predictive power. The results show that site salinity measured is the most important feature for forecasting both $E.\ Coli$ and enterococci levels. Finally, the spatial and temporal accuracy of both ML models were examined at sites with the lowest coastal water quality. The spatial $E. Coli$ and enterococci models achieved strong R$^2$ values of 0.85 and 0.83, while the temporal models achieved R$^2$ values of 0.74 and 0.67. The temporal model also achieved moderate R$^2$ values of 0.44 and 0.46 at a site with high coastal water quality.

【8】 Combined Global and Local Search for Optimization with Gaussian Process Models 标题：基于高斯过程模型的全局搜索和局部搜索相结合的优化算法

作者：Qun Meng,Songhao Wang,Szu Hui Ng 机构：Department of Industrial Systems Engineering and Management, National University of Singapore, Singapore , Department of Information Systems and Management Engineering, Southern University of Science and Technology, No. 链接：https://arxiv.org/abs/2107.03217 摘要：基于高斯过程模型的优化在仿真和机器学习中有着广泛的应用。一般情况下，它首先根据真实响应中的一些观测值来估计GP模型，然后利用该模型来指导搜索，目的是快速找到全局最优解。尽管它的应用很成功，但它有一些限制，可能会阻碍它的广泛应用。首先，建立一个精确的GP模型可能是困难的和计算昂贵的，特别是当响应函数是多模态的或在设计空间内变化很大时。第二，即使有适当的模型，搜索过程也可能在移动到全局最优之前陷入次优区域，这是因为在当前最佳解周围花费了过多的精力。本文在优化框架中采用了加性全局和局部GP（AGLGP）模型。该模型以基于诱导点的GP稀疏逼近为基础，结合不同区域的独立局部模型。由于这些特性，AGLGP模型适用于数据量相对较大的多模态响应。在此模型的基础上，提出了一种全局和局部搜索相结合的优化算法。它首先将整个设计空间划分为不相交的局部区域，并用全局模型确定一个有希望的区域。然后，在选定区域中建立一个局部模型来指导该区域内的详细搜索。当找到一个好的局部解时，该算法将切换回全局步长。CGLO算法的全局性和局部性使得它能够同时利用全局搜索和局部搜索的优点来有效地定位全局最优解。摘要：Gaussian process (GP) model based optimization is widely applied in simulation and machine learning. In general, it first estimates a GP model based on a few observations from the true response and then employs this model to guide the search, aiming to quickly locate the global optimum. Despite its successful applications, it has several limitations that may hinder its broader usage. First, building an accurate GP model can be difficult and computationally expensive, especially when the response function is multi-modal or varies significantly over the design space. Second, even with an appropriate model, the search process can be trapped in suboptimal regions before moving to the global optimum due to the excessive effort spent around the current best solution. In this work, we adopt the Additive Global and Local GP (AGLGP) model in the optimization framework. The model is rooted in the inducing-points-based GP sparse approximations and is combined with independent local models in different regions. With these properties, the AGLGP model is suitable for multi-modal responses with relatively large data sizes. Based on this AGLGP model, we propose a Combined Global and Local search for Optimization (CGLO) algorithm. It first divides the whole design space into disjoint local regions and identifies a promising region with the global model. Next, a local model in the selected region is fit to guide detailed search within this region. The algorithm then switches back to the global step when a good local solution is found. The global and local natures of CGLO enable it to enjoy the benefits of both global and local search to efficiently locate the global optimum.

【9】 A Closed-Form Approximation to the Conjugate Prior of the Dirichlet and Beta Distributions 标题：Dirichlet分布和Beta分布的共轭先验的闭式逼近

作者：Kaspar Thommen 链接：https://arxiv.org/abs/2107.03183 摘要：我们推导了Dirichlet分布和beta分布的共轭先验，并用数值例子对其进行了探讨，以获得对分布本身、超参数及其收敛条件的直观理解。由于先验的不确定性，我们继续定义和分析一个闭式近似。最后，我们提供了一个实现这种近似的算法，该算法能够在不需要蒙特卡罗模拟的情况下，对Dirichlet和beta似然性进行完全可处理的贝叶斯共轭处理。摘要：We derive the conjugate prior of the Dirichlet and beta distributions and explore it with numerical examples to gain an intuitive understanding of the distribution itself, its hyperparameters, and conditions concerning its convergence. Due to the prior's intractability, we proceed to define and analyze a closed-form approximation. Finally, we provide an algorithm implementing this approximation that enables fully tractable Bayesian conjugate treatment of Dirichlet and beta likelihoods without the need for Monte Carlo simulations.

【10】 Distance covariance for random fields 标题：随机场的距离协方差

作者：Muneya Matsui,Thomas Mikosch,Rasool Roozegar,Laleh Tafakori 备注：34 pages, 6 figures 链接：https://arxiv.org/abs/2107.03162 摘要：研究了随机场$（X，Y）$基于距离相关的独立性检验。我们考虑的情况下，当$（x，y）$观察到晶格上等距网格大小，当$（x，y）$在随机位置观察到。我们为这两种情况下的样本距离相关性提供了简单的理论，并证明了bootstrap的一致性。后一个事实允许基于这些字段的离散化来构建$X$和$Y$的独立性测试。我们在一个包含分数布朗和无穷方差稳定域的模拟研究中说明了bootstrap检验的性能。独立性检验适用于在日本整个地区观测到的日本气象数据。摘要：We study an independence test based on distance correlation for random fields $(X,Y)$. We consider the situations when $(X,Y)$ is observed on a lattice with equidistant grid sizes and when $(X,Y)$ is observed at random locations. We provide \asy\ theory for the sample distance correlation in both situations and show bootstrap consistency. The latter fact allows one to build a test for independence of $X$ and $Y$ based on the considered discretizations of these fields. We illustrate the performance of the bootstrap test in a simulation study involving fractional Brownian and infinite variance stable fields. The independence test is applied to Japanese meteorological data, which are observed over the entire area of Japan.

【11】 Neural Contextual Bandits without Regret 标题：无怨无悔的神经情境性强盗

作者：Parnian Kassraie,Andreas Krause 机构：ETH Zurich 备注：37 pages, 6 figures 链接：https://arxiv.org/abs/2107.03144 摘要：上下文盗贼是一个丰富的模型，为顺序决策给定的边信息，具有重要的应用，如在推荐系统。提出了一种利用神经网络逼近未知奖赏函数的新算法。我们解决了在这种情况下证明一般上下文序列的次线性遗憾界的公开问题，同时考虑了完全连通网络和卷积网络。为此，我们首先分析了一种基于神经切线核（NTK）的核化bandit优化算法NTK-UCB，并以NTK最大信息增益$\gamma\u T$这一反映学习困难的复杂度参数来界定其遗憾。我们对NTK的$\gamma\u T$的边界可能有独立的兴趣。然后介绍了基于神经网络的算法NN-UCB，并证明了该算法能很好地跟踪NTK-UCB算法。在关于奖励函数的广泛的非参数假设下，我们的方法在$\tilde{\mathcal{O}（T^{-1/2d}）$率下收敛到最优策略，其中$d$是上下文的维度。摘要：Contextual bandits are a rich model for sequential decision making given side information, with important applications, e.g., in recommender systems. We propose novel algorithms for contextual bandits harnessing neural networks to approximate the unknown reward function. We resolve the open problem of proving sublinear regret bounds in this setting for general context sequences, considering both fully-connected and convolutional networks. To this end, we first analyze NTK-UCB, a kernelized bandit optimization algorithm employing the Neural Tangent Kernel (NTK), and bound its regret in terms of the NTK maximum information gain $\gamma_T$, a complexity parameter capturing the difficulty of learning. Our bounds on $\gamma_T$ for the NTK may be of independent interest. We then introduce our neural network based algorithm NN-UCB, and show that its regret closely tracks that of NTK-UCB. Under broad non-parametric assumptions about the reward function, our approach converges to the optimal policy at a $\tilde{\mathcal{O}}(T^{-1/2d})$ rate, where $d$ is the dimension of the context.

【12】 Variable selection in convex quantile regression: L1-norm or L0-norm regularization? 标题：凸分位数回归中的变量选择：L1范数还是L0范数正则化？

作者：Sheng Dai 机构：Department of Information and Service Management, Aalto University School of Business 链接：https://arxiv.org/abs/2107.03119 摘要：维数灾难是非参数估计中公认的挑战。本文提出了一种新的L0范数正则化方法，用于子集变量选择的凸分位数和期望回归。我们展示了如何使用混合整数规划来解决所提出的L0范数正则化方法在实践中，并建立了一个链接到常用的L1范数正则化方法。通过montecarlo方法比较了L0惩罚凸分位数和期望回归方法与L1范数正则化方法的有限样本性能。将该方法进一步应用于OECD国家可持续发展绩效的评价，并对变量降维的准确性进行了实证分析。仿真和应用结果表明，在多维空间中，L0范数正则化方法比L1范数正则化方法能更有效地解决维数灾难问题。摘要：The curse of dimensionality is a recognized challenge in nonparametric estimation. This paper develops a new L0-norm regularization approach to the convex quantile and expectile regressions for subset variable selection. We show how to use mixed integer programming to solve the proposed L0-norm regularization approach in practice and build a link to the commonly used L1-norm regularization approach. A Monte Carlo study is performed to compare the finite sample performances of the proposed L0-penalized convex quantile and expectile regression approaches with the L1-norm regularization approaches. The proposed approach is further applied to benchmark the sustainable development performance of the OECD countries and empirically analyze the accuracy in the dimensionality reduction of variables. The results from the simulation and application illustrate that the proposed L0-norm regularization approach can more effectively address the curse of dimensionality than the L1-norm regularization approach in multidimensional spaces.

【13】 MultiColl package and other packages to detect multicollinearity in R 标题：MultiColl包和其他包用于检测R中的多重共线性

作者：R. Salmerón,C. B. García,J. García 机构：CO] 7 Jul 20 2 1multiColl package and other packages to detect multicollinearity in RRom´an Salmer´onDepartment of Quantitative Methods for the Economy and BusinessUniversity of Granada (Spain)Tel 备注：10 pages, 1 table, working paper 链接：https://arxiv.org/abs/2107.03077 摘要：这项工作提供了一个指南，使用的一些功能的多芯包在R检测近多重共线性。与R或其他计量经济软件中的其他现有软件包相比，主要贡献在于简单/多元线性回归模型中定性自变量和截距的处理。本文的主要目的是展示R中多孔包的优点，并将其结果与R中其他现有包处理多重共线性的结果进行比较。摘要：This work presents a guide for the use of some of the functions of the multiColl package in R for the detection of near-multicollinearity. The main contribution, in comparison to other existing packages in R or other econometric software, is the treatment of qualitative independent variables and the intercept in the simple/multiple linear regression model. The main goal of this paper is to show the advantages of the multiColl package in R, comparing its results with the results obtained by other existing packages in R for the treatment of multicollinearity.

【14】 Distance correlation for long-range dependent time series 标题：长程相依时间序列的距离相关

作者：Annika Betken,Herold Dehling 机构： University of Twente, Ruhr-Universität Bochum 链接：https://arxiv.org/abs/2107.03041 摘要：我们应用距离相关的概念来检验长程相依时间序列的独立性。为此，我们建立了一个非中心极限定理的随机过程的价值在一个$L_2$-希尔BERT空间。这个极限定理是一个普遍的理论兴趣超出了本文的上下文。本文为推导从属高斯过程的距离协方差的渐近分布提供了依据。根据数据的相关性，距离相关的标准化和极限会有所不同。在任何情况下，限制都是不可行的，因此测试决策是基于子抽样程序的。我们证明了子抽样方法的有效性，并评估了基于距离协方差的假设检验的有限样本性能。特别地，我们比较了它的有限样本性能与基于皮尔逊样本相关系数的检验。为此，我们还建立了这个依赖度量的收敛结果。考虑了向量之间的不同依赖关系。结果表明，皮尔逊样本相关系数仅能较好地检测出线性相关，而距离相关则能较好地检测出所有相关性。对三条不同河流的月平均流量之间的相互依赖性进行了分析，为本文所建立的理论结果提供了应用。摘要：We apply the concept of distance correlation for testing independence of long-range dependent time series. For this, we establish a non-central limit theorem for stochastic processes with values in an $L_2$-Hilbert space. This limit theorem is of a general theoretical interest that goes beyond the context of this article. For the purpose of this article, it provides the basis for deriving the asymptotic distribution of the distance covariance of subordinated Gaussian processes. Depending on the dependence in the data, the standardization and the limit of distance correlation vary. In any case, the limit is not feasible, such that test decisions are based on a subsampling procedure. We prove the validity of the subsampling procedure and assess the finite sample performance of a hypothesis test based on the distance covariance. In particular, we compare its finite sample performance to that of a test based on Pearson's sample correlation coefficient. For this purpose, we additionally establish convergence results for this dependence measure. Different dependencies between the vectors are considered. It turns out that only linear correlation is better detected by Pearson's sample correlation coefficient, while all other dependencies are better detected by distance correlation. An analysis with regard to cross-dependencies between the mean monthly discharges of three different rivers provides an application of the theoretical results established in this article.

【15】 High dimensional precision matrix estimation under weak sparsity 标题：弱稀疏性下的高维精度矩阵估计

作者：Zeyu Wu,Cheng Wang,Weidong Liu 备注：24 pages, 5 figures 链接：https://arxiv.org/abs/2107.02999 摘要：本文在弱稀疏条件下估计高维精度矩阵，其中许多项几乎为零。研究了一种高维精度矩阵估计的Lasso型方法，给出了弱稀疏条件下的一般误差界。放宽了一般不可再现条件，所得结果适用于弱稀疏矩阵。作为应用，我们研究了重尾数据、非超自然数据和矩阵数据的精度矩阵估计。摘要：In this paper, we estimate the high dimensional precision matrix under the weak sparsity condition where many entries are nearly zero. We study a Lasso-type method for high dimensional precision matrix estimation and derive general error bounds under the weak sparsity condition. The common irrepresentable condition is relaxed and the results are applicable to the weak sparse matrix. As applications, we study the precision matrix estimation for the heavy-tailed data, the non-paranormal data, and the matrix data with the Lasso-type method.

【16】 Test for non-negligible adverse shifts 标题：测试不可忽略的逆变率

作者：Vathy M. Kamulete 机构：Enterprise Model Risk Management, Royal Bank of Canada, Toronto, Canada 备注：14 pages, 4 figures, preprint 链接：https://arxiv.org/abs/2107.02990 摘要：数据集转移的统计测试易受假警报的影响：它们对微小差异非常敏感，而这些差异实际上具有足够的样本覆盖率和预测性能。我们提出了一个基于离群值得分的数据集迁移测试的健壮框架，简称D-SOS。D-SOS能检测到不利的偏移，并能识别由良性偏移引起的假警报。它假设一个新的（测试）样本在本质上并不比一个旧的（训练）样本差，也不是说两者相等。其关键思想是将观测值减少到异常值分数，并比较污染率。除了比较分布之外，用户还可以根据预测性能和其他相关概念来定义更差的含义。我们展示了D-SOS对于各种真实和模拟数据集的通用性和实用性。与均匀分布和拟合优度的测试不同，D-SOS测试是专门定制的，可作为监控模型漂移和数据集转移的稳健性能度量。摘要：Statistical tests for dataset shift are susceptible to false alarms: they are sensitive to minor differences where there is in fact adequate sample coverage and predictive performance. We propose instead a robust framework for tests of dataset shift based on outlier scores, D-SOS for short. D-SOS detects adverse shifts and can identify false alarms caused by benign ones. It posits that a new (test) sample is not substantively worse than an old (training) sample, and not that the two are equal. The key idea is to reduce observations to outlier scores and compare contamination rates. Beyond comparing distributions, users can define what worse means in terms of predictive performance and other relevant notions. We show how versatile and practical D-SOS is for a wide range of real and simulated datasets. Unlike tests of equal distribution and of goodness-of-fit, the D-SOS tests are uniquely tailored to serve as robust performance metrics to monitor model drift and dataset shift.

【17】 When to adjust alpha during multiple testing: A consideration of disjunction, conjunction, and individual testing 标题：多重测试中何时调整α：析取、合取和单独测试的考虑

作者：Mark Rubin 机构：The University of Newcastle, Australia, Citation: Rubin, M. (,). When to adjust alpha during multiple testing: A consideration of disjunction 备注：Synthese (2021) 链接：https://arxiv.org/abs/2107.02947 摘要：科学家在进行零假设显著性检验时，为了考虑多重检验和多重比较，往往会调整其显著性阈值（α水平）。在科学复制危机的背景下，这种阿尔法调整变得尤为重要。本文考虑了阿尔法调整的适当条件和不适当条件。区分了三种类型的多重测试：分离测试、连接测试和单独测试。有人认为，α调整只适用于分离测试的情况，其中至少有一个测试结果必须是重要的，以拒绝相关联的联合零假设。阿尔法调整是不合适的情况下，连接测试，其中所有相关的结果必须是显着的，以拒绝联合零假设。阿尔法调整也不适用于个别测试，在这种情况下，每个单独的结果必须是显著的，以拒绝每个相关的个别无效假设。在何种条件下，这三种类型的多重测试是值得检查。结论是，研究人员不应该自动（无意识地）假设阿尔法调整是必要的，在多次测试。提供了与联合研究假设和联合多元方差假设相关的插图。摘要：Scientists often adjust their significance threshold (alpha level) during null hypothesis significance testing in order to take into account multiple testing and multiple comparisons. This alpha adjustment has become particularly relevant in the context of the replication crisis in science. The present article considers the conditions in which this alpha adjustment is appropriate and the conditions in which it is inappropriate. A distinction is drawn between three types of multiple testing: disjunction testing, conjunction testing, and individual testing. It is argued that alpha adjustment is only appropriate in the case of disjunction testing, in which at least one test result must be significant in order to reject the associated joint null hypothesis. Alpha adjustment is inappropriate in the case of conjunction testing, in which all relevant results must be significant in order to reject the joint null hypothesis. Alpha adjustment is also inappropriate in the case of individual testing, in which each individual result must be significant in order to reject each associated individual null hypothesis. The conditions under which each of these three types of multiple testing is warranted are examined. It is concluded that researchers should not automatically (mindlessly) assume that alpha adjustment is necessary during multiple testing. Illustrations are provided in relation to joint studywise hypotheses and joint multiway ANOVAwise hypotheses.

【18】 Solution of Physics-based Bayesian Inverse Problems with Deep Generative Priors 标题：具有深度生成先验的基于物理的贝叶斯反问题的求解

作者：Dhruv V Patel,Deep Ray,Assad A Oberai 机构：Department of Aerospace and Mechanical Engineering, University of Southern California, Los Angeles, California, USA 备注：Paper: 18 pages, 5 figures. Supplementary: 9 pages, 6 Figures, 2 Tables 链接：https://arxiv.org/abs/2107.02926 摘要：众所周知，反问题很难求解，因为它们可能没有解，也可能有多个解，或者在测量的小扰动下解会有很大的变化。贝叶斯推理，提出了一个逆问题作为一个随机推理问题，解决了这些困难，并提供了定量估计的推断领域和相关的不确定性。然而，当推断大维度的向量时，和/或当通过先前获取的样本获得先验信息时，难以采用。在本文中，我们描述了如何使用深层生成对抗网络来表示贝叶斯推理中的先验分布，并克服这些挑战。我们将这些想法应用于逆问题，这些逆问题在控制物理原理、先验知识来源、测量类型以及测量噪声的可用信息范围等方面存在差异。在每种情况下，我们应用所提出的方法来推断最可能的解决方案和不确定性的定量估计。摘要：Inverse problems are notoriously difficult to solve because they can have no solutions, multiple solutions, or have solutions that vary significantly in response to small perturbations in measurements. Bayesian inference, which poses an inverse problem as a stochastic inference problem, addresses these difficulties and provides quantitative estimates of the inferred field and the associated uncertainty. However, it is difficult to employ when inferring vectors of large dimensions, and/or when prior information is available through previously acquired samples. In this paper, we describe how deep generative adversarial networks can be used to represent the prior distribution in Bayesian inference and overcome these challenges. We apply these ideas to inverse problems that are diverse in terms of the governing physical principles, sources of prior knowledge, type of measurement, and the extent of available information about measurement noise. In each case we apply the proposed approach to infer the most likely solution and quantitative estimates of uncertainty.

【19】 An application of time truncated single acceptance sampling inspection plan based on transmuted Rayleigh distribution 标题：基于变形瑞利分布的时间截断一次验收抽样检验方案的应用

作者：Harsh Tripathi,Mahendra Saha 机构：Department of Statistics, Central University of Rajasthan, Rajasthan, India 链接：https://arxiv.org/abs/2107.02903 摘要：本文介绍了在给定时间截尾寿命试验时，变瑞利分布的单次验收抽样检验计划（SASIP）。针对不同的置信水平、可接受数和真实平均寿命与指定平均寿命之比的选择，建立建议的计划。确保获得特定寿命所需的最小样本量。提出了该方案的运行特征值和生产商风险。文中给出了两个实例来说明所提出的SASIP的适用性。摘要：In this paper, we introduce single acceptance sampling inspection plan (SASIP) for transmuted Rayleigh (TR) distribution when the lifetime experiment is truncated at a prefixed time. Establish the proposed plan for different choices of confidence level, acceptance number and ratio of true mean lifetime to specified mean lifetime. Minimum sample size necessary to ensure a certain specified lifetime is obtained. Operating characteristic(OC) values and producer's risk of proposed plan are presented. Two real life example has been presented to show the applicability of proposed SASIP.

【20】 On the asymptotic distribution of the maximum sample spectral coherence of Gaussian time series in the high dimensional regime 标题：高维区高斯时间序列最大样本谱相干性的渐近分布

作者：Alexis Rosuel,Philippe Loubaton,Pascal Vallet 机构：Laboratoire d’Informatique Gaspard Monge (CNRS, Univ. Gustave-Eiffel), Bd., Descartes , Marne-la-Vall´ee (France), Laboratoire de l’Int´egration du Mat´eriau au Systeme (CNRS, Univ. Bordeaux, Bordeaux, INP), Cours de la Lib´eration , Talence (France) 链接：https://arxiv.org/abs/2107.02891 摘要：研究了当维数M和样本数N都收敛到无穷大时，具有相互独立分量的M元复高斯时间序列谱相干性的频率平滑估计的最大值的渐近分布。如果B表示基本平滑周期图估计量的平滑跨度，则在利率假设M N$\rightarrow$0和M B$\rightarrow$c$\in$（0，+$\infty$）下得到一个I型极值极限分布。然后利用这个结果建立一个具有受控渐近水平的统计量来检验观测时间序列的M个分量之间的独立性。数值模拟支持我们的结果。摘要：We investigate the asymptotic distribution of the maximum of a frequency smoothed estimate of the spectral coherence of a M-variate complex Gaussian time series with mutually independent components when the dimension M and the number of samples N both converge to infinity. If B denotes the smoothing span of the underlying smoothed periodogram estimator, a type I extreme value limiting distribution is obtained under the rate assumptions M N $\rightarrow$ 0 and M B $\rightarrow$ c $\in$ (0, +$\infty$). This result is then exploited to build a statistic with controlled asymptotic level for testing independence between the M components of the observed time series. Numerical simulations support our results.

【21】 Non-Homogeneity Estimation and Universal Kriging on the Sphere 标题：球面上的非齐性估计与泛克里格法

作者：Nicholas W. Bussberg,Jacob Shields,Chunfeng Huang 机构：∗ Corresponding author., a Elon University, Department of Mathematics and Statistics, Campus Drive, Elon, NC , b Elanco Animal Health, Innovation Way, Greenfield, IN , c Indiana University, Department of Statistics, Informatics East, E ,th St, Bloomington, IN 备注：15 pages, 6 figures 链接：https://arxiv.org/abs/2107.02871 摘要：克里格法是一种广泛认可的空间预测方法。在球面上，常用的方法如普通克里格法假设空间过程本质上是均匀的。然而，在许多情况下，内在同质性过于严格。本研究利用内禀随机函数（IRF）理论来放宽同质性假设。IRF过程建模的一个关键组成部分是估计非均质性的程度。提出了一种图形化的方法来完成这一估计。由于具有估计非均匀性的能力，可以开发IRF通用克里格方法。从模拟研究的结果提供了证明优势，使用IRF通用克里格相对于普通克里格时，基本过程不是本质上均匀的。摘要：Kriging is a widely recognized method for making spatial predictions. On the sphere, popular methods such as ordinary kriging assume that the spatial process is intrinsically homogeneous. However, intrinsic homogeneity is too strict in many cases. This research uses intrinsic random function (IRF) theory to relax the homogeneity assumption. A key component of modeling IRF processes is estimating the degree of non-homogeneity. A graphical approach is proposed to accomplish this estimation. With the ability to estimate non-homogeneity, an IRF universal kriging procedure can be developed. Results from simulation studies are provided to demonstrate the advantage of using IRF universal kriging as opposed to ordinary kriging when the underlying process is not intrinsically homogeneous.

【22】 Randomization-based Test for Censored Outcomes: A New Look at the Logrank Test 标题：基于随机化的删失结果检验：LOGRANK检验的新视角

作者：Xinran Li,Dylan S. Small 机构： Department of Statistics, University of Illinois, University of Pennsylvania 链接：https://arxiv.org/abs/2107.02849 摘要：双样本检验是统计学中最经典的课题之一，在前沿领域也有着广泛的应用。至少有两种推理模式用于证明两个样本测试的合理性。一种是通常的超总体推断，假设单元是来自某个超总体的独立同分布（i.i.d.）样本；另一种是有限总体推断，它依赖于将单元随机分配到不同的组中。在实际实施随机化时，后者的优点是避免了对结果的分布假设。在这篇论文中，我们将集中在有限总体推断删失的结果，这是一个较少探讨的文献。此外，我们允许截尾时间依赖于处理分配，在这种情况下，精确的排列推理是不可能实现的。我们发现，令人惊讶的是，通常的对数秩检验也可以通过随机化来证明。具体地说，在每个治疗组的非信息性i.i.d.删失的Bernoulli随机实验中，logrank检验对于检验Fisher关于任何单位都没有治疗效果的零假设是渐近有效的。此外，logrank检验的渐近有效性不需要对潜在事件时间进行任何分布假设。我们进一步将该理论推广到分层logrank检验，这对于随机分块设计和不同层次的删失机制是有用的。综上所述，由有限总体推断发展起来的对数秩检验理论是对其经典理论的补充，为对数秩检验提供了更广泛的依据。摘要：Two-sample tests have been one of the most classical topics in statistics with wide application even in cutting edge applications. There are at least two modes of inference used to justify the two-sample tests. One is usual superpopulation inference assuming the units are independent and identically distributed (i.i.d.) samples from some superpopulation; the other is finite population inference that relies on the random assignments of units into different groups. When randomization is actually implemented, the latter has the advantage of avoiding distributional assumptions on the outcomes. In this paper, we will focus on finite population inference for censored outcomes, which has been less explored in the literature. Moreover, we allow the censoring time to depend on treatment assignment, under which exact permutation inference is unachievable. We find that, surprisingly, the usual logrank test can also be justified by randomization. Specifically, under a Bernoulli randomized experiment with non-informative i.i.d. censoring within each treatment arm, the logrank test is asymptotically valid for testing Fisher's null hypothesis of no treatment effect on any unit. Moreover, the asymptotic validity of the logrank test does not require any distributional assumption on the potential event times. We further extend the theory to the stratified logrank test, which is useful for randomized blocked designs and when censoring mechanisms vary across strata. In sum, the developed theory for the logrank test from finite population inference supplements its classical theory from usual superpopulation inference, and helps provide a broader justification for the logrank test.

【23】 Transfer Learning in Information Criteria-based Feature Selection 标题：基于信息准则的特征选择中的迁移学习

作者：Shaohan Chen,Nikolaos V. Sahinidis,Chuanhou Gao 机构：School of Mathematical Sciences, Zhejiang University, Hangzhou , China, H. Milton Stewart School of Industrial & Systems Engineering and, School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA , USA, Editor: 链接：https://arxiv.org/abs/2107.02847 摘要：本文研究了基于Mallows-Cp的迁移学习的有效性，提出了一种将迁移学习与Mallows-Cp相结合的方法（TLCp），并证明了该方法在准确性和稳定性方面优于传统的Mallows-Cp准则。我们的理论结果表明，对于目标域中的任何样本大小，如果i）源域和目标域任务之间的相异性很小，则所提出的TLCp估计器在正交预测的情况下的均方误差（MSE）度量优于Cp估计器，根据特定的显式规则调整过程参数（复杂性惩罚）。此外，我们证明了我们的迁移学习框架可以扩展到其他特征选择准则，如贝叶斯信息准则。通过分析正交化Cp的解，在非正交预测的情况下，我们确定了一个渐近逼近Cp准则解的估计量。对于非正交TLCp也得到了类似的结果。最后，通过仿真研究和实际数据应用，验证了TLCp方案的有效性。摘要：This paper investigates the effectiveness of transfer learning based on Mallows' Cp. We propose a procedure that combines transfer learning with Mallows' Cp (TLCp) and prove that it outperforms the conventional Mallows' Cp criterion in terms of accuracy and stability. Our theoretical results indicate that, for any sample size in the target domain, the proposed TLCp estimator performs better than the Cp estimator by the mean squared error (MSE) metric in the case of orthogonal predictors, provided that i) the dissimilarity between the tasks from source domain and target domain is small, and ii) the procedure parameters (complexity penalties) are tuned according to certain explicit rules. Moreover, we show that our transfer learning framework can be extended to other feature selection criteria, such as the Bayesian information criterion. By analyzing the solution of the orthogonalized Cp, we identify an estimator that asymptotically approximates the solution of the Cp criterion in the case of non-orthogonal predictors. Similar results are obtained for the non-orthogonal TLCp. Finally, simulation studies and applications with real data demonstrate the usefulness of the TLCp scheme.

【24】 New Methods and Datasets for Group Anomaly Detection From Fundamental Physics 标题：基础物理群体异常检测的新方法和新数据集

作者：Gregor Kasieczka,Benjamin Nachman,David Shih 机构： Universität Hamburg, Department of Physics & Astronomy, Rutgers University 备注：Accepted for ANDEA (Anomaly and Novelty Detection, Explanation and Accommodation) Workshop at KDD 2021 链接：https://arxiv.org/abs/2107.02821 摘要：在大量的实际应用中，异常超密度的识别是一个丰富的问题。然而，与点异常或其他类型的单实例异常值相比，它在更广泛的ML社区中受到的关注相对较少。其中一个原因是缺乏强大的基准数据集。在本文中，我们首先解释了在诺贝尔奖获得者发现希格斯玻色子之后，无监督的群异常探测如何成为基础物理学的一个新前沿（其动机是寻找新的粒子和力）。然后，我们提出了一个真实的综合基准数据集（LHCO2020）来开发群体异常检测算法。最后，我们比较了几种现有的用于无监督群体异常检测的统计声音技术，并在LHCO2020数据集上展示了它们的性能。摘要：The identification of anomalous overdensities in data - group or collective anomaly detection - is a rich problem with a large number of real world applications. However, it has received relatively little attention in the broader ML community, as compared to point anomalies or other types of single instance outliers. One reason for this is the lack of powerful benchmark datasets. In this paper, we first explain how, after the Nobel-prize winning discovery of the Higgs boson, unsupervised group anomaly detection has become a new frontier of fundamental physics (where the motivation is to find new particles and forces). Then we propose a realistic synthetic benchmark dataset (LHCO2020) for the development of group anomaly detection algorithms. Finally, we compare several existing statistically-sound techniques for unsupervised group anomaly detection, and demonstrate their performance on the LHCO2020 dataset.

【25】 Differentiable Architecture Pruning for Transfer Learning 标题：用于迁移学习的可微体系结构剪枝

作者：Nicolo Colombo,Yang Gao 机构：Department of Computer Science, Royal Holloway University of London, Egham Hill, Egham TW,EX, UK 备注：19 pages (main + appendix), 7 figures and 1 table, Workshop @ ICML 2021, 24th July 2021 链接：https://arxiv.org/abs/2107.03375 摘要：我们提出了一种新的基于梯度的方法来从给定的大模型中提取子结构。与现有的剪枝方法无法分离网络结构和相应的权值相反，我们的结构剪枝方案产生了可转移的新结构，可以成功地重新训练以解决不同的任务。我们关注的是一个迁移学习设置，在这个设置中，架构可以在一个大的数据集上进行训练，但是很少有数据点可用于在新任务上对它们进行微调。我们定义了一种新的基于梯度的算法，该算法独立于附加的权值来训练任意低复杂度的体系结构。给定一个由现有大型神经网络模型定义的搜索空间，我们将结构搜索任务转化为一个复杂度惩罚的子集选择问题，并通过一个双温度松弛方案进行求解。我们提供了理论上的收敛性保证，并在实际数据上验证了所提出的迁移学习策略。摘要：We propose a new gradient-based approach for extracting sub-architectures from a given large model. Contrarily to existing pruning methods, which are unable to disentangle the network architecture and the corresponding weights, our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We focus on a transfer-learning setup where architectures can be trained on a large data set but very few data points are available for fine-tuning them on new tasks. We define a new gradient-based algorithm that trains architectures of arbitrarily low complexity independently from the attached weights. Given a search space defined by an existing large neural model, we reformulate the architecture search task as a complexity-penalized subset-selection problem and solve it through a two-temperature relaxation scheme. We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.

【26】 Estimation and Inference in Factor Copula Models with Exogenous Covariates 标题：具有外生协变量的因子Copula模型的估计和推断

作者：Alexander Mayer,Dominik Wied 机构：Institute of Econometrics and Statistics, University of Cologne 链接：https://arxiv.org/abs/2107.03366 摘要：提出了一种因子copula模型，该模型中的因子可以模拟，也可以从外部信息中估计。点估计和推断是基于模拟矩量法（SMM）的方法，具有非重叠的模拟图形。建立了估计量的相合性和极限正态性，证明了bootstrap标准差的有效性。这样一来，先前的文献结果在低水平条件下对因子结构的各个组成部分进行了验证。montecarlo证据证实了渐近理论在有限样本下的准确性，并通过一个实证应用说明了该模型在解释股票收益率之间的横截面相关性方面的有效性。摘要：A factor copula model is proposed in which factors are either simulable or estimable from exogenous information. Point estimation and inference are based on a simulated methods of moments (SMM) approach with non-overlapping simulation draws. Consistency and limiting normality of the estimator is established and the validity of bootstrap standard errors is shown. Doing so, previous results from the literature are verified under low-level conditions imposed on the individual components of the factor structure. Monte Carlo evidence confirms the accuracy of the asymptotic theory in finite samples and an empirical application illustrates the usefulness of the model to explain the cross-sectional dependence between stock returns.

【27】 Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions 标题：减轻神经标记点过程的性能饱和：结构和损失函数

作者：Tianbo Li,Tianze Luo,Yiping Ke,Sinno Jialin Pan 机构：Sea AI Lab, Singapore, Nanyang Technological University 备注：9 pages, 4 figures, accepted by KDD-21 research track. The source code is available at this https URL Hawkes-Processes-GCHP 链接：https://arxiv.org/abs/2107.03354 摘要：属性化事件序列在实践中经常遇到。最近的一个研究方向是将神经网络与统计模型——标记点过程相结合，标记点过程是处理属性事件序列的传统工具。神经标记点过程具有很好的概率模型解释能力和神经网络的表示能力。然而，我们发现神经标记点过程的性能并不总是随着网络结构的复杂化和大型化而提高，这就是我们所说的性能饱和现象。这是由于神经标记点过程的泛化误差同时由网络的表示能力和模型规格决定的。因此，我们可以得出两个主要结论：第一，在某些情况下，简单的网络结构并不比复杂的网络结构差；其次，使用适当的概率假设与提高网络的复杂性同等重要，甚至更重要。基于这一观察，我们提出了一种简单的基于图的网络结构GCHP，它只使用图卷积层，因此可以很容易地被并行机制加速。我们直接考虑到达时间的分布，而不是对条件强度函数施加特定假设，并提出使用似然比损失与矩匹配机制进行优化和模型选择。实验结果表明，GCHP能显著减少训练时间，而在间隔时间概率假设下的似然比损失能显著提高模型性能。摘要：Attributed event sequences are commonly encountered in practice. A recent research line focuses on incorporating neural networks with the statistical model -- marked point processes, which is the conventional tool for dealing with attributed event sequences. Neural marked point processes possess good interpretability of probabilistic models as well as the representational power of neural networks. However, we find that performance of neural marked point processes is not always increasing as the network architecture becomes more complicated and larger, which is what we call the performance saturation phenomenon. This is due to the fact that the generalization error of neural marked point processes is determined by both the network representational ability and the model specification at the same time. Therefore we can draw two major conclusions: first, simple network structures can perform no worse than complicated ones for some cases; second, using a proper probabilistic assumption is as equally, if not more, important as improving the complexity of the network. Based on this observation, we propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers, thus it can be easily accelerated by the parallel mechanism. We directly consider the distribution of interarrival times instead of imposing a specific assumption on the conditional intensity function, and propose to use a likelihood ratio loss with a moment matching mechanism for optimization and model selection. Experimental results show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.

【28】 A Survey of Uncertainty in Deep Neural Networks 标题：深度神经网络中的不确定性综述

作者：Jakob Gawlikowski,Cedrique Rovile Njieutcheu Tassi,Mohsin Ali,Jongseok Lee,Matthias Humt,Jianxiang Feng,Anna Kruspe,Rudolph Triebel,Peter Jung,Ribana Roscher,Muhammad Shahzad,Wen Yang,Richard Bamler,Xiao Xiang Zhu 链接：https://arxiv.org/abs/2107.03342 摘要：由于它们的传播越来越广泛，神经网络预测的可信度变得越来越重要。然而，基本的神经网络不能提供确定性估计或遭受过度或不足的信心。许多研究人员一直致力于理解和量化神经网络预测中的不确定性。因此，不同类型和来源的不确定性已经确定，并提出了各种方法来衡量和量化神经网络中的不确定性。这项工作给出了神经网络中不确定性估计的全面概述，回顾了该领域的最新进展，突出了当前的挑战，并确定了潜在的研究机会。它的目的是给任何对神经网络中的不确定性估计感兴趣的人一个广泛的概述和介绍，而不预先假定在这个领域的先验知识。全面介绍了最关键的不确定性来源，并将其分为可约模型不确定性和不可约数据不确定性。介绍了基于确定性神经网络、贝叶斯神经网络、神经网络集成和测试时间数据扩充方法的不确定性建模，讨论了这些领域的不同分支和最新发展。对于实际应用，我们讨论了不同的不确定度度量，神经网络的校准方法，并给出了现有基线和实现的概述。不同领域的各种挑战中的不同例子说明了实际应用中不确定性的需求和挑战。此外，还讨论了当前任务和安全关键现实世界应用方法的实际局限性，并对下一步更广泛地使用此类方法进行了展望。摘要：Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. A comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and not reducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks, ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for the calibration of neural networks and give an overview of existing baselines and implementations. Different examples from the wide spectrum of challenges in different fields give an idea of the needs and challenges regarding uncertainties in practical applications. Additionally, the practical limitations of current methods for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given.

【29】 KaFiStO: A Kalman Filtering Framework for Stochastic Optimization 标题：KaFiStO：一种随机优化的卡尔曼滤波框架

作者：Aram Davtyan,Sepehr Sameni,Llukman Cerkezi,Givi Meishvilli,Adam Bielski,Paolo Favaro 机构：Computer Vision Group, University of Bern 链接：https://arxiv.org/abs/2107.03331 摘要：优化问题通常是一个确定性问题，通过梯度下降等迭代过程求解。然而，当训练神经网络时，由于样本子集的随机选择，损失函数随（迭代）时间而变化。这种随机化将优化问题转化为随机问题。我们建议考虑一些参考优化的损失作为嘈杂的观察。这种对损失的解释使我们可以采用Kalman滤波作为优化器，因为它的递推公式是用来从噪声测量中估计未知参数的。此外，我们还证明了未知参数演化的Kalman滤波动力学模型可以用来捕捉动量和Adam等先进方法的梯度动力学。我们称这种随机优化方法为KaFiStO。KaFiStO是一种易于实现、可扩展、高效的神经网络训练方法。我们表明，它也产生参数估计，与现有的优化算法相比，在多个神经网络架构和机器学习任务，如计算机视觉和语言建模。摘要：Optimization is often cast as a deterministic problem, where the solution is found through some iterative procedure such as gradient descent. However, when training neural networks the loss function changes over (iteration) time due to the randomized selection of a subset of the samples. This randomization turns the optimization problem into a stochastic one. We propose to consider the loss as a noisy observation with respect to some reference optimum. This interpretation of the loss allows us to adopt Kalman filtering as an optimizer, as its recursive formulation is designed to estimate unknown parameters from noisy measurements. Moreover, we show that the Kalman Filter dynamical model for the evolution of the unknown parameters can be used to capture the gradient dynamics of advanced methods such as Momentum and Adam. We call this stochastic optimization method KaFiStO. KaFiStO is an easy to implement, scalable, and efficient method to train neural networks. We show that it also yields parameter estimates that are on par with or better than existing optimization algorithms across several neural network architectures and machine learning tasks, such as computer vision and language modeling.

【30】 Probabilistic semi-nonnegative matrix factorization: a Skellam-based framework 标题：概率半非负矩阵分解：一个基于Skellam的框架

作者：Benoit Fuentes,Gaël Richard 备注：Submitted for publication 链接：https://arxiv.org/abs/2107.03317 摘要：我们提出了一种新的概率模型来解决半非负矩阵分解（SNMF），称为Skellam-SNMF。它是一个由先验分量、骨架分布的隐变量和观测数据组成的递阶生成模型。推导了两种推理算法：最大后验估计的期望最大化算法和全贝叶斯推理的变分Bayes-EM算法，包括参数先验分布的估计。在这个基于Skellam的模型中，我们还引入了实值目标数据$x$和两个非负参数$\lambda{0}$和$\lambda{1}$之间的一个新的散度$\mathcal{D}\ left（x\mid\lambda{0}、\lambda{1}\ right）=0\Leftrightarrow x=\lambda{0}-\lambda{1}$，这是Kullback-Leibler（KL）散度的推广。最后，我们对这些新算法进行了实验研究，以了解它们的行为，并证明它们在实际数据的自动聚类任务中可以优于经典的SNMF方法。摘要：We present a new probabilistic model to address semi-nonnegative matrix factorization (SNMF), called Skellam-SNMF. It is a hierarchical generative model consisting of prior components, Skellam-distributed hidden variables and observed data. Two inference algorithms are derived: Expectation-Maximization (EM) algorithm for maximum \emph{a posteriori} estimation and Variational Bayes EM (VBEM) for full Bayesian inference, including the estimation of parameters prior distribution. From this Skellam-based model, we also introduce a new divergence $\mathcal{D}$ between a real-valued target data $x$ and two nonnegative parameters $\lambda_{0}$ and $\lambda_{1}$ such that $\mathcal{D}\left(x\mid\lambda_{0},\lambda_{1}\right)=0\Leftrightarrow x=\lambda_{0}-\lambda_{1}$, which is a generalization of the Kullback-Leibler (KL) divergence. Finally, we conduct experimental studies on those new algorithms in order to understand their behavior and prove that they can outperform the classic SNMF approach on real data in a task of automatic clustering.

【31】 Predicting with Confidence on Unseen Distributions 标题：不可见分布的置信度预测

作者：Devin Guillory,Vaishaal Shankar,Sayna Ebrahimi,Trevor Darrell,Ludwig Schmidt 机构：UC Berkeley, Amazon, Toyota Research Institute 链接：https://arxiv.org/abs/2107.03315 摘要：最近的研究表明，当对来自于接近但不同于训练分布的分布的数据进行评估时，机器学习模型的性能会有很大的不同。因此，预测模型在未知分布上的性能是一个重要的挑战。我们的工作结合了领域适应和预测不确定性文献中的技术，并允许我们在不访问标记数据的情况下预测具有挑战性的未知分布的模型精度。在分布转移的背景下，分布距离常常被用来调整模型并改善其在新领域的性能，然而在这些研究中，精度估计或其他形式的预测不确定性常常被忽略。通过调查广泛的已建立的分布距离，如Frechet距离或最大平均差异，我们确定，他们无法诱导可靠的估计性能下的分布转移。另一方面，我们发现分类器预测的置信度差异（DoC）成功地估计了分类器在各种变化下的性能变化。我们特别研究了综合分布和自然分布变化之间的区别，并观察到尽管DoC简单，但它始终优于其他分布差异的量化方法$DoC$可将几个现实且具有挑战性的分布变化的预测误差减少近一半（$46\%$），例如，在ImageNet Vid Robust和ImageNet格式副本数据集上。摘要：Recent work has shown that the performance of machine learning models can vary substantially when models are evaluated on data drawn from a distribution that is close to but different from the training distribution. As a result, predicting model performance on unseen distributions is an important challenge. Our work connects techniques from domain adaptation and predictive uncertainty literature, and allows us to predict model accuracy on challenging unseen distributions without access to labeled data. In the context of distribution shift, distributional distances are often used to adapt models and improve their performance on new domains, however accuracy estimation, or other forms of predictive uncertainty, are often neglected in these investigations. Through investigating a wide range of established distributional distances, such as Frechet distance or Maximum Mean Discrepancy, we determine that they fail to induce reliable estimates of performance under distribution shift. On the other hand, we find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts. We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference. $DoC$ reduces predictive error by almost half ($46\%$) on several realistic and challenging distribution shifts, e.g., on the ImageNet-Vid-Robust and ImageNet-Rendition datasets.

【32】 A Leap among Entanglement and Neural Networks: A Quantum Survey 标题：纠缠与神经网络的飞跃：量子综述

作者：Fabio Valerio Massoli,Lucia Vadicamo,Giuseppe Amato,Fabrizio Falchi 机构：tuto di Scienza e Tecnologie dell’Informazione “Alessandro Faedo”, CNR, Italy 链接：https://arxiv.org/abs/2107.03313 摘要：近年来，量子计算在资源可用性和算法开发方面都有了巨大的进步。利用量子现象解决计算问题的能力是一个由来已久的梦想，自80年代末以来一直吸引着科学界的兴趣。在这种情况下，我们提出了我们的贡献。首先，我们介绍与量子计算相关的基本概念，然后解释实现门模型和绝热量子计算范式的技术的核心功能。最后，我们收集、比较和分析了量子感知器和量子神经网络实现的最新进展。摘要：In recent years, Quantum Computing witnessed massive improvements both in terms of resources availability and algorithms development. The ability to harness quantum phenomena to solve computational problems is a long-standing dream that has drawn the scientific community's interest since the late '80s. In such a context, we pose our contribution. First, we introduce basic concepts related to quantum computations, and then we explain the core functionalities of technologies that implement the Gate Model and Adiabatic Quantum Computing paradigms. Finally, we gather, compare and analyze the current state-of-the-art concerning Quantum Perceptrons and Quantum Neural Networks implementations.

【33】 Nested Counterfactual Identification from Arbitrary Surrogate Experiments 标题：任意代理实验中的嵌套式反事实鉴定

作者：Juan D Correa,Sanghack Lee,Elias Bareinboim 机构：Seoul National University, Columbia University 链接：https://arxiv.org/abs/2107.03190 摘要：因果关系阶梯描述了代理人可能感兴趣的三种性质不同的活动类型，即看（观察）、做（干预）和想象（反事实）（Pearl和Mackenzie，2018）。因果层次结构带来的推理挑战是，数据是由观察或干预系统的代理收集的（第1层和第2层），而它的目标可能是了解如果它采取不同的行动过程会发生什么，与实际结果相反（第3层）。虽然人们对允许从观察到干预进行跨层推断的条件有着坚实的理解，但在针对反事实量时，结果却有点少。在本文中，我们研究从观察和实验的任意组合中识别嵌套反事实。具体地说，基于嵌套反实数的一个更明确的定义，我们证明了反实数不可测定理（CUT），它允许我们将任意嵌套的反实数映射到非嵌套的反实数。例如，调解和公平性分析中的应用通常会引发直接、间接和虚假效果的概念，这自然需要嵌套。其次，我们从观测分布和实验分布的任意组合引入了反事实识别的充要图形条件。最后，我们提出了一个有效且完整的识别嵌套反事实的算法；算法返回查询表达式失败意味着它不可识别。摘要：The Ladder of Causation describes three qualitatively different types of activities an agent may be interested in engaging in, namely, seeing (observational), doing (interventional), and imagining (counterfactual) (Pearl and Mackenzie, 2018). The inferential challenge imposed by the causal hierarchy is that data is collected by an agent observing or intervening in a system (layers 1 and 2), while its goal may be to understand what would have happened had it taken a different course of action, contrary to what factually ended up happening (layer 3). While there exists a solid understanding of the conditions under which cross-layer inferences are allowed from observations to interventions, the results are somewhat scarcer when targeting counterfactual quantities. In this paper, we study the identification of nested counterfactuals from an arbitrary combination of observations and experiments. Specifically, building on a more explicit definition of nested counterfactuals, we prove the counterfactual unnesting theorem (CUT), which allows one to map arbitrary nested counterfactuals to unnested ones. For instance, applications in mediation and fairness analysis usually evoke notions of direct, indirect, and spurious effects, which naturally require nesting. Second, we introduce a sufficient and necessary graphical condition for counterfactual identification from an arbitrary combination of observational and experimental distributions. Lastly, we develop an efficient and complete algorithm for identifying nested counterfactuals; failure of the algorithm returning an expression for a query implies it is not identifiable.

【34】 Learning Time-Invariant Reward Functions through Model-Based Inverse Reinforcement Learning 标题：基于模型的逆强化学习学习时不变奖励函数

作者：Todor Davchev,Sarah Bechtle,Subramanian Ramamoorthy,Franziska Meier 机构：School of Informatics, University of Edinburgh, MPI for Intelligent Systems, Facebook AI Research, Menlo Park, CA 链接：https://arxiv.org/abs/2107.03186 摘要：逆强化学习是一种范式，其目标是从已证明的行为中学习一般的奖励函数。然而，学习成本的一般性概念通常仅根据对各种空间扰动的鲁棒性进行评估，假设以固定的执行速度部署。然而，这在机器人学的背景下是不切实际的，构建时不变的解决方案是至关重要的。在这项工作中，我们提出了一个公式，允许我们1）通过学习时不变的成本来改变执行的长度，2）放宽从演示学习的时间对齐要求。我们将我们的方法应用于两种不同类型的成本公式，并在模拟放置和钉孔任务的学习奖励函数的上下文中评估它们的性能。我们的研究结果显示，我们的方法可以学习时间不变的奖励，从错位示范，也可以推广到空间分布外的任务。摘要：Inverse reinforcement learning is a paradigm motivated by the goal of learning general reward functions from demonstrated behaviours. Yet the notion of generality for learnt costs is often evaluated in terms of robustness to various spatial perturbations only, assuming deployment at fixed speeds of execution. However, this is impractical in the context of robotics and building time-invariant solutions is of crucial importance. In this work, we propose a formulation that allows us to 1) vary the length of execution by learning time-invariant costs, and 2) relax the temporal alignment requirements for learning from demonstration. We apply our method to two different types of cost formulations and evaluate their performance in the context of learning reward functions for simulated placement and peg in hole tasks. Our results show that our approach enables learning temporally invariant rewards from misaligned demonstration that can also generalise spatially to out of distribution tasks.

【35】 Probabilistic partition of unity networks: clustering based deep approximation 标题：单位网络的概率划分：基于聚类的深度逼近

作者：Nat Trask,Mamikon Gulian,Andy Huang,Kookjin Lee 机构：Center for Computing Research, Sandia National Laboratories, Albuquerque, NM , Electrical Models and Simulation, Quantitative Modeling and Analysis 备注：12 pages, 6 figures 链接：https://arxiv.org/abs/2107.03066 摘要：单位网络划分（POU-Nets）可以实现偏微分方程回归和求解的代数收敛速度，但需要对训练参数进行经验调整。我们用高斯噪声模型来丰富POU网络，以获得一个基于梯度的最大似然损失最小化的概率泛化。所得到的结构提供了无噪和有噪数据的空间表示为高斯混合，方差的闭合形式表达式提供了局部误差的估计。训练过程基于函数值的相关性产生显著的输入空间划分。这种训练点的分类可以采用分层细化策略，显著提高回归的局部化程度，允许使用高阶多项式近似。与高斯过程回归相比，该框架更适合于大数据集，并允许空间变化的不确定性，利用深度神经网络的表达能力，同时绕过与其他概率深度学习方法相关的昂贵训练。与标准的深度神经网络相比，该框架在不使用正则化子来调整分区局部性的情况下证明了hp收敛性。我们提供了量化高/低维性能的基准，证明了收敛速度仅依赖于高维空间中数据的潜在维。最后，我们介绍了一个新的基于偏微分方程的半导体器件模拟的开源数据集，并对物理上可解释的降阶基进行了无监督提取。摘要：Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs, but require empirical tuning of training parameters. We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based minimization of a maximum likelihood loss. The resulting architecture provides spatial representations of both noiseless and noisy data as Gaussian mixtures with closed form expressions for variance which provides an estimator of local error. The training process yields remarkably sharp partitions of input space based upon correlation of function values. This classification of training points is amenable to a hierarchical refinement strategy that significantly improves the localization of the regression, allowing for higher-order polynomial approximation to be utilized. The framework scales more favorably to large data sets as compared to Gaussian process regression and allows for spatially varying uncertainty, leveraging the expressive power of deep neural networks while bypassing expensive training associated with other probabilistic deep learning methods. Compared to standard deep neural networks, the framework demonstrates hp-convergence without the use of regularizers to tune the localization of partitions. We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space. Finally, we introduce a new open-source data set of PDE-based simulations of a semiconductor device and perform unsupervised extraction of a physically interpretable reduced-order basis.

【36】 Exact Learning Augmented Naive Bayes Classifier 标题：精确学习增广朴素贝叶斯分类器

作者：Shouta Sugahara,Maomi Ueno 机构：Graduate school of Informatics and Engineering, The University of Electro-Communications, -,-, Chofugaoka, Chofu-shi, Tokyo, Japan, Editor: 备注：29 pages 链接：https://arxiv.org/abs/2107.03018 摘要：以往的研究表明，在给定特征变量的情况下，通过最大化一类变量的条件对数似然（CLL）得到的贝叶斯网络（BNs）的分类精度高于通过最大化边缘似然（ML）得到的分类精度。然而，在早期的研究中，这两个分数的表现之间的差异可能是由于他们使用的是近似的学习算法，而不是精确的学习算法。本文比较了用CLL近似学习和用ML精确学习的BNs分类精度，结果表明，对于大数据，最大化ML得到的BNs分类精度高于最大化CLL得到的BNs分类精度。然而，研究结果也显示，当样本量较小且类别变数有多个父变数时，使用ML的精确学习BNs的分类准确率要比其他方法差得多。为了解决这一问题，我们提出了一种精确学习的增广朴素贝叶斯分类器（ANB），它保证了类变量没有父变量。该方法保证了在精确学习的BN之后渐近估计同一类。对比实验表明，该方法具有良好的性能。摘要：Earlier studies have shown that classification accuracies of Bayesian networks (BNs) obtained by maximizing the conditional log likelihood (CLL) of a class variable, given the feature variables, were higher than those obtained by maximizing the marginal likelihood (ML). However, differences between the performances of the two scores in the earlier studies may be attributed to the fact that they used approximate learning algorithms, not exact ones. This paper compares the classification accuracies of BNs with approximate learning using CLL to those with exact learning using ML. The results demonstrate that the classification accuracies of BNs obtained by maximizing the ML are higher than those obtained by maximizing the CLL for large data. However, the results also demonstrate that the classification accuracies of exact learning BNs using the ML are much worse than those of other methods when the sample size is small and the class variable has numerous parents. To resolve the problem, we propose an exact learning augmented naive Bayes classifier (ANB), which ensures a class variable with no parents. The proposed method is guaranteed to asymptotically estimate the identical class posterior to that of the exactly learned BN. Comparison experiments demonstrated the superior performance of the proposed method.

【37】 Harnessing Heterogeneity: Learning from Decomposed Feedback in Bayesian Modeling 标题：利用异构性：从贝叶斯建模中分解的反馈中学习

作者：Kai Wang,Bryan Wilder,Sze-chuan Suen,Bistra Dilkina,Milind Tambe 机构：Harvard University, USA, University of Southern California, USA 链接：https://arxiv.org/abs/2107.03003 摘要：学习和优化一个由多个子组件组成的复杂系统，其中这些组件可以是代理或自主传感器，这引起了人们极大的兴趣。在这方面的丰富文献中，基于agent和特定领域的仿真可以捕获复杂的动力学和子组交互，但是在这样的仿真上进行优化在计算和算法上都具有挑战性。贝叶斯方法，如高斯过程（GPs），可以用来学习一个计算上易于处理的近似基础动力学，但通常忽略了有关复杂系统中子群的详细信息。我们试图通过提出分解反馈的思想来找到两个世界中最好的一个，它捕获了基于组的异质性和动态性。我们引入了一种新的分解GP回归方法来结合子组分解反馈。与以前的方法相比，我们的修正回归具有更低的方差，因此后验概率更准确；它还允许我们引入一个分解GP-UCB优化算法，利用子组反馈。该方法的贝叶斯性质使得优化算法在理论上具有收敛性和无遗憾性。为了证明这项工作的广泛适用性，我们在两个不同的社会问题上执行了我们的算法：异质人群中的传染病控制和分布式天气传感器的分配。实验结果表明，与现有方法相比，新方法有了显著的改进。摘要：There is significant interest in learning and optimizing a complex system composed of multiple sub-components, where these components may be agents or autonomous sensors. Among the rich literature on this topic, agent-based and domain-specific simulations can capture complex dynamics and subgroup interaction, but optimizing over such simulations can be computationally and algorithmically challenging. Bayesian approaches, such as Gaussian processes (GPs), can be used to learn a computationally tractable approximation to the underlying dynamics but typically neglect the detailed information about subgroups in the complicated system. We attempt to find the best of both worlds by proposing the idea of decomposed feedback, which captures group-based heterogeneity and dynamics. We introduce a novel decomposed GP regression to incorporate the subgroup decomposed feedback. Our modified regression has provably lower variance -- and thus a more accurate posterior -- compared to previous approaches; it also allows us to introduce a decomposed GP-UCB optimization algorithm that leverages subgroup feedback. The Bayesian nature of our method makes the optimization algorithm trackable with a theoretical guarantee on convergence and no-regret property. To demonstrate the wide applicability of this work, we execute our algorithm on two disparate social problems: infectious disease control in a heterogeneous population and allocation of distributed weather sensors. Experimental results show that our new method provides significant improvement compared to the state-of-the-art.

【38】 Universal Approximation for Log-concave Distributions using Well-conditioned Normalizing Flows 标题：使用良态正规化流的对数凹分布的通用逼近

作者：Holden Lee,Chirag Pabbaraju,Anish Sevekari,Andrej Risteski 机构：Duke University, Carnegie Mellon University 备注：40 pages, 0 figures 链接：https://arxiv.org/abs/2107.02951 摘要：规范化流是一类广泛应用的具有可处理似然性的潜变量生成模型。仿射耦合（Dinh et al，2014-16）模型是一种特别常见的标准化流类型，其中潜在到可观测变量变换的雅可比矩阵是三角形的，允许在线性时间内计算可能性。尽管仿射耦合被广泛使用，但该体系结构的特殊结构使得理解它们的表示能力具有挑战性。直到最近，三篇平行论文才解决了普遍近似问题（Huang等人，2020；张等，2020；Koehler等人，2020年）--他展示了合理的正则分布可以通过仿射耦合任意地逼近--尽管网络具有几乎奇异的雅可比矩阵。由于病态雅可比矩阵是基于似然的训练的一个障碍，基本问题仍然存在：哪些分布可以用条件良好的仿射耦合流来近似？本文证明了任意对数凹分布都可以用条件良好的仿射耦合流来近似。在证明技术方面，我们揭示并利用仿射耦合结构、欠阻尼Langevin动力学（通常用于从Gibbs测度采样的随机微分方程）和H′enon映射（辛微分同胚研究中出现的结构化动力学系统）之间的深层联系。我们的结果也为仿射耦合的训练实践提供了信息：我们用iid高斯近似输入分布的填充版本——Koehler等人（2020）根据经验观察到的一种策略可以产生更好的条件流，但迄今为止没有任何理论基础。因此，我们的证明为高斯填充在训练标准化流时的好处提供了理论证据。摘要：Normalizing flows are a widely used class of latent-variable generative models with a tractable likelihood. Affine-coupling (Dinh et al, 2014-16) models are a particularly common type of normalizing flows, for which the Jacobian of the latent-to-observable-variable transformation is triangular, allowing the likelihood to be computed in linear time. Despite the widespread usage of affine couplings, the special structure of the architecture makes understanding their representational power challenging. The question of universal approximation was only recently resolved by three parallel papers (Huang et al.,2020;Zhang et al.,2020;Koehler et al.,2020) -- who showed reasonably regular distributions can be approximated arbitrarily well using affine couplings -- albeit with networks with a nearly-singular Jacobian. As ill-conditioned Jacobians are an obstacle for likelihood-based training, the fundamental question remains: which distributions can be approximated using well-conditioned affine coupling flows? In this paper, we show that any log-concave distribution can be approximated using well-conditioned affine-coupling flows. In terms of proof techniques, we uncover and leverage deep connections between affine coupling architectures, underdamped Langevin dynamics (a stochastic differential equation often used to sample from Gibbs measures) and H\'enon maps (a structured dynamical system that appears in the study of symplectic diffeomorphisms). Our results also inform the practice of training affine couplings: we approximate a padded version of the input distribution with iid Gaussians -- a strategy which Koehler et al.(2020) empirically observed to result in better-conditioned flows, but had hitherto no theoretical grounding. Our proof can thus be seen as providing theoretical evidence for the benefits of Gaussian padding when training normalizing flows.

【39】 Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification 标题：扩展连续时间马尔可夫链有助于解决不规范问题

作者：Alkis Gotovos,Rebekka Burkholz,John Quackenbush,Stefanie Jegelka 机构：MIT, Harvard University 链接：https://arxiv.org/abs/2107.02911 摘要：在许多生物医学应用中，建立离散项目集（如基因突变）的时间演化模型是一个基本问题。我们通过连续时间马尔可夫链的镜头来处理这个问题，并且证明在通常的横截面数据设置中，所产生的学习任务通常是不明确的。我们探索了一种可能令人惊讶的补救方法：包括一些额外的独立项可以帮助确定时间顺序，从而解决不明确的问题。这与将分析局限于相关项目的一小部分的常见做法形成了鲜明对比，这在很大程度上是由于现有方法的伸缩性差。为了将我们的理论观点应用到实践中，我们提出了一种学习连续时间马尔可夫链的近似似然最大化方法，它可以扩展到数百个项目，并且比以前的方法快几个数量级。我们证明了我们的方法对合成和真实癌症数据的有效性。摘要：Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of additional independent items can help determine time order, and hence resolve underspecification. This is in sharp contrast to the common practice of limiting the analysis to a small subset of relevant items, which is followed largely due to poor scaling of existing methods. To put our theoretical insight into practice, we develop an approximate likelihood maximization method for learning continuous-time Markov chains, which can scale to hundreds of items and is orders of magnitude faster than previous methods. We demonstrate the effectiveness of our approach on synthetic and real cancer data.

【40】 Mid infrared spectroscopy and milk quality traits: a data analysis competition at the "International Workshop on Spectroscopy and Chemometrics 2021"

作者：Maria Frizzarin,Antonio Bevilacqua,Bhaskar Dhariyal,Katarina Domijan,Federico Ferraccioli,Elena Hayes,Georgiana Ifrim,Agnieszka Konkolewska,Thach Le Nguyen,Uche Mbaka,Giovanna Ranzato,Ashish Singh,Marco Stefanucci,Alessandro Casa 机构：Teagasc, Animal & Grassland Research and Innovation Centre, Moorepark, Ireland, School of Mathematics and Statistics, University College Dublin, Ireland, School of Computer Science, University College Dublin, Ireland 备注：17 pages, 6 figures, 6 tables 链接：https://arxiv.org/abs/2107.02906 摘要：在第一版“光谱学和化学计量学国际研讨会”期间安排了化学计量学数据分析挑战，由Vistamilk SFI研究中心组织，于2021年4月在线举行。比赛的目的是建立一个校准模型，以便仅利用中红外光谱中包含的信息预测牛奶品质性状。提供了三种不同的特征，呈现出不同程度的预测复杂性，因此可能需要特征特定的建模选择。本文概述了与会者采用的不同方法，并对从分析中获得的见解进行了批判性讨论。摘要：A chemometric data analysis challenge has been arranged during the first edition of the "International Workshop on Spectroscopy and Chemometrics", organized by the Vistamilk SFI Research Centre and held online in April 2021. The aim of the competition was to build a calibration model in order to predict milk quality traits exploiting the information contained in mid-infrared spectra only. Three different traits have been provided, presenting heterogeneous degrees of prediction complexity thus possibly requiring trait-specific modelling choices. In this paper the different approaches adopted by the participants are outlined and the insights obtained from the analyses are critically discussed.

【41】 Network meta-analysis and random walks 标题：网络荟萃分析与随机游走

作者：Annabel L. Davies,Theodoros Papakonstantinou,Adriani Nikolakopoulou,Gerta Rücker,Tobias Galla 机构：Theoretical Physics, Department of Physics, and Astronomy, School of Natural, Sciences, The University of Manchester, Manchester, UK, Institute of Medical Biometry and, Statistics, Centre,University of Freiburg, Freiburg, Germany, Instituto de Física Interdisciplinar y 备注：34 pages, 8 figures 链接：https://arxiv.org/abs/2107.02886 摘要：网络元分析（NMA）是临床研究证据综合的核心工具。NMA的结果在很大程度上取决于汇集的证据的质量。因此，在评估NMA的有效性时，了解每个直接治疗比较对每个网络治疗效果的贡献比例是很重要的。比例贡献的构建基于观察，即hat矩阵的每一行代表了每个治疗比较的所谓“证据流网络”。然而，现有的算法用于计算这些值是与模糊度根据路径的选择。在这项工作中，我们提出了一个新的NMA和随机游动之间的类比。我们使用这个类比来推导比例贡献的封闭形式表达式。图上的随机游动是一个随机过程，它描述由边连接的顶点之间的一系列随机“跳跃”。边的权重与步行者沿着该边移动的概率有关。利用NMA的图表示构造了证据网络上随机游动的转移矩阵。我们证明了一个步行者穿越网络边缘的净次数与证据流网络有关。通过定义有向证据流网络上的随机游动，我们解析地导出了比例贡献矩阵。随机游走方法除了计算效率更高之外，没有现有算法的相关模糊性。摘要：Network meta-analysis (NMA) is a central tool for evidence synthesis in clinical research. The results of an NMA depend critically on the quality of evidence being pooled. In assessing the validity of an NMA, it is therefore important to know the proportion contributions of each direct treatment comparison to each network treatment effect. The construction of proportion contributions is based on the observation that each row of the hat matrix represents a so-called 'evidence flow network' for each treatment comparison. However, the existing algorithm used to calculate these values is associated with ambiguity according to the selection of paths. In this work we present a novel analogy between NMA and random walks. We use this analogy to derive closed-form expressions for the proportion contributions. A random walk on a graph is a stochastic process that describes a succession of random 'hops' between vertices which are connected by an edge. The weight of an edge relates to the probability that the walker moves along that edge. We use the graph representation of NMA to construct the transition matrix for a random walk on the network of evidence. We show that the net number of times a walker crosses each edge of the network is related to the evidence flow network. By then defining a random walk on the directed evidence flow network, we derive analytically the matrix of proportion contributions. The random-walk approach, in addition to being computationally more efficient, has none of the associated ambiguity of the existing algorithm.

【42】 Principles for Evaluation of AI/ML Model Performance and Robustness 标题：AI/ML模型性能和稳健性的评价原则

作者：Olivia Brown,Andrew Curtis,Justin Goodwin 链接：https://arxiv.org/abs/2107.02868 摘要：美国国防部（DoD）已经大幅增加了对人工智能和机器学习（AI/ML）能力的设计、评估和部署的投资，以满足国家安全需求。虽然AI/ML在学术和商业领域取得了许多成功，但其中许多系统也被证明是脆弱和不健壮的。在复杂和不断变化的国家安全环境中，在这些新能力部署到战场之前，国防部必须建立一个健全和系统的过程来评估AI/ML模型的性能和健壮性。本文回顾了AI/ML开发过程，重点介绍了AI/ML模型评估的常见最佳实践，并向国防部评估人员提出了建议，以确保为国家安全需要部署强大的AI/ML能力。摘要：The Department of Defense (DoD) has significantly increased its investment in the design, evaluation, and deployment of Artificial Intelligence and Machine Learning (AI/ML) capabilities to address national security needs. While there are numerous AI/ML successes in the academic and commercial sectors, many of these systems have also been shown to be brittle and nonrobust. In a complex and ever-changing national security environment, it is vital that the DoD establish a sound and methodical process to evaluate the performance and robustness of AI/ML models before these new capabilities are deployed to the field. This paper reviews the AI/ML development process, highlights common best practices for AI/ML model evaluation, and makes recommendations to DoD evaluators to ensure the deployment of robust AI/ML capabilities for national security needs.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-08，如有侵权请联系 cloudcommunity@tencent.com 删除

linux