前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >统计学学术速递[11.10]

统计学学术速递[11.10]

作者头像
公众号-arXiv每日学术速递
发布2021-11-17 10:56:46
5320
发布2021-11-17 10:56:46
举报
文章被收录于专栏:arXiv每日学术速递

stat统计学,共计39篇

【1】 An Examination of Sport Climbing's Competition Format and Scoring System 标题:对运动攀岩竞赛形式和评分系统的考察 链接:https://arxiv.org/abs/2111.05310

作者:Quang Nguyen,Hannah Butler,Gregory J. Matthews 摘要:运动攀岩在2020年夏季奥运会上首次亮相,通常由三个独立的学科组成:速度攀岩、抱石和铅皮攀岩。然而,国际奥委会(IOC)只允许在攀岩运动中按性别分配一套奖牌。因此,攀岩运动的管理机构,而不是只选择三个学科中的一个列入奥运会,决定创建一个综合所有三个学科的比赛。为了确定胜利者,使用三个学科排名的乘积创建了一个综合评分系统,以确定每个登山者的总得分。在这项工作中,通过模拟评估了运动攀岩的秩积评分系统,以研究其一般特征,特别是给定位置的攀岩者的前进概率。此外,还对历史攀岩比赛结果进行了分析,并举例说明了违反不相关备选方案独立性的真实例子。最后,这项研究发现了证据,表明当前的竞赛形式使速度攀岩者处于劣势。 摘要:Sport climbing, which made its Olympic debut at the 2020 Summer Games, generally consists of three separate disciplines: speed climbing, bouldering, and lead climbing. However, the International Olympic Committee (IOC) only allowed one set of medals per gender for sport climbing. As a result, the governing body of sport climbing, rather than choosing only one of the three disciplines to include in the Olympics, decided to create a competition combining all three disciplines. In order to determine a winner, a combined scoring system was created using the product of the ranks across the three disciplines to determine an overall score for each climber. In this work, the rank-product scoring system of sport climbing is evaluated through simulation to investigate its general features, specifically, the advancement probabilities for climbers given certain placements. Additionally, analyses of historical climbing contest results are presented and real examples of violations of the independence of irrelevant alternatives are illustrated. Finally, this work finds evidence that the current competition format is putting speed climbers at a disadvantage.

【2】 Double Control Variates for Gradient Estimation in Discrete Latent Variable Models 标题:离散潜变量模型梯度估计的双控制变量 链接:https://arxiv.org/abs/2111.05300

作者:Michalis K. Titsias,Jiaxin Shi 机构:DeepMind, Microsoft Research New England 备注:18 pages 摘要:由于梯度的高方差,基于随机梯度的离散潜变量模型优化具有挑战性。我们介绍了一种利用双控制变量的得分函数估计的方差缩减技术。这些控制变量作用于主控制变量之上,并试图进一步降低总体估计量的方差。我们利用泰勒展开式为强化遗漏估计量建立了一个双控制变量。对于训练离散的潜变量模型,例如具有二进制潜变量的变分自动编码器,我们的方法与使用强化遗漏估计的标准训练相比不增加额外的计算成本。我们将我们的方法应用于挑战高维玩具的例子和训练具有二进制潜变量的变分自动编码器。我们证明了我们的估计可以有较低的方差相比,其他国家的最先进的估计。 摘要:Stochastic gradient-based optimisation for discrete latent variable models is challenging due to the high variance of gradients. We introduce a variance reduction technique for score function estimators that makes use of double control variates. These control variates act on top of a main control variate, and try to further reduce the variance of the overall estimator. We develop a double control variate for the REINFORCE leave-one-out estimator using Taylor expansions. For training discrete latent variable models, such as variational autoencoders with binary latent variables, our approach adds no extra computational cost compared to standard training with the REINFORCE leave-one-out estimator. We apply our method to challenging high-dimensional toy examples and training variational autoencoders with binary latent variables. We show that our estimator can have lower variance compared to other state-of-the-art estimators.

【3】 Harmless interpolation in regression and classification with structured features 标题:结构化特征回归分类中的无害化插值 链接:https://arxiv.org/abs/2111.05198

作者:Andrew D. McRae,Santhosh Karnik,Mark A. Davenport,Vidya Muthukumar 机构:School of Electrical and Computer Engineering, Georgia Tech, Department of Computational Mathematics, Science, & Engineering, School of Industrial and Systems Engineering, Georgia Tech 摘要:过参数化神经网络倾向于完美地拟合有噪声的训练数据,但对测试数据具有良好的泛化能力。受这一经验观察的启发,最近的工作试图在更简单的线性模型中理解良性过度拟合或无害插值的现象。以前的理论工作严格假设数据特征在统计上是独立的,或者输入数据是高维的;这排除了结构化特征映射的常规非参数设置。本文在再生核Hilbert空间中提出了一个通用的、灵活的上界回归和分类风险框架。一个关键贡献是,我们的框架描述了数据图矩阵上发生无害插值的精确充分条件。我们的结果恢复了先前的独立特征结果(通过更简单的分析),但它们进一步表明,无害的插值可以发生在更一般的设置中,例如有界正交系统的特征。此外,我们的结果显示了分类和回归性能之间的渐进分离,这种分离方式以前仅适用于高斯特征。 摘要:Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. Inspired by this empirical observation, recent work has sought to understand this phenomenon of benign overfitting or harmless interpolation in the much simpler linear model. Previous theoretical work critically assumes that either the data features are statistically independent or the input data is high-dimensional; this precludes general nonparametric settings with structured feature maps. In this paper, we present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space. A key contribution is that our framework describes precise sufficient conditions on the data Gram matrix under which harmless interpolation occurs. Our results recover prior independent-features results (with a much simpler analysis), but they furthermore show that harmless interpolation can occur in more general settings such as features that are a bounded orthonormal system. Furthermore, our results show an asymptotic separation between classification and regression performance in a manner that was previously only shown for Gaussian features.

【4】 In Nonparametric and High-Dimensional Models, Bayesian Ignorability is an Informative Prior 标题:在非参数和高维模型中,贝叶斯Ignorability是一个信息先验 链接:https://arxiv.org/abs/2111.05137

作者:Antonio R. Linero 机构:Department of Statistics and Data Sciences, University of Texas at Austin 摘要:在大量缺失数据的问题中,必须对两个不同的数据生成过程建模:生成响应的结果过程和确定我们观察到的数据的缺失数据机制。然而,在Rubin(1976)的可忽略条件下,基于可能性的结果过程推断不依赖于缺失数据机制,因此只需要估计前者;部分由于这种简化,可忽略性通常被用作基线假设。我们研究了存在高维干扰参数时贝叶斯可忽略性的含义,并认为可忽略性通常与关于选择偏差量的合理先验信念不相容。我们发现,对于许多问题,可忽略性直接意味着选择偏差的先验值紧密地集中在零附近。这在几个实际感兴趣的模型上得到了证明,并且对于具有岭回归先验的高维线性模型,描述了可忽略性对后验分布的影响。然后,我们展示了如何建立高维模型,对选择偏差的合理信念进行编码,并展示了在某些狭窄的环境下,可忽略性问题较少。 摘要:In problems with large amounts of missing data one must model two distinct data generating processes: the outcome process which generates the response and the missing data mechanism which determines the data we observe. Under the ignorability condition of Rubin (1976), however, likelihood-based inference for the outcome process does not depend on the missing data mechanism so that only the former needs to be estimated; partially because of this simplification, ignorability is often used as a baseline assumption. We study the implications of Bayesian ignorability in the presence of high-dimensional nuisance parameters and argue that ignorability is typically incompatible with sensible prior beliefs about the amount of selection bias. We show that, for many problems, ignorability directly implies that the prior on the selection bias is tightly concentrated around zero. This is demonstrated on several models of practical interest, and the effect of ignorability on the posterior distribution is characterized for high-dimensional linear models with a ridge regression prior. We then show both how to build high-dimensional models which encode sensible beliefs about the selection bias and also show that under certain narrow circumstances ignorability is less problematic.

【5】 Using sequential drift detection to test the API economy 标题:使用序贯漂移检测来测试API的经济性 链接:https://arxiv.org/abs/2111.05136

作者:Samuel Ackerman,Parijat Dube,Eitan Farchi 摘要:API经济指的是API(高级编程接口)微服务的广泛集成,其中软件应用程序可以相互通信,作为业务模型和功能中的关键元素。这样一个系统可以被使用的可能方式的数量是巨大的。因此,需要监控使用模式并确定系统何时以以前从未使用过的方式使用。这为系统分析员提供了警告,他们可以确保系统的不间断运行。在这项工作中,我们分析了API使用的直方图和调用图,以确定系统的使用模式是否发生了变化。我们比较了非参数统计和贝叶斯序列分析在该问题中的应用。这样做可以克服重复统计测试的问题,并确保警报的统计意义。对该技术进行了模拟和测试,证明该技术在各种情况下都能有效地检测到漂移。我们还提到了对该技术的修改,以减少其内存,从而在监测开始时的延迟发生分布漂移时,能够更快地响应。 摘要:The API economy refers to the widespread integration of API (advanced programming interface) microservices, where software applications can communicate with each other, as a crucial element in business models and functions. The number of possible ways in which such a system could be used is huge. It is thus desirable to monitor the usage patterns and identify when the system is used in a way that was never used before. This provides a warning to the system analysts and they can ensure uninterrupted operation of the system. In this work we analyze both histograms and call graph of API usage to determine if the usage patterns of the system has shifted. We compare the application of nonparametric statistical and Bayesian sequential analysis to the problem. This is done in a way that overcomes the issue of repeated statistical tests and insures statistical significance of the alerts. The technique was simulated and tested and proven effective in detecting the drift in various scenarios. We also mention modifications to the technique to decrease its memory so that it can respond more quickly when the distribution drift occurs at a delay from when monitoring begins.

【6】 Gated Linear Model induced U-net for surrogate modeling and uncertainty quantification 标题:用于代理建模和不确定性量化的门线性模型诱导U网 链接:https://arxiv.org/abs/2111.05123

作者:Sai Krishna Mendu,Souvik Chakraborty 机构:Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati - , Assam, India., Department of Applied Mechanics, School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas - , New Delhi, India 备注:21 pages 摘要:我们提出了一种新的基于深度学习的替代模型,用于解决高维不确定性量化和不确定性传播问题。提出的深度学习体系结构是通过将著名的U-net体系结构与高斯选通线性网络(GGLN)相结合而开发的,称为选通线性网络诱导的U-net或GLU网络。所提出的GLU网络将不确定性传播问题视为图像到图像的回归,因此具有极高的数据效率。此外,它还提供了预测不确定性的估计值。GLU-net的网络结构没有当代作品复杂,参数少44\%。我们展示了所提出的GLU网络在稀疏数据场景下求解不确定达西流问题的性能。我们认为随机输入维数高达4225。基准测试结果使用香草蒙特卡罗模拟生成。我们观察到,即使没有向网络提供有关输入结构的信息,所提出的GLU网络仍然是准确和高效的。通过改变训练样本大小和随机输入维数进行了实例研究,以说明该方法的鲁棒性。 摘要:We propose a novel deep learning based surrogate model for solving high-dimensional uncertainty quantification and uncertainty propagation problems. The proposed deep learning architecture is developed by integrating the well-known U-net architecture with the Gaussian Gated Linear Network (GGLN) and referred to as the Gated Linear Network induced U-net or GLU-net. The proposed GLU-net treats the uncertainty propagation problem as an image to image regression and hence, is extremely data efficient. Additionally, it also provides estimates of the predictive uncertainty. The network architecture of GLU-net is less complex with 44\% fewer parameters than the contemporary works. We illustrate the performance of the proposed GLU-net in solving the Darcy flow problem under uncertainty under the sparse data scenario. We consider the stochastic input dimensionality to be up to 4225. Benchmark results are generated using the vanilla Monte Carlo simulation. We observe the proposed GLU-net to be accurate and extremely efficient even when no information about the structure of the inputs is provided to the network. Case studies are performed by varying the training sample size and stochastic input dimensionality to illustrate the robustness of the proposed approach.

【7】 Changepoint detection in non-exchangeable data 标题:不可交换数据中的变点检测 链接:https://arxiv.org/abs/2111.05054

作者:Karl L. Hallgren,Nicholas A. Heard,Niall M. Adams 机构:Department of Mathematics, Imperial College London 摘要:变更点模型通常假设每个段中的数据是独立的,并且是相同分布的,条件是某些参数在各个段之间发生变化。当数据受到局部相关模式的影响时,这种结构可能不充分,通常会导致拟合的变化点比优选的多。本文提出了一个贝叶斯转换点模型,该模型放松了段内可交换性的假设。所提出的模型假设某一段内的数据与某些未知的$m\geqslant0$相关,这些未知的$m\geqslant0$可能在不同的段之间变化,从而产生一个适合于检测受不同局部时间相关性影响的数据中明显不连续性的模型。该方法适用于连续和离散数据。提出了一种新的可逆跳跃MCMC算法对模型进行采样;特别是,对参数空间进行了详细分析,以建立依赖顺序的建议。两个应用证明了该模型的优点:通过计数数据的变化检测进行计算机网络监控,以及金融时间序列的分割。 摘要:Changepoint models typically assume the data within each segment are independent and identically distributed conditional on some parameters which change across segments. This construction may be inadequate when data are subject to local correlation patterns, often resulting in many more changepoints fitted than preferable. This article proposes a Bayesian changepoint model which relaxes the assumption of exchangeability within segments. The proposed model supposes data within a segment are $m$-dependent for some unkown $m \geqslant0$ which may vary between segments, resulting in a model suitable for detecting clear discontinuities in data which are subject to different local temporal correlations. The approach is suited to both continuous and discrete data. A novel reversible jump MCMC algorithm is proposed to sample from the model; in particular, a detailed analysis of the parameter space is exploited to build proposals for the orders of dependence. Two applications demonstrate the benefits of the proposed model: computer network monitoring via change detection in count data, and segmentation of financial time series.

【8】 Exploratory Factor Analysis of Data on a Sphere 标题:球面上数据的探索性因子分析 链接:https://arxiv.org/abs/2111.04940

作者:Fan Dai,Karin S. Dorman,Somak Dutta,Ranjan Maitra 机构: all at Iowa State University 备注:26 pages, 12 figures, 6 tables 摘要:关于高维球体的数据在许多学科中经常出现,或者是自然产生的,或者是初步处理的结果,并且可能具有需要理解的复杂依赖结构。我们开发了预测正态分布的探索性因素分析,以使用一些易于解释的潜在因素解释此类数据的可变性。我们的方法通过一种新的快速交替期望分布条件最大化算法提供最大似然估计。在各种设置下的模拟实验结果都非常优秀。我们的方法在2018年12月初使用$MeToo$标签发布推特时,提供了可解释和深刻的结果,对正常青少年休息时大脑的时间过程功能磁共振图像,对手写数字进行了表征,并对癌症基因组图谱中癌细胞的基因表达数据进行了分析。 摘要:Data on high-dimensional spheres arise frequently in many disciplines either naturally or as a consequence of preliminary processing and can have intricate dependence structure that needs to be understood. We develop exploratory factor analysis of the projected normal distribution to explain the variability in such data using a few easily interpreted latent factors. Our methodology provides maximum likelihood estimates through a novel fast alternating expectation profile conditional maximization algorithm. Results on simulation experiments on a wide range of settings are uniformly excellent. Our methodology provides interpretable and insightful results when applied to tweets with the $\#MeToo$ hashtag in early December 2018, to time-course functional Magnetic Resonance Images of the average pre-teen brain at rest, to characterize handwritten digits, and to gene expression data from cancerous cells in the Cancer Genome Atlas.

【9】 Kernel estimation of bivariate time-varying coefficient model for longitudinal data with terminal event 标题:具有终端事件的纵向数据二元时变系数模型的核估计 链接:https://arxiv.org/abs/2111.04938

作者:Yue Wang,Bin Nan,Jack D. Kalbfleisch 机构:Department of Statistics, University of California, Irvine, and, Department of Biostatistics, University of Michigan 摘要:我们提出了一个非参数二元时变系数模型,用于发生右截尾事件的纵向测量。随时间变化的系数捕捉协变量效应的纵向轨迹以及随访时间和剩余寿命。该模型扩展了最近文献中给定终端事件时间的参数条件方法,从而避免了潜在的模型错误指定。我们考虑了一种用于估计模型中回归系数的核平滑方法,并使用交叉验证来进行带宽选择,最后分析中应用了欠平滑来消除核估计的渐近偏差。我们证明了核估计在弱正则条件下是渐近正态的,并给出了一个易于计算的三明治方差估计。我们进行了广泛的模拟,显示了所提出方法的理想性能,并将该方法应用于分析终末期肾病患者的医疗费用数据。 摘要:We propose a nonparametric bivariate time-varying coefficient model for longitudinal measurements with the occurrence of a terminal event that is subject to right censoring. The time-varying coefficients capture the longitudinal trajectories of covariate effects along with both the followup time and the residual lifetime. The proposed model extends the parametric conditional approach given terminal event time in recent literature, and thus avoids potential model misspecification. We consider a kernel smoothing method for estimating regression coefficients in our model and use cross-validation for bandwidth selection, applying undersmoothing in the final analysis to eliminate the asymptotic bias of the kernel estimator. We show that the kernel estimates are asymptotically normal under mild regularity conditions, and provide an easily computable sandwich variance estimator. We conduct extensive simulations that show desirable performance of the proposed approach, and apply the method to analyzing the medical cost data for patients with end-stage renal disease.

【10】 The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection 标题:自适应优化器在诚实私密超参数选择中的作用 链接:https://arxiv.org/abs/2111.04906

作者:Shubhankar Mohapatra,Sajin Sasy,Xi He,Gautam Kamath,Om Thakkar 摘要:超参数优化是机器学习中一个普遍存在的挑战,训练模型的性能取决于它们的有效选择。尽管有一套丰富的工具可用于此目的,但在差异隐私(DP)的约束下,目前还没有实用的超参数选择方法。我们研究了用于差异私有机器学习的诚实超参数选择,其中超参数调整的过程在总体隐私预算中被考虑。为此,我们i)证明了标准合成工具在许多环境中的表现优于更先进的技术;ii)从经验和理论上证明了学习率和剪切范数超参数之间的内在联系,iii)证明像DPAdam这样的自适应优化器在诚实的超参数调优过程中具有显著优势,iv)利用Adam在DP设置中的新颖限制行为设计新的更高效的优化器。 摘要:Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection. While a rich set of tools exist for this purpose, there are currently no practical hyperparameter selection methods under the constraint of differential privacy (DP). We study honest hyperparameter selection for differentially private machine learning, in which the process of hyperparameter tuning is accounted for in the overall privacy budget. To this end, we i) show that standard composition tools outperform more advanced techniques in many settings, ii) empirically and theoretically demonstrate an intrinsic connection between the learning rate and clipping norm hyperparameters, iii) show that adaptive optimizers like DPAdam enjoy a significant advantage in the process of honest hyperparameter tuning, and iv) draw upon novel limiting behaviour of Adam in the DP setting to design a new and more efficient optimizer.

【11】 Determinants of Women's Attitude towards Intimate Partner Violence: Evidence from Bangladesh 标题:女性对亲密伴侣暴力态度的决定因素:来自孟加拉国的证据 链接:https://arxiv.org/abs/2111.04892

作者:Md Tareq Ferdous Khan,Lianfen Qian 备注:22 pages, 6 tables, 2 figures 摘要:目的:本研究的目的是确定导致妇女对亲密伴侣暴力(IPV)态度差异的重要决定因素。方法:2014年孟加拉国全国代表性人口与健康调查数据(17863名妇女)用于解决研究问题。本研究从五个态度问题中构建了两个反应变量,并测试了一系列个人和社区层面的预测因子。本研究采用的初步统计方法包括单变量和双变量分布,而采用的统计模型包括每个响应变量的二元logistic、序数logistic、混合效应多水平logistic模型,最后是广义序数logistic回归。结果:统计分析显示,在个体层面的自变量中,初婚年龄、受访者的教育程度、决策分数、宗教信仰、非政府组织成员资格、信息获取、丈夫的教育程度、标准化财富分数和划分指标对女性对IPV的态度有显著影响。在三个社区层面变量中,只有平均决策得分在降低可能性方面具有显著意义。结论:很明显,除了宗教、非政府组织成员和划分指标外,变量值越高,证明IPV的可能性越低。然而,作为穆斯林、非政府组织成员和其他部门的居民,妇女被发现比她们各自的对应者更能容忍IPV。这些研究结果表明,政府、决策者、从业者、学者和所有其他利益相关者应致力于研究重要的决定因素,以改变妇女对IPV的错误态度,从而有助于从社会上消除这一根深蒂固的问题。 摘要:Purpose: The purpose of this study is to identify the important determinants responsible for the variation in women's attitude towards intimate partner violence (IPV). Methods: A nationally representative Bangladesh Demographic and Health Survey 2014 data of 17,863 women is used to address the research questions. In the study, two response variables are constructed from the five attitude questions, and a series of individual and community-level predictors are tested. The preliminary statistical methods employed in the study include univariate and bivariate distributions, while the adopted statistical models include binary logistic, ordinal logistic, mixed-effects multilevel logistic models for each response variable, and finally, the generalized ordinal logistic regression. Results: Statistical analyses reveal that among the individual-level independent variables age at first marriage, respondent's education, decision score, religion, NGO membership, access to information, husband's education, normalized wealth score, and division indicator have significant effects on the women's attitude towards IPV. Among the three community-level variables, only the mean decision score is found significant in lowering the likelihood. Conclusions: It is evident that other than religion, NGO membership, and division indicator, the higher the value of the variable, the lower the likelihood of justifying IPV. However, being a Muslim, NGO member, and resident of other divisions, women are found more tolerant of IPV from their respective counterparts. These findings suggest the government, policymakers, practitioners, academicians, and all other stakeholders to work on the significant determinants to divert women's wrong attitude towards IPV, and thus help to take away this deep-rooted problem from society.

【12】 Query-augmented Active Metric Learning 标题:查询增强的主动度量学习 链接:https://arxiv.org/abs/2111.04871

作者:Yujia Deng,Yubai Yuan,Haoda Fu,Annie Qu 机构:Department of Statistics, University of Illinois, Urbana-Champaign, Department of Statistics, University of California, Irvine, Eli Lilly and Company 摘要:在这篇文章中,我们提出了一种主动度量学习方法,用于两两约束下的聚类。该方法主动查询信息实例对的标签,同时通过合并未标记的实例对来估计底层度量,从而实现更准确、更高效的聚类过程。特别是,我们通过生成更多的成对标签来增加查询的约束,以便在学习度量时提供额外的信息,从而提高聚类性能。此外,我们通过顺序更新学习的度量并自适应地惩罚不相关的特征,提高了度量学习的鲁棒性。此外,我们还提出了一种新的主动查询策略,通过引入邻域结构更准确地评估实例对的信息增益,从而在不增加额外标记成本的情况下提高了聚类效率。从理论上讲,与仅使用现有约束的方法相比,我们提供了一个更严格的误差界。此外,我们还研究了使用主动查询策略代替随机选择的改进。对仿真环境和真实数据集的数值研究表明,当显著特征和不相关特征之间的信噪比较低时,该方法尤其具有优势。 摘要:In this paper we propose an active metric learning method for clustering with pairwise constraints. The proposed method actively queries the label of informative instance pairs, while estimating underlying metrics by incorporating unlabeled instance pairs, which leads to a more accurate and efficient clustering process. In particular, we augment the queried constraints by generating more pairwise labels to provide additional information in learning a metric to enhance clustering performance. Furthermore, we increase the robustness of metric learning by updating the learned metric sequentially and penalizing the irrelevant features adaptively. In addition, we propose a novel active query strategy that evaluates the information gain of instance pairs more accurately by incorporating the neighborhood structure, which improves clustering efficiency without extra labeling cost. In theory, we provide a tighter error bound of the proposed metric learning method utilizing augmented queries compared with methods using existing constraints only. Furthermore, we also investigate the improvement using the active query strategy instead of random selection. Numerical studies on simulation settings and real datasets indicate that the proposed method is especially advantageous when the signal-to-noise ratio between significant features and irrelevant features is low.

【13】 Double robust estimation of partially adaptive treatment strategies 标题:部分适应性治疗策略的双重稳健估计 链接:https://arxiv.org/abs/2111.04844

作者:Denis Talbot,Erica EM Moodie,Caroline Diorio 机构:D´epartement de m´edecine sociale et pr´eventive, Universit´e Laval, Department of Epidemiology and Biostatistics, McGill University, Axe oncologie, Centre de recherche du CHU de Qu´ebec 备注:22 pages and 8 tables 摘要:精准医学旨在根据患者的特点定制治疗决策。G-估计和动态加权普通最小二乘法(dWOLS)是双稳健统计方法,可用于确定最佳自适应治疗策略。它们需要一个结果模型和一个治疗模型,如果至少正确指定了其中一个模型,则它们是一致的。这些方法还需要对所有现有的治疗混杂因素相互作用进行建模,以产生一致的估计值,这一点未被充分认识。在实践中,识别仅根据少数协变量定制治疗的部分适应性治疗策略,忽略一些相互作用,可能更可取。有人提出将逆概率加权和G估计相结合来解决这个问题,但我们认为由此产生的估计量不应具有双重鲁棒性。基于G-估计和dWOLS,我们提出了部分自适应策略的替代估计,并证明了它们的双重鲁棒性。我们在一项模拟研究中调查并比较了六种估计器的经验性能。正如预期的那样,当治疗模型被错误地指定时,将逆概率加权与G估计或DWOL相结合的估计器是有偏差的。如果正确指定了治疗或结果模型,并且具有类似的标准误差,则其他估计器是无偏的。利用盛疾病中心保存的数据,说明了根据乳腺癌患者的雌激素受体状态和体重指数来评估部分适应性治疗策略的方法。提供了实现我们的估计器的R软件。 摘要:Precision medicine aims to tailor treatment decisions according to patients' characteristics. G-estimation and dynamic weighted ordinary least squares (dWOLS) are double robust statistical methods that can be used to identify optimal adaptive treatment strategies. They require both a model for the outcome and a model for the treatment and are consistent if at least one of these models is correctly specified. It is underappreciated that these methods additionally require modeling all existing treatment-confounder interactions to yield consistent estimators. Identifying partially adaptive treatment strategies that tailor treatments according to only a few covariates, ignoring some interactions, may be preferable in practice. It has been proposed to combine inverse probability weighting and G-estimation to address this issue, but we argue that the resulting estimator is not expected to be double robust. Building on G-estimation and dWOLS, we propose alternative estimators of partially adaptive strategies and demonstrate their double robustness. We investigate and compare the empirical performance of six estimators in a simulation study. As expected, estimators combining inverse probability weighting with either G-estimation or dWOLS are biased when the treatment model is incorrectly specified. The other estimators are unbiased if either the treatment or the outcome model are correctly specified and have similar standard errors. Using data maintained by the Centre des Maladies du Sein, the methods are illustrated to estimate a partially adaptive treatment strategy for tailoring hormonal therapy use in breast cancer patients according to their estrogen receptor status and body mass index. R software implementing our estimators is provided.

【14】 Solution to the Non-Monotonicity and Crossing Problems in Quantile Regression 标题:分位数回归中非单调性和交叉性问题的解决 链接:https://arxiv.org/abs/2111.04805

作者:Resve A. Saleh,A. K. Md. Ehsanes Saleh 机构:Dept. of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada, A.K.Md. Ehsanes Saleh, School of Mathematics and Statistics, Carleton University, Ottawa, Canada 备注:10 pages, 13 figures, IEEE conference format 摘要:本文提出了一种新的方法来解决长期存在的条件分位数函数和结构分位数函数估计缺乏单调性的问题,即分位数交叉问题。分位数回归是数据科学特别是计量经济学中非常强大的工具。不幸的是,40多年来,交叉问题一直困扰着研究者和实践者。为了找到一个可接受的解决方案,已经进行了无数次尝试,但迄今为止还没有找到简单而普遍的解决方案。本文描述了一个优雅的问题解决方案,它基于一个简单的数学方程,易于理解并在R和Python中实现,同时大大减少了交叉问题。它在分位数回归经常使用的所有领域都非常重要,也可能在稳健回归中得到应用,特别是在机器学习的背景下。 摘要:This paper proposes a new method to address the long-standing problem of lack of monotonicity in estimation of the conditional and structural quantile function, also known as quantile crossing problem. Quantile regression is a very powerful tool in data science in general and econometrics in particular. Unfortunately, the crossing problem has been confounding researchers and practitioners alike for over 4 decades. Numerous attempts have been made to find an acceptable solution but no simple and general solution has been found to date. This paper describes an elegant solution to the problem which is based on a single mathematical equation that is easy to understand and implement in R and Python, while greatly reducing the crossing problem. It will be very important in all areas where quantile regression is routinely used and may also find application in robust regression, especially in the context of machine learning.

【15】 Data Augmentation Can Improve Robustness 标题:数据增强可以提高健壮性 链接:https://arxiv.org/abs/2111.05328

作者:Sylvestre-Alvise Rebuffi,Sven Gowal,Dan A. Calian,Florian Stimberg,Olivia Wiles,Timothy Mann 机构:DeepMind, London 备注:Accepted at NeurIPS 2021. arXiv admin note: substantial text overlap with arXiv:2103.01946; text overlap with arXiv:2110.09468 摘要:对抗性训练存在稳健过度拟合现象,即稳健测试精度在训练过程中开始下降。在本文中,我们重点讨论了使用常用的数据增强方案来减少鲁棒过拟合。我们证明,与之前的研究结果相反,当与模型权重平均相结合时,数据增强可以显著提高鲁棒精度。此外,我们比较了各种增强技术,发现空间合成技术最适合对抗训练。最后,我们分别针对大小为$\epsilon=8/255$和$\epsilon=128/255$的$\ell\infty$和$\ell\u 2$范数有界扰动对CIFAR-10的方法进行了评估。与以前最先进的方法相比,我们在鲁棒性精度方面显示了+2.93%和+2.16%的巨大绝对改进。特别是,对于大小为$\epsilon=8/255$的$\ell\infty$范数有界扰动,我们的模型在不使用任何外部数据的情况下达到60.07%的鲁棒精度。在使用其他体系结构和数据集(如CIFAR-100、SVHN和TinyImageNet)时,我们还通过这种方法实现了显著的性能提升。 摘要:Adversarial training suffers from robust overfitting, a phenomenon where the robust test accuracy starts to decrease during training. In this paper, we focus on reducing robust overfitting by using common data augmentation schemes. We demonstrate that, contrary to previous findings, when combined with model weight averaging, data augmentation can significantly boost robust accuracy. Furthermore, we compare various augmentations techniques and observe that spatial composition techniques work the best for adversarial training. Finally, we evaluate our approach on CIFAR-10 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $\epsilon = 8/255$ and $\epsilon = 128/255$, respectively. We show large absolute improvements of +2.93% and +2.16% in robust accuracy compared to previous state-of-the-art methods. In particular, against $\ell_\infty$ norm-bounded perturbations of size $\epsilon = 8/255$, our model reaches 60.07% robust accuracy without using any external data. We also achieve a significant performance boost with this approach while using other architectures and datasets such as CIFAR-100, SVHN and TinyImageNet.

【16】 Turing-Universal Learners with Optimal Scaling Laws 标题:图灵-具有最优比例律的通用学习器 链接:https://arxiv.org/abs/2111.05321

作者:Preetum Nakkiran 机构:Halıcıoğlu Data Science Institute, University of California San Diego 摘要:对于给定的分布、学习算法和性能度量,收敛速度(或数据比例律)是算法测试性能的渐近行为,是训练样本数的函数。理论和实践中的许多学习方法都有幂律比率,即绩效等级为$n^{-\alpha}$,有些$\alpha>0$。此外,理论家和实践者都关心在感兴趣的环境下提高学习算法的速度。我们观察到“通用学习器”的存在,它在指定的运行时间内(例如$O(n^2)$)在所有学习算法中实现了最佳的可能分布相关的渐近速率,而在该运行时间内只会导致多段对数减速。该算法是统一的,不依赖于分布,但对于所有分布都能获得最佳的可能速率。该结构本身是Levin的通用搜索(Levin,1973)的简单扩展。就像宇宙搜索一样,宇宙学习者根本不实用,主要是理论和哲学兴趣。 摘要:For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithm's test performance as a function of number of train samples. Many learning methods in both theory and practice have power-law rates, i.e. performance scales as $n^{-\alpha}$ for some $\alpha > 0$. Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest. We observe the existence of a "universal learner", which achieves the best possible distribution-dependent asymptotic rate among all learning algorithms within a specified runtime (e.g. $O(n^2)$), while incurring only polylogarithmic slowdown over this runtime. This algorithm is uniform, and does not depend on the distribution, and yet achieves best-possible rates for all distributions. The construction itself is a simple extension of Levin's universal search (Levin, 1973). And much like universal search, the universal learner is not at all practical, and is primarily of theoretical and philosophical interest.

【17】 Robust Estimation for Random Graphs 标题:随机图的稳健估计 链接:https://arxiv.org/abs/2111.05320

作者:Jayadev Acharya,Ayush Jain,Gautam Kamath,Ananda Theertha Suresh,Huanyu Zhang 摘要:我们研究了在$n$节点上稳健估计Erd\H{o}s-R\\ enyi随机图的参数$p$的问题,其中$\gamma$部分节点可能被恶意破坏。在指出标准估计量的不足之后,我们设计了一种计算效率高的谱算法,该算法对$\gamma<1/60$的$\tilde O(\sqrt{p(1-p)}/n+\gamma\sqrt{p(1-p)}/\sqrt{n}+\gamma n)$进行精确估计。此外,对于所有$\gamma<1/2$的信息理论极限,我们给出了一个具有相似精度的低效算法。最后,我们证明了一个近似匹配的统计下界,表明我们算法的误差在对数因子下是最优的。 摘要:We study the problem of robustly estimating the parameter $p$ of an Erd\H{o}s-R\'enyi random graph on $n$ nodes, where a $\gamma$ fraction of nodes may be adversarially corrupted. After showing the deficiencies of canonical estimators, we design a computationally-efficient spectral algorithm which estimates $p$ up to accuracy $\tilde O(\sqrt{p(1-p)}/n + \gamma\sqrt{p(1-p)} /\sqrt{n}+ \gamma/n)$ for $\gamma < 1/60$. Furthermore, we give an inefficient algorithm with similar accuracy for all $\gamma <1/2$, the information-theoretic limit. Finally, we prove a nearly-matching statistical lower bound, showing that the error of our algorithms is optimal up to logarithmic factors.

【18】 Identifying the atmospheric drivers of drought and heat using a smoothed deep learning approach 标题:使用平滑深度学习方法识别干旱和高温的大气驱动因素 链接:https://arxiv.org/abs/2111.05303

作者:Magdalena Mittermeier,Maximilian Weigert,David Rügamer 机构:Department of Geography, LMU Munich, Statistical Consulting StaBLab, Department of Statistics 备注:NeurIPS 2021: Tackling Climate Change with Machine Learning 摘要:在最近的夏天,欧洲遭受了几次灾难性的高温和干旱。除了热力学影响外,这种高温和干燥的极端天气是由某些大气条件(包括反气旋条件)驱动的。气候变化对大气环流的影响是复杂的,在这种情况下,许多开放的研究问题仍然存在,例如反气旋条件的未来趋势。基于标记环流模式目录和空间大气变量的组合,我们提出了一种平滑卷积神经网络分类器,用于六种与干旱和高温相关的反气旋环流。我们的工作有助于在气候模拟中确定极端炎热和干旱的重要驱动因素,从而揭示气候变化对这些驱动因素的影响。我们解决了环流模式分类固有的各种挑战,这些挑战也存在于其他气候模式中,例如,主观标签和明确的过渡期。 摘要:Europe was hit by several, disastrous heat and drought events in recent summers. Besides thermodynamic influences, such hot and dry extremes are driven by certain atmospheric situations including anticyclonic conditions. Effects of climate change on atmospheric circulations are complex and many open research questions remain in this context, e.g., on future trends of anticyclonic conditions. Based on the combination of a catalog of labeled circulation patterns and spatial atmospheric variables, we propose a smoothed convolutional neural network classifier for six types of anticyclonic circulations that are associated with drought and heat. Our work can help to identify important drivers of hot and dry extremes in climate simulations, which allows to unveil the impact of climate change on these drivers. We address various challenges inherent to circulation pattern classification that are also present in other climate patterns, e.g., subjective labels and unambiguous transition periods.

【19】 Can Information Flows Suggest Targets for Interventions in Neural Circuits? 标题:信息流能建议神经回路干预的目标吗? 链接:https://arxiv.org/abs/2111.05299

作者:Praveen Venkatesh,Sanghamitra Dutta,Neil Mehta,Pulkit Grover 机构:Allen Institute,University of Washington, Seattle; ,JP Morgan Chase AI Research;, –,Department of Electrical and Computer Engineering,Neuroscience Institute, Carnegie Mellon University 备注:Accepted to the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021). (29 pages; 61 figures) 摘要:在神经科学和临床应用的推动下,我们经验性地检验了信息流的观察指标是否可以建议干预措施。我们通过在机器学习公平性的背景下对人工神经网络进行实验来实现这一点,其目标是通过干预在系统中诱导公平性。使用我们最近开发的$M$-信息流框架,我们测量了关于真实标签的信息流(对准确性负责,因此是可取的),以及关于受保护属性的信息流(对偏差负责,因此是不可取的),在经过训练的神经网络的边上。然后,我们将流量大小与通过修剪对这些边缘进行干预的效果进行比较。我们表明,修剪携带有关受保护属性的更大信息流的边在更大程度上减少了输出的偏差。这表明$M$-信息流可以有意义地建议干预目标,以肯定的方式回答标题的问题。我们还评估了不同干预策略的偏差-准确度权衡,以分析人们如何使用对理想和不理想信息流(此处为准确度和偏差流)的估计来告知干预措施,从而保留前者,同时减少后者。 摘要:Motivated by neuroscientific and clinical applications, we empirically examine whether observational measures of information flow can suggest interventions. We do so by performing experiments on artificial neural networks in the context of fairness in machine learning, where the goal is to induce fairness in the system through interventions. Using our recently developed $M$-information flow framework, we measure the flow of information about the true label (responsible for accuracy, and hence desirable), and separately, the flow of information about a protected attribute (responsible for bias, and hence undesirable) on the edges of a trained neural network. We then compare the flow magnitudes against the effect of intervening on those edges by pruning. We show that pruning edges that carry larger information flows about the protected attribute reduces bias at the output to a greater extent. This demonstrates that $M$-information flow can meaningfully suggest targets for interventions, answering the title's question in the affirmative. We also evaluate bias-accuracy tradeoffs for different intervention strategies, to analyze how one might use estimates of desirable and undesirable information flows (here, accuracy and bias flows) to inform interventions that preserve the former while reducing the latter.

【20】 Generalization in quantum machine learning from few training data 标题:基于少量训练数据的量子机器学习泛化 链接:https://arxiv.org/abs/2111.05292

作者:Matthias C. Caro,Hsin-Yuan Huang,M. Cerezo,Kunal Sharma,Andrew Sornborger,Lukasz Cincio,Patrick J. Coles 机构:Department of Mathematics, Technical University of Munich, Garching, Germany, Munich Center for Quantum Science and Technology (MCQST), Munich, Germany, Institute for Quantum Information and Matter, Caltech, Pasadena, CA, USA 备注:14+25 pages, 4+1 figures 摘要:现代量子机器学习(QML)方法涉及对训练数据集上的参数化量子电路进行变量优化,然后对测试数据集进行预测(即,泛化)。在这项工作中,我们在有限数量的$N$训练数据点上进行训练后,对QML中的泛化性能进行了全面的研究。我们证明了一个具有$T$可训练门的量子机器学习模型的泛化误差在最坏情况下为$\sqrt{T/N}$。当优化过程中只有$K\ll T$门发生了实质性变化时,我们证明了泛化误差提高到$\sqrt{K/N}$。我们的结果表明,可以显著加快将单位编译成多项式数量的本机门的速度,这对于通常使用指数大小训练数据的量子计算行业来说是一个至关重要的应用。我们还表明,用量子卷积神经网络对跨越相变的量子态进行分类只需要非常小的训练数据集。其他潜在的应用包括学习量子纠错码或量子动力学模拟。我们的工作为QML领域注入了新的希望,因为很少的训练数据保证了良好的泛化。 摘要:Modern quantum machine learning (QML) methods involve variationally optimizing a parameterized quantum circuit on a training data set, and subsequently making predictions on a testing data set (i.e., generalizing). In this work, we provide a comprehensive study of generalization performance in QML after training on a limited number $N$ of training data points. We show that the generalization error of a quantum machine learning model with $T$ trainable gates scales at worst as $\sqrt{T/N}$. When only $K \ll T$ gates have undergone substantial change in the optimization process, we prove that the generalization error improves to $\sqrt{K / N}$. Our results imply that the compiling of unitaries into a polynomial number of native gates, a crucial application for the quantum computing industry that typically uses exponential-size training data, can be sped up significantly. We also show that classification of quantum states across a phase transition with a quantum convolutional neural network requires only a very small training data set. Other potential applications include learning quantum error correcting codes or quantum dynamical simulation. Our work injects new hope into the field of QML, as good generalization is guaranteed from few training data.

【21】 Generalized Kernel Ridge Regression for Causal Inference with Missing-at-Random Sample Selection 标题:随机样本缺失因果推断的广义核岭回归 链接:https://arxiv.org/abs/2111.05277

作者:Rahul Singh 机构: MIT Department of Economics 备注:75 pages 摘要:我提出了核岭回归估计的非参数剂量反应曲线和半参数治疗的影响,在设置中,分析员可以访问选定的样本,而不是随机样本;仅对选定的观察结果进行观察。我假设选择和随机条件治疗一样好,并且有一组足够丰富的观察到的协变量,这些协变量可以导致治疗或由治疗引起——随机缺失的延伸(MAR)。我提出了基于核矩阵运算的反事实结果的均值、增量和分布的估计,允许处理和协变量是离散的或连续的,低维、高维或无限维。对于连续处理的情况,我证明了有限样本率的一致性。对于离散处理的情况,我证明了根n一致性、高斯近似和半参数效率。 摘要:I propose kernel ridge regression estimators for nonparametric dose response curves and semiparametric treatment effects in the setting where an analyst has access to a selected sample rather than a random sample; only for select observations, the outcome is observed. I assume selection is as good as random conditional on treatment and a sufficiently rich set of observed covariates, where the covariates are allowed to cause treatment or be caused by treatment -- an extension of missingness-at-random (MAR). I propose estimators of means, increments, and distributions of counterfactual outcomes with closed form solutions in terms of kernel matrix operations, allowing treatment and covariates to be discrete or continuous, and low, high, or infinite dimensional. For the continuous treatment case, I prove uniform consistency with finite sample rates. For the discrete treatment case, I prove root-n consistency, Gaussian approximation, and semiparametric efficiency.

【22】 Towards a Unified Information-Theoretic Framework for Generalization 标题:走向统一的信息论综合框架 链接:https://arxiv.org/abs/2111.05275

作者:Mahdi Haghifam,Gintare Karolina Dziugaite,Shay Moran,Daniel M. Roy 机构:University of Toronto, Vector Institute, Google Research, Mila, Technion 备注:22 Pages, NeurIPS 2021, This submission subsumes [arXiv:2011.02970] ("On the Information Complexity of Proper Learners for VC Classes in the Realizable Case") 摘要:在这项工作中,我们研究了Steinke和Zakynthinou(2020)的“条件互信息”(CMI)框架的表达能力,以及使用它来提供一个统一的框架来证明可实现环境中的泛化边界的前景。我们首先证明,对于从一类有界VC维输出假设的任何学习算法,可以使用该框架来表示非平凡(但次优)边界。我们证明了CMI框架在学习半空间的支持向量机(SVM)的预期风险上产生了最优界。这个结果是我们的一般结果的一个应用,表明大小为$k$的稳定压缩方案Bousquet al.(2020)具有$O(k)$阶的一致有界CMI。我们进一步表明,VC课程正确学习的固有局限性与具有恒定CMI的正确学习者的存在相矛盾,这意味着对Steinke和Zakynthinou(2020)的公开问题的消极解决。我们进一步研究了$H$类经验风险最小化(ERM)的CMI,并表明当且仅当$H$具有有界星号时,才有可能输出具有有界CMI的所有一致分类器(版本空间)(Hanneke和Yang(2015))。此外,我们证明了一个通用的简化,表明“遗漏一项”分析可以通过CMI框架表达。作为推论,我们研究了Haussler等人(1994)提出的单包含图算法的CMI。更一般地说,我们证明了CMI框架的普遍性,即对于每个一致的算法和数据分布,当且仅当其评估的CMI随样本数呈次线性增长时,预期风险随着样本数的发散而消失。 摘要:In this work, we investigate the expressiveness of the "conditional mutual information" (CMI) framework of Steinke and Zakynthinou (2020) and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm that outputs hypotheses from a class of bounded VC dimension. We prove that the CMI framework yields the optimal bound on the expected risk of Support Vector Machines (SVMs) for learning halfspaces. This result is an application of our general result showing that stable compression schemes Bousquet al. (2020) of size $k$ have uniformly bounded CMI of order $O(k)$. We further show that an inherent limitation of proper learning of VC classes contradicts the existence of a proper learner with constant CMI, and it implies a negative resolution to an open problem of Steinke and Zakynthinou (2020). We further study the CMI of empirical risk minimizers (ERMs) of class $H$ and show that it is possible to output all consistent classifiers (version space) with bounded CMI if and only if $H$ has a bounded star number (Hanneke and Yang (2015)). Moreover, we prove a general reduction showing that "leave-one-out" analysis is expressible via the CMI framework. As a corollary we investigate the CMI of the one-inclusion-graph algorithm proposed by Haussler et al. (1994). More generally, we show that the CMI framework is universal in the sense that for every consistent algorithm and data distribution, the expected risk vanishes as the number of samples diverges if and only if its evaluated CMI has sublinear growth with the number of samples.

【23】 A set of R packages to estimate population counts from mobile phone data 标题:一组根据手机数据估算人口数量的R包 链接:https://arxiv.org/abs/2111.05269

作者:Bogdan Oancea,David Salgado,Luis Sanguiao Sande,Sandra Barragan 备注:None 摘要:在本文中,我们描述了在ESSnet大数据II项目期间将手机数据纳入当前官方统计数据生产链的方法框架的软件实现。我们概述了软件堆栈的体系结构、组件、它们之间的接口,并展示了如何使用它们。我们的软件实现包括四个R包:用于估计移动设备空间分布的destim,用于将设备分类为与其所有者1:1或2:1通信的重复数据消除,用于从地理位置概率和重复性概率开始估计网络检测到的个体数量的聚合,以及将前一包提供的个体数量与其他信息(如来自官方登记簿和移动运营商的人口计数)相结合的推断提供目标人口数量估计的渗透率。 摘要:In this paper, we describe the software implementation of the methodological framework designed to incorporate mobile phone data into the current production chain of official statistics during the ESSnet Big Data II project. We present an overview of the architecture of the software stack, its components, the interfaces between them, and show how they can be used. Our software implementation consists in four R packages: destim for estimation of the spatial distribution of the mobile devices, deduplication for classification of the devices as being in 1:1 or 2:1 correspondence with its owner, aggregation for estimation of the number of individuals detected by the network starting from the geolocation probabilities and the duplicity probabilities and inference which combines the number of individuals provided by the previous package with other information like the population counts from an official register and the mobile operator penetration rates to provide an estimation of the target population counts.

【24】 Community detection using low-dimensional network embedding algorithms 标题:基于低维网络嵌入算法的社区发现 链接:https://arxiv.org/abs/2111.05267

作者:Aman Barot,Shankar Bhamidi,Souvik Dhara 摘要:随着大型网络在重要领域的相关性日益增强,如研究接触网络对疾病传播的影响,或研究社会网络对地缘政治的影响,有必要研究可扩展到通常包含数百万节点的超大网络的机器学习工具。这种可伸缩算法的一大类称为网络表示学习或网络嵌入。这些算法试图通过首先运行多个随机游动,然后使用观察到的随机游动段中每对节点的共现次数来学习网络泛函(例如~节点)的表示,以获得某些欧几里德空间上节点的低维表示。本文的目的是严格理解两种主要算法DeepWalk和node2vec在恢复具有地面真实社区的规范网络模型社区方面的性能。根据图的稀疏性,我们找到了所需的随机游走段的长度,以便相应的观察到的共现窗口能够执行基本社区分配的几乎精确恢复。我们证明,与使用简单随机游动的DeepWalk相比,在给定固定共现窗口的情况下,使用低非回溯概率的随机游动的node2vec可以在更稀疏的网络中成功。此外,如果稀疏性参数较低,我们提供的证据表明,这些算法可能无法实现几乎精确的恢复。该分析需要开发具有底层低秩结构的随机网络路径计算的通用工具,这些网络具有独立的兴趣。 摘要:With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learning or network embedding. These algorithms try to learn representations of network functionals (e.g.~nodes) by first running multiple random walks and then using the number of co-occurrences of each pair of nodes in observed random walk segments to obtain a low-dimensional representation of nodes on some Euclidean space. The aim of this paper is to rigorously understand the performance of two major algorithms, DeepWalk and node2vec, in recovering communities for canonical network models with ground truth communities. Depending on the sparsity of the graph, we find the length of the random walk segments required such that the corresponding observed co-occurrence window is able to perform almost exact recovery of the underlying community assignments. We prove that, given some fixed co-occurrence window, node2vec using random walks with a low non-backtracking probability can succeed for much sparser networks compared to DeepWalk using simple random walks. Moreover, if the sparsity parameter is low, we provide evidence that these algorithms might not succeed in almost exact recovery. The analysis requires developing general tools for path counting on random networks having an underlying low-rank structure, which are of independent interest.

【25】 High-order joint embedding for multi-level link prediction 标题:用于多级链接预测的高阶联合嵌入 链接:https://arxiv.org/abs/2111.05265

作者:Yubai Yuan,Annie Qu 机构: Department of Statistics, University of California 备注:35 pages 摘要:链路预测从观测到的网络中推断出潜在链路,是网络分析中的基本问题之一。与传统的只预测双向成对关系的图表示模型不同,我们提出了一种新的基于张量的联合网络嵌入方法,将成对链接和超链接同时编码到潜在空间中,它捕获了在推断潜在的未观察到的超链接时成对链接和多路链接之间的依赖关系。所提出的嵌入方法的主要优点是,它结合了节点之间的成对关系和子组结构,以捕获更丰富的网络信息。此外,该方法还引入了链接之间的层次依赖关系来推断潜在的超链接,从而实现了更好的链接预测。在理论上,我们为所提出的嵌入方法建立了估计一致性,并且与仅利用成对链接或超链接的链接预测相比,提供了更快的收敛速度。对仿真环境和Facebook ego网络的数值研究表明,与现有的链接预测算法相比,该方法提高了超链接和成对链接的预测精度。 摘要:Link prediction infers potential links from observed networks, and is one of the essential problems in network analyses. In contrast to traditional graph representation modeling which only predicts two-way pairwise relations, we propose a novel tensor-based joint network embedding approach on simultaneously encoding pairwise links and hyperlinks onto a latent space, which captures the dependency between pairwise and multi-way links in inferring potential unobserved hyperlinks. The major advantage of the proposed embedding procedure is that it incorporates both the pairwise relationships and subgroup-wise structure among nodes to capture richer network information. In addition, the proposed method introduces a hierarchical dependency among links to infer potential hyperlinks, and leads to better link prediction. In theory we establish the estimation consistency for the proposed embedding approach, and provide a faster convergence rate compared to link prediction utilizing pairwise links or hyperlinks only. Numerical studies on both simulation settings and Facebook ego-networks indicate that the proposed method improves both hyperlink and pairwise link prediction accuracy compared to existing link prediction algorithms.

【26】 Logarithmic Regret from Sublinear Hints 标题:来自次线性暗示的对数遗憾 链接:https://arxiv.org/abs/2111.05257

作者:Aditya Bhaskara,Ashok Cutkosky,Ravi Kumar,Manish Purohit 机构:Department of Computer Science, University of Utah, Salt Lake City, UT, Dept. of Electrical and Computer Engineering, Boston University, Boston, MA, Google Research, Mountain View, CA 摘要:我们考虑在线线性优化问题,在每一步中,算法在单位球中扮演一个点$xyt $,并且遭受损失$Lang-Cyt,Xyt \ Langle $,对于一些代价向量$CYT $,然后被揭示给算法。最近的工作表明,如果一个算法在播放$x_t$之前收到一个与$c_t$具有非平凡相关性的提示$h_t$,那么它可以实现$O(\log t)$的遗憾保证,从而改进标准设置中的$\Theta(\sqrt{t})的界限。在这项工作中,我们研究了一个算法是否真的需要在每一个时间步上都有提示的问题。令人惊讶的是,在自然查询模型下,算法只需$O(\sqrt{T})$提示即可获得$O(\logt)$遗憾;相反,我们还表明$o(\sqrt{T})$提示不能保证比$\Omega(\sqrt{T})$后悔更好。我们给出了我们的结果的两个应用,一个是被充分研究的乐观遗憾界的设置,另一个是有弃权的在线学习问题。 摘要:We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t\rangle$ for some cost vector $c_t$ that is then revealed to the algorithm. Recent work showed that if an algorithm receives a hint $h_t$ that has non-trivial correlation with $c_t$ before it plays $x_t$, then it can achieve a regret guarantee of $O(\log T)$, improving on the bound of $\Theta(\sqrt{T})$ in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at every time step. Somewhat surprisingly, we show that an algorithm can obtain $O(\log T)$ regret with just $O(\sqrt{T})$ hints under a natural query model; in contrast, we also show that $o(\sqrt{T})$ hints cannot guarantee better than $\Omega(\sqrt{T})$ regret. We give two applications of our result, to the well-studied setting of optimistic regret bounds and to the problem of online learning with abstention.

【27】 Bounding Treatment Effects by Pooling Limited Information across Observations 标题:通过汇集不同观察点间的有限信息来限制处理效果 链接:https://arxiv.org/abs/2111.05243

作者:Sokbae Lee,Martin Weidner 机构:‡Department of Economics, Columbia University 摘要:我们提供了在无边界假设下有效的平均治疗效果(治疗组)的新边界。我们的边界设计为在具有挑战性的情况下具有鲁棒性,例如,当条件变量在观察样本中具有大量不同的值时,或者当重叠条件被违反时。这种稳健性是通过仅使用有限的跨观测的信息“汇集”来实现的。也就是说,边界被构造为观察结果函数的样本平均值,因此每个结果的贡献仅取决于有限数量观察的治疗状态。没有跨观察的信息池导致所谓的“Manski界限”,而无限制的信息池导致标准的反向倾向评分权重。我们探索这两个极端之间的中间范围,并提供相应的推理方法。我们通过蒙特卡罗实验和一个经验应用表明,我们的边界在实践中确实是可靠和有用的。 摘要:We provide novel bounds on average treatment effects (on the treated) that are valid under an unconfoundedness assumption. Our bounds are designed to be robust in challenging situations, for example, when the conditioning variables take on a large number of different values in the observed sample, or when the overlap condition is violated. This robustness is achieved by only using limited "pooling" of information across observations. Namely, the bounds are constructed as sample averages over functions of the observed outcomes such that the contribution of each outcome only depends on the treatment status of a limited number of observations. No information pooling across observations leads to so-called "Manski bounds", while unlimited information pooling leads to standard inverse propensity score weighting. We explore the intermediate range between these two extremes and provide corresponding inference methods. We show in Monte Carlo experiments and through an empirical application that our bounds are indeed robust and informative in practice.

【28】 Almost Optimal Universal Lower Bound for Learning Causal DAGs with Atomic Interventions 标题:原子干预学习因果DAG的几乎最优通用下界 链接:https://arxiv.org/abs/2111.05070

作者:Vibhor Porwal,Piyush Srivastava,Gaurav Sinha 机构:com†Tata Institute of Fundamental Research, comPS acknowledges support from the Department of Atomic Energy 摘要:在因果有向无环图(DAG)的结构学习问题中出现的一个研究得很好的挑战是,使用观测数据,只能将图学习到“马尔可夫等价类”(MEC)。剩余的无向边必须使用干预来定向,在应用程序中执行干预可能非常昂贵。因此,尽可能减少MEC完全定位所需的干预数量的问题最近受到了广泛关注,也是这项工作的重点。我们证明了两个主要结果。第一个是任何算法(主动或被动)都需要执行的原子干预数量的新的通用下限,以确定给定MEC的方向。我们的第二个结果表明,事实上,这个界限在能够确定MEC方向的最小原子干涉集大小的两倍以内。我们的下界可以证明比以前已知的下界更好。我们的下界的证明基于CBSP序的新概念,CBSP序是没有v-结构且满足某些特殊性质的DAG的拓扑序。此外,通过对合成图进行模拟,并给出特殊图族的例子,我们证明了我们的界通常明显更好。 摘要:A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG) is that using observational data, one can only learn the graph up to a "Markov equivalence class" (MEC). The remaining undirected edges have to be oriented using interventions, which can be very expensive to perform in applications. Thus, the problem of minimizing the number of interventions needed to fully orient the MEC has received a lot of recent attention, and is also the focus of this work. We prove two main results. The first is a new universal lower bound on the number of atomic interventions that any algorithm (whether active or passive) would need to perform in order to orient a given MEC. Our second result shows that this bound is, in fact, within a factor of two of the size of the smallest set of atomic interventions that can orient the MEC. Our lower bound is provably better than previously known lower bounds. The proof of our lower bound is based on the new notion of CBSP orderings, which are topological orderings of DAGs without v-structures and satisfy certain special properties. Further, using simulations on synthetic graphs and by giving examples of special graph families, we show that our bound is often significantly better.

【29】 Misspecified Gaussian Process Bandit Optimization 标题:误指定高斯过程的Bandit优化 链接:https://arxiv.org/abs/2111.05008

作者:Ilija Bogunovic,Andreas Krause 机构:ETH Zürich 备注:Accepted to NeurIPS 2021 摘要:我们考虑优化的黑盒函数的基础上嘈杂的土匪反馈。核化bandit算法在这个问题上表现出了很强的经验和理论性能。然而,他们严重依赖于模型被很好地指定的假设,如果没有它,可能会失败。相反,我们引入了一个\emph{misspecified}核化bandit设置,其中未知函数可以是$\epsilon$--在某些再生核希尔BERT空间(RKHS)中由一个有界范数的函数一致逼近。我们设计了高效实用的算法,在存在模型错误的情况下,其性能下降最小。具体来说,我们提出了两种基于高斯过程(GP)方法的算法:需要知道错误指定误差的乐观EC-GP-UCB算法和能够适应未知模型错误指定的阶段GP不确定性采样算法。我们根据$\epsilon$、时间范围和底层内核提供了它们的累积遗憾的上界,并且我们证明了我们的算法在没有错误指定的先验知识的情况下实现了对$\epsilon$的最佳依赖。此外,在随机背景下,我们证明了EC-GP-UCB可以有效地与遗憾边界平衡策略相结合,并在不知道$\epsilon$的情况下获得相似的遗憾边界。 摘要:We consider the problem of optimizing a black-box function based on noisy bandit feedback. Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem. They heavily rely on the assumption that the model is well-specified, however, and can fail without it. Instead, we introduce a \emph{misspecified} kernelized bandit setting where the unknown function can be $\epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We design efficient and practical algorithms whose performance degrades minimally in the presence of model misspecification. Specifically, we present two algorithms based on Gaussian process (GP) methods: an optimistic EC-GP-UCB algorithm that requires knowing the misspecification error, and Phased GP Uncertainty Sampling, an elimination-type algorithm that can adapt to unknown model misspecification. We provide upper bounds on their cumulative regret in terms of $\epsilon$, the time horizon, and the underlying kernel, and we show that our algorithm achieves optimal dependence on $\epsilon$ with no prior knowledge of misspecification. In addition, in a stochastic contextual setting, we show that EC-GP-UCB can be effectively combined with the regret bound balancing strategy and attain similar regret bounds despite not knowing $\epsilon$.

【30】 On Representation Knowledge Distillation for Graph Neural Networks 标题:关于图神经网络的表示知识提取 链接:https://arxiv.org/abs/2111.04964

作者:Chaitanya K. Joshi,Fayao Liu,Xu Xun,Jie Lin,Chuan-Sheng Foo 机构: All authors are with Institute for InfocommResearch 摘要:知识提取是一种很有前途的学习范式,它可以使用更具表现力但更繁琐的教师模型来提高资源效率图神经网络(GNN)的性能和可靠性。过去关于GNNs蒸馏的工作提出了局部结构保留损失(LSP),它匹配学生和教师节点嵌入空间中的局部结构关系。在本文中,我们做出了两个关键贡献:从方法论的角度,我们研究了保留教师如何嵌入图形数据的全局拓扑是否是GNNs更有效的蒸馏目标,因为现实世界的图形通常包含潜在交互和噪声边。预定义边上的纯局部LSP目标无法实现这一点,因为它忽略了断开连接的节点之间的关系。我们提出了两种更好地保持全局拓扑结构的新方法:(1)全局结构保持损失(GSP),它将LSP扩展到包含所有成对相互作用;(2)图形对比表征蒸馏(G-CRD),它使用对比学习将学生节点嵌入到共享表征空间中的教师节点嵌入对齐。从实验的角度来看,我们在大规模真实数据集上引入了一组扩展的基准测试,其中教师和学生GNN之间的性能差距不容忽视。我们认为这对于测试知识提炼的有效性和稳健性至关重要,但LSP研究中缺少这一点,该研究使用的是性能差距很小的合成数据集。在4个数据集和14个异构GNN体系结构上的实验表明,G-CRD持续提高了轻量级GNN模型的性能和鲁棒性,优于结构保持方法、LSP和GSP以及基于2D计算机视觉的基线。 摘要:Knowledge distillation is a promising learning paradigm for boosting the performance and reliability of resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships across the student and teacher's node embedding spaces. In this paper, we make two key contributions: From a methodological perspective, we study whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. The purely local LSP objective over pre-defined edges is unable to achieve this as it ignores relationships among disconnected nodes. We propose two new approaches which better preserve global topology: (1) Global Structure Preserving loss (GSP), which extends LSP to incorporate all pairwise interactions; and (2) Graph Contrastive Representation Distillation (G-CRD), which uses contrastive learning to align the student node embeddings to those of the teacher in a shared representation space. From an experimental perspective, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. We believe this is critical for testing the efficacy and robustness of knowledge distillation, but was missing from the LSP study which used synthetic datasets with trivial performance gaps. Experiments across 4 datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNN models, outperforming the structure preserving approaches, LSP and GSP, as well as baselines adapted from 2D computer vision.

【31】 American Hate Crime Trends Prediction with Event Extraction 标题:基于事件提取的美国仇恨犯罪趋势预测 链接:https://arxiv.org/abs/2111.04951

作者:Songqiao Han,Hailiang Huang,Jiangwei Liu,Shengsheng Xiao 机构:School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai , China 备注:12 pages, 5 figures, 4 tables 摘要:社交媒体平台可能为包含仇恨言论的话语提供潜在的空间,甚至更糟的是,可以作为仇恨犯罪的传播机制。FBI的统一犯罪报告(UCR)计划收集仇恨犯罪数据,并每年发布统计报告。这些统计数据为确定国家仇恨犯罪趋势提供了信息。这些统计数据还可以为执法机构提供有价值的整体和战略见解,或为立法者制定具体立法提供依据。然而,这些报告大多是在明年发布的,落后于许多迫切需要。最近的研究主要集中在社交媒体文本中仇恨言语的检测或对已确认犯罪的影响的实证研究。本文提出了一个框架,首先利用文本挖掘技术从《纽约时报》新闻中提取仇恨犯罪事件,然后利用结果预测美国国家和州层面的仇恨犯罪趋势。实验结果表明,与没有事件相关因素的时间序列或回归方法相比,我们的方法可以显著提高预测性能。我们的框架拓宽了国家层面和州层面仇恨犯罪趋势预测的方法。 摘要:Social media platforms may provide potential space for discourses that contain hate speech, and even worse, can act as a propagation mechanism for hate crimes. The FBI's Uniform Crime Reporting (UCR) Program collects hate crime data and releases statistic report yearly. These statistics provide information in determining national hate crime trends. The statistics can also provide valuable holistic and strategic insight for law enforcement agencies or justify lawmakers for specific legislation. However, the reports are mostly released next year and lag behind many immediate needs. Recent research mainly focuses on hate speech detection in social media text or empirical studies on the impact of a confirmed crime. This paper proposes a framework that first utilizes text mining techniques to extract hate crime events from New York Times news, then uses the results to facilitate predicting American national-level and state-level hate crime trends. Experimental results show that our method can significantly enhance the prediction performance compared with time series or regression methods without event-related factors. Our framework broadens the methods of national-level and state-level hate crime trends prediction.

【32】 Optimizing Bayesian acquisition functions in Gaussian Processes 标题:高斯过程中贝叶斯捕获函数的优化 链接:https://arxiv.org/abs/2111.04930

作者:Ashish Anil Pawar,Ujwal Warbhe 机构:College of Engineering, Pune, National Institute of Technology, Srinagar 备注:9 Pages, 12 Figures, 1 Table, 10 Equations 摘要:贝叶斯优化是一种搜索目标函数全局极大值的有效方法,特别是在函数未知的情况下。该过程包括使用代理函数和选择采集函数,然后优化采集函数以找到下一个采样点。本文分析了不同的采集函数,如最大改进概率和预期改进,以及各种优化器,如L-BFGS和TNC,以优化采集函数以找到下一个采样点。通过对所用时间的分析,本文还说明了初始样本位置的重要性。 摘要:Bayesian Optimization is an effective method for searching the global maxima of an objective function especially if the function is unknown. The process comprises of using a surrogate function and choosing an acquisition function followed by optimizing the acquisition function to find the next sampling point. This paper analyzes different acquistion functions like Maximum Probability of Improvement and Expected Improvement and various optimizers like L-BFGS and TNC to optimize the acquisitions functions for finding the next sampling point. Along with the analysis of time taken, the paper also shows the importance of position of initial samples chosen.

【33】 Optimal Decision Rules Under Partial Identification 标题:部分辨识下的最优决策规则 链接:https://arxiv.org/abs/2111.04926

作者:Kohei Yata 摘要:我考虑一类统计决策问题,其中决策者必须在两个备选政策之间做出决定,以基于有限样本最大化社会福利(例如,结果的人口平均值)。中心假设是,潜在的,可能是无限维的参数,位于一个已知的凸集,可能导致福利效应的部分识别。这种限制的一个例子是反事实结果函数的平滑性。作为主要的理论结果,我得到了一个有限样本决策规则(即,将数据映射到决策的函数),该规则在minimax后悔准则下是最优的。该规则易于计算,但在所有决策规则中实现了最优;决策规则类别不受特别限制。我将我的结果应用于是否在回归不连续设置中更改保单资格截止的问题。我在布基纳法索光明学校建设项目(Kazianga、Levy、Linden和Sloan,2013年)的实证应用中阐述了我的方法,在该项目中,根据村庄的特征计算出的分数选择村庄接收学校。在对反事实结果函数的平滑度进行合理限制的情况下,最优决策规则意味着扩展程序不具有成本效益。我实证比较了最优决策规则和备选决策规则的性能。 摘要:I consider a class of statistical decision problems in which the policy maker must decide between two alternative policies to maximize social welfare (e.g., the population mean of an outcome) based on a finite sample. The central assumption is that the underlying, possibly infinite-dimensional parameter, lies in a known convex set, potentially leading to partial identification of the welfare effect. An example of such restrictions is the smoothness of counterfactual outcome functions. As the main theoretical result, I obtain a finite-sample decision rule (i.e., a function that maps data to a decision) that is optimal under the minimax regret criterion. This rule is easy to compute, yet achieves optimality among all decision rules; no ad hoc restrictions are imposed on the class of decision rules. I apply my results to the problem of whether to change a policy eligibility cutoff in a regression discontinuity setup. I illustrate my approach in an empirical application to the BRIGHT school construction program in Burkina Faso (Kazianga, Levy, Linden and Sloan, 2013), where villages were selected to receive schools based on scores computed from their characteristics. Under reasonable restrictions on the smoothness of the counterfactual outcome function, the optimal decision rule implies that it is not cost-effective to expand the program. I empirically compare the performance of the optimal decision rule with alternative decision rules.

【34】 Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers 标题:在可实现的环境中实践的、可证明正确的互动学习:真正信徒的力量 链接:https://arxiv.org/abs/2111.04915

作者:Julian Katz-Samuels,Blake Mason,Kevin Jamieson,Rob Nowak 机构:University of Wisconsin, Madison, Rice University, University of Washington, Robert Nowak 摘要:我们考虑交互式学习在可实现的设置,并开发了一个通用的框架来处理问题,从最佳手臂识别到主动分类。我们从观察到不可知算法在可实现的环境中不可能是极小极大最优的开始我们的研究。因此,我们为可实现设置设计了新的计算效率高的算法,该算法匹配对数因子的极小极大下界,并且是通用的,可容纳多种函数类,包括核方法、H{“o}lder平滑函数和凸函数。我们的算法的样本复杂性可以用众所周知的量(如扩展教学维和haystack维)来量化。但是,与直接基于这些组合量的算法不同,我们的算法在计算效率上是高的。为了实现计算效率为了提高效率,我们的算法使用蒙特卡罗“点击并运行”从版本空间采样“算法,而不是显式维护版本空间。我们的方法有两个关键优势。首先,它很简单,由两个统一的贪婪算法组成。其次,我们的算法能够无缝地利用在实践中经常可用和有用的先验知识。除了我们新的理论结果外,我们还从经验上证明了我们的算法与高斯过程UCB方法具有竞争力。 摘要:We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification. We begin our investigation with the observation that agnostic algorithms \emph{cannot} be minimax-optimal in the realizable setting. Hence, we design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors and are general-purpose, accommodating a wide variety of function classes including kernel methods, H{\"o}lder smooth functions, and convex functions. The sample complexities of our algorithms can be quantified in terms of well-known quantities like the extended teaching dimension and haystack dimension. However, unlike algorithms based directly on those combinatorial quantities, our algorithms are computationally efficient. To achieve computational efficiency, our algorithms sample from the version space using Monte Carlo "hit-and-run" algorithms instead of maintaining the version space explicitly. Our approach has two key strengths. First, it is simple, consisting of two unifying, greedy algorithms. Second, our algorithms have the capability to seamlessly leverage prior knowledge that is often available and useful in practice. In addition to our new theoretical results, we demonstrate empirically that our algorithms are competitive with Gaussian process UCB methods.

【35】 An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit 标题:多人多臂协同劫匪的实例相关分析 链接:https://arxiv.org/abs/2111.04873

作者:Aldo Pacchiano,Peter Bartlett,Michael I. Jordan 机构:Microsoft Research, NYC, University of California, Berkeley 备注:44 pages 摘要:研究了多玩家多武装匪徒的信息共享与合作问题。我们提出了第一个算法,实现对数遗憾这个问题。我们的结果基于两个创新。首先,我们证明了对连续淘汰策略的一个简单修改可以用于允许玩家在没有碰撞的情况下估计其次优差距,直到常数因子。其次,我们利用第一个结果设计了一个通信协议,该协议成功地利用碰撞的小回报在参与者之间进行协调,同时保留有意义的依赖于实例的对数遗憾保证。 摘要:We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem. Our results are based on two innovations. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps, up to constant factors, in the absence of collisions. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players, while preserving meaningful instance-dependent logarithmic regret guarantees.

【36】 Efficient estimates of optimal transport via low-dimensional embeddings 标题:基于低维嵌入的最优传输的有效估计 链接:https://arxiv.org/abs/2111.04838

作者:Patric M. Fulop,Vincent Danos 机构: PatricSchool of InformaticsUniversity of EdinburghEH8 9AB 备注:Neurips 2021 Optimal Transport and Machine Learning Workshop 摘要:最优传输距离(OT)作为比较概率分布的一种方法,在机器学习领域得到了广泛的应用。当数据生活在高维环境中时,计算这些数据的成本很高。Paty等人于2019年开展的最新工作旨在通过使用数据的低阶预测(被视为离散度量)计算OT来降低成本。我们扩展了这一方法,并表明,如果地图族是1-Lipschitz,则可以使用更一般的地图族来近似OT距离。最佳估计是通过在给定的族上最大化OT来获得的。由于OT计算是在将数据映射到低维空间后进行的,因此我们的方法可以很好地与原始数据维进行缩放。我们用神经网络来演示这个想法。 摘要:Optimal transport distances (OT) have been widely used in recent work in Machine Learning as ways to compare probability distributions. These are costly to compute when the data lives in high dimension. Recent work by Paty et al., 2019, aims specifically at reducing this cost by computing OT using low-rank projections of the data (seen as discrete measures). We extend this approach and show that one can approximate OT distances by using more general families of maps provided they are 1-Lipschitz. The best estimate is obtained by maximising OT over the given family. As OT calculations are done after mapping data to a lower dimensional space, our method scales well with the original data dimension. We demonstrate the idea with neural networks.

【37】 Explaining Hyperparameter Optimization via Partial Dependence Plots 标题:用部分相关图解释超参数优化 链接:https://arxiv.org/abs/2111.04820

作者:Julia Moosbauer,Julia Herbinger,Giuseppe Casalicchio,Marius Lindauer,Bernd Bischl 机构:Department of Statistics, Ludwig-Maximilians-University Munich, Munich, Germany, Institute of Information Processing, Leibniz University Hannover, Hannover, Germany 摘要:自动超参数优化(HPO)可以支持从业者在机器学习模型中获得最佳性能。然而,对于不同超参数对最终模型性能的影响,通常缺乏有价值的见解。由于缺乏可解释性,因此很难信任和理解自动化HPO过程及其结果。我们建议使用可解释机器学习(IML)来从使用贝叶斯优化(BO)的HPO期间获得的实验数据中获得见解。BO倾向于关注具有潜在高性能配置的有前途的区域,因此会导致采样偏差。因此,许多IML技术,如部分依赖图(PDP),具有产生有偏解释的风险。通过利用BO替代模型的后验不确定性,我们引入了一种具有估计置信带的PDP变体。我们建议对超参数空间进行划分,以在相关子区域获得更可靠的PDP。在一项实验研究中,我们为亚区域内PDP质量的提高提供了定量证据。 摘要:Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of explainability makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO with Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, such as the partial dependence plot (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. We propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions.

【38】 The World in a Grain of Sand: Condensing the String Vacuum Degeneracy 标题:一粒沙子里的世界:凝结着弦的真空简并 链接:https://arxiv.org/abs/2111.04761

作者:Yang-Hui He,Shailesh Lal,M. Zaid Zaz 机构:London Institute, Royal Institution of GB, Albemarle St., London W,S ,BS, Merton College, University of Oxford, OX,JD, UK, Department of Mathematics, City, University of London, London EC,V,HB, UK, School of Physics, NanKai University, Tianjin, P.R. China∗ 备注:12 pages, 15 figures, 3 tables 摘要:我们提出了一种新的方法来解决弦景观的真空简并问题,通过在紧凑化场景中找到一个有效的相似性度量。以一类约一百万个Calabi-Yau流形为具体示例,Few-Shot机器学习和暹罗神经网络的范例将它们表示为R(3)中的点,其中两个流形之间的相似性分数是它们的R(3)代表之间的欧几里德距离。使用这些方法,我们可以通过仅在几百个数据点上进行训练,将极为罕见的流形的搜索空间压缩到原始数据的1%以内。我们还演示了如何应用这些方法来描述真空代表的“典型性”。 摘要:We propose a novel approach toward the vacuum degeneracy problem of the string landscape, by finding an efficient measure of similarity amongst compactification scenarios. Using a class of some one million Calabi-Yau manifolds as concrete examples, the paradigm of few-shot machine-learning and Siamese Neural Networks represents them as points in R(3) where the similarity score between two manifolds is the Euclidean distance between their R(3) representatives. Using these methods, we can compress the search space for exceedingly rare manifolds to within one percent of the original data by training on only a few hundred data points. We also demonstrate how these methods may be applied to characterize `typicality' for vacuum representatives.

【39】 Realizable Learning is All You Need 标题:可实现的学习是您所需要的全部 链接:https://arxiv.org/abs/2111.04746

作者:Max Hopkins,Daniel Kane,Shachar Lovett,Gaurav Mahajan 摘要:可实现性与不可知可学习性的等价性是学习理论中的一个基本现象。从经典环境(如PAC学习和回归)到最近的趋势(如对抗性稳健和私人学习),我们仍然缺乏统一的理论,这是令人惊讶的;传统的等价性证明往往是完全不同的,并且依赖于强大的特定于模型的假设,如一致收敛和样本压缩。在这项工作中,我们给出了第一个独立于模型的框架,解释了可实现性和不可知可学习性的等价性:一个三行黑盒约简,它简化、统一并扩展了我们对各种设置的理解。这包括不具有已知可学习性特征的模型,如具有任意分布假设或一般损失的学习,以及大量其他流行设置,如稳健学习、部分学习、公平学习和统计查询模型。更一般地说,我们认为可实现学习和不可知学习的等价性实际上是一种更广泛现象的特例,我们称之为属性泛化:学习算法的任何期望属性(例如,噪声容忍度、隐私性、稳定性)都可以在有限的假设类上得到满足(可能存在某种变化)任何可以学习的假设类。 摘要:The equivalence of realizable and agnostic learnability is a fundamental phenomenon in learning theory. With variants ranging from classical settings like PAC learning and regression to recent trends such as adversarially robust and private learning, it's surprising that we still lack a unified theory; traditional proofs of the equivalence tend to be disparate, and rely on strong model-specific assumptions like uniform convergence and sample compression. In this work, we give the first model-independent framework explaining the equivalence of realizable and agnostic learnability: a three-line blackbox reduction that simplifies, unifies, and extends our understanding across a wide variety of settings. This includes models with no known characterization of learnability such as learning with arbitrary distributional assumptions or general loss, as well as a host of other popular settings such as robust learning, partial learning, fair learning, and the statistical query model. More generally, we argue that the equivalence of realizable and agnostic learning is actually a special case of a broader phenomenon we call property generalization: any desirable property of a learning algorithm (e.g.\ noise tolerance, privacy, stability) that can be satisfied over finite hypothesis classes extends (possibly in some variation) to any learnable hypothesis class.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-11-10,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
TI-ONE 训练平台
TI-ONE 训练平台(以下简称TI-ONE)是为 AI 工程师打造的一站式机器学习平台,为用户提供从数据接入、模型训练、模型管理到模型服务的全流程开发支持。TI-ONE 支持多种训练方式和算法框架,满足不同 AI 应用场景的需求。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档