统计学学术速递[12.21]

公众号-arXiv每日学术速递

发布于 2021-12-24 08:51:01

1.2K0

发布于 2021-12-24 08:51:01

stat统计学，共计70篇

【1】 Hypothesis testing and confidence sets: why Bayesian not frequentist, and how to set a prior with a regulatory authority 标题：假设检验和置信集：为什么贝叶斯不是常客，以及如何设置监管机构的优先事项链接：https://arxiv.org/abs/2112.10685

作者：Roger Sewell 机构：Acknowledgements: I am grateful to: Lalita Ramakrishan for eliciting the comments which provoked, me to start writing; Pawe�l Piwek for constructive criticism and inspiring sections ,., and ,.,; Filip 备注：121 pages, 59 figures, 11 tables 摘要：我们分析了偏好贝叶斯假设检验和置信集而非频繁检验的论点。我们定义了推理问题的可容许解，并指出贝叶斯解是可容许的。我们给出了六个较弱的推理问题解的常识标准，这些方法都失败了，但任何可容许的方法都满足。我们注意到，伪贝叶斯方法通过限制贝叶斯方法来满足I型错误率的标准，使得它们在本质上不是贝叶斯方法。我们给出了四个例子来说明贝叶斯方法和频率分析方法之间的差异；第一个是那些没有微积分的人可以接触到的，第二个是用抽象的方式戏剧性地说明这些频繁使用的方法的错误，第三个是表明同样的问题出现在日常统计问题中，尽管程度较低，第四部分说明在一些实际推理问题上，贝叶斯方法所需的数据比固定样本量（分别为伪贝叶斯）频繁假设检验所需的数据要少，因子超过3000（分别为300），而不依赖于信息先验。为了解决利益对立的各方就事先协议达成一致的问题，我们说明了贝叶斯“让数据决定”政策对各种条件下的结果以及通过同意达成共同事先协议的动机的有益影响。我们证明了一般情况下，频度置信水平包含的香农信息少于贝叶斯后验，并给出了一个示例，其中没有确定的频度临界区域提供任何相关信息，即使贝叶斯后验包含的信息达到最大可能量。相比之下，使用贝叶斯先验可以构造非确定性的临界区域，贝叶斯后验可以从频度置信度中恢复。摘要：We marshall the arguments for preferring Bayesian hypothesis testing and confidence sets to frequentist ones. We define admissible solutions to inference problems, noting that Bayesian solutions are admissible. We give six weaker common-sense criteria for solutions to inference problems, all failed by these frequentist methods but satisfied by any admissible method. We note that pseudo-Bayesian methods made by handicapping Bayesian methods to satisfy criteria on type I error rate makes them frequentist not Bayesian in nature. We give four examples showing the differences between Bayesian and frequentist methods; the first to be accessible to those with no calculus, the second to illustrate dramatically in abstract what is wrong with these frequentist methods, the third to show that the same problems arise, albeit to a lesser extent, in everyday statistical problems, and the fourth to illustrate how on some real-life inference problems Bayesian methods require less data than fixed sample-size (resp. pseudo-Bayesian) frequentist hypothesis testing by factors exceeding 3000 (resp 300) without recourse to informative priors. To address the issue of different parties with opposing interests reaching agreement on a prior, we illustrate the beneficial effects of a Bayesian "Let the data decide" policy both on results under a wide variety of conditions and on motivation to reach a common prior by consent. We show that in general the frequentist confidence level contains less relevant Shannon information than the Bayesian posterior, and give an example where no deterministic frequentist critical regions give any relevant information even though the Bayesian posterior contains up to the maximum possible amount. In contrast use of the Bayesian prior allows construction of non-deterministic critical regions for which the Bayesian posterior can be recovered from the frequentist confidence.

【2】 Trends in hospitalised mortality risk and lengths of stay during the first, second and current waves of COVID-19 in England: a cohort study 标题：英国冠状病毒第一波、第二波和当前波期间住院死亡风险和住院时间的趋势：一项队列研究链接：https://arxiv.org/abs/2112.10661

作者：Peter Kirwan,Andre Charlett,Paul Birrell,Suzanne Elgohari,Russell Hope,Sema Mandal,Daniela De Angelis,Anne Presanis 机构：Affiliations:, . Medical Research Council Biostatistics Unit, School of Clinical Medicine, University of, Cambridge, Cambridge, UK, . UK Health Security Agency, London, UK, Joint last authors, Way, Cambridge, CB,SR 摘要：接种疫苗已经改变了COVID-19感染的景观，极大地改变了症状的呈现和减少感染的发病率。我们估计了每月趋势以及疫苗接种对住院死亡率的影响，控制了基线人口统计数据和医院负荷。我们将竞争风险方法应用于英国COVID-19住院患者的综合公共卫生监测数据。在COVID-19住院的259727个人中，51948（20%）在医院经历了死亡，其余的在2021年9月底出院或住院。住院死亡率为40.3%（95%可信区间39.4，41.3%），在2020年3月入院至2021年6月的8.1%（7.2，9%）的低。老年患者和有多种合并症的患者更有可能死于医院（85岁及以上的患者为46.5%，15-24岁的患者为0.5%；基线检查时无合并症的患者为6.3%，Charleson合并症指数为5或以上的患者为43.0%），或者出院前住院时间更长（85岁以上老人的平均住院时间为5.1-10.4天，15-24岁老人的平均住院时间为0.9-2.4天）。与未接种疫苗的参考类别相比，第一次接种疫苗的患者入院后死亡率的危险比为0.72（0.67,0.77），第二次接种疫苗的患者入院后死亡率的危险比为0.58（0.54,0.62）。在英国，COVID-19住院患者的预后在整个大流行期间基本上不同，并且与入院时的年龄、性别、剥夺、基线共病和医院负荷混淆。在控制其他因素后，与未接种疫苗的患者相比，单次和双次接种的患者的结局显著改善。摘要：The introduction of vaccination has changed the landscape for COVID-19 infection, vastly altering the presentation of symptoms and reducing morbidity of infection. We estimate monthly trends and the impact of vaccination upon hospitalised mortality, controlling for baseline demographics and hospital load. We apply competing risks methods to comprehensive public health surveillance data on patients hospitalised with COVID-19 in England. Among a total of 259,727 individuals hospitalised with COVID-19, 51,948 (20.0%) experienced mortality in hospital, with the remainder being discharged or remaining in hospital by end of September 2021. Hospitalised fatality risk ranged from a high of 40.3% (95% confidence interval 39.4, 41.3%) among those admitted in March 2020 to a low of 8.1% (7.2, 9.0%) in June 2021. Older patients and those with multiple co-morbidities were more likely to die in hospital (46.5% for those aged 85 and over vs. 0.5% for those aged 15-24, and 6.3% for those with no comorbidity at baseline vs. 43.0% for those with a Charleson comorbidity index of 5 or above) or else experienced longer stays prior to discharge (median stays of between 5.1-10.4 days for those aged 85+ vs. 0.9-2.4 days for those aged 15-24). The hazard ratio for mortality following hospital admission was 0.72 (0.67, 0.77) among those admitted with a first vaccine dose, and 0.58 (0.54, 0.62) with a second vaccine dose, compared to a reference category of unvaccinated. The prognosis for patients hospitalised with COVID-19 in England has varied substantially throughout the pandemic and is confounded with age, sex, deprivation, baseline comorbidity and hospital load at admission. After controlling for other factors, outcomes for single and double vaccinated patients were significantly improved compared to unvaccinated patients.

【3】 Robust Functional ANOVA with Application to Additive Manufacturing 标题：稳健函数方差分析及其在添加剂制造中的应用链接：https://arxiv.org/abs/2112.10643

作者：Fabio Centofanti,Bianca Maria Colosimo,Marco Luigi Grasso,Alessandra Menafoglio,Biagio Palumbo,Simone Vantini 机构：Department of Industrial Engineering, University of Naples Federico II, Naples, Italy, Department of Mechanical Engineering, Politecnico di Milano, Milan, Italy, Department of Mathematics, Politecnico di Milano, Milan, Italy 摘要：数据采集系统的开发有助于收集易于建模为功能数据的数据。在某些应用中，人们的兴趣在于识别由不同实验条件定义的组功能平均值的显著差异，这被称为功能方差分析（FANOVA）。就实际数据而言，研究中的样本通常会受到一些异常值的污染，这会严重影响分析结果。在本文中，我们提出了一种新的鲁棒非参数函数ANOVA方法（RoFANOVA），该方法降低了分析结果中外围函数数据的权重。它是通过一个基于测试统计的置换测试实现的，该测试统计是通过经典稳健$M$-估计的函数扩展获得的。通过广泛的蒙特卡罗模拟研究，在单向和双向设计中，将所提出的试验与文献中已经提出的一些备选方案进行比较。RoFANOVA的性能在添加剂制造领域的激励性实际案例研究框架中得到了证明，该研究涉及飞溅喷射的分析。RoFANOVA方法在R软件包RoFANOVA中实现，可在线访问https://github.com/unina-sfere/rofanova. 摘要：The development of data acquisition systems is facilitating the collection of data that are apt to be modelled as functional data. In some applications, the interest lies in the identification of significant differences in group functional means defined by varying experimental conditions, which is known as functional analysis of variance (FANOVA). With real data, it is common that the sample under study is contaminated by some outliers, which can strongly bias the analysis. In this paper, we propose a new robust nonparametric functional ANOVA method (RoFANOVA) that reduces the weights of outlying functional data on the results of the analysis. It is implemented through a permutation test based on a test statistic obtained via a functional extension of the classical robust $ M $-estimator. By means of an extensive Monte Carlo simulation study, the proposed test is compared with some alternatives already presented in the literature, in both one-way and two-way designs. The performance of the RoFANOVA is demonstrated in the framework of a motivating real-case study in the field of additive manufacturing that deals with the analysis of spatter ejections. The RoFANOVA method is implemented in the R package rofanova, available online at https://github.com/unina-sfere/rofanova.

【4】 Online control of the False Discovery Rate in group-sequential platform trials 标题：成组序贯平台试验中错误发现率的在线控制链接：https://arxiv.org/abs/2112.10619

作者：Sonja Zehetmayer,Martin Posch,Franz Koenig 机构：Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of, Vienna, Austria 备注：17 pages, 7 figures, 3 tables 摘要：当测试多个假设时，即使在探索性试验中也应控制适当的错误率。控制错误发现率（FDR）的传统方法假设所有p值在测试决策的时间点可用。然而，在平台试验中，治疗组在试验进行期间随时进出试验。因此，治疗和假设检验的数量不是预先确定的，假设也不是一次检验，而是按顺序检验。最近，针对这种设置，引入了FDR在线控制的概念。我们研究了LOND程序，以控制平台试验中的在线FDR，并提出了一个扩展，以允许进行中期分析，并选择早期停止，以便对个别假设有效或无效。功率敏感地取决于效应大小的先验分布，例如，真实备选方案是否随时间均匀分布。我们考虑设计参数的朗德过程，以最大限度地提高整体功率，并比较OBrien Fleming组序贯设计与PoCKK方法。最后，我们研究了包括并发和非并发控制数据对错误率的影响。摘要：When testing multiple hypotheses, a suitable error rate should be controlled even in exploratory trials. Conventional methods to control the False Discovery Rate (FDR) assume that all p-values are available at the time point of test decision. In platform trials, however, treatment arms enter and leave the trial at any time during its conduct. Therefore, the number of treatments and hypothesis tests is not fixed in advance and hypotheses are not tested at once, but sequentially. Recently, for such a setting the concept of online control of the FDR was introduced. We investigate the LOND procedure to control the online FDR in platform trials and propose an extension to allow for interim analyses with the option of early stopping for efficacy or futility for individual hypotheses. The power depends sensitively on the prior distribution of effect sizes, e.g., whether true alternatives are uniformly distributed over time or not. We consider the choice of design parameters for the LOND procedure to maximize the overall power and compare the OBrien-Fleming group-sequential design with the Pocock approach. Finally we investigate the impact on error rates by including both concurrent and non-concurrent control data.

【5】 PyChEst: a Python package for the consistent retrospective estimation of distributional changes in piece-wise stationary time series 标题：PyChEst：分段平稳时间序列分布变化一致性回溯估计的Python软件包链接：https://arxiv.org/abs/2112.10565

作者：Azadeh Khaleghi,Lukas Zierahn 机构：Mathematics & Statistics, Lancaster University, Computer Science, Università degli Studi di Milano 摘要：我们介绍PyChEst，一个Python包，它提供了同时估计分段平稳时间序列分布中多个变化点的工具。当样本由未知分段平稳过程生成时，所实现的非参数算法在一般框架下是一致的。在此设置中，样本可能具有任意形式的长期相关性，并且在变更点前后任何（未知）固定大小的有限维边缘可能相同。软件包中包含的算法的优势在于，它们能够始终如一地检测变化，而不会对基础过程分布施加任何超出平稳性的假设。我们通过将包的性能与为样本独立且相同分布的设置而设计的最先进模型进行比较来说明这一区别特征。摘要：We introduce PyChEst, a Python package which provides tools for the simultaneous estimation of multiple changepoints in the distribution of piece-wise stationary time series. The nonparametric algorithms implemented are provably consistent in a general framework: when the samples are generated by unknown piece-wise stationary processes. In this setting, samples may have long-range dependencies of arbitrary form and the finite-dimensional marginals of any (unknown) fixed size before and after the changepoints may be the same. The strength of the algorithms included in the package is in their ability to consistently detect the changes without imposing any assumptions beyond stationarity on the underlying process distributions. We illustrate this distinguishing feature by comparing the performance of the package against state-of-the-art models designed for a setting where the samples are independently and identically distributed.

【6】 Covariate adjustment in multi-armed, possibly factorial experiments 标题：多臂可能析因实验中的协变量调整链接：https://arxiv.org/abs/2112.10557

作者：Anqi Zhao,Peng Ding 机构： Department of Statistics and Data Science, National University of Singapore, University of California 摘要：随机实验是因果推断的黄金标准，可以证明在不同治疗组之间进行简单比较是合理的。回归调整提供了一种方便的方法来合并协变量信息以提高效率。本文提供了一个统一的帐户，它的效用，以提高估计效率的多武器实验。我们从回归调整常用的加性和完全相互作用模型开始，并阐明由此产生的普通最小二乘（OLS）估计器在有限样本性能和渐近效率方面估计平均治疗效果之间的权衡。然后，我们转向基于限制最小二乘法（RLS）的回归调整，并首次建立其属性，用于从基于设计的角度推断平均治疗效果。由此产生的推论具有多重保证。首先，当正确指定限制时，它是渐近有效的。第二，只要正确指定了对治疗指标系数的限制（如有），并将其与治疗协变量相互作用系数的限制分开，则其保持一致。第三，即使限制被适度地错误指定，它也可以比它的无限制对应物具有更好的有限样本性能。因此，我们建议在多臂实验中进行协变量调整，当在多个协变量、多个处理、但样本量适中的情况下，完全交互回归的OLS拟合存在较大的有限样本变异性风险。此外，所提出的RLS理论也为研究基于OLS的一般回归规范推理提供了一个强有力的工具。作为一个例子，我们通过理论和模拟证明了它对于在析因实验中研究基于OLS的回归调整的独特价值。摘要：Randomized experiments are the gold standard for causal inference, and justify simple comparisons across treatment groups. Regression adjustment provides a convenient way to incorporate covariate information for additional efficiency. This article provides a unified account of its utility for improving estimation efficiency in multi-armed experiments. We start with the commonly used additive and fully interacted models for regression adjustment, and clarify the trade-offs between the resulting ordinary least-squares (OLS) estimators for estimating average treatment effects in terms of finite-sample performance and asymptotic efficiency. We then move on to regression adjustment based on restricted least squares (RLS), and establish for the first time its properties for inferring average treatment effects from the design-based perspective. The resulting inference has multiple guarantees. First, it is asymptotically efficient when the restriction is correctly specified. Second, it remains consistent as long as the restriction on the coefficients of the treatment indicators, if any, is correctly specified and separate from that on the coefficients of the treatment-covariates interactions. Third, it can have better finite-sample performance than its unrestricted counterpart even if the restriction is moderately misspecified. It is thus our recommendation for covariate adjustment in multi-armed experiments when the OLS fit of the fully interacted regression risks large finite-sample variability in case of many covariates, many treatments, yet a moderate sample size. In addition, the proposed theory of RLS also provides a powerful tool for studying OLS-based inference from general regression specifications. As an illustration, we demonstrate its unique value for studying OLS-based regression adjustment in factorial experiments via both theory and simulation.

【7】 Relational hyperevent models for polyadic interaction networks 标题：多维交互网络的关系型超事件模型链接：https://arxiv.org/abs/2112.10552

作者：Jürgen Lerner,Alessandro Lomi 机构：University of the Italian Switzerland, Lugano, CH., (preprint of a submitted manuscript) 摘要：当一个发送者同时向多个接收者发送地址时，就会发生多元（一对多）社会互动。目前可用的关系事件模型（REM）不适合分析多元交互网络，因为它们将接收器集的事件率指定为与发送方和一个接收器相关联的二元协变量的函数。关系超事件模型（RHEM）通过将事件率指定为与发送方和整个接收方相关联的超边缘协变量的函数来缓解这一问题。在这篇文章中，我们展示了RHEMs对分析多元社会互动的潜在益处。我们定义并实施了REMs不可用但可能纳入RHEM经验规范的实际相关效应。在对典型安然电子邮件数据的重新分析中，我们说明了RHEMs如何有效地（i）揭示经验数据中多元依赖性的证据，（ii）改善REMs可比二元规范的拟合，以及（iii）更好地确定从可能收到相同电子邮件但没有收到的潜在收件人组中实际收到相同电子邮件的收件人组。摘要：Polyadic (one-to-many) social interaction happens when one sender addresses multiple receivers simultaneously. Currently available relational event models (REM) are not well suited to the analysis of polyadic interaction networks because they specify event rates for sets of receivers as functions of dyadic covariates associated with the sender and one receiver at a time. Relational hyperevent models (RHEM) alleviate this problem by specifying event rates as functions of hyperedge covariates associated with the sender and the entire set of receivers. In this article we demonstrate the potential benefits of RHEMs for the analysis of polyadic social interaction. We define and implement practically relevant effects that are not available for REMs but may be incorporated in empirical specifications of RHEM. In a reanalysis of the canonical Enron email data, we illustrate how RHEMs effectively (i) reveal evidence of polyadic dependencies in empirical data, (ii) improve the fit over comparable dyadic specifications of REMs, and (iii) better identify the set of recipients actually receiving the same email message from sets of potential recipients who could have received the same email message, but did not.

【8】 No star is good news: A unified look at rerandomization based on p-values from covariate balance tests标题：没有星号就是好消息：对基于协变量平衡测试中的p值的重新随机化有一个统一的看法链接：https://arxiv.org/abs/2112.10545

作者：Anqi Zhao,Peng Ding 机构： Department of Statistics, University of California 摘要：现代社会和生物医学科学出版物要求报告协变量平衡表，不仅包括治疗组的协变量平均值，还包括差异显著性检验的相关$p$-值。避免小的$p$-值的实际需要使得通过假设检验标准进行平衡检查和重新随机化成为改善随机实验中协变量平衡的一个有吸引力的工具。尽管这种实践的直观性及其在现实中已经广泛使用，但现有文献对其对后续推理的影响知之甚少，将许多有效的重新随机实验置于可能低效的分析之下。为了填补这一空白，我们基于协变量平衡测试中的$p$-值（ReP）检查了各种可能有用的重新随机化方案，并展示了它们对后续推断的影响。具体而言，我们分别关注治疗结果的未调整、加性和完全交互线性回归的平均治疗效果的三个估计量，并在ReP下得出它们各自的渐近抽样性质。主要发现有两个。首先，在所有检验的ReP方案下，完全交互回归的估计是渐近最有效的，并且允许与完全随机化下相同的方便回归辅助推断。其次，ReP不仅渐近地改善了协变量平衡，而且还提高了未调整和加性回归估计量的效率。因此，标准回归分析仍然有效，但可能过于保守。摘要：Modern social and biomedical scientific publications require the reporting of covariate balance tables with not only covariate means by treatment group but also the associated $p$-values from significance tests of their differences. The practical need to avoid small $p$-values renders balance check and rerandomization by hypothesis testing standards an attractive tool for improving covariate balance in randomized experiments. Despite the intuitiveness of such practice and its arguably already widespread use in reality, the existing literature knows little about its implications on subsequent inference, subjecting many effectively rerandomized experiments to possibly inefficient analyses. To fill this gap, we examine a variety of potentially useful schemes for rerandomization based on $p$-values (ReP) from covariate balance tests, and demonstrate their impact on subsequent inference. Specifically, we focus on three estimators of the average treatment effect from the unadjusted, additive, and fully interacted linear regressions of the outcome on treatment, respectively, and derive their respective asymptotic sampling properties under ReP. The main findings are twofold. First, the estimator from the fully interacted regression is asymptotically the most efficient under all ReP schemes examined, and permits convenient regression-assisted inference identical to that under complete randomization. Second, ReP improves not only covariate balance but also the efficiency of the estimators from the unadjusted and additive regressions asymptotically. The standard regression analysis, in consequence, is still valid but can be overly conservative.

【9】 An iterative clustering algorithm for the Contextual Stochastic Block Model with optimality guarantees 标题：具有最优性保证的上下文随机挡路模型的迭代聚类算法链接：https://arxiv.org/abs/2112.10467

作者：Guillaume Braun,Hemant Tyagi,Christophe Biernacki 摘要：现实世界中的网络通常附带辅助信息，这些信息有助于提高网络分析任务（如群集）的性能。尽管在过去十年中，对网络聚类方法进行了大量的实证和理论研究，但对边信息的附加值以及用于将其最佳地纳入聚类算法的方法的了解相对较少。我们提出了一种新的迭代算法来对带有节点边信息（以协变量的形式）的网络进行聚类，并证明了我们的算法在上下文对称随机块模型下是最优的。与先前提出的方法相比，我们的算法可以应用于一般上下文随机块模型，并且避免了超参数调整。我们在合成数据实验中证实了我们的理论结果，其中我们的算法明显优于其他方法，并且表明它也可以应用于有符号图。最后，我们在实际数据上展示了我们的方法的实际意义。摘要：Real-world networks often come with side information that can help to improve the performance of network analysis tasks such as clustering. Despite a large number of empirical and theoretical studies conducted on network clustering methods during the past decade, the added value of side information and the methods used to incorporate it optimally in clustering algorithms are relatively less understood. We propose a new iterative algorithm to cluster networks with side information for nodes (in the form of covariates) and show that our algorithm is optimal under the Contextual Symmetric Stochastic Block Model. Our algorithm can be applied to general Contextual Stochastic Block Models and avoids hyperparameter tuning in contrast to previously proposed methods. We confirm our theoretical results on synthetic data experiments where our algorithm significantly outperforms other methods, and show that it can also be applied to signed graphs. Finally we demonstrate the practical interest of our method on real data.

【10】 Model-based Clustering with Missing Not At Random Data 标题：基于模型的缺失非随机数据聚类链接：https://arxiv.org/abs/2112.10425

作者：Aude Sportisse,Christophe Biernacki,Claire Boyer,Julie Josse,Matthieu Marbac Lourdelle,Gilles Celeux,Fabien Laporte 机构：Inria Sophia Antipolis, Université Côte d’Azur,IA Côte d’Azur, Sophia Antipolis, Inria Lille, Université Lille , CNRS, Lille, Sorbonne Université, Paris, Inria Sophia Antipolis, Montpellier, IDESP, Université Rennes, Ensai, CNRS, CREST -UMR , Rennes 摘要：近几十年来，技术进步使得收集大型数据集成为可能。在这种情况下，基于模型的聚类是一种非常流行、灵活和可解释的方法，用于在定义良好的统计框架中进行数据探索。大型数据集增加的一个讽刺之处是，缺少值的情况更加频繁。然而，传统的方法（如丢弃缺失值的观察值或插补方法）并不用于聚类目的。此外，尽管在实践中经常出现缺失非随机（MNAR）值的情况，但它们很少适用于一般情况，即缺失程度取决于未观测到的数据值，也可能取决于观测到的数据值。本文的目标是提出一种将MNAR数据直接嵌入到基于模型的聚类算法中的新方法。我们引入了一个数据和缺失数据指标联合分布的选择模型。它对应于数据分布的混合模型和缺失数据机制的一般MNAR模型，这可能取决于基础类（未知）和/或缺失变量本身的值。导出了大量有意义的MNAR子模型，并研究了每个子模型的参数可辨识性，这通常是任何MNAR方案的关键问题。估计中考虑了EM和随机EM算法。最后，我们在合成数据上对所提出的子模型进行了实证评估，并在医学登记簿TraumaBase（R）数据集上说明了我们的方法的相关性。摘要：In recent decades, technological advances have made it possible to collect large data sets. In this context, the model-based clustering is a very popular, flexible and interpretable methodology for data exploration in a well-defined statistical framework. One of the ironies of the increase of large datasets is that missing values are more frequent. However, traditional ways (as discarding observations with missing values or imputation methods) are not designed for the clustering purpose. In addition, they rarely apply to the general case, though frequent in practice, of Missing Not At Random (MNAR) values, i.e. when the missingness depends on the unobserved data values and possibly on the observed data values. The goal of this paper is to propose a novel approach by embedding MNAR data directly within model-based clustering algorithms. We introduce a selection model for the joint distribution of data and missing-data indicator. It corresponds to a mixture model for the data distribution and a general MNAR model for the missing-data mechanism, which may depend on the underlying classes (unknown) and/or the values of the missing variables themselves. A large set of meaningful MNAR sub-models is derived and the identifiability of the parameters is studied for each of the sub-models, which is usually a key issue for any MNAR proposals. The EM and Stochastic EM algorithms are considered for estimation. Finally, we perform empirical evaluations for the proposed submodels on synthetic data and we illustrate the relevance of our method on a medical register, the TraumaBase (R) dataset.

【11】 Generalized Pareto Regression Trees for extreme events analysis 标题：用于极端事件分析的广义Pareto回归树链接：https://arxiv.org/abs/2112.10409

作者：Sébastien Farkas,Antoine Heranval,Olivier Lopez,Maud Thomas 机构： Sorbonne Universit´e, CNRS, Laboratoire de Probabilit´es, Statistique et, Mod´elisation, LPSM, place Jussieu, F-, Paris, France, Mission Risques Naturels, rue Jules Lefebvre , Paris, France 摘要：在本文中，我们提供有限样本结果来评估广义Pareto回归树的一致性，作为执行极值回归的工具。我们提供的结果是从浓度不等式中获得的，并且对于有限的样本量是有效的，考虑到由于使用“峰值超过阈值”方法而产生的错误指定偏差。我们得出的属性也证明了剪枝策略（即模型选择规则）的合理性，该策略用于选择适当的树，从而在偏差和方差之间实现折衷。通过模拟研究和自然灾害保险中的实际数据应用说明了该方法。摘要：In this paper, we provide finite sample results to assess the consistency of Generalized Pareto regression trees, as tools to perform extreme value regression. The results that we provide are obtained from concentration inequalities, and are valid for a finite sample size, taking into account a misspecification bias that arises from the use of a "Peaks over Threshold" approach. The properties that we derive also legitimate the pruning strategies (i.e. the model selection rules) used to select a proper tree that achieves compromise between bias and variance. The methodology is illustrated through a simulation study, and a real data application in insurance against natural disasters.

【12】 The Predictive Individual Effect for Survival Data 标题：生存数据的预测性个体效应链接：https://arxiv.org/abs/2112.10404

作者：Beat Neuenschwander,Satrajit Roychoudhury,Simon Wandel,Kannan Natarajan,Emmanuel Zuber 机构：a Novartis Pharma AG, Basel, Switzerland, b Pfizer Inc, New York, New York, USA, Corresponding author:, Pfizer Inc, E ,nd St, New York, NY , Key words:, Bayesian predictive inference, non-proportional hazards, patient-centric measure, rank preserva- 摘要：正如《21世纪治疗法》和监管机构最近的指导方针和倡议所表达的那样，以患者为中心的药物开发呼吁是响亮而明确的。在促成现代化药物开发和改善保健活动的因素中，临床效益的衡量标准很容易理解。此外，如果治疗效果随时间变化不稳定，则需要对具有时间-事件终点的癌症试验给予特别关注。我们提出了预测性个体效应，这是一种以患者为中心的、在各种情况下对临床益处的有形衡量。它可以通过标准的预测计算在等级保留假设下获得，该假设以前曾用于治疗转换试验。我们讨论了四个最近的肿瘤试验，包括比例和非比例风险（延迟治疗效果或生存曲线交叉）的情况。研究表明，预测性个体效应提供了超越p值、危险比估计值或中位生存率差异的有价值的见解。与标准统计指标相比，预测性个体效应是一种直接、易于解释的临床效益指标。它促进了临床医生、患者和其他各方之间的沟通，因此，除了标准统计结果外，还应考虑它。摘要：The call for patient-focused drug development is loud and clear, as expressed in the 21st Century Cures Act and in recent guidelines and initiatives of regulatory agencies. Among the factors contributing to modernized drug development and improved health-care activities are easily interpretable measures of clinical benefit. In addition, special care is needed for cancer trials with time-to-event endpoints if the treatment effect is not constant over time. We propose the predictive individual effect which is a patient-centric and tangible measure of clinical benefit under a wide variety of scenarios. It can be obtained by standard predictive calculations under a rank preservation assumption that has been used previously in trials with treatment switching. We discuss four recent Oncology trials that cover situations with proportional as well as non-proportional hazards (delayed treatment effect or crossing of survival curves). It is shown that the predictive individual effect offers valuable insights beyond p-values, estimates of hazard ratios or differences in median survival. Compared to standard statistical measures, the predictive individual effect is a direct, easily interpretable measure of clinical benefit. It facilitates communication among clinicians, patients, and other parties and should therefore be considered in addition to standard statistical results.

【13】 Quasi-uniform designs with optimal and near-optimal uniformity constant 标题：具有最优和接近最优均匀性常数的拟均匀设计链接：https://arxiv.org/abs/2112.10401

作者：Luc Pronzato,Anatoly Zhigljavsky 摘要：设计是给定集合$X$中不同点的集合，假定该集合是$R^d$的一个紧凑子集，设计的网格比率是其填充距离与其分离半径的比率。嵌套设计序列的均匀性常数是设计网格比的最小上限。我们推导了这个一致性常数的一个下界，并证明了一个简单的贪婪构造可以达到这个下界。然后，我们扩展此方案，以便在设计和施工中具有更大的灵活性。摘要：A design is a collection of distinct points in a given set $X$, which is assumed to be a compact subset of $R^d$, and the mesh-ratio of a design is the ratio of its fill distance to its separation radius. The uniformity constant of a sequence of nested designs is the smallest upper bound for the mesh-ratios of the designs. We derive a lower bound on this uniformity constant and show that a simple greedy construction achieves this lower bound. We then extend this scheme to allow more flexibility in the design construction.

【14】 Bayesian nonparametric model based clustering with intractable distributions: an ABC approach 标题：基于贝叶斯非参数模型的难解分布聚类：ABC方法链接：https://arxiv.org/abs/2112.10393

作者：Mario Beraha,Riccardo Corradin 机构：† 3 1Department of Mathematics, Politecnico di Milano 2Department of Computer Science, Università degli Studi di Bologna 3Department of Economics, University of Milano-BicoccaDecember 2 1 备注：20 pages, 4 figures 摘要：贝叶斯非参数混合模型为基于模型的聚类提供了丰富的框架。我们考虑的情况下，混合物的内核只有一个棘手的标准化常数。在这种情况下，大多数常用的马尔可夫链蒙特卡罗（MCMC）方法并不适用。我们提出了一种近似贝叶斯计算（ABC）策略，通过近似后验概率来避免核的难解性。我们推导了一个ABC-MCMC算法，该算法结合了（i）使用由非参数先验推断的预测分布作为建议，以及（ii）使用Wasserstein距离及其与最佳匹配问题的联系。为了克服对算法参数的敏感性，我们进一步提出了一种自适应策略。我们通过几项仿真研究和实际数据的应用来说明所提出的算法的使用，其中我们对一群网络进行聚类，将其性能与标准MCMC算法进行比较，并验证自适应策略。摘要：Bayesian nonparametric mixture models offer a rich framework for model based clustering. We consider the situation where the kernel of the mixture is available only up to an intractable normalizing constant. In this case, most of the commonly used Markov chain Monte Carlo (MCMC) methods are not suitable. We propose an approximate Bayesian computational (ABC) strategy, whereby we approximate the posterior to avoid the intractability of the kernel. We derive an ABC-MCMC algorithm which combines (i) the use of the predictive distribution induced by the nonparametric prior as proposal and (ii) the use of the Wasserstein distance and its connection to optimal matching problems. To overcome the sensibility with respect to the parameters of our algorithm, we further propose an adaptive strategy. We illustrate the use of the proposed algorithm with several simulation studies and an application on real data, where we cluster a population of networks, comparing its performance with standard MCMC algorithms and validating the adaptive strategy.

【15】 Fast iterative proportional scaling for Gaussian graphical models 标题：高斯图形模型的快速迭代比例缩放链接：https://arxiv.org/abs/2112.10388

作者：Søren Højsgaard,Steffen Lauritzen 机构： HØJSGAARDDepartment of Mathematical Sciences, Aalborg UniversitySkjernvej 4A, LAURITZENDepartment of Mathematical Sciences, University of CopenhagenUniversitetsparken 5 摘要：在高斯图形模型中，似然方程通常必须迭代求解，例如通过迭代比例缩放。然而，这种方法可能无法很好地适用于多变量模型，因为它涉及到大矩阵的重复反演。我们提出了一种算法版本，该算法避免了这些反转，从而提高了速度，特别是当图形稀疏时。摘要：In Gaussian graphical models, the likelihood equations must typically be solved iteratively, for example by iterative proportional scaling. However, this method may not scale well to models with many variables because it involves repeated inversion of large matrices. We present a version of the algorithm which avoids these inversions, resulting in increased speed, in particular when graphs are sparse.

【16】 Nonparametric estimation of multivariate copula using empirical bayes method 标题：基于经验Bayes方法的多元Copula非参数估计链接：https://arxiv.org/abs/2112.10351

作者：Lu Lu,Sujit Ghosh 机构：Department of Statistics, North Carolina State University 摘要：在金融、保险和系统可靠性等领域，通过使用copula对多变量分布建模来度量变量之间的依赖性通常是人们感兴趣的。带有参数假设的copula模型很容易估计，但当这些假设是错误的时，可能会有很大的偏差，而经验copula是非光滑的，并且通常不是真正的copula，这使得依赖性的推断在实践中具有挑战性。作为折衷方案，经验Bernstein copula提供了一个平滑的估计量，但调谐参数的估计仍然难以捉摸。在本文中，通过使用所谓的经验棋盘copula，我们建立了一个层次化的经验Bayes模型，该模型能够估计任意维的光滑copula函数。基于多元Bernstein多项式的估计量本身就是一个真正的copula，其维数变化程度的选择依赖于数据。我们还表明，所提出的copula估计提供了一个更准确的估计数多元依赖措施，可以得到封闭形式。我们研究了该估计的渐近和有限样本性能，并通过仿真研究将其与一些非参数估计进行了比较。本文介绍了一个在投资组合风险管理中的应用，以及估计不确定性的量化。摘要：In the field of finance, insurance, and system reliability, etc., it is often of interest to measure the dependence among variables by modeling a multivariate distribution using a copula. The copula models with parametric assumptions are easy to estimate but can be highly biased when such assumptions are false, while the empirical copulas are non-smooth and often not genuine copula making the inference about dependence challenging in practice. As a compromise, the empirical Bernstein copula provides a smooth estimator but the estimation of tuning parameters remains elusive. In this paper, by using the so-called empirical checkerboard copula we build a hierarchical empirical Bayes model that enables the estimation of a smooth copula function for arbitrary dimensions. The proposed estimator based on the multivariate Bernstein polynomials is itself a genuine copula and the selection of its dimension-varying degrees is data-dependent. We also show that the proposed copula estimator provides a more accurate estimate of several multivariate dependence measures which can be obtained in closed form. We investigate the asymptotic and finite-sample performance of the proposed estimator and compare it with some nonparametric estimators through simulation studies. An application to portfolio risk management is presented along with a quantification of estimation uncertainty.

【17】 Convergence properties of data augmentation algorithms for high-dimensional robit regression 标题：高维ROBIT回归数据扩充算法的收敛性链接：https://arxiv.org/abs/2112.10349

作者：Sourav Mukherjee,Kshitij Khare,Saptarshi Chakraborty 机构：Department of Statistics, University of Florida, Department of Biostatistics, State University of New York at Buffalo 备注：29 pages, 4 figures 摘要：logistic和probit连接函数是具有二元响应的回归模型最常见的选择。然而，这些选择对于异常值/意外观察的存在并不可靠。robit链接函数等于学生$t$-分布的反向CDF，为probit和logistic链接函数提供了一个健壮的替代方案。回归系数的多元正态先验是robit回归模型中贝叶斯推理的标准选择。由此产生的后验密度是难以处理的，使用数据增强（DA）马尔可夫链从所需的后验分布生成近似样本。建立DA马尔可夫链的几何遍历性非常重要，因为它为MCMC标准误差对于期望的后验期望值/分位数的渐近有效性提供了理论保证。之前的工作[Roy（2012）]建立了robit DA Markov链的几何遍历性，假设（i）样本量$n$支配预测值$p$的数量，以及（ii）一个额外的约束，该约束要求样本量由一个固定常数限定，该常数取决于设计矩阵$X$。特别是，不考虑$n<p$的现代高维设置。在这项工作中，我们证明了robit-DA马尔可夫链是跟踪类（即，对应马尔可夫算子的特征值是可求和的），可以任意选择样本大小$n$、预测数$p$、设计矩阵$X$以及先验均值和方差参数。trace类属性意味着几何遍历性。此外，该属性允许我们得出结论，三明治robit链（通过在DA链的两个步骤之间插入廉价的额外步骤获得）在适当意义上严格优于robit DA链。摘要：The logistic and probit link functions are the most common choices for regression models with a binary response. However, these choices are not robust to the presence of outliers/unexpected observations. The robit link function, which is equal to the inverse CDF of the Student's $t$-distribution, provides a robust alternative to the probit and logistic link functions. A multivariate normal prior for the regression coefficients is the standard choice for Bayesian inference in robit regression models. The resulting posterior density is intractable and a Data Augmentation (DA) Markov chain is used to generate approximate samples from the desired posterior distribution. Establishing geometric ergodicity for this DA Markov chain is important as it provides theoretical guarantees for asymptotic validity of MCMC standard errors for desired posterior expectations/quantiles. Previous work [Roy(2012)] established geometric ergodicity of this robit DA Markov chain assuming (i) the sample size $n$ dominates the number of predictors $p$, and (ii) an additional constraint which requires the sample size to be bounded above by a fixed constant which depends on the design matrix $X$. In particular, modern high-dimensional settings where $n < p$ are not considered. In this work, we show that the robit DA Markov chain is trace-class (i.e., the eigenvalues of the corresponding Markov operator are summable) for arbitrary choices of the sample size $n$, the number of predictors $p$, the design matrix $X$, and the prior mean and variance parameters. The trace-class property implies geometric ergodicity. Moreover, this property allows us to conclude that the sandwich robit chain (obtained by inserting an inexpensive extra step in between the two steps of the DA chain) is strictly better than the robit DA chain in an appropriate sense.

【18】 Comprehensive Performance Evaluation of LID Practices for the Sponge City Construction: A Case Study in Guangxi, China 标题：海绵城市建设中盖子实践的综合绩效评价--以广西为例链接：https://arxiv.org/abs/2112.10347

作者：Li Qian,Wang Feng,Yu Yang,Huang Zhengce,Li Mantao,Guan Yuntao 机构： Graduate School at Shenzhen, Tsinghua University First Author 备注：None 摘要：海绵城市建设是城市雨水管理的一个新概念，它可以有效缓解城市洪水，减少非点源污染，促进雨水资源的利用，通常包括应用低影响开发（LID）技术。虽然中国已经选择了30个城市实施海绵城市建设，但缺乏定量评估方法来评估LID实践的环境、经济和社会效益。本文采用雨水管理模型（SWMM）和层次分析法（AHP）开发了一个综合评价系统，以量化LID单元不同组合的效益。针对中国广西某体育中心项目，分析了生物滞留设施、植草洼地、下沉绿地、透水路面和储罐的五种不同位置和尺寸的盖子设计方案的性能。结果表明，包含34.5%的生物滞留设施和46.0%的下沉式绿地的绿色方案在满足75%的年总径流减少要求和实现良好的运行性能、雨水利用、景观促进、，和生态服务功能，主要是因为它们是微型和分散的设施，可以通过自然过程在源头管理雨水。该评价体系也可应用于其他海绵城市项目LID实践的优选和绩效效果评价。摘要：Sponge city construction is a new concept of urban stormwater management, which can effectively relieve urban flooding, reduce non-point source pollution, and promote the usage of rainwater resources, often including the application of Low Impact Development (LID) techniques. Although 30 cities in China have been chosen to implement sponge city construction, there is a lack of a quantitative evaluation method to evaluate the environmental, economic, and social benefits of LID practices. This paper develops a comprehensive evaluation system to quantify the benefits of different combinations of LID units using the Storm Water Management Model (SWMM) and the Analytical Hierarchy Process (AHP) method. The performance of five LID design scenarios with different locations and sizes of the bio-retention facility, the grassed swale, the sunken green space, the permeable pavement, and the storage tank were analyzed for a sports center project in Guangxi, China. Results indicated that the green scenario that contains 34.5% of bio-retention facilities and 46.0% of sunken green spaces had the best comprehensive performance regarding meeting the requirements of 75% annual total runoff reduction and the attainment of good operation performance, rainwater utilization, landscape promotion, and ecological service functions, mainly because they are micro-scale and decentralized facilities that can manage stormwater at the source through the natural process. The optimal scenario was adopted to construct the project, and the proposed evaluation system can also be applied to optimal selection and performance effect evaluation of LID practices in other sponge city projects.

【19】 Approximating Bayes in the 21st Century 标题：21世纪的贝叶斯近似化链接：https://arxiv.org/abs/2112.10342

作者：Gael M. Martin,David T. Frazier,Christian P. Robert 备注：arXiv admin note: text overlap with arXiv:2004.06425 摘要：21世纪，近似贝叶斯方法的发展和使用有了巨大的增长。这些方法产生了某些难以解决的统计问题的计算解决方案，这些问题对马尔可夫链蒙特卡罗等精确方法提出了挑战：例如，具有不可用可能性的模型、高维模型和具有大型数据集的模型。这些近似方法是本次审查的主题。其目的是帮助新的研究人员，尤其是那些对采用贝叶斯方法进行实证研究感兴趣的人，区分不同的近似技术；理解它们的近似意义；了解特定方法何时以及为什么有用；看看它们可以结合在一起的方式。摘要：The 21st century has seen an enormous growth in the development and use of approximate Bayesian methods. Such methods produce computational solutions to certain intractable statistical problems that challenge exact methods like Markov chain Monte Carlo: for instance, models with unavailable likelihoods, high-dimensional models, and models featuring large data sets. These approximate methods are the subject of this review. The aim is to help new researchers in particular -- and more generally those interested in adopting a Bayesian approach to empirical work -- distinguish between different approximate techniques; understand the sense in which they are approximate; appreciate when and why particular methods are useful; and see the ways in which they can can be combined.

【20】 Adapting the Hill estimator to distributed inference: dealing with the bias 标题：使Hill估计量适应分布式推理：处理偏差链接：https://arxiv.org/abs/2112.10329

作者：Liujun Chen,Deyuan Li,Chen Zhou 机构：nl 1School of Management, Fudan University, 2Erasmus School of Economics, Erasmus University Rotterdam 摘要：分布式Hill估计器是一种分治算法，用于在多台机器中存储数据时估计极值指数。在应用中，基于分布式Hill估计器的估计可能对每台机器中使用的超越比数量的选择敏感。即使在选择较低水平的数字时，也可能出现较高的渐近偏差。我们通过设计分布式Hill估计器的偏差校正程序克服了这一潜在缺陷，该估计器遵循分布式推理的设置。我们得到的渐近无偏分布估计一方面适用于分布式存储数据，另一方面继承了极值统计中偏差校正方法的所有已知优点。摘要：The distributed Hill estimator is a divide-and-conquer algorithm for estimating the extreme value index when data are stored in multiple machines. In applications, estimates based on the distributed Hill estimator can be sensitive to the choice of the number of the exceedance ratios used in each machine. Even when choosing the number at a low level, a high asymptotic bias may arise. We overcome this potential drawback by designing a bias correction procedure for the distributed Hill estimator, which adheres to the setup of distributed inference. The asymptotically unbiased distributed estimator we obtained, on the one hand, is applicable to distributed stored data, on the other hand, inherits all known advantages of bias correction methods in extreme value statistics.

【21】 A Nonparametric Statistical Method for Two Crossing Survival Curves 标题：两条交叉生存曲线的非参数统计方法链接：https://arxiv.org/abs/2112.10323

作者：Xinghui Huang,Jingjing Lyu,Yawen Hou,Zheng Chen 机构：Department of Biostatistics, Southern Medical University, Guangzhou, China., Department of Statistics, Jinan University, Guangzhou, China., Correspondence, Jan 备注：None 摘要：在两组事件时间数据的比较研究中，当两条生存曲线相互交叉时，可能难以使用对数秩检验和危险比（HR）正确评估治疗效益。我们的目的是确定在上述情况下评估两组治疗效益的方法。我们基于一种称为两条生存曲线之间的面积（ABS）的直观测量对治疗效益进行了量化，这是临床试验中治疗效益的稳健测量，无论是否违反比例风险假设或两条生存曲线相互交叉。此外，我们提出了一种基于ABS的置换测试，并用模拟数据评估了该测试的有效性和可靠性。ABS置换试验是一种稳健的统计推断方法，具有可接受的I型错误率和检测治疗效果差异的优越能力，特别是在违反比例风险假设的情况下。ABS可用于直观地量化随时间变化的治疗差异，并在复杂情况下提供可靠结论，如交叉生存曲线。R包“ComparisonSurv”包含建议的方法，可从https://CRAN.R-project.org/package=ComparisonSurv.关键词：生存分析；两条存活曲线之间的面积；交叉生存曲线；治疗效益摘要：In comparative research on time-to-event data for two groups, when two survival curves cross each other, it may be difficult to use the log-rank test and hazard ratio (HR) to properly assess the treatment benefit. Our aim was to identify a method for evaluating the treatment benefits for two groups in the above situation. We quantified treatment benefits based on an intuitive measure called the area between two survival curves (ABS), which is a robust measure of treatment benefits in clinical trials regardless of whether the proportional hazards assumption is violated or two survival curves cross each other. Additionally, we propose a permutation test based on the ABS, and we evaluate the effectiveness and reliability of this test with simulated data. The ABS permutation test is a robust statistical inference method with an acceptable type I error rate and superior power to detect differences in treatment effects, especially when the proportional hazards assumption is violated. The ABS can be used to intuitively quantify treatment differences over time and provide reliable conclusions in complicated situations, such as crossing survival curves. The R Package "ComparisonSurv" contains the proposed methods and is available from https://CRAN.R-project.org/package=ComparisonSurv. Keywords: Survival analysis; Area between two survival curves; Crossing survival curves; Treatment benefit

【22】 Tutorial on Asymptotic Properties of Regularized Least Squares Estimator for Finite Impulse Response Model 标题：有限脉冲响应模型正则化最小二乘估计的渐近性质教程链接：https://arxiv.org/abs/2112.10319

作者：Yue Ju,Tianshi Chen,Biqiang Mu,Lennart Ljung 机构：cn 2Biqiang Mu is with Key Laboratory of Systems and Control, Institute of Systems Science 摘要：在本文中，我们给出了一个关于带滤波白噪声输入的有限脉冲响应模型的最小二乘（LS）和正则化最小二乘（RLS）估计的渐近性质的教程。我们提供了三种观点：几乎确定收敛性、分布收敛性和概率有界性。一方面，这些性质加深了我们对LS和RLS估计的理解。另一方面，我们可以使用它们作为工具来研究其他估计量的渐近性质，例如各种超参数估计量。摘要：In this paper, we give a tutorial on asymptotic properties of the Least Square (LS) and Regularized Least Squares (RLS) estimators for the finite impulse response model with filtered white noise inputs. We provide three perspectives: the almost sure convergence, the convergence in distribution and the boundedness in probability. On one hand, these properties deepen our understanding of the LS and RLS estimators. On the other hand, we can use them as tools to investigate asymptotic properties of other estimators, such as various hyper-parameter estimators.

【23】 Marginal Independence Models 标题：边际独立性模型链接：https://arxiv.org/abs/2112.10287

作者：Tobias Boege,Sonja Petrović,Bernd Sturmfels 备注：14 pages 摘要：我们对张量的边缘化施加秩一约束，由简单复合体给出。在柯库普和沙利文的工作之后，这种边缘独立性模型可以通过坐标的线性变化得到复曲面。我们研究了它们的复曲面理想，重点研究了随机图模型和拟阵的独立集多面体。我们利用欧几里德距离和最大似然法发展了参数估计的数值代数，并提供了一个小型模型的综合数据库。摘要：We impose rank one constraints on marginalizations of a tensor, given by a simplicial complex. Following work of Kirkup and Sullivant, such marginal independence models can be made toric by a linear change of coordinates. We study their toric ideals, with emphasis on random graph models and independent set polytopes of matroids. We develop the numerical algebra of parameter estimation, using both Euclidean distance and maximum likelihood, and we present a comprehensive database of small models.

【24】 Variational Bayes for high-dimensional proportional hazards models with applications to gene expression variable selection 标题：高维比例风险模型的变分Bayes及其在基因表达变量选择中的应用链接：https://arxiv.org/abs/2112.10270

作者：Michael Komodromos,Eric Aboagye,Marina Evangelou,Sarah Filippi,Kolyan Ray 机构：Department of Mathematics, Imperial College London and, Department of Surgery and Cancer, Imperial College London 摘要：我们提出了一个变分贝叶斯比例风险模型，用于高维生存数据的预测和变量选择。我们的方法基于平均场变分近似，克服了MCMC的高计算成本，同时保留了有用的特征，提供了优秀的点估计，并通过后验包含概率提供了变量选择的自然机制。我们提出的方法的性能通过广泛的模拟进行评估，并与其他最先进的贝叶斯变量选择方法进行比较，证明了可比或更好的性能。最后，我们展示了所提出的方法如何用于两个具有截尾生存结果的转录组数据集的变量选择，其中我们确定了具有预先存在的生物学解释的基因。摘要：We propose a variational Bayesian proportional hazards model for prediction and variable selection regarding high-dimensional survival data. Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC whilst retaining the useful features, providing excellent point estimates and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, where we identify genes with pre-existing biological interpretations.

【25】 Realistic and Fast Modeling of Spatial Extremes over Large Geographical Domains 标题：大地理区域空间极值的真实感快速建模链接：https://arxiv.org/abs/2112.10248

作者：Arnab Hazra,Raphaël Huser,David Bolin 摘要：各种自然现象仅在短距离内表现出空间极值依赖性，而通常随着地点之间距离的任意增加而消失。然而，文献中提出的空间极值模型，基于最大稳定过程或帕累托过程，或基于高斯位置和/或尺度混合的计算要求相对较低的“亚渐近”模型，通常假设空间极值依赖性在整个空间域内持续存在。这是一个明显的限制，当在大的地理区域上建模极端时，但令人惊讶的是，它在文献中大多被忽略。在本文中，我们基于一个新的高斯尺度混合模型开发了一个更现实的贝叶斯框架，其中高斯过程分量由一个随机偏微分方程定义，该方程产生一个稀疏精度矩阵，将随机尺度分量建模为由紧支撑基函数确定的低秩Pareto尾或Weibull尾空间过程。我们证明了我们提出的模型是近似尾部平稳的，尽管它的基函数是非平稳的，并且我们证明了它可以捕获作为距离函数的广泛的极值依赖结构。此外，我们的空间模型固有的稀疏结构允许快速贝叶斯计算，即使在高空间维度，基于定制的马尔可夫链蒙特卡罗算法，该算法优先考虑尾部的校准。在我们的应用中，我们将我们的模型用于分析孟加拉国的强季风降雨数据。我们的研究表明，所提出的模型优于一些自然方案，并且该模型很好地拟合了降水极值。最后，我们使用拟合模型对每个地点的边际降水量和空间总量的长期回报水平进行推断。摘要：Various natural phenomena exhibit spatial extremal dependence at short distances only, while it usually vanishes as the distance between sites increases arbitrarily. However, models proposed in the literature for spatial extremes, which are based on max-stable or Pareto processes or comparatively less computationally demanding ``sub-asymptotic'' models based on Gaussian location and/or scale mixtures, generally assume that spatial extremal dependence persists across the entire spatial domain. This is a clear limitation when modeling extremes over large geographical domains, but surprisingly, it has been mostly overlooked in the literature. In this paper, we develop a more realistic Bayesian framework based on a novel Gaussian scale mixture model, where the Gaussian process component is defined by a stochastic partial differential equation that yields a sparse precision matrix, and the random scale component is modeled as a low-rank Pareto-tailed or Weibull-tailed spatial process determined by compactly supported basis functions. We show that our proposed model is approximately tail-stationary despite its non-stationary construction in terms of basis functions, and we demonstrate that it can capture a wide range of extremal dependence structures as a function of distance. Furthermore, the inherently sparse structure of our spatial model allows fast Bayesian computations, even in high spatial dimensions, based on a customized Markov chain Monte Carlo algorithm, which prioritize calibration in the tail. In our application, we fit our model to analyze heavy monsoon rainfall data in Bangladesh. Our study indicates that the proposed model outperforms some natural alternatives, and that the model fits precipitation extremes satisfactorily well. Finally, we use the fitted model to draw inferences on long-term return levels for marginal precipitation at each site, and for spatial aggregates.

【26】 Valid inferential models for prediction in supervised learning problems 标题：监督学习问题中有效的预测推理模型链接：https://arxiv.org/abs/2112.10234

作者：Leonardo Cella,Ryan Martin 机构：and 备注：Comments welcome at this https URL 摘要：预测是统计学中的一个基本问题，在预测中，观测数据用于量化未来观测的不确定性。具有覆盖概率保证的预测集是一种常见的解决方案，但这些预测集不提供概率不确定性量化，即将信念分配给有关未来可观测数据的相关断言。或者，我们建议使用概率预测器，即给定观测数据的待预测观测的数据相关（不精确）概率分布。概率预测值必须是可靠的或有效的，在这里，我们提供了有效性的概念，并探讨了其行为和统计含义。特别是，我们证明了有效的概率预测器避免了确定损失，并导致预测过程具有理想的频率错误率控制特性。我们还提供了一个通用的推理模型结构，该结构产生了一个可证明有效的概率预测，并在回归和分类应用中说明了这种结构。摘要：Prediction, where observed data is used to quantify uncertainty about a future observation, is a fundamental problem in statistics. Prediction sets with coverage probability guarantees are a common solution, but these do not provide probabilistic uncertainty quantification in the sense of assigning beliefs to relevant assertions about the future observable. Alternatively, we recommend the use of a probabilistic predictor, a data-dependent (imprecise) probability distribution for the to-be-predicted observation given the observed data. It is essential that the probabilistic predictor be reliable or valid, and here we offer a notion of validity and explore its behavioral and statistical implications. In particular, we show that valid probabilistic predictors avoid sure loss and lead to prediction procedures with desirable frequentist error rate control properties. We also provide a general inferential model construction that yields a provably valid probabilistic predictor, and we illustrate this construction in regression and classification applications.

【27】 Approximately valid probabilistic inference on a class of statistical functionals 标题：一类统计泛函的近似有效概率推断链接：https://arxiv.org/abs/2112.10232

作者：Leonardo Cella,Ryan Martin 机构：and 备注：Comments welcome at this https URL 摘要：现有的概率推理框架假设推理目标是假定的统计模型参数。然而，在机器学习应用中，通常没有统计模型，因此感兴趣的数量不是一个模型参数，而是一个统计函数。在这篇文章中，我们发展了一个广义的推理模型框架，当这个泛函是一个风险最小化或估计方程的解时。我们构造了一个数据相关的不确定性量化和推理的可能性测度，其计算基于bootstrap。然后，我们证明了这种新的广义推理模型提供了近似有效的推理，即分配给未知假设的似然性值在频率论意义上是渐近良好校准的。除此之外，这意味着从我们新的广义推理模型导出的基本泛函的置信域是近似有效的。该方法在经典示例（包括分位数回归）和个性化医疗应用中表现良好。摘要：Existing frameworks for probabilistic inference assume the inferential target is the posited statistical model's parameter. In machine learning applications, however, often there is no statistical model, so the quantity of interest is not a model parameter but a statistical functional. In this paper, we develop a generalized inferential model framework for cases when this functional is a risk minimizer or solution to an estimating equation. We construct a data-dependent possibility measure for uncertainty quantification and inference whose computation is based on the bootstrap. We then prove that this new generalized inferential model provides approximately valid inference in the sense that the plausibility values assigned to hypotheses about the unknowns are asymptotically well-calibrated in a frequentist sense. Among other things, this implies that confidence regions for the underlying functional derived from our new generalized inferential model are approximately valid. The method is shown to perform well in classical examples, including quantile regression, and in a personalized medicine application.

【28】 Stable Conformal Prediction Sets 标题：稳定的共形预测集链接：https://arxiv.org/abs/2112.10224

作者：Eugene Ndiaye 机构：Georgia Institute of Technology (ISyE) 摘要：当观察变量序列$（x_1，y_1）时。。。，（x_n，y_n）$，共形预测是一种方法，它允许通过仅假设数据的分布是可交换的来估计给定$x_{n+1}$的$y_{n+1}$的置信集。虽然有吸引力，但这种集合的计算通常是不可行的，例如，当未知变量$y{n+1}$是连续的时。在本文中，我们将保角预测技术与算法稳定性界相结合，推导出一个可计算的预测集，该预测集采用单一模型拟合。我们进行了一些数值实验，说明了当样本量足够大时，我们估计的严密性。摘要：When one observes a sequence of variables $(x_1, y_1), ..., (x_n, y_n)$, conformal prediction is a methodology that allows to estimate a confidence set for $y_{n+1}$ given $x_{n+1}$ by merely assuming that the distribution of the data is exchangeable. While appealing, the computation of such set turns out to be infeasible in general, e.g. when the unknown variable $y_{n+1}$ is continuous. In this paper, we combine conformal prediction techniques with algorithmic stability bounds to derive a prediction set computable with a single model fit. We perform some numerical experiments that illustrate the tightness of our estimation when the sample size is sufficiently large.

【29】 Sequential Estimation of Temporally Evolving Latent Space Network Models 标题：时间演化潜在空间网络模型的序贯估计链接：https://arxiv.org/abs/2112.10220

作者：Kathryn Turnbull,Christopher Nemeth,Matthew Nunes,Tyler McCormick 机构：McCormick, Department of Mathematics and Statistics, Lancaster University, School of Mathematical Sciences, University of Bath, Department of Statistics, University of Washington 摘要：在本文中，我们关注动态网络数据，这些数据描述了固定人口之间随时间的相互作用。我们使用潜在空间框架来建模该数据，其中连接形成的概率被表示为与节点相关联的低维潜在坐标的函数，并考虑通过序贯Monte Carlo（SMC）方法对模型参数进行序贯估计。在这种情况下，SMC是一种自然的估计候选方法，它比文献中通常考虑的现有方法具有更大的可扩展性，允许在给定额外观察的情况下方便地更新估计，并促进在线和离线推断。我们提出了一种基于高维SMC文献中的技术，顺序推断动态潜在空间网络模型参数的新方法。此外，我们还通过模拟检验了我们方法的可扩展性和性能，展示了我们方法对变量建模的灵活性，并分析了描述课堂接触的真实数据集。摘要：In this article we focus on dynamic network data which describe interactions among a fixed population through time. We model this data using the latent space framework, in which the probability of a connection forming is expressed as a function of low-dimensional latent coordinates associated with the nodes, and consider sequential estimation of model parameters via Sequential Monte Carlo (SMC) methods. In this setting, SMC is a natural candidate for estimation which offers greater scalability than existing approaches commonly considered in the literature, allows for estimates to be conveniently updated given additional observations and facilitates both online and offline inference. We present a novel approach to sequentially infer parameters of dynamic latent space network models by building on techniques from the high-dimensional SMC literature. Furthermore, we examine the scalability and performance of our approach via simulation, demonstrate the flexibility of our approach to model variants and analyse a real-world dataset describing classroom contacts.

【30】 RELAX: Representation Learning Explainability 标题：放松：表达学习可理解性链接：https://arxiv.org/abs/2112.10161

作者：Kristoffer K. Wickstrøm,Daniel J. Trosten,Sigurd Løkse,Karl Øyvind Mikalsen,Michael C. Kampffmeyer,Robert Jenssen 机构：Department of Physics and Technology, UiT The Arctic University of Norway 摘要：尽管通过自我监督的表征学习在从未标记数据学习时取得了显著的进步，但没有任何方法可以解释什么会影响所学表征。我们通过我们提出的RELAX方法来解决这一需求，RELAX是第一种基于属性的表示解释方法。我们的方法还可以对其解释中的不确定性进行建模，这对于生成可信的解释至关重要。RELAX通过测量输入和隐藏版本之间的表示空间的相似性来解释表示，提供直观的解释，显著优于基于梯度的基线。我们提供RELAX的理论解释，并对使用有监督和无监督学习训练的特征提取器进行新的分析，提供对不同学习策略的见解。最后，我们说明了RELAX在多视图聚类中的可用性，并强调了合并不确定性对于提供低复杂性解释是必不可少的，这是解释表示的关键一步。摘要：Despite the significant improvements that representation learning via self-supervision has led to when learning from unlabeled data, no methods exist that explain what influences the learned representation. We address this need through our proposed approach, RELAX, which is the first approach for attribution-based explanations of representations. Our approach can also model the uncertainty in its explanations, which is essential to produce trustworthy explanations. RELAX explains representations by measuring similarities in the representation space between an input and masked out versions of itself, providing intuitive explanations and significantly outperforming the gradient-based baseline. We provide theoretical interpretations of RELAX and conduct a novel analysis of feature extractors trained using supervised and unsupervised learning, providing insights into different learning strategies. Finally, we illustrate the usability of RELAX in multi-view clustering and highlight that incorporating uncertainty can be essential for providing low-complexity explanations, taking a crucial step towards explaining representations.

【31】 Edge differentially private estimation in the β-model via jittering and method of moments标题：基于抖动和矩量法的β-模型边缘差分私有估计链接：https://arxiv.org/abs/2112.10151

作者：Jinyuan Chang,Qiao Hu,Eric D. Kolaczyk,Qiwei Yao,Fengting Yi 机构：a School of Statistics, Southwestern University of Finance and Economics, Chengdu, China, b Department of Mathematics and Statistics, Boston University, Boston, MA, USA 摘要：数据隐私的一个长期挑战是隐私级别和统计推断效率之间的权衡。在此，我们对通过抖动（Karwa、Krivitsky和Slavkovic，2017）发布的边缘差异专用网络数据的$\beta$-模型（Chatterjee、Diaconis和Sly，2011）中的参数估计进行了深入研究。与以往大多数基于最大似然估计的网络模型方法不同，我们采用矩量法。这一选择有助于我们探索比目前更广泛的隐私级别——对应于更严格的隐私。在这个新的范围内，我们发现我们提出的参数估计呈现出有趣的相变，其收敛速度和渐近方差遵循三种不同的行为模式之一，这取决于隐私级别。由于在实际中很难识别可操作区域，我们设计了一种新的自适应bootstrap过程来构建跨不同阶段的统一推理。事实上，利用此引导，我们能够同时推断$\beta$-模型中的所有参数（即等于顶点数），这似乎是同类的第一个结果。数值实验证实了所提出的推理方法的竞争性和可靠的有限样本性能，仅次于可比的最大似然方法，以及在计算速度和内存方面的显著优势。摘要：A standing challenge in data privacy is the trade-off between the level of privacy and the efficiency of statistical inference. Here we conduct an in-depth study of this trade-off for parameter estimation in the $\beta$-model (Chatterjee, Diaconis and Sly, 2011) for edge differentially private network data released via jittering (Karwa, Krivitsky and Slavkovi\'c, 2017). Unlike most previous approaches based on maximum likelihood estimation for this network model, we proceed via method of moments. This choice facilitates our exploration of a substantially broader range of privacy levels -- corresponding to stricter privacy -- than has been to date. Over this new range we discover our proposed estimator for the parameters exhibits an interesting phase transition, with both its convergence rate and asymptotic variance following one of three different regimes of behavior depending on the level of privacy. Because identification of the operable regime is difficult to impossible in practice, we devise a novel adaptive bootstrap procedure to construct uniform inference across different phases. In fact, leveraging this bootstrap we are able to provide for simultaneous inference of all parameters in the $\beta$-model (i.e., equal to the number of vertices), which would appear to be the first result of its kind. Numerical experiments confirm the competitive and reliable finite sample performance of the proposed inference methods, next to a comparable maximum likelihood method, as well as significant advantages in terms of computational speed and memory.

【32】 A bivariate copula capturing the dependence of a random variable and a random vector, its estimation and applications 标题：捕捉随机变量和随机向量相关性的二元Copula及其估计和应用链接：https://arxiv.org/abs/2112.10147

作者：Sebastian Fuchs 机构：Universität Salzburg, Salzburg, Austria 备注：24 pages, 9 figures 摘要：我们定义了一个二元copula，它捕获单个随机变量$Y$对一组潜在解释性随机变量$X_1、\dots，X_d$的依赖性的尺度不变程度。copula本身包含$Y$是否完全依赖于$X_1、\dots、X_d$，以及$Y$和$X_1、\dots、X_d$是否独立的信息。沿着对角线均匀地评估这个copula，即计算Spearman的足迹，导致了Azadkia和Chatterjee最近引入的所谓“条件依赖的简单度量”[1]。另一方面，在单位平方上均匀地计算这个copula，即计算Spearman的rho，可以得到一个无分布的确定系数。应用文献[1]中介绍的技术，我们构造了这个copula估计，并证明了这个copula估计是强相合的。因为，对于$d=1$，所考虑的copula与众所周知的copula的Markov乘积（作为副产品）相一致，我们也得到了Markov乘积的强一致copula估计。仿真研究表明了该估计器的小样本性能。摘要：We define a bivariate copula that captures the scale-invariant extent of dependence of a single random variable $Y$ on a set of potential explanatory random variables $X_1, \dots, X_d$. The copula itself contains the information whether $Y$ is completely dependent on $X_1, \dots, X_d$, and whether $Y$ and $X_1, \dots, X_d$ are independent. Evaluating this copula uniformly along the diagonal, i.e. calculating Spearman's footrule, leads to the so-called 'simple measure of conditional dependence' recently introduced by Azadkia and Chatterjee [1]. On the other hand, evaluating this copula uniformly over the unit square, i.e. calculating Spearman's rho, leads to a distribution-free coefficient of determination. Applying the techniques introduced in [1], we construct an estimate for this copula and show that this copula estimator is strongly consistent. Since, for $d=1$, the copula under consideration coincides with the well-known Markov product of copulas, as by-product, we also obtain a strongly consistent copula estimator for the Markov product. A simulation study illustrates the small sample performance of the proposed estimator.

【33】 Information Field Theory as Artificial Intelligence 标题：作为人工智能的信息场理论链接：https://arxiv.org/abs/2112.10133

作者：Torsten Enßlin 机构： Max Planck Institute for Astrophysics, Karl-Schwarzschild-Str. , Garching, Germany, Ludwig-Maximilians-Universität München, Geschwister-Scholl-Platz , Munich, Germany 备注：8 pages, no figures, invited talk at MaxEnt2020/2021 摘要：信息场理论（IFT）是场的信息理论，是信号重构和非参数逆问题的数学框架。这里，字段表示作为空间（和时间）函数不断变化的物理量，信息论指配备相关熵信息度量的贝叶斯概率逻辑。用IFT重建信号是一个类似于训练生成型神经网络（GNN）的计算问题。本文从GNN训练的角度对IFT中的推理进行了重新表述，并讨论了IFT中的数值变分推理方法与机器学习的交叉应用。讨论表明，IFT推理可以被视为人工智能的一种特殊形式。与经典神经网络相比，基于IFT的GNN由于将专家知识整合到其体系结构中，因此无需预先训练即可运行。摘要：Information field theory (IFT), the information theory for fields, is a mathematical framework for signal reconstruction and non-parametric inverse problems. Here, fields denote physical quantities that change continuously as a function of space (and time) and information theory refers to Bayesian probabilistic logic equipped with the associated entropic information measures. Reconstructing a signal with IFT is a computational problem similar to training a generative neural network (GNN). In this paper, the inference in IFT is reformulated in terms of GNN training and the cross-fertilization of numerical variational inference methods used in IFT and machine learning are discussed. The discussion suggests that IFT inference can be regarded as a specific form of artificial intelligence. In contrast to classical neural networks, IFT based GNNs can operate without pre-training thanks to incorporating expert knowledge into their architecture.

【34】 Temporal and spectral governing dynamics of Australian hydrological streamflow time series 标题：澳大利亚水文径流时间序列的时间和谱控制动力学链接：https://arxiv.org/abs/2112.10073

作者：Nick James,Howard Bondell 机构：School of Mathematics and Statistics, University of Melbourne, Victoria, Australia 备注：24 pages 摘要：我们使用多元时间序列分析中新建立的方法来研究414个澳大利亚水文站的径流动力学。首先，我们在时域分析我们收集的时间序列，并比较水文站候选轨迹的相似性。然后，我们引入一个基于Whittle似然的优化框架来研究我们收集的台站中周期现象的集体相似性。在确定了时间域和光谱域中值得注意的相似性后，我们介绍了一种算法程序来估计澳大利亚境内的控制水文径流过程。为了确定这种行为随时间的稳定性，我们随后研究了主成分分析（PCA）和谱分析在时变应用下的控制动力学和基本时间序列的演化。摘要：We use new and established methodologies in multivariate time series analysis to study the dynamics of 414 Australian hydrological stations' streamflow. First, we analyze our collection of time series in the temporal domain, and compare the similarity in hydrological stations' candidate trajectories. Then, we introduce a Whittle Likelihood-based optimization framework to study the collective similarity in periodic phenomena among our collection of stations. Having identified noteworthy similarity in the temporal and spectral domains, we introduce an algorithmic procedure to estimate a governing hydrological streamflow process across Australia. To determine the stability of such behaviours over time, we then study the evolution of the governing dynamics and underlying time series with time-varying applications of principal components analysis (PCA) and spectral analysis.

【35】 Statistical Efficiency of Travel Time Prediction 标题：行程时间预测的统计效率研究链接：https://arxiv.org/abs/2112.09993

作者：Chiwei Yan,James Johndrow,Dawn Woodard 摘要：导航服务和打车平台等现代移动应用严重依赖地理空间技术，最关键的是预测车辆穿越特定路线所需的时间。两大类预测方法是基于路段的方法，该方法在路段级别预测行程时间，然后在路线上进行聚合；基于路线的方法，该方法使用有关行程的一般信息（如起点和终点）预测行程时间。尽管已经开发和使用了各种形式的这些方法，但没有对这两种方法的准确性进行严格的理论比较，在许多情况下，实证研究得出了相反的结论。我们通过进行第一次理论分析来填补这一空白，比较这两种方法作为训练数据样本量函数的预测准确性（统计效率）。我们引入了一个建模框架，并正式定义了一系列基于分段的估计器和基于路径的估计器，这些估计器类似于文献中提出并在实践中使用的许多实用估计器。在有限样本和渐近条件下，我们给出了基于分段的方法支配基于路径的方法的条件。我们发现，尽管基于路线的方法可以避免在单个路段上聚合所引入的累积误差，但这种优势通常被（显著）较小的相关样本量所抵消。因此，如果必须在实践中选择这两种方法，我们建议使用基于分段的方法。我们的工作强调，行程时间预测的准确性不仅取决于模型的复杂程度，还取决于应用这些方法的空间粒度。摘要：Modern mobile applications such as navigation services and ride-hailing platforms rely heavily on geospatial technologies, most critically predictions of the time required for a vehicle to traverse a particular route. Two major categories of prediction methods are segment-based approaches, which predict travel time at the level of road segments and then aggregate across the route, and route-based approaches, which use generic information about the trip such as origin and destination to predict travel time. Though various forms of these methods have been developed and used, there has been no rigorous theoretical comparison of the accuracy of these two approaches, and empirical studies have in many cases drawn opposite conclusions. We fill this gap by conducting the first theoretical analysis to compare these two approaches in terms of their predictive accuracy as a function of the sample size of the training data (the statistical efficiency). We introduce a modeling framework and formally define a family of segment-based estimators and route-based estimators that resemble many practical estimators proposed in the literature and used in practice. Under both finite sample and asymptotic settings, we give conditions under which segment-based approaches dominate their route-based counterparts. We find that although route-based approaches can avoid accumulative errors introduced by aggregating over individual road segments, such advantage is often offset by (significantly) smaller relevant sample sizes. For this reason we recommend the use of segment-based approaches if one has to make a choice between the two methods in practice. Our work highlights that the accuracy of travel time prediction is driven not just by the sophistication of the model, but also the spatial granularity at which those methods are applied.

【36】 Off-Policy Evaluation Using Information Borrowing and Context-Based Switching 标题：利用信息借用和基于上下文的切换进行非政策性评估链接：https://arxiv.org/abs/2112.09865

作者：Sutanoy Dasgupta,Yabo Niu,Kishan Panaganti,Dileep Kalathil,Debdeep Pati,Bani Mallick 机构：Department of Statistics, Texas A&M University, Department of Electrical and Computer Engineering 备注：23 pages, 6 figures, manuscript under review 摘要：我们考虑在上下文盗贼中的Office Opple Debug（OPE）问题，其目标是使用日志记录策略收集的数据来估计目标策略的值。最流行的OPE方法是双稳健（DR）估计器的变体，该估计器通过结合直接法（DM）估计器和涉及逆倾向评分（IPS）的校正项而获得。现有的算法主要集中在减少由大型IP引起的DR估计方差的策略上。我们提出了一种新的方法，称为基于信息借用和上下文切换的双重鲁棒（DR-IC）估计器，该估计器专注于减少偏差和方差。DR-IC估计器将标准DM估计器替换为参数奖励模型，该模型通过依赖于IPS的相关结构从“更密切”的上下文中借用信息。DR-IC估计器还基于特定于上下文的切换规则在该修改的DM估计器和修改的DR估计器之间自适应插值。我们对DR-IC估计的性能给出了可证明的保证。我们还证明了DR-IC估计器在许多基准问题上的性能优于最先进的OPE算法。摘要：We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algorithms primarily focus on strategies to reduce the variance of the DR estimator arising from large IPS. We propose a new approach called the Doubly Robust with Information borrowing and Context-based switching (DR-IC) estimator that focuses on reducing both bias and variance. The DR-IC estimator replaces the standard DM estimator with a parametric reward model that borrows information from the 'closer' contexts through a correlation structure that depends on the IPS. The DR-IC estimator also adaptively interpolates between this modified DM estimator and a modified DR estimator based on a context-specific switching rule. We give provable guarantees on the performance of the DR-IC estimator. We also demonstrate the superior performance of the DR-IC estimator compared to the state-of-the-art OPE algorithms on a number of benchmark problems.

【37】 High-Dimensional Knockoffs Inference for Time Series Data 标题：时间序列数据的高维仿冒推理链接：https://arxiv.org/abs/2112.09851

作者：Chien-Ming Chi,Yingying Fan,Ching-Kang Ing,Jinchi Lv 机构：University of Southern California, and National Tsing Hua University 备注：76 pages, 2 figures 摘要：X型仿冒品框架为变量选择中的精确有限样本错误发现率（FDR）控制提供了灵活的工具。它还完全绕过了传统p值的使用，使其在高维非线性模型中特别具有吸引力。现有的工作主要集中在设置独立和相同分布的观测值。然而，时间序列数据在实际应用中非常普遍。这推动了时间序列数据的X模型仿冒推理研究。本文对时间序列数据的X模型仿射推理建立了理论和方法基础。我们提出了时间序列仿冒推理（TSKI）方法，利用子抽样的思想来缓解序列依赖性带来的困难。我们建立了充分条件，在此条件下，原始模型-X仿冒推理与子采样相结合仍能实现渐近FDR控制。我们的技术分析揭示了FDR控制的串行依赖性的确切影响。为了缓解由于二次采样导致的样本量减少而导致的功率损失的实际问题，我们利用了具有拷贝和多个仿冒的仿冒的思想。在相当一般的时间序列模型设置下，我们证明了FDR仍然是渐近控制的。为了从理论上证明TSKI的威力，我们进一步提出了新的仿冒统计量，即反向消除排序（BE）统计量，并表明它在线性时间序列模型设置中既具有确定的筛选特性，又具有受控的FDR。通过几个模拟实例和一个经济通货膨胀预测应用，说明了建议的TSKI方法与BE耦合的理论结果和诱人的有限样本性能。摘要：The framework of model-X knockoffs provides a flexible tool for exact finite-sample false discovery rate (FDR) control in variable selection. It also completely bypasses the use of conventional p-values, making it especially appealing in high-dimensional nonlinear models. Existing works have focused on the setting of independent and identically distributed observations. Yet time series data is prevalent in practical applications. This motivates the study of model-X knockoffs inference for time series data. In this paper, we make some initial attempt to establish the theoretical and methodological foundation for the model-X knockoffs inference for time series data. We suggest the method of time series knockoffs inference (TSKI) by exploiting the idea of subsampling to alleviate the difficulty caused by the serial dependence. We establish sufficient conditions under which the original model-X knockoffs inference combined with subsampling still achieves the asymptotic FDR control. Our technical analysis reveals the exact effect of serial dependence on the FDR control. To alleviate the practical concern on the power loss because of reduced sample size cause by subsampling, we exploit the idea of knockoffs with copies and multiple knockoffs. Under fairly general time series model settings, we show that the FDR remains to be controlled asymptotically. To theoretically justify the power of TSKI, we further suggest the new knockoff statistic, the backward elimination ranking (BE) statistic, and show that it enjoys both the sure screening property and controlled FDR in the linear time series model setting. The theoretical results and appealing finite-sample performance of the suggested TSKI method coupled with the BE are illustrated with several simulation examples and an economic inflation forecasting application.

【38】 Improving upon the effective sample size based on Godambe information for block likelihood inference 标题：改进基于戈丹贝信息的挡路似然推理有效样本量链接：https://arxiv.org/abs/2112.09840

作者：Rahul Mukerjee 机构：Indian Institute of Management Calcutta, Joka, Diamond Harbour Road, Kolkata , India 摘要：我们认为有效的样本大小，基于GoDube信息，块似然推断，这是一个有吸引力的和计算可行的替代大似然数据集的全似然推断。参考具有恒定平均值的高斯随机场，我们探索块的选择如何影响有效样本大小。可以看出，将每个块内的空间点分散开来，而不是将它们保持在一起，可以在保持计算简单性的同时获得可观的收益。在AR（1）模型下得到了这个方向的分析结果。这样发现的洞见有助于研究其他模型，包括平面上的相关模型，其中封闭形式的表达式是难以处理的。摘要：We consider the effective sample size, based on Godambe information, for block likelihood inference which is an attractive and computationally feasible alternative to full likelihood inference for large correlated datasets. With reference to a Gaussian random field having a constant mean, we explore how the choice of blocks impacts this effective sample size. It is seen that spreading out the spatial points within each block, instead of keeping them close together, can lead to considerable gains while retaining computational simplicity. Analytical results in this direction are obtained under the AR(1) model. The insights so found facilitate the study of other models, including correlation models on a plane, where closed form expressions are intractable.

【39】 Multimeasurement Generative Models 标题：多测度产生式模型链接：https://arxiv.org/abs/2112.09822

作者：Saeed Saremi,Rupesh Kumar Srivastava 机构：NNAISENSE Inc., Redwood Center, UC Berkeley 摘要：我们正式地将密度为$p_X$in$\mathbb{R}^d$的未知分布的采样问题映射到学习和采样$p_X{Y}$in$\mathbb{R}{Md}$的问题，该问题是通过将$p_X$与固定的阶乘核进行卷积得到的：$p_X{Y}$称为M-密度，阶乘核称为多重测量噪声模型（MNM）。M-密度比$p_X$更平滑，更容易学习和取样，但是对于大的$M$，这两个问题在数学上是等价的，因为$X$可以使用Bayes估计器$\widehat{X}（\mathbf{Y}）=\mathbb{E}[X\vert\mathbf{Y}=\mathbf{Y}]$精确地在给定的情况下进行估计。为了解决这个问题，我们推导了$\widehat{x}（\mathbf{y}）$对于泊松和高斯MNM，用非规范化的$p\mathbf{y}表示。这导致了学习参数能量和分数函数的简单最小二乘目标。我们提出了各种感兴趣的参数化方案，包括研究高斯M-密度直接导致多重去噪自动编码器的方案——这是文献中去噪自动编码器和经验贝叶斯之间的第一个理论联系。来自$p_X$的样本通过欠阻尼Langevin MCMC（walk）通过步行跳跃采样（Saremi&Hyvarinen，2019）获得，从$p_\mathbf{Y}$采样，并通过多测量Bayes估计$X$（跳跃）。我们研究了MNIST、CIFAR-10和FFHQ-256数据集上的置换不变高斯M-密度，并证明了该框架在实现高维快速混合稳定马尔可夫链方面的有效性。摘要：We formally map the problem of sampling from an unknown distribution with density $p_X$ in $\mathbb{R}^d$ to the problem of learning and sampling $p_\mathbf{Y}$ in $\mathbb{R}^{Md}$ obtained by convolving $p_X$ with a fixed factorial kernel: $p_\mathbf{Y}$ is referred to as M-density and the factorial kernel as multimeasurement noise model (MNM). The M-density is smoother than $p_X$, easier to learn and sample from, yet for large $M$ the two problems are mathematically equivalent since $X$ can be estimated exactly given $\mathbf{Y}=\mathbf{y}$ using the Bayes estimator $\widehat{x}(\mathbf{y})=\mathbb{E}[X\vert\mathbf{Y}=\mathbf{y}]$. To formulate the problem, we derive $\widehat{x}(\mathbf{y})$ for Poisson and Gaussian MNMs expressed in closed form in terms of unnormalized $p_\mathbf{Y}$. This leads to a simple least-squares objective for learning parametric energy and score functions. We present various parametrization schemes of interest, including one in which studying Gaussian M-densities directly leads to multidenoising autoencoders--this is the first theoretical connection made between denoising autoencoders and empirical Bayes in the literature. Samples from $p_X$ are obtained by walk-jump sampling (Saremi & Hyvarinen, 2019) via underdamped Langevin MCMC (walk) to sample from $p_\mathbf{Y}$ and the multimeasurement Bayes estimation of $X$ (jump). We study permutation invariant Gaussian M-densities on MNIST, CIFAR-10, and FFHQ-256 datasets, and demonstrate the effectiveness of this framework for realizing fast-mixing stable Markov chains in high dimensions.

【40】 A Bayesian hierarchical small-area population model accounting for data source specific methodologies from American Community Survey, Population Estimates Program, and Decennial Census data 标题：从美国社区调查、人口估计计划和十年人口普查数据中考虑特定数据源方法的贝叶斯分层小区域人口模型链接：https://arxiv.org/abs/2112.09813

作者：Emily N Peterson,Rachel C Nethery,Tullia Padellini,Jarvis T Chen,Brent A Coull,Frederic B Piel,Jon Wakefield,Marta Blangiardo,Lance A Waller 机构：Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory, University, Atlanta GA USA, Department of Biostatistics, Harvard TH Chan School of Public Health, Boston MA USA 摘要：许多流行病学研究都需要对人口进行小面积估计，但其质量和准确性往往得不到评估。在美国，美国人口普查局（USCB）以十年一次的人口普查计数、城际人口预测（PEP）和美国社区调查（ACS）估计的形式发布了人口计数的小面积估计。虽然这些数据源之间存在着重要的关系，但在数据收集和处理方法上存在着重要的对比，因此每组估计值可能会受到不同来源和误差大小的影响。此外，由于每个数据源的调查后调整，这些数据源不报告相同的小面积人口计数。根据用于人口计数的数据源（分母数据），得出的小面积疾病/死亡率可能有所不同。为了准确捕获年度小面积人口数量和相关的不确定性，我们提出了一个贝叶斯人口模型（B-Pop），该模型融合了所有三个USCB来源的信息，考虑了数据源特定的方法和相关错误。我们框架的主要特点是：1）集成多个数据源的单一模型，2）考虑特定于数据源的数据生成机制，并具体考虑特定于数据源的错误，以及3）预测没有USCB报告数据的年份的估计。我们的研究重点是格鲁吉亚的159个县，并对2005-2021年进行了估算。摘要：Small area estimates of population are necessary for many epidemiological studies, yet their quality and accuracy are often not assessed. In the United States, small area estimates of population counts are published by the United States Census Bureau (USCB) in the form of the Decennial census counts, Intercensal population projections (PEP), and American Community Survey (ACS) estimates. Although there are significant relationships between these data sources, there are important contrasts in data collection and processing methodologies, such that each set of estimates may be subject to different sources and magnitudes of error. Additionally, these data sources do not report identical small area population counts due to post-survey adjustments specific to each data source. Resulting small area disease/mortality rates may differ depending on which data source is used for population counts (denominator data). To accurately capture annual small area population counts, and associated uncertainties, we present a Bayesian population model (B-Pop), which fuses information from all three USCB sources, accounting for data source specific methodologies and associated errors. The main features of our framework are: 1) a single model integrating multiple data sources, 2) accounting for data source specific data generating mechanisms, and specifically accounting for data source specific errors, and 3) prediction of estimates for years without USCB reported data. We focus our study on the 159 counties of Georgia, and produce estimates for years 2005-2021.

【41】 Nested Bayesian Optimization for Computer Experiments 标题：计算机实验的嵌套贝叶斯优化链接：https://arxiv.org/abs/2112.09797

作者：Yan Wang,Meng Wang,Areej AlBahar,Xiaowei Yue 机构： Wang are with the School of Statistics and Data Science, Beijing University of Technology 备注：12 PAGES, 13 FIGURES 摘要：计算机实验可以模拟物理系统，帮助进行计算研究，并得出解析解。它们已被广泛应用于许多工程应用中（例如，航空航天、汽车、能源系统。传统的贝叶斯优化在计算机实验中没有包含嵌套结构。本文针对具有多步骤或分层特征的复杂计算机实验提出了一种新的嵌套贝叶斯优化。我们证明了给定tw的嵌套输出的理论性质o情况：高斯或非高斯。导出了嵌套期望改进的封闭形式。我们还提出了嵌套贝叶斯优化的计算算法。三个数值研究表明，所提出的嵌套贝叶斯优化方法优于忽略内部计算机代码中间输出的五种基准贝叶斯优化方法。算例分析表明，嵌套贝叶斯优化能有效地减小复合材料结构装配过程中的残余应力，避免收敛到局部最优。摘要：Computer experiments can emulate the physical systems, help computational investigations, and yield analytic solutions. They have been widely employed with many engineering applications (e.g., aerospace, automotive, energy systems. Conventional Bayesian optimization did not incorporate the nested structures in computer experiments. This paper proposes a novel nested Bayesian optimization for complex computer experiments with multi-step or hierarchical characteristics. We prove the theoretical properties of nested outputs given two cases: Gaussian or non-Gaussian. The closed forms of nested expected improvement are derived. We also propose the computational algorithms for nested Bayesian optimization. Three numerical studies show that the proposed nested Bayesian optimization outperforms the five benchmark Bayesian optimization methods ignoring the intermediate outputs of the inner computer code. The case study shows that the nested Bayesian optimization can efficiently minimize the residual stress during composite structures assembly and avoid convergence to the local optimum.

【42】 Functional Linear Regression for Partially Observed Functional Data 标题：部分观测函数数据的泛函线性回归链接：https://arxiv.org/abs/2112.09784

作者：Yafei Wang,Tingyu Lai,Bei Jiang,Linglong Kong,Zhongzhan Zhang 机构：Beijing University of Technology, Beijing, China, University of Alberta, Edmonton, T,G ,G, Canada. 摘要：在函数线性回归模型中，人们提出并研究了许多方法来估计斜率函数，同时在整个域中观察到函数预测器。然而，关于具有部分观测轨迹的函数线性回归模型的研究较少受到关注。在本文中，为了填补文献空白，我们考虑的情况下，单独的功能预测可以只观察到部分的域。根据函数预测器中是否存在测量误差，开发了两种方法，一种是基于轨迹观测部分的线性泛函，另一种是使用条件主成分分数。我们建立了这两种方法的渐近性质。通过有限样本模拟验证了其性能。分析了来自阿尔茨海默病神经成像倡议（ADNI）研究的扩散张量成像（DTI）数据。摘要：In the functional linear regression model, many methods have been proposed and studied to estimate the slope function while the functional predictor was observed in the entire domain. However, works on functional linear regression models with partially observed trajectories have received less attention. In this paper, to fill the literature gap we consider the scenario where individual functional predictor may be observed only on part of the domain. Depending on whether measurement error is presented in functional predictors, two methods are developed, one is based on linear functionals of the observed part of the trajectory and the other one uses conditional principal component scores. We establish the asymptotic properties of the two proposed methods. Finite sample simulations are conducted to verify their performance. Diffusion tensor imaging (DTI) data from Alzheimer's Disease Neuroimaging Initiative (ADNI) study is analyzed.

【43】 Probabilistic Inverse Optimal Transport 标题：概率逆最优运输链接：https://arxiv.org/abs/2112.09754

作者：Wei-Ting Chiu,Pei Wang,Patrick Shafto 机构：Department of Mathematics and Computer Science, Rutgers University Newark, NJ , School of Mathematics, Institute for Advanced Study (IAS), Princeton NJ 备注：18 pages, 9 figures 摘要：最优运输（OT）形式化了在给定成本矩阵的概率测度之间寻找最优耦合的问题。推断给定耦合成本的逆问题是逆最优传输（IOT）。物联网比OT更难理解。我们使用熵正则化OT研究中的工具对物联网的特性进行形式化和系统化分析。理论贡献包括交叉比等效成本流形的表征、模型先验的含义以及MCMC采样器的推导。经验贡献包括在基本示例上可视化交叉比等效效应和验证理论结果的模拟。摘要：Optimal transport (OT) formalizes the problem of finding an optimal coupling between probability measures given a cost matrix. The inverse problem of inferring the cost given a coupling is Inverse Optimal Transport (IOT). IOT is less well understood than OT. We formalize and systematically analyze the properties of IOT using tools from the study of entropy-regularized OT. Theoretical contributions include characterization of the manifold of cross-ratio equivalent costs, the implications of model priors, and derivation of an MCMC sampler. Empirical contributions include visualizations of cross-ratio equivalent effect on basic examples and simulations validating theoretical results.

【44】 Supervised Multivariate Learning with Simultaneous Feature Auto-grouping and Dimension Reduction 标题：同时进行特征自动分组和降维的有监督多元学习链接：https://arxiv.org/abs/2112.09746

作者：Yiyuan She,Jiahui Shen,Chao Zhang 机构：Department of Statistics, Florida State University, Tallahassee, USA., Center for Information Science, Peking University, Beijing, China. 摘要：现代高维方法通常采用“赌稀疏”原则，而在有监督的多元学习中，统计学家可能面临大量非零系数的“密集”问题。本文提出了一种新的聚类降秩学习（CRL）框架，该框架采用两种联合矩阵正则化来自动分组特征，以构造预测因子。CRL比低秩模型更具解释性，并且在变量选择中放松了严格的稀疏性假设。本文提出了新的信息理论极限，揭示了寻求聚类的内在代价，以及多元学习中维度带来的好处。此外，还提出了一种高效的优化算法，该算法在保证收敛的前提下进行子空间学习和聚类。所得到的不动点估计虽然不一定是全局最优的，但在某些正则条件下，其统计精度超过了标准似然设置。此外，提出了一种新的信息准则及其无标度形式，用于聚类和秩选择，在不假设无限样本量的情况下，具有严格的理论支持。大量的仿真和实际数据实验证明了该方法的统计精度和可解释性。摘要：Modern high-dimensional methods often adopt the ``bet on sparsity'' principle, while in supervised multivariate learning statisticians may face ``dense'' problems with a large number of nonzero coefficients. This paper proposes a novel clustered reduced-rank learning (CRL) framework that imposes two joint matrix regularizations to automatically group the features in constructing predictive factors. CRL is more interpretable than low-rank modeling and relaxes the stringent sparsity assumption in variable selection. In this paper, new information-theoretical limits are presented to reveal the intrinsic cost of seeking for clusters, as well as the blessing from dimensionality in multivariate learning. Moreover, an efficient optimization algorithm is developed, which performs subspace learning and clustering with guaranteed convergence. The obtained fixed-point estimators, though not necessarily globally optimal, enjoy the desired statistical accuracy beyond the standard likelihood setup under some regularity conditions. Moreover, a new kind of information criterion, as well as its scale-free form, is proposed for cluster and rank selection, and has a rigorous theoretical support without assuming an infinite sample size. Extensive simulations and real-data experiments demonstrate the statistical accuracy and interpretability of the proposed method.

【45】 Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Switched Linear Systems 标题：自治切换线性系统切换最小二乘系统辨识的一致性和收敛速度链接：https://arxiv.org/abs/2112.10753

作者：Borna Sayedana,Mohammad Afshari,Peter E. Caines,Aditya Mahajan 机构：Electrical and Computer Engineering, McGill University, Canada, Computer Science, University of Alberta, Canada 摘要：本文研究具有完全状态观测的自治切换线性系统的系统辨识问题。我们提出了用于切换线性系统辨识的切换最小二乘法，证明了该方法的强一致性，并推导了与数据相关和与数据无关的收敛速度。特别是，我们的数据相关收敛速度表明，几乎可以肯定，系统识别错误是$\mathcal{O}\big（\sqrt{\log（T）/T}\big）$，其中，$T$是时间范围。这些结果表明，对于切换线性系统，我们的方法与对于非切换线性系统的最小二乘法具有相同的收敛速度。我们将我们的结果与文献中的结果进行比较。我们给出了数值例子来说明所提出的系统辨识方法的性能。摘要：In this paper, we investigate the problem of system identification for autonomous switched linear systems with complete state observations. We propose switched least squares method for the identification for switched linear systems, show that this method is strongly consistent, and derive data-dependent and data-independent rates of convergence. In particular, our data-dependent rate of convergence shows that, almost surely, the system identification error is $\mathcal{O}\big(\sqrt{\log(T)/T} \big)$ where $T$ is the time horizon. These results show that our method for switched linear systems has the same rate of convergence as least squares method for non-switched linear systems. We compare our results with those in the literature. We present numerical examples to illustrate the performance of the proposed system identification method.

【46】 RvS: What is Essential for Offline RL via Supervised Learning? 标题：RVS：通过监督学习实现离线RL的关键是什么？链接：https://arxiv.org/abs/2112.10751

作者：Scott Emmons,Benjamin Eysenbach,Ilya Kostrikov,Sergey Levine 机构：UC Berkeley, Carnegie Mellon University 摘要：最近的研究表明，在没有时间差（TD）学习的情况下，单独的监督学习对于离线学习是非常有效的。什么时候成立，哪些算法组件是必需的？通过大量实验，我们将离线学习的监督学习归结为其基本要素。在我们考虑的每个环境中，简单地用两层前馈MLP最大化似然与基于TD学习或Transformer的序列建模的更为复杂的方法的最新结果相竞争。仔细选择模型容量（例如，通过规范化或架构）以及选择哪些信息（例如，目标或奖励）对性能至关重要。这些见解可作为实践者通过监督学习（我们称之为“RvS学习”）进行强化学习的现场指南。他们还探讨了现有RvS方法的局限性，这些方法在随机数据方面相对较弱，并提出了一些有待解决的问题。摘要：Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. When does this hold true, and which algorithmic components are necessary? Through extensive experiments, we boil supervised learning for offline RL down to its essential elements. In every environment suite we consider, simply maximizing likelihood with a two-layer feedforward MLP is competitive with state-of-the-art results of substantially more complex methods based on TD learning or sequence modeling with Transformers. Carefully choosing model capacity (e.g., via regularization or architecture) and choosing which information to condition on (e.g., goals or rewards) are critical for performance. These insights serve as a field guide for practitioners doing Reinforcement Learning via Supervised Learning (which we coin "RvS learning"). They also probe the limits of existing RvS methods, which are comparatively weak on random data, and suggest a number of open problems.

【47】 Measuring Segregation via Analysis on Graphs 标题：基于图论分析的偏析测度链接：https://arxiv.org/abs/2112.10708

作者：Moon Duchin,James M. Murphy,Thomas Weighill 机构：TuftsUniversity, edu)†Department of Mathematics 备注：25 pages, 10 figures, 1 table 摘要：在本文中，我们使用图形分析来研究分离的定量度量。我们关注的是地理学和城市社会学文献中的一个经典统计数据，即莫兰I分数。我们的结果描述了I的极值行为，说明了底层图几何和度分布在解释分数中的重要作用。这就给用户带来了关于I在不同地区进行比较的有用性的警告。我们提出了一种新的随机游走解释I，通过扩散将测量的分离水平与方差减少率联系起来。对I的可解释性的关注导致我们提出了一个图上的H^1-范数，作为隔离的替代措施，使其能够与网络和高频和低频图傅里叶模式中的社区检测文献相联系。我们的方法概述了一个研究地理隔离的新计划，该计划由图形的时频分析驱动。我们用风格化的合成示例和从真实地理和人口数据中得出的图表来说明我们的理论结果。摘要：In this paper, we use analysis on graphs to study quantitative measures of segregation. We focus on a classical statistic from the geography and urban sociology literature known as Moran's I score. Our results characterizing the extremal behavior of I illustrate the important role of the underlying graph geometry and degree distribution in interpreting the score. This leads to caveats for users about the usefulness of I for making comparisons across different localities. We present a novel random walk interpretation of I, connecting the measured level of segregation to the rate of variance reduction via diffusion. Concerns about interpretability of I lead us to propose an H^1-norm on graphs as an alternative measure of segregation, enabling connections with the literature on community detection in networks and high- and low-frequency graph Fourier modes. Our methods outline a new program for the study of geographic segregation that is motivated by time-frequency analysis on graphs. We offer illustrations of our theoretical results with a mix of stylized synthetic examples and graphs derived from real geographic and demographic data.

【48】 Mathematical modelling, selection and hierarchical inference to determine the minimal dose in IFNα therapy against Myeloproliferative Neoplasms标题：确定干扰素-α-治疗骨髓增殖性肿瘤最小剂量的数学模型、选择和层次推理链接：https://arxiv.org/abs/2112.10688

作者：Gurvan Hermange,William Vainchenker,Isabelle Plo,Paul-Henry Cournède 机构：Universit´e Paris-Saclay, CentraleSup´elec, Laboratory of Mathematics and Informatics (MICS), Gif-sur-Yvette, France., INSERM U, (INSERM, Gustave Roussy, Universit´e Paris-Saclay), Villejuif, France, Gustave Roussy, Villejuif, France 备注：18 pages and 9 figures for the article, 20 additional pages for the Appendix 摘要：骨髓增生性肿瘤（MPN）是在获得造血干细胞的驱动突变后出现的血癌。这些血液系统恶性肿瘤会导致成熟血细胞的过度产生，如果不治疗，会诱发心血管事件和血栓形成的风险。聚乙二醇化IFN$\alpha$通常用于治疗MPN，但对于患者的处方剂量尚无明确的指南。我们应用模型选择程序，并运行分层贝叶斯推理方法来解释剂量变化如何影响对治疗的反应。我们推断IFN$\alpha$通过诱导突变干细胞分化为祖细胞而作用于突变干细胞，剂量越高，作用越大。我们发现，当达到足够的（患者依赖性）剂量时，治疗可以诱导长期缓解。我们在一组患者中确定了个体的最小剂量，并估计了给新患者以增加治愈机会的最合适起始剂量。摘要：Myeloproliferative Neoplasms (MPN) are blood cancers that appear after acquiring a driver mutation in a hematopoietic stem cell. These hematological malignancies result in the overproduction of mature blood cells and, if not treated, induce a risk of cardiovascular events and thrombosis. Pegylated IFN$\alpha$ is commonly used to treat MPN, but no clear guidelines exist concerning the dose prescribed to patients. We applied a model selection procedure and ran a hierarchical Bayesian inference method to decipher how dose variations impact the response to the therapy. We inferred that IFN$\alpha$ acts on mutated stem cells by inducing their differentiation into progenitor cells, the higher the dose, the higher the effect. We found that when a sufficient (patient-dependent) dose is reached, the treatment can induce a long-term remission. We determined this minimal dose for individuals in a cohort of patients and estimated the most suitable starting dose to give to a new patient to increase the chances of being cured.

【49】 Regularity based spectral clustering and mapping the Fiedler-carpet 标题：基于正则性的菲德勒地毯谱聚类与映射链接：https://arxiv.org/abs/2112.10637

作者：Marianna Bolla,Vilas Winstein,Tao You,Frank Seidl,Fatma Abdelkhalek 摘要：通过将谱聚类扩展到矩形阵列和差异最小化，从多个角度对谱聚类进行了讨论。通过奇异值分解和加权$k$均值算法得到了近似最优聚类。对于矩形阵列，这意味着通过聚类增强对应分析的方法，对于边加权图，则是基于规范化拉普拉斯的聚类。在后一种情况下，证明了当顶点代表的簇数为$2^{k-1}$时，归一化拉普拉斯矩阵的$（k-1）$th和$k$th最小正特征值之间的谱差导致簇内方差的突然减小，但仅第一个$k-1$特征向量，构成所谓的费德勒地毯，用于表示。讨论了有向迁移图的应用。摘要：Spectral clustering is discussed from many perspectives, by extending it to rectangular arrays and discrepancy minimization too. Near optimal clusters are obtained with singular value decomposition and with the weighted $k$-means algorithm. In case of rectangular arrays, this means enhancing the method of correspondence analysis with clustering, and in case of edge-weighted graphs, a normalized Laplacian based clustering. In the latter case it is proved that a spectral gap between the $(k-1)$th and $k$th smallest positive eigenvalues of the normalized Laplacian matrix gives rise to a sudden decrease of the inner cluster variances when the number of clusters of the vertex representatives is $2^{k-1}$, but only the first $k-1$ eigenvectors, constituting the so-called Fiedler-carpet, are used in the representation. Application to directed migration graphs is also discussed.

【50】 Turbo-Sim: a generalised generative model with a physical latent space 标题：Turbo-Sim：一种具有物理潜在空间的广义生成模型链接：https://arxiv.org/abs/2112.10629

作者：Guillaume Quétant,Mariia Drozdova,Vitaliy Kinakh,Tobias Golling,Slava Voloshynovkiy 机构：Department of Computer Science, University of Geneva, Carouge, Department of Particle Physics, University of Geneva, Genève 备注：8 pages, 2 figures, 1 table 摘要：我们介绍了turbosim，这是一个从信息论原理衍生出来的通用自动编码器框架，可以用作生成模型。通过最大化编码器和解码器的输入和输出之间的互信息，我们能够重新发现通常在对抗性自动编码器和生成性对抗性网络以及各种更复杂的相关模型中发现的损失项。我们的通用框架使这些模型在数学上可以解释，并通过分别设置每个损失项的权重，允许新模型的多样性。该框架还独立于编码器和解码器的固有架构，因此为整个网络的构建块留下了广泛的选择。我们将turbosim应用于对撞机的物理生成问题：几个粒子的特性在碰撞之后从理论空间转换到观测空间，在实验检测之后。摘要：We present Turbo-Sim, a generalised autoencoder framework derived from principles of information theory that can be used as a generative model. By maximising the mutual information between the input and the output of both the encoder and the decoder, we are able to rediscover the loss terms usually found in adversarial autoencoders and generative adversarial networks, as well as various more sophisticated related models. Our generalised framework makes these models mathematically interpretable and allows for a diversity of new ones by setting the weight of each loss term separately. The framework is also independent of the intrinsic architecture of the encoder and the decoder thus leaving a wide choice for the building blocks of the whole network. We apply Turbo-Sim to a collider physics generation problem: the transformation of the properties of several particles from a theory space, right after the collision, to an observation space, right after the detection in an experiment.

【51】 Multidimensional Projection Filters via Automatic Differentiation and Sparse-Grid Integration 标题：基于自动微分和稀疏网格积分的多维投影滤波链接：https://arxiv.org/abs/2112.10594

作者：Muhammad Fuady Emzir,Zheng Zhao,Simo Särkkä 机构：• We use automatic differentiation and sparse-grid integration to automate the con-, struction of the projection filter., • We present methods for constructing projection filters for multidimensional filtering, problems using a non-Gaussian parametric density. 摘要：投影滤波器是一种近似最优滤波问题的条件概率密度动力学的方法。在投影滤波器中，控制最优滤波密度演化的Kushner-Stratonovich随机偏微分方程被投影到参数密度的流形上，得到一个有限维随机微分方程。尽管投影滤波器具有捕获复杂概率密度的能力，但其实现（到目前为止）仅限于高斯族或一维滤波问题。本文考虑将数值积分和自动微分相结合来构造更一般问题的投影滤波器。我们对指数族流形的这种组合作了详细的阐述。我们通过数值实验表明，与基于有限差分的Zakai滤波器和粒子滤波器相比，该方法可以保持相当精确的滤波密度近似，同时需要相对较少的正交点。摘要：The projection filter is a method for approximating the dynamics of conditional probability densities of optimal filtering problems. In projection filters, the Kushner--Stratonovich stochastic partial differential equation governing the evolution of the optimal filtering density is projected to a manifold of parametric densities, yielding a finite-dimensional stochastic differential equation. Despite its capability of capturing complex probability densities, the implementations of projection filters are (so far) restricted to either the Gaussian family or unidimensional filtering problems. This paper considers a combination of numerical integration and automatic differentiation to construct projection filters for more general problems. We give a detailed exposition about this combination for the manifold of the exponential family. We show via numerical experiments that this approach can maintain a fairly accurate approximation of the filtering density compared to the finite-difference based Zakai filter and a particle filter while requiring a relatively low number of quadrature points.

【52】 Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization 标题：基于状态守恒策略优化的过渡动态鲁棒抗扰策略学习链接：https://arxiv.org/abs/2112.10513

作者：Yufei Kuang,Miao Lu,Jie Wang,Qi Zhou,Bin Li,Houqiang Li 机构：CAS Key Laboratory of Technology in GIPAS, University of Science and Technology of China, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center 备注：Accepted to AAAI 2022 摘要：由于源环境和目标环境之间的差异，深度强化学习算法在实际任务中的性能较差。这种差异通常被视为过渡动力学中的扰动。现有的许多算法通过对干扰进行建模并在训练期间将其应用于源环境来学习鲁棒策略，这通常需要事先了解模拟器的干扰和控制。然而，当目标环境的干扰未知或难以在模拟器中建模时，这些算法可能会失败。为了解决这个问题，我们提出了一种新的无模型参与者-批评家算法——状态保守策略优化（SCPO）——在不预先建模干扰的情况下学习鲁棒策略。具体地说，SCPO将过渡动力学中的扰动降低到状态空间中的扰动，然后通过一个简单的基于梯度的正则化器对其进行逼近。SCPO的吸引人的特点包括：它易于实现，不需要额外的干扰知识或专门设计的模拟器。在多个机器人控制任务中的实验表明，SCPO能够针对过渡动力学中的干扰学习鲁棒策略。摘要：Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators. However, these algorithms can fail in scenarios where the disturbance from target environments is unknown or is intractable to model in simulators. To tackle this problem, we propose a novel model-free actor-critic algorithm -- namely, state-conservative policy optimization (SCPO) -- to learn robust policies without modeling the disturbance in advance. Specifically, SCPO reduces the disturbance in transition dynamics to that in state space and then approximates it by a simple gradient-based regularizer. The appealing features of SCPO include that it is simple to implement and does not require additional knowledge about the disturbance or specially designed simulators. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.

【53】 Transformers Can Do Bayesian Inference 标题：Transformer可以做贝叶斯推理链接：https://arxiv.org/abs/2112.10510

作者：Samuel Müller,Noah Hollmann,Sebastian Pineda Arango,Josif Grabocka,Frank Hutter 机构：University of Freiburg,Charit´e Berlin,Bosch Center for Artificial Intelligence 摘要：目前，贝叶斯方法很难获得深度学习的好处，因为贝叶斯方法允许明确说明先验知识并准确捕获模型不确定性。我们提出了先前的数据拟合网络（PFN）。PFN利用大规模机器学习技术来逼近一大组后验概率。PFNs工作的唯一要求是能够从监督学习任务（或函数）的先验分布中采样。我们的方法将后验近似的目标重申为具有集值输入的监督分类问题：它重复地从前一个中提取任务（或函数），从中提取一组数据点及其标签，屏蔽其中一个标签，并学习根据其余数据点的集值输入对其进行概率预测。以一组来自新的有监督学习任务的样本作为输入，PFN在学习近似贝叶斯推理的基础上，对单个正向传播中的任意其他数据点进行概率预测。我们证明了PFNs可以近乎完美地模拟高斯过程，也可以对难以解决的问题进行有效的贝叶斯推理，与现有方法相比，在多个设置中的加速比超过200倍。我们在非常不同的领域获得了强有力的结果，如高斯过程回归、贝叶斯神经网络、小表格数据集分类和Few-Shot图像分类，证明了PFNs的通用性。代码和经过训练的PFN在https://github.com/automl/TransformersCanDoBayesianInference. 摘要：Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at https://github.com/automl/TransformersCanDoBayesianInference.

【54】 Community detection and reciprocity in networks by jointly modeling pairs of edges 标题：通过联合建模边对实现网络中的社区检测和互易性链接：https://arxiv.org/abs/2112.10436

作者：Martina Contisciani,Hadiseh Safdari,Caterina De Bacco 机构：Max Planck Institute for Intelligent Systems, Cyber Valley, Tuebingen , Germany 摘要：我们提出了一个概率生成模型和一个有效的算法来执行社区检测和捕获网络中的互惠性。我们的方法使用精确的2边联合分布联合建模边对。此外，它还为边际分布和条件分布提供了封闭形式的分析表达式。我们在恢复社区、边缘预测任务和生成复制真实网络中观察到的互易值的合成网络方面验证了我们的合成数据模型。我们还强调了两个真实数据集上的这些发现，这两个数据集与社会科学家和行为生态学家相关。我们的方法克服了标准算法和通过伪似然近似结合互易性的最新模型的局限性。我们在线提供代码的开源实现。摘要：We present a probabilistic generative model and an efficient algorithm to both perform community detection and capture reciprocity in networks. Our approach jointly models pairs of edges with exact 2-edge joint distributions. In addition, it provides closed-form analytical expressions for both marginal and conditional distributions. We validate our model on synthetic data in recovering communities, edge prediction tasks, and generating synthetic networks that replicate the reciprocity values observed in real networks. We also highlight these findings on two real datasets that are relevant for social scientists and behavioral ecologists. Our method overcomes the limitations of both standard algorithms and recent models that incorporate reciprocity through a pseudo-likelihood approximation. We provide an open-source implementation of the code online.

【55】 How to estimate the memory of the Elephant Random Walk 标题：如何估计“大象漫步”的记忆链接：https://arxiv.org/abs/2112.10405

作者：Bernard Bercu,Lucile Laulin 摘要：我们介绍了一种原始的方法来估计大象随机游动的记忆参数，这是一种有趣的离散时间整数随机游动，具有完整的历史记忆。我们的估计是基于对数似然函数的二阶泰勒近似的拟极大似然估计。我们证明了我们的估计在扩散区、临界区和超扩散区几乎肯定收敛。我们的统计过程的局部渐近正态性是在扩散区建立的，而局部渐近混合正态性是在超扩散区证明的。还提供了渐近和精确置信区间以及统计检验。我们所有的分析都依赖于鞅和相关的二次变化的渐近结果。摘要：We introduce an original way to estimate the memory parameter of the elephant random walk, a fascinating discrete time random walk on integers having a complete memory of its entire history. Our estimator is nothing more than a quasi-maximum likelihood estimator, based on a second order Taylor approximation of the log-likelihood function. We show the almost sure convergence of our estimate in the diffusive, critical and superdiffusive regimes. The local asymptotic normality of our statistical procedure is established in the diffusive regime, while the local asymptotic mixed normality is proven in the superdiffusive regime. Asymptotic and exact confidence intervals as well as statistical tests are also provided. All our analysis relies on asymptotic results for martingales and the quadratic variations associated.

【56】 Classifier Calibration: How to assess and improve predicted class probabilities: a survey 标题：分类器校准：如何评估和改进预测类概率：一项调查链接：https://arxiv.org/abs/2112.10327

作者：Telmo Silva Filho,Hao Song,Miquel Perello-Nieto,Raul Santos-Rodriguez,Meelis Kull,Peter Flach 机构： and Peter, Flach, Department of Statistics, Federal University of Para´ıba, Cidade, Universit´aria, Jo˜ao Pessoa,.,-, Para´ıba, Brazil., Intelligent Systems Laboratory, University of Bristol, Merchant, Venturers Building, Woodland Road, Bristol, BS,UB, United 摘要：本文介绍并详细概述了分类器校准的原理和实践。经过良好校准的分类器可正确量化与其实例预测相关的不确定性或置信度水平。这对于关键应用程序、最佳决策、成本敏感的分类以及某些类型的上下文更改是至关重要的。校准研究有着丰富的历史，比机器学习作为一个学术领域的诞生早了几十年。然而，最近对校准的兴趣增加，导致了新的方法和从二进制扩展到多类设置。选择的空间和要考虑的问题是很大的，并且导航它需要正确的概念和工具集。我们提供主要概念和方法的介绍性材料和最新技术细节，包括适当的评分规则和其他评估指标、可视化方法、二元和多类分类的事后校准方法的综合说明，以及几个高级主题。摘要：This paper provides both an introduction to and a detailed overview of the principles and practice of classifier calibration. A well-calibrated classifier correctly quantifies the level of uncertainty or confidence associated with its instance-wise predictions. This is essential for critical applications, optimal decision making, cost-sensitive classification, and for some types of context change. Calibration research has a rich history which predates the birth of machine learning as an academic field by decades. However, a recent increase in the interest on calibration has led to new methods and the extension from binary to the multiclass setting. The space of options and issues to consider is large, and navigating it requires the right set of concepts and tools. We provide both introductory material and up-to-date technical details of the main concepts and methods, including proper scoring rules and other evaluation metrics, visualisation approaches, a comprehensive account of post-hoc calibration methods for binary and multiclass classification, and several advanced topics.

【57】 Balancing Adaptability and Non-exploitability in Repeated Games 标题：重复博弈中适应性与不可利用性的平衡链接：https://arxiv.org/abs/2112.10314

作者：Anthony DiGiovanni,Ambuj Tewari 机构：Department of Statistics, University of Michigan-Ann Arbor 备注：25 pages, 4 figures 摘要：我们研究的问题，保证低后悔在重复游戏中对一个对手未知的成员在几个类之一。我们增加了一个约束条件，即我们的算法是不可利用的，因为对手缺乏使用算法的动机，我们无法获得超过某些“公平”价值的奖励。我们的解决方案是一个专家算法（LAFF），它在一组子算法中搜索，这些子算法对于每个对手类别都是最优的，并且在检测到对手利用的证据时使用惩罚策略。利用依赖于对手类别的基准测试，我们证明了LAFF对可能的对手具有一致的次线性后悔，但剥削对手除外，我们保证对手具有线性后悔。据我们所知，这项工作首次为多智能体学习中的后悔和不可利用性提供了保证。摘要：We study the problem of guaranteeing low regret in repeated games against an opponent with unknown membership in one of several classes. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some "fair" value. Our solution is an expert algorithm (LAFF) that searches within a set of sub-algorithms that are optimal for each opponent class and uses a punishment policy upon detecting evidence of exploitation by the opponent. With benchmarks that depend on the opponent class, we show that LAFF has sublinear regret uniformly over the possible opponents, except exploitative ones, for which we guarantee that the opponent has linear regret. To our knowledge, this work is the first to provide guarantees for both regret and non-exploitability in multi-agent learning.

【58】 Estimating Causal Effects of Multi-Aspect Online Reviews with Multi-Modal Proxies 标题：用多模态指标估计多方面在线评论的因果效应链接：https://arxiv.org/abs/2112.10274

作者：Lu Cheng,Ruocheng Guo,Huan Liu 机构： School of Computing and Augmented Intelligence, Arizona State University, USA, School of Data Science, City University of Hong Kong, China 备注：10 pages, 6 figures, accepted to WSDM22 摘要：在线评论使消费者能够与公司接触并提供重要反馈。由于高维文本的复杂性，这些评论通常被简化为单个数字分数，例如评级或情绪分数。这项工作经验性地考察了用户产生的在线评论在粒度水平上的因果效应：我们考虑多个方面，例如餐厅的食物和服务。了解消费者对不同方面的意见有助于详细评估业务绩效并有效制定业务运营战略。具体来说，我们的目标是回答一些介入性问题，如如果w.r.t.的优质服务增加10%，餐厅的受欢迎程度会如何？使用观察数据进行因果推断的决定性挑战是“混杂因素”的存在，这可能无法观察或测量，例如，消费者对食物类型的偏好，导致估计的影响有偏差和高方差。为了应对这一挑战，我们求助于多模式代理，如消费者档案信息和消费者与企业之间的互动。我们展示了如何有效地利用丰富的信息来识别和估计在线评论中多个方面的因果效应。对合成数据和真实数据的实证评估证实了该方法的有效性，并阐明了该方法的可行性。摘要：Online reviews enable consumers to engage with companies and provide important feedback. Due to the complexity of the high-dimensional text, these reviews are often simplified as a single numerical score, e.g., ratings or sentiment scores. This work empirically examines the causal effects of user-generated online reviews on a granular level: we consider multiple aspects, e.g., the Food and Service of a restaurant. Understanding consumers' opinions toward different aspects can help evaluate business performance in detail and strategize business operations effectively. Specifically, we aim to answer interventional questions such as What will the restaurant popularity be if the quality w.r.t. its aspect Service is increased by 10%? The defining challenge of causal inference with observational data is the presence of "confounder", which might not be observed or measured, e.g., consumers' preference to food type, rendering the estimated effects biased and high-variance. To address this challenge, we have recourse to the multi-modal proxies such as the consumer profile information and interactions between consumers and businesses. We show how to effectively leverage the rich information to identify and estimate causal effects of multiple aspects embedded in online reviews. Empirical evaluations on synthetic and real-world data corroborate the efficacy and shed light on the actionable insight of the proposed approach.

【59】 Exploration-exploitation trade-off for continuous-time episodic reinforcement learning with linear-convex models 标题：基于线性-凸模型的连续时间分段强化学习的探索开发权衡链接：https://arxiv.org/abs/2112.10264

作者：Lukasz Szpruch,Tanut Treetanthiploet,Yufei Zhang 机构：uk†Alan Turing Institute, uk‡Department of Statistics 摘要：我们开发了一个概率框架，用于分析情景环境中基于模型的强化学习。然后，我们将其应用于研究具有线性动力学但系数未知和凸但可能不规则目标函数的有限时域随机控制问题。利用概率表示法，我们研究了相关成本函数的规律性，并建立了应用由估计模型参数和真实模型参数导出的最优反馈控制之间性能差距的精确估计。我们确定了该性能差距为二次的条件，改善了最近工作中的线性性能差距[X.Guo，A.Hu和Y.Zhang，arXiv预印本，arXiv:2104.09311，（2021）]，这与随机线性二次问题的结果相匹配。接下来，我们提出了一种基于阶段的学习算法，展示了如何在高概率和高期望下优化勘探开发权衡并实现次线性遗憾。当二次性能差距所需的假设成立时，该算法在一般情况下实现了$\mathcal{O}（\sqrt{N}\ln N）阶高概率遗憾，在自我探索情况下实现了$\mathcal{O}（（\ln N）^2）阶预期遗憾，超过$N$集，与文献中的最佳可能结果相匹配。该分析需要我们推导的相关连续时间观测的新浓度不等式。摘要：We develop a probabilistic framework for analysing model-based reinforcement learning in the episodic setting. We then apply it to study finite-time horizon stochastic control problems with linear dynamics but unknown coefficients and convex, but possibly irregular, objective function. Using probabilistic representations, we study regularity of the associated cost functions and establish precise estimates for the performance gap between applying optimal feedback control derived from estimated and true model parameters. We identify conditions under which this performance gap is quadratic, improving the linear performance gap in recent work [X. Guo, A. Hu, and Y. Zhang, arXiv preprint, arXiv:2104.09311, (2021)], which matches the results obtained for stochastic linear-quadratic problems. Next, we propose a phase-based learning algorithm for which we show how to optimise exploration-exploitation trade-off and achieve sublinear regrets in high probability and expectation. When assumptions needed for the quadratic performance gap hold, the algorithm achieves an order $\mathcal{O}(\sqrt{N} \ln N)$ high probability regret, in the general case, and an order $\mathcal{O}((\ln N)^2)$ expected regret, in self-exploration case, over $N$ episodes, matching the best possible results from the literature. The analysis requires novel concentration inequalities for correlated continuous-time observations, which we derive.

【60】 On Causal Inference for Data-free Structured Pruning 标题：关于无数据结构化剪枝的因果推理链接：https://arxiv.org/abs/2112.10229

作者：Martin Ferianc,Anush Sankaran,Olivier Mastropietro,Ehsan Saboori,Quentin Cappart 机构：Department of Electronic and Electrical Engineering, University College London, London, UK WC,E ,JE, Department of Computer Engineering and Software Engineering, Polytechnique Montr´eal, Montreal, QC, Canada H,T ,J 备注：Accepted to ITCI'22: The AAAI-22 Workshop on Information-Theoretic Methods for Causal Inference and Discovery 摘要：神经网络（NNs）正在对研究和工业产生巨大影响。然而，随着NNs精度的提高，其规模、所需计算操作数和能耗也随之增加。资源消耗的增加导致NNs采用率的降低和实际部署的不切实际。因此，需要对NNs进行压缩，使其可供更广泛的受众使用，同时降低其运行时成本。在这项工作中，我们从因果推理的角度来处理这一挑战，并提出了一种评分机制来促进NNs的结构化修剪。该方法基于在最大熵扰动下测量互信息，通过神经网络顺序传播。我们在两个数据集和不同规模的神经网络上展示了该方法的性能，并且我们证明了我们的方法在具有挑战性的条件下取得了有竞争力的性能。摘要：Neural networks (NNs) are making a large impact both on research and industry. Nevertheless, as NNs' accuracy increases, it is followed by an expansion in their size, required number of compute operations and energy consumption. Increase in resource consumption results in NNs' reduced adoption rate and real-world deployment impracticality. Therefore, NNs need to be compressed to make them available to a wider audience and at the same time decrease their runtime costs. In this work, we approach this challenge from a causal inference perspective, and we propose a scoring mechanism to facilitate structured pruning of NNs. The approach is based on measuring mutual information under a maximum entropy perturbation, sequentially propagated through the NN. We demonstrate the method's performance on two datasets and various NNs' sizes, and we show that our approach achieves competitive performance under challenging conditions.

【61】 Rethinking Importance Weighting for Transfer Learning 标题：对迁移学习重要性权重的再思考链接：https://arxiv.org/abs/2112.10157

作者：Nan Lu,Tianyi Zhang,Tongtong Fang,Takeshi Teshima,Masashi Sugiyama 机构：jp‡The University of Tokyo 摘要：监督学习的一个关键假设是训练和测试数据遵循相同的概率分布。然而，这一基本假设在实践中并不总是得到满足，例如，由于环境变化、样本选择偏差、隐私问题或高标签成本。迁移学习（Transfer learning，TL）放松了这一假设，允许我们在分布转移的情况下学习。经典的TL方法通常依赖于重要性加权——根据重要性加权的训练损失（即训练密度比测试）来训练预测值。然而，随着现实世界中的机器学习任务变得越来越复杂、高维和动态，最近人们探索了新的方法来应对这些挑战。在介绍基于重要性加权的TL的基础上，我们回顾了基于联合和动态重要性预估的最新进展。此外，我们还介绍了一种因果机制转换的方法，该方法将因果结构融入到翻译目的语中。最后，我们讨论了翻译目的语研究的未来前景。摘要：A key assumption in supervised learning is that training and test data follow the same probability distribution. However, this fundamental assumption is not always satisfied in practice, e.g., due to changing environments, sample selection bias, privacy concerns, or high labeling costs. Transfer learning (TL) relaxes this assumption and allows us to learn under distribution shift. Classical TL methods typically rely on importance-weighting -- a predictor is trained based on the training losses weighted according to the importance (i.e., the test-over-training density ratio). However, as real-world machine learning tasks are becoming increasingly complex, high-dimensional, and dynamical, novel approaches are explored to cope with such challenges recently. In this article, after introducing the foundation of TL based on importance-weighting, we review recent advances based on joint and dynamic importance-predictor estimation. Furthermore, we introduce a method of causal mechanism transfer that incorporates causal structure in TL. Finally, we discuss future perspectives of TL research.

【62】 Wasserstein Generative Learning of Conditional Distribution 标题：条件分布的Wasserstein生成学习链接：https://arxiv.org/abs/2112.10039

作者：Shiao Liu,Xingyu Zhou,Yuling Jiao,Jian Huang 备注：34 pages, 8 figures 摘要：条件分布是描述响应和预测之间关系的基本量。我们提出了一种学习条件分布的Wasserstein生成方法。该方法使用条件生成器将已知分布转换为目标条件分布。通过匹配包含条件生成器的联合分布和目标联合分布，使用Wasserstein距离作为这些联合分布的差异度量，来估计条件生成器。我们建立了由该方法生成的条件抽样分布的非渐近误差界，并证明了在假设数据分布在低维集合上的情况下，该方法能够减轻维数灾难。我们进行了数值实验来验证所提出的方法，并说明了它在条件样本生成、非参数条件密度估计、预测不确定性量化、二元响应数据、图像重建和图像生成中的应用。摘要：Conditional distribution is a fundamental quantity for describing the relationship between a response and a predictor. We propose a Wasserstein generative approach to learning a conditional distribution. The proposed approach uses a conditional generator to transform a known distribution to the target conditional distribution. The conditional generator is estimated by matching a joint distribution involving the conditional generator and the target joint distribution, using the Wasserstein distance as the discrepancy measure for these joint distributions. We establish non-asymptotic error bound of the conditional sampling distribution generated by the proposed method and show that it is able to mitigate the curse of dimensionality, assuming that the data distribution is supported on a lower-dimensional set. We conduct numerical experiments to validate proposed method and illustrate its applications to conditional sample generation, nonparametric conditional density estimation, prediction uncertainty quantification, bivariate response data, image reconstruction and image generation.

【63】 Weisfeiler and Leman go Machine Learning: The Story so far 标题：魏斯费勒和莱曼围棋机器学习：到目前为止的故事链接：https://arxiv.org/abs/2112.09992

作者：Christopher Morris,Yaron Lipman,Haggai Maron,Bastian Rieck,Nils M. Kriege,Martin Grohe,Matthias Fey,Karsten Borgwardt 机构：McGill University and Mila – Quebec AI Institute, Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, NVIDIA Research, AIDOS Lab, Institute of AI for Health, Helmholtz Zentrum München, University of Vienna, Vienna 摘要：近年来，基于Weisfeiler-Leman算法的算法和神经结构（Weisfeiler-Leman算法是解决图同构问题的一种著名的启发式算法）成为利用图和关系数据进行机器学习的有力工具。在这里，我们对该算法在机器学习环境中的应用进行了全面概述，重点介绍了监督机制。我们讨论了理论背景，展示了如何将其用于有监督的图和节点表示学习，讨论了最近的扩展，并概述了该算法与（置换）等变神经结构的联系。此外，我们概述了当前的应用和未来的方向，以促进进一步的研究。摘要：In recent years, algorithms and neural architectures based on the Weisfeiler-Leman algorithm, a well-known heuristic for the graph isomorphism problem, emerged as a powerful tool for machine learning with graphs and relational data. Here, we give a comprehensive overview of the algorithm's use in a machine learning setting, focusing on the supervised regime. We discuss the theoretical background, show how to use it for supervised graph- and node representation learning, discuss recent extensions, and outline the algorithm's connection to (permutation-)equivariant neural architectures. Moreover, we give an overview of current applications and future directions to stimulate further research.

【64】 Syntactic-GCN Bert based Chinese Event Extraction 标题：基于句法-GCN-BERT的中文事件抽取链接：https://arxiv.org/abs/2112.09939

作者：Jiangwei Liu,Jingshu Zhang,Xiaohong Huang,Liangyu Min 机构：School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai , China 备注：9 pages, 4 figures, 3 tables. arXiv admin note: text overlap with arXiv:2111.03212 摘要：随着信息技术的快速发展，在线平台（如新闻门户和社交媒体）每时每刻都会产生大量的网络信息。因此，从社会流中提取事件的结构化表示至关重要。通常，现有的事件提取研究利用模式匹配、机器学习或深度学习方法来执行事件提取任务。然而，由于汉语的独特性，汉语事件抽取的性能不如英语。本文提出了一个完整的中文事件提取框架。该方法是一个多通道输入的神经网络框架，集成了语义特征和句法特征。语义特征由BERT体系结构捕获。词性（POS）特征和依赖解析（DP）特征分别通过分析嵌入和图卷积网络（GCN）捕获。我们还将在真实数据集上评估我们的模型。实验结果表明，该方法的性能明显优于基准方法。摘要：With the rapid development of information technology, online platforms (e.g., news portals and social media) generate enormous web information every moment. Therefore, it is crucial to extract structured representations of events from social streams. Generally, existing event extraction research utilizes pattern matching, machine learning, or deep learning methods to perform event extraction tasks. However, the performance of Chinese event extraction is not as good as English due to the unique characteristics of the Chinese language. In this paper, we propose an integrated framework to perform Chinese event extraction. The proposed approach is a multiple channel input neural framework that integrates semantic features and syntactic features. The semantic features are captured by BERT architecture. The Part of Speech (POS) features and Dependency Parsing (DP) features are captured by profiling embeddings and Graph Convolutional Network (GCN), respectively. We also evaluate our model on a real-world dataset. Experimental results show that the proposed method outperforms the benchmark approaches significantly.

【65】 Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better 标题：联合动态稀疏训练：计算更少，交流更少，学习更好链接：https://arxiv.org/abs/2112.09824

作者：Sameer Bibikar,Haris Vikalo,Zhangyang Wang,Xiaohan Chen 机构：Department of Electrical and Computer Engineering, The University of Texas at Austin 摘要：联邦学习（FL）支持将机器学习工作负载从云分发到资源有限的边缘设备。不幸的是，当前的深度网络不仅计算量太大，无法在边缘设备上进行推理和训练，而且对于在带宽受限的网络上进行更新通信来说也太大。在本文中，我们开发、实现并实验验证了一种称为联邦动态稀疏训练（FedDST）的新型FL框架，通过该框架可以部署和训练复杂的神经网络，大大提高了设备计算和网络通信的效率。FedDST的核心是从目标完整网络中提取和训练稀疏子网络的动态过程。在这个方案中，“一举两得”：每个客户端对自己的稀疏网络进行有效的训练，而不是完整的模型，并且只有稀疏网络在设备和云之间传输。此外，我们的结果表明，与固定的共享稀疏掩码相比，FL训练期间的动态稀疏性更灵活地适应FL代理中的局部异质性。此外，动态稀疏自然地将“及时自我感知效应”引入到训练动态中，并改善FL性能，即使在密集训练中也是如此。在现实且具有挑战性的非i.i.d.FL设置中，FedDST在我们的实验中始终优于竞争算法：例如，在非iid CIFAR-10上的任何固定上传数据上限下，在给定相同上传数据上限时，它比FedAvgM获得了10%的令人印象深刻的准确性优势；即使给FedAvgM 2倍上传数据上限，精度差距仍保持在3%，进一步证明了FedDST的有效性。代码可从以下网址获取：https://github.com/bibikar/feddst. 摘要：Federated learning (FL) enables distribution of machine learning workloads from the cloud to resource-limited edge devices. Unfortunately, current deep networks remain not only too compute-heavy for inference and training on edge devices, but also too large for communicating updates over bandwidth-constrained networks. In this paper, we develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (FedDST) by which complex neural networks can be deployed and trained with substantially improved efficiency in both on-device computation and in-network communication. At the core of FedDST is a dynamic process that extracts and trains sparse sub-networks from the target full network. With this scheme, "two birds are killed with one stone:" instead of full models, each client performs efficient training of its own sparse networks, and only sparse networks are transmitted between devices and the cloud. Furthermore, our results reveal that the dynamic sparsity during FL training more flexibly accommodates local heterogeneity in FL agents than the fixed, shared sparse masks. Moreover, dynamic sparsity naturally introduces an "in-time self-ensembling effect" into the training dynamics and improves the FL performance even over dense training. In a realistic and challenging non i.i.d. FL setting, FedDST consistently outperforms competing algorithms in our experiments: for instance, at any fixed upload data cap on non-iid CIFAR-10, it gains an impressive accuracy advantage of 10% over FedAvgM when given the same upload data cap; the accuracy gap remains 3% even when FedAvgM is given 2x the upload data cap, further demonstrating efficacy of FedDST. Code is available at: https://github.com/bibikar/feddst.

【66】 GPEX, A Framework For Interpreting Artificial Neural Networks 标题：GPEX，一个解释人工神经网络的框架链接：https://arxiv.org/abs/2112.09820

作者：Amir Akbarnejad,Gilbert Bigras,Nilanjan Ray 机构：University of Alberta 摘要：机器学习研究人员长期以来一直注意到可解释性和预测性能之间的权衡。一方面，传统的模型通常是可以解释给人类的，但是它们不能达到很高的预测性能。相反，深度模型可以在许多任务中实现最先进的性能。然而，深部模型的预测是人类无法理解的。在本文中，我们提出了一个框架，缩短了上述两组方法之间的差距。给定一个人工神经网络（ANN），我们的方法发现一个高斯过程（GP），其预测几乎和ANN的预测相匹配。由于GPs具有高度的可解释性，我们使用经过训练的GP来解释ANN的决策。我们使用我们的方法来解释ANN对五月数据集的决定。这些解释提供了有关ANN决策的有趣见解。据我们所知，我们的GPs推理公式是第一个自然出现ANN和类似行为的高斯过程的公式。此外，我们还研究了一些已知的理论条件，在这些条件下，人工神经网络可以被GPs解释。其中一些理论条件对现代建筑来说过于严格。然而，我们假设这些理论条件中只有一部分是充分的。最后，我们将我们的框架实现为一个名为GPEX的公共可用工具。给定任何Pytork前馈模块，GPEX允许用户轻松地解释模块的任何ANN子组件，而无需参与推理算法。GPEX可在线公开获取：www.github。com/Nilanjan-Ray/gpex 摘要：Machine learning researchers have long noted a trade-off between interpretability and prediction performance. On the one hand, traditional models are often interpretable to humans but they cannot achieve high prediction performances. At the opposite end of the spectrum, deep models can achieve state-of-the-art performances in many tasks. However, deep models' predictions are known to be uninterpretable to humans. In this paper we present a framework that shortens the gap between the two aforementioned groups of methods. Given an artificial neural network (ANN), our method finds a Gaussian process (GP) whose predictions almost match those of the ANN. As GPs are highly interpretable, we use the trained GP to explain the ANN's decisions. We use our method to explain ANNs' decisions on may datasets. The explanations provide intriguing insights about the ANNs' decisions. With the best of our knowledge, our inference formulation for GPs is the first one in which an ANN and a similarly behaving Gaussian process naturally appear. Furthermore, we examine some of the known theoretical conditions under which an ANN is interpretable by GPs. Some of those theoretical conditions are too restrictive for modern architectures. However, we hypothesize that only a subset of those theoretical conditions are sufficient. Finally, we implement our framework as a publicly available tool called GPEX. Given any pytorch feed-forward module, GPEX allows users to interpret any ANN subcomponent of the module effortlessly and without having to be involved in the inference algorithm. GPEX is publicly available online:www.github.com/Nilanjan-Ray/gpex

【67】 AutoTransfer: Subject Transfer Learning with Censored Representations on Biosignals Data 标题：AutoTransfer：生物信号数据删失表征下的主题迁移学习链接：https://arxiv.org/abs/2112.09796

作者：Niklas Smedemark-Margulies,Ye Wang,Toshiaki Koike-Akino,Deniz Erdogmus 摘要：我们为主题迁移学习提供了一个正则化框架，其中我们试图训练编码器和分类器，以最小化分类损失，并对潜在表示和主题标签之间的独立性进行惩罚。我们介绍了独立性的三个概念和相应的惩罚条款，使用互信息或分歧作为独立性的代理。对于每个惩罚项，我们使用分析方法和神经批评函数提供了几种具体的估计算法。我们提供了一种不干涉策略，用于将这种不同的正则化算法家族应用于一个新的数据集，我们称之为“自动转移”。我们在EEG、EMG和ECoG数据集上评估了这些个体正则化策略和我们的自转移方法的性能，表明这些方法可以改善具有挑战性的真实数据集的主题转移学习。摘要：We provide a regularization framework for subject transfer learning in which we seek to train an encoder and classifier to minimize classification loss, subject to a penalty measuring independence between the latent representation and the subject label. We introduce three notions of independence and corresponding penalty terms using mutual information or divergence as a proxy for independence. For each penalty term, we provide several concrete estimation algorithms, using analytic methods as well as neural critic functions. We provide a hands-off strategy for applying this diverse family of regularization algorithms to a new dataset, which we call "AutoTransfer". We evaluate the performance of these individual regularization strategies and our AutoTransfer method on EEG, EMG, and ECoG datasets, showing that these approaches can improve subject transfer learning for challenging real-world datasets.

【68】 Heavy-tailed denoising score matching 标题：重尾去噪得分匹配链接：https://arxiv.org/abs/2112.09788

作者：Jacob Deasy,Nikola Simidjievski,Pietro Liò 机构： 1Department of Computer Science and Technology, Universityof Cambridge 摘要：在过去几年中，基于分数的模型研究已经通过高斯去噪分数匹配（DSM）产生了最先进的生成模型。然而，高斯噪声假设有几个高维限制，促使未来更具体地进行更高维的PDF估计。在将该理论推广到更广泛的噪声分布族——即广义正态分布之前，我们概述了这一局限性。为了从理论上证明这一点，我们放松了（去噪）分数匹配理论中的一个关键假设，证明了几乎处处可微的分布允许与高斯分布相同的客观简化。对于噪声向量长度分布，我们证明了在深度学习中普遍存在的高维空间中，度量的有利集中。在此过程中，我们发现了一个扭曲的噪声向量长度分布，并开发了一个迭代噪声缩放算法，以一致地初始化退火Langevin动力学中的多个噪声级别。在实践方面，我们使用重尾DSM改进了分数估计、可控的采样收敛性，以及对不平衡数据集更平衡的无条件生成性能。摘要：Score-based model research in the last few years has produced state of the art generative models by employing Gaussian denoising score-matching (DSM). However, the Gaussian noise assumption has several high-dimensional limitations, motivating a more concrete route toward even higher dimension PDF estimation in future. We outline this limitation, before extending the theory to a broader family of noising distributions -- namely, the generalised normal distribution. To theoretically ground this, we relax a key assumption in (denoising) score matching theory, demonstrating that distributions which are differentiable \textit{almost everywhere} permit the same objective simplification as Gaussians. For noise vector length distributions, we demonstrate favourable concentration of measure in the high-dimensional spaces prevalent in deep learning. In the process, we uncover a skewed noise vector length distribution and develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in annealed Langevin dynamics. On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.

【69】 Neurashed: A Phenomenological Model for Imitating Deep Learning Training 标题：Neurash：一种模仿深度学习训练的现象学模型链接：https://arxiv.org/abs/2112.09741

作者：Weijie J. Su 备注：8 pages 摘要：为了在未来十年推进深度学习方法，需要一个关于现代神经网络推理的理论框架。尽管人们越来越多地试图揭开深度学习为何如此有效的神秘面纱，但仍然缺乏一个全面的图景，表明更好的理论是可能的。我们认为，未来的深度学习理论应该继承三个特征：层次结构的网络结构、使用基于随机梯度的方法优化的参数、以及来自数据的信息。作为一个实例，我们将这些特性集成到一个名为\textit{neurashed}的图形模型中。该模型有效地解释了深度学习中一些常见的经验模式。特别是，neurashed能够深入了解隐式正则化、信息瓶颈和局部弹性。最后，我们讨论了neurashed如何指导深度学习理论的发展。摘要：To advance deep learning methodologies in the next decade, a theoretical framework for reasoning about modern neural networks is needed. While efforts are increasing toward demystifying why deep learning is so effective, a comprehensive picture remains lacking, suggesting that a better theory is possible. We argue that a future deep learning theory should inherit three characteristics: a \textit{hierarchically} structured network architecture, parameters \textit{iteratively} optimized using stochastic gradient-based methods, and information from the data that evolves \textit{compressively}. As an instantiation, we integrate these characteristics into a graphical model called \textit{neurashed}. This model effectively explains some common empirical patterns in deep learning. In particular, neurashed enables insights into implicit regularization, information bottleneck, and local elasticity. Finally, we discuss how neurashed can guide the development of deep learning theories.

【70】 Game-theoretic Formulations of Sequential Nonparametric One- and Two-Sample Tests 标题：序贯非参数单样和双样试验的博弈论公式链接：https://arxiv.org/abs/2112.09162

作者：Shubhanshu Shekhar,Aaditya Ramdas 机构：Department of Statistics and Data Science, Carnegie Mellon University, Department of Machine Learning, Carnegie Mellon University 备注：56 pages, 7 figures 摘要：我们研究在非参数环境下设计一致的连续一个和两个样本检验的问题。在博彩测试原则的指导下，我们将构建序列测试的任务重新构造为选择使虚拟投注者财富最大化的支付函数，在重复博弈中对空投注。当投注者的财富过程超过适当的阈值时，由此产生的序列测试拒绝空值。我们提出了一种选择支付函数作为与某些统计距离度量的变分表示相关的\emph{witness function}的可预测估计的一般策略，如积分概率度量（IPMs）和$\varphi$-发散。总的来说，这种方法确保（i）财富过程在空值下是一个非负鞅，从而允许对i型误差进行严格控制，（ii）在替代方案下几乎肯定会增长到无穷大，从而意味着一致性。我们通过设计复合e-过程来实现这一点，该复合e-过程在零下保持有界期望，但在另一种情况下增长到无穷大。我们例举了一些常用距离度量的通用测试，以获得Kolmogorov-Smirnov~（KS）测试、$\chi^2$-测试和内核MMD测试的顺序版本，并通过实证证明了它们能够适应未知的问题难度。本文构建的顺序测试框架是通用的，我们最后讨论了如何将这些思想应用于两个相关问题：高阶随机优势测试和对称性测试。摘要：We study the problem of designing consistent sequential one- and two-sample tests in a nonparametric setting. Guided by the principle of \emph{testing by betting}, we reframe the task of constructing sequential tests into that of selecting payoff functions that maximize the wealth of a fictitious bettor, betting against the null in a repeated game. The resulting sequential test rejects the null when the bettor's wealth process exceeds an appropriate threshold. We propose a general strategy for selecting payoff functions as predictable estimates of the \emph{witness function} associated with the variational representation of some statistical distance measures, such as integral probability metrics~(IPMs) and $\varphi$-divergences. Overall, this approach ensures that (i) the wealth process is a non-negative martingale under the null, thus allowing tight control over the type-I error, and (ii) it grows to infinity almost surely under the alternative, thus implying consistency. We accomplish this by designing composite e-processes that remain bounded in expectation under the null, but grow to infinity under the alternative. We instantiate the general test for some common distance metrics to obtain sequential versions of Kolmogorov-Smirnov~(KS) test, $\chi^2$-test and kernel-MMD test, and empirically demonstrate their ability to adapt to the unknown hardness of the problem. The sequential testing framework constructed in this paper is versatile, and we end with a discussion on applying these ideas to two related problems: testing for higher-order stochastic dominance, and testing for symmetry.

机器翻译，仅供参考

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-12-21，如有侵权请联系 cloudcommunity@tencent.com 删除

linux