统计学学术速递[6.30]

公众号-arXiv每日学术速递

发布于 2021-07-02 17:28:15

1K0

发布于 2021-07-02 17:28:15

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

stat统计学，共计50篇

【1】 On the Optimal Configuration of a Square Array Group Testing Algorithm 标题：关于方阵分组测试算法的最优配置

作者：Ugnė Čižikovienė,Viktor Skorniakov 机构：Institute of Applied Mathematics, Informatics, Vilnius University, Naugarduko , Vilnius LT-, Lithuania 链接：https://arxiv.org/abs/2106.15603 摘要：迄今为止，方阵（A2）群测试（GT）算法的最优配置只有上下界已知。我们建立了精确的解析公式，并给出了我们的结果的一些应用。首先，我们比较了A2-GT方案与其他几种经典GT方案在最佳配置下获得的每个样品的增益。第二，在客观贝叶斯框架下操作，在最优GT配置下损失达到最小，我们建议在自然最小假设下优先选择组大小：关于患病率的先验信息表明分组和应用A2优于单独测试。对于Minimax策略也给出了同样的建议。摘要：Up to date, only lower and upper bounds for the optimal configuration of a Square Array (A2) Group Testing (GT) algorithm are known. We establish exact analytical formulae and provide a couple of applications of our result. First, we compare the A2 GT scheme to several other classical GT schemes in terms of the gain per specimen attained at optimal configuration. Second, operating under objective Bayesian framework with the loss designed to attain minimum at optimal GT configuration, we suggest the preferred choice of the group size under natural minimal assumptions: the prior information regarding the prevalence suggests that grouping and application of A2 is better than individual testing. The same suggestion is provided for the Minimax strategy.

【2】 Estimation of the odds ratio in a proportional odds model with censored time-lagged outcome in a randomized clinical trial 标题：随机临床试验中具有删失时滞结果的比例优势模型中优势比的估计

作者：Anastasios A. Tsiatis,Marie Davidian,Shannon T. Holloway 机构：Department of Statistics, North Carolina, State University, Raleigh, North Carolina, Correspondence, Present Address, Summary 备注：15 pages 链接：https://arxiv.org/abs/2106.15559 摘要：在许多治疗COVID-19的随机临床试验中，主要结果是一个有序的、分类的变量，最后一类通常是死亡，可以在发生时确定。对于其余类别，在确定时间小于或等于预先规定的随访时间之前，不能确定参与者的结果属于哪一类。兴趣集中在比例优势模型假设下的优势比（主动代理与控制）。虽然在最终分析时，所有受试者的结果都将确定，但在中期分析时，一些受试者的状态可能尚未确定；因此，这些主题的结果可以被视为审查。有效的中期分析只能基于那些充分随访的受试者的数据；然而，这种方法效率低下，因为它没有利用那些在中期分析时尚未达到随访时间的人的额外信息。借助于半参数理论，我们提出了一个比例优势比模型中优势比的估计，该模型具有截尾、时滞的分类结果，包含了额外的基线和时间相关信息，并且证明了相对于简单的方法，它可以在效率上获得相当大的提高。摘要：In many randomized clinical trials of therapeutics for COVID-19, the primary outcome is an ordinal, categorical variable for which the final category is often death, which can be ascertained at the time of occurence. For the remaining categories, determination of into which of these categories a participant's outcome falls cannot be made until some ascertainment time that can be less than or equal to a pre-specified follow-up time. Interest focuses on the odds ratio (active agent vs. control) under the assumption of a proportional odds model. Although at the final analysis the outcome will be determined for all subjects, at an interim analysis, the status of some participants may not yet be determined; accordingly, the outcome from these subjects can be viewed as censored. A valid interim analysis can be based on data only from those subjects with full follow up; however, this approach is inefficient, as it does not exploit additional information that may be available on those who have not reached the follow-up time at the time of the interim analysis. Appealing to the theory of semiparametrics, we propose an estimator for the odds ratio in a proportional odds model with censored, time-lagged categorical outcome that incorporates such additional baseline and time-dependent information and demonstrate that it can result in considerable gains in efficiency relative to simpler approaches.

【3】 Locally correct confidence intervals for a binomial proportion: A new criteria for an interval estimator 标题：二项比例的局部校正置信区间：区间估计的一个新准则

作者：Paul H. Garthwaite,Maha W. Moustafa,Fadlalla G. Elfadaly 机构：School of Mathematics and Statistics, The Open University, Milton Keynes, MK,AA, UK. 链接：https://arxiv.org/abs/2106.15521 摘要：为二项式比例形成“置信区间”的推荐方法给出的区间估计实际上不符合置信区间的定义，因为它们的覆盖率有时低于名义置信水平。这些方法之所以受到青睐，是因为它们的区间比Clopper-Pearson（金标准）方法的平均长度要短，后者的区间实际上是置信区间。比较这些方法是很棘手的——最好的方法也许应该是给出最短区间（平均值）的方法，但是什么时候一个方法的覆盖率如此之低以至于不应该被归类为形成置信区间的方法呢？由于不遵守置信区间的定义，因此需要另一个标准来形成二项式比例的区间估计。本文提出了一个新的判据；满足标准的方法被称为产生$\textit{局部正确的置信区间}$。我们提出了一种产生这种区间的方法，并证明了它的区间平均长度比其他任何满足该准则的方法都要短。与Clopper-Pearson方法相比，该方法给出的区间平均长度要小得多。mid-$p$方法也满足新的准则，并具有自己的最优性。摘要：Well-recommended methods of forming `confidence intervals' for a binomial proportion give interval estimates that do not actually meet the definition of a confidence interval, in that their coverages are sometimes lower than the nominal confidence level. The methods are favoured because their intervals have a shorter average length than the Clopper-Pearson (gold-standard) method, whose intervals really are confidence intervals. Comparison of such methods is tricky -- the best method should perhaps be the one that gives the shortest intervals (on average), but when is the coverage of a method so poor that it should not be classed as a means of forming confidence intervals? As the definition of a confidence interval is not being adhered to, another criterion for forming interval estimates for a binomial proportion is needed. In this paper we suggest a new criterion; methods which meet the criterion are said to yield $\textit{locally correct confidence intervals}$. We propose a method that yields such intervals and prove that its intervals have a shorter average length than those of any other method that meets the criterion. Compared with the Clopper-Pearson method, the proposed method gives intervals with an appreciably smaller average length. The mid-$p$ method also satisfies the new criterion and has its own optimality property.

【4】 Scaled process priors for Bayesian nonparametric estimation of the unseen genetic variation 标题：不可见遗传变异的贝叶斯非参数估计的尺度过程先验

作者：Federico Camerlenghi,Stefano Favaro,Lorenzo Masoero,Tamara Broderick 机构：Department of Economics, University of Milano-Bicocca†Department of Economic and Social Sciences, University ofTorino‡Department of Electrical Engineering and Computer Science, edu§Department of Electrical Engineering and Computer Science 链接：https://arxiv.org/abs/2106.15480 摘要：人们对未知特征数量的估计越来越感兴趣，这主要是由生物科学的应用驱动的。最近的一项工作提出了贝叶斯非参数推理中常见的稳定Beta过程先验及其推广的优缺点：i）缺点在于后验分布中抽样信息的使用有限，仅通过样本量依赖于可观测样本；ii）优势在于后验分布的分析可处理性和可解释性，后验分布是简单的泊松分布，其参数易于计算，并取决于样本量和先验参数。本文介绍并研究了一种新的非参数先验，称为稳定β-标度过程先验，它是第一种能够丰富不可见特征数后验分布的先验，通过在可观察样本中包含关于不同特征数量的采样信息，同时保持与稳定Beta过程相同的分析可处理性和可解释性。我们的先验导致负二项式后验分布，其参数取决于样本大小、观察到的不同特征数和先验参数，提供了简单、线性的采样信息和计算效率高的估计。我们将我们的方法应用于合成和真实的遗传数据，结果表明它在估计精度方面优于参数和非参数竞争对手。摘要：There is a growing interest in the estimation of the number of unseen features, mostly driven by applications in biological sciences. A recent work brought out the upside and the downside of the popular stable-Beta process prior, and generalizations thereof, in Bayesian nonparametric inference for the unseen-features problem: i) the downside lies in the limited use of the sampling information in the posterior distributions, which depend on the observable sample only through the sample size; ii) the upside lies in the analytical tractability and interpretability of the posterior distributions, which are simple Poisson distributions whose parameters are simple to compute, and depend on the sample size and the prior's parameter. In this paper, we introduce and investigate an alternative nonparametric prior, referred to as the stable-Beta scaled process prior, which is the first prior that allows to enrich the posterior distribution of the number of unseen features, through the inclusion of the sampling information on the number of distinct features in the observable sample, while maintaining the same analytical tractability and interpretability as the stable-Beta process prior. Our prior leads to a negative Binomial posterior distribution, whose parameters depends on the sample size, the observed number of distinct features and the prior's parameter, providing estimates that are simple, linear in the sampling information and computationally efficient. We apply our approach to synthetic and real genetic data, showing that it outperforms parametric and nonparametric competitors in terms of estimation accuracy.

【5】 Topological Data Analysis through alignment of Persistence Landscapes 标题：基于持久化景观排列的拓扑数据分析

作者：James Matuk,Sebastian Kurtek,Karthik Bharath 机构：Department of Statistics, The Ohio State University, Columbus, OH, USA, School of Mathematical Sciences, University of Nottingham, Nottingham, UK 链接：https://arxiv.org/abs/2106.15436 摘要：持久性环境是持久性图的功能摘要，旨在使用功能数据分析工具对图进行分析。它们由一组标量函数组成，使得持久性图中拓扑特征的生灭时间映射到函数和区间的极值，而这些极值和区间是非零的。因此，拓扑信息被编码在持久性景观的振幅和相位分量中。通过对弹性黎曼度量下持久性景观的函数数据分析，我们展示了如何通过解耦振幅和相位变化中存在的拓扑信号来获得持久性景观的有意义的统计总结（例如，平均值、主要变化方向）。估计的相函数与分辨率参数有关，分辨率参数决定了用于构建持久性图的简单复合物的过滤。对于在尺度和采样变量下获得的数据集，相函数规定了增强持久性图中拓扑信号的分辨率参数的最佳增长率。我们通过几个模拟例子和一个真实的数据例子来说明对齐的好处，这个例子涉及到以3D点云表示的脑动脉树的结构。摘要：Persistence landscapes are functional summaries of persistence diagrams designed to enable analysis of the diagrams using tools from functional data analysis. They comprise a collection of scalar functions such that birth and death times of topological features in persistence diagrams map to extrema of functions and intervals where they are non-zero. As a consequence, topological information is encoded in both amplitude and phase components of persistence landscapes. Through functional data analysis of persistence landscapes under an elastic Riemannian metric, we show how meaningful statistical summaries of persistence landscapes (e.g., mean, dominant directions of variation) can be obtained by decoupling topological signal present in amplitude and phase variations. The estimated phase functions are tied to the resolution parameter that determines the filtration of simplicial complexes used to construct persistence diagrams. For a dataset obtained under scale and sampling variabilities, the phase function prescribes an optimal rate of increase of the resolution parameter for enhancing the topological signal in a persistence diagram. We demonstrate benefits of alignment through several simulation examples and a real data example concerning structure of brain artery trees represented as 3D point clouds.

【6】 Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections 标题：基于随机投影集中的切片-瓦瑟斯坦距离的快速逼近

作者：Kimia Nadjahi,Alain Durmus,Pierre E. Jacob,Roland Badeau,Umut Şimşekli 机构：LTCI, Télécom Paris, Institut Polytechnique de Paris, France, Centre Borelli, ENS Paris-Saclay, CNRS, Université Paris-Saclay, France, Department of Statistics, Harvard University, USA 链接：https://arxiv.org/abs/2106.15427 摘要：切片Wasserstein距离（SW）作为Wasserstein距离的一种替代方法，在机器学习应用中得到了越来越多的应用，并提供了显著的计算和统计优势。由于SW被定义为对随机投影的期望，因此SW通常用montecarlo近似。我们采用一种新的视角，利用测度集中现象来近似SW：在温和的假设下，高维随机向量的一维投影近似为高斯分布。基于这个观察，我们发展了一个简单的SW确定性近似。我们的方法不需要对许多随机投影进行抽样，因此与通常的蒙特卡罗近似法相比，它既准确又易于使用。在弱依赖于数据分布的条件下，证明了近似误差随维数的增加而趋于零。我们在合成数据集上验证了我们的理论发现，并在一个生成性建模问题上说明了所提出的近似方法。摘要：The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on a generative modeling problem.

【7】 Place of Occurrence of COVID-19 Deaths in the UK: Modelling and Analysis 标题：英国冠状病毒死亡的发生地点：建模和分析

作者：Spencer A. Thomas 链接：https://arxiv.org/abs/2106.15381 摘要：我们分析了英国国家统计机构在2020年3月9日至2021年2月28日期间提供的关于COVID-19死亡发生地点的公开数据。我们引入了一个修正的Weibull模型，该模型描述了在国家和发生地点水平上由COVID-19引起的死亡。我们在英国观察到类似的趋势，在第一波和第二波中，由于COVID-19导致的死亡在家中出现第一个高峰，随后是医院和疗养院1-2周后。这是符合传染期的疾病，表明可能的传播媒介之间的设置。我们的结果表明，第一波的特征是快速增长和缓慢下降后，由于COVID-19死亡峰值。第二和第三波具有相反的性质，缓慢增长和迅速下降从峰值。这种差异可能是由于人群的行为变化（社会距离、面具等）造成的。最后，我们引入一个双logistic模型来描述每种情况下COVID-19死亡的动态比例。这项分析显示，从大流行开始到超过第一波中COVID-19死亡总人数的峰值，在护理院发生的COVID-19死亡比例增加。在第一波灾难性冲击之后，在护理院发生的COVID-19死亡比例从第一波之后的最大值逐渐下降，表明第二波和第三波中的住所比第一波得到了更好的保护。摘要：We analysed publicly available data on place of occurrence of COVID-19 deaths from national statistical agencies in the UK between March 9 2020 and February 28 2021. We introduce a modified Weibull model that describes the deaths due to COVID-19 at a national and place of occurrence level. We observe similar trends in the UK where deaths due to COVID-19 first peak in Homes, followed by Hospitals and Care Homes 1-2 weeks later in the first and second waves. This is in line with the infectious period of the disease, indicating a possible transmission vehicle between the settings. Our results show that the first wave is characterised by fast growth and a slow reduction after the peak in deaths due to COVID-19. The second and third waves have the converse property, with slow growth and a rapid decrease from the peak. This difference may result from behavioural changes in the population (social distancing, masks, etc). Finally, we introduce a double logistic model to describe the dynamic proportion of COVID-19 deaths occurring in each setting. This analysis reveals that the proportion of COVID-19 deaths occurring in Care Homes increases from the start of the pandemic and past the peak in total number of COVID-19 deaths in the first wave. After the catastrophic impact in the first wave, the proportion of COVID-19 deaths occurring in Care Homes gradually decreased from is maximum after the first wave indicating residence were better protected in the second and third waves compared to the first.

【8】 Unified Framework for Spectral Dimensionality Reduction, Maximum Variance Unfolding, and Kernel Learning By Semidefinite Programming: Tutorial and Survey 标题：用半定规划进行频谱降维、最大方差展开和核学习的统一框架：教程和综述

作者：Benyamin Ghojogh,Ali Ghodsi,Fakhri Karray,Mark Crowley 机构：Department of Electrical and Computer Engineering, Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada, Department of Statistics and Actuarial Science & David R. Cheriton School of Computer Science 备注：To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning 链接：https://arxiv.org/abs/2106.15379 摘要：这是一篇关于谱降维方法、半定规划（SDP）、最大方差展开（MVU）或半定嵌入（SDE）的核学习及其变体的统一的教程和综述。我们首先解释了如何将谱降维方法统一为具有不同核的核主成分分析（PCA）。这种统一可以解释为特征函数学习或用距离矩阵表示核。然后，由于谱方法统一为核主成分分析，我们说让我们学习最好的核展开流形的数据，其最大方差。本文首先简要介绍了用SDP进行核学习的方法。然后，我们详细介绍了MVU。介绍了利用最近邻图、类展开、Fisher准则和有色MVU实现有监督MVU的各种方法。我们还利用特征函数和核映射解释了MVU的样本外扩展。最后，我们介绍了MVU的其他变体，包括基于动作的嵌入、放松MVU和用于大数据的landmark MVU。摘要：This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analysis (PCA) with different kernels. This unification can be interpreted as eigenfunction learning or representation of kernel in terms of distance matrix. Then, since the spectral methods are unified as kernel PCA, we say let us learn the best kernel for unfolding the manifold of data to its maximum variance. We first briefly introduce kernel learning by SDP for the transduction task. Then, we explain MVU in detail. Various versions of supervised MVU using nearest neighbors graph, by class-wise unfolding, by Fisher criterion, and by colored MVU are explained. We also explain out-of-sample extension of MVU using eigenfunctions and kernel mapping. Finally, we introduce other variants of MVU including action respecting embedding, relaxed MVU, and landmark MVU for big data.

【9】 Isotonic regression for functionals of elicitation complexity greater than one 标题：启发式复杂度大于1的泛函的保序回归

作者：Anja Mühlemann,Johanna F. Ziegel 链接：https://arxiv.org/abs/2106.15369 摘要：研究了作为一元可导泛函的二元可导泛函的非参数等渗回归问题及其Bayes风险。这类函数的突出例子是（均值、方差）和（风险价值、预期缺口），其中后一对由重要的金融风险度量组成。我们给出了全序协变量的结果，但在附录中给出了部分序的扩展。摘要：We study the non-parametric isotonic regression problem for bivariate elicitable functionals that are given as an elicitable univariate functional and its Bayes risk. Prominent examples for functionals of this type are (mean, variance) and (Value-at-Risk, Expected Shortfall), where the latter pair consists of important risk measures in finance. We present our results for totally ordered covariates but extenstions to partial orders are given in the appendix.

【10】 Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors 标题：基于稀疏生成先验的样本最优压缩相位检索

作者：Zhaoqiang Liu,Subhroshekhar Ghosh,Jonathan Scarlett 机构： Ghosh is with the Department of Mathematics, National University of Singapore (email 链接：https://arxiv.org/abs/2106.15358 摘要：压缩相位恢复是标准压缩感知问题的一个流行变体，其中测量值仅包含幅度信息。在这篇论文中，基于深层生成模型的最新进展，我们为具有生成先验的相位恢复提供了顺序最优样本复杂度边界的恢复保证。我们首先证明，当使用i.i.d.高斯测量和$L$-Lipschitz连续生成模型（有界$k$-维输入）时，大约$O（k\logL）$样本足以保证信号接近任何使基于振幅的经验损失函数最小化的向量。用一种实用的算法来获得这个样本复杂度仍然是一个困难的挑战，而一种流行的光谱初始化方法被认为是一个主要的瓶颈。为了部分地解决这个问题，我们进一步证明了大约$O（k\logl）$个样本确保了信号和为谱初始化设计的优化问题的任何{\em全局最优}解之间的足够接近性（尽管找到这样的解仍然是一个挑战）。我们将这一结果应用于稀疏相位恢复，并证明当底层信号为$s$稀疏和$n$维时，$O（s\logn）$样本足以满足类似的保证，匹配信息论下限。虽然我们的保证并不直接对应于一个实际的算法，我们提出了一个实用的光谱初始化方法的动机是我们的研究结果，并通过实验观察到显着的性能增益比现有的各种光谱初始化方法稀疏相位检索。摘要：Compressive phase retrieval is a popular variant of the standard compressive sensing problem, in which the measurements only contain magnitude information. In this paper, motivated by recent advances in deep generative models, we provide recovery guarantees with order-optimal sample complexity bounds for phase retrieval with generative priors. We first show that when using i.i.d. Gaussian measurements and an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs, roughly $O(k \log L)$ samples suffice to guarantee that the signal is close to any vector that minimizes an amplitude-based empirical loss function. Attaining this sample complexity with a practical algorithm remains a difficult challenge, and a popular spectral initialization method has been observed to pose a major bottleneck. To partially address this, we further show that roughly $O(k \log L)$ samples ensure sufficient closeness between the signal and any {\em globally optimal} solution to an optimization problem designed for spectral initialization (though finding such a solution may still be challenging). We adapt this result to sparse phase retrieval, and show that $O(s \log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional, matching an information-theoretic lower bound. While our guarantees do not directly correspond to a practical algorithm, we propose a practical spectral initialization method motivated by our findings, and experimentally observe significant performance gains over various existing spectral initialization methods of sparse phase retrieval.

【11】 Achieving Statistical Optimality of Federated Learning: Beyond Stationary Points 标题：实现联合学习的统计最优性：超越驻点

作者：Lili Su,Jiaming Xu,Pengkun Yang 机构：Electrical and Computer Engineering, Northeastern University, The Fuqua School of Business, Duke University, Center for Statistical Science, Tsinghua University 链接：https://arxiv.org/abs/2106.15216 摘要：联邦学习（FL）是一个很有前途的框架，在保护隐私和降低云计算负载方面有很大的潜力。FedAvg和FedProx是两种被广泛采用的算法。然而，最近的工作对这两种方法提出了关注：（1）它们的不动点与原优化问题的驻点不一致；（2）所发现的一般模型可能不能很好地局部推广。在本文中，我们缓解了这些担忧。为此，我们采用统计学习的观点，但允许分布的异质性和本地数据的不平衡。我们证明，在一般的核回归设置下，FedAvg和FedProx都收敛到极小极大最优错误率。此外，当核函数的秩有限时，收敛速度是指数级的。我们的结果进一步分析量化了模型异质性的影响，并描述了联合增益的特征——与最佳局部估计相比，工人加入联合学习的估计误差减小。据我们所知，我们是第一个证明在FedAvg和FedProx下极小极大错误率的可实现性的人，也是第一个描述加入FL的增益的人。数值实验进一步证实了我们关于FedAvg和FedProx的统计最优性和联邦增益的理论发现。摘要：Federated Learning (FL) is a promising framework that has great potentials in privacy preservation and in lowering the computation load at the cloud. FedAvg and FedProx are two widely adopted algorithms. However, recent work raised concerns on these two methods: (1) their fixed points do not correspond to the stationary points of the original optimization problem, and (2) the common model found might not generalize well locally. In this paper, we alleviate these concerns. Towards this, we adopt the statistical learning perspective yet allow the distributions to be heterogeneous and the local data to be unbalanced. We show, in the general kernel regression setting, that both FedAvg and FedProx converge to the minimax-optimal error rates. Moreover, when the kernel function has a finite rank, the convergence is exponentially fast. Our results further analytically quantify the impact of the model heterogeneity and characterize the federation gain - the reduction of the estimation error for a worker to join the federated learning compared to the best local estimator. To the best of our knowledge, we are the first to show the achievability of minimax error rates under FedAvg and FedProx, and the first to characterize the gains in joining FL. Numerical experiments further corroborate our theoretical findings on the statistical optimality of FedAvg and FedProx and the federation gains.

【12】 Tensor-train approximation of the chemical master equation and its application for parameter inference 标题：化学主方程的张量列近似及其在参数推断中的应用

作者：Ion Gabriel Ion,Christian Wildner,Dimitrios Loukrezis,Heinz Koeppl,Herbert De Gersem 机构：)Centre for Computational Engineering, Technische Universität Darmstadt, )Department of Electrical Engineering, Technische Universität Darmstadta), )Centre for Synthetic Biology, Technische Universität Darmstadt 链接：https://arxiv.org/abs/2106.15188 摘要：在这项工作中，我们执行贝叶斯推理任务的化学主方程张量火车格式。张量列近似已被证明是非常有效的表示高维数据所产生的显式表示化学主方程的解决方案。用张量列格式表示概率质量函数的另一个优点是，通过在参数空间中引入张量积基展开，可以很容易地合并参数依赖性。将时间作为张量的附加维数，导出了求解化学主方程的线性方程组。通过使用张量序列框架执行平滑和参数推断等推理任务，我们举例说明了张量序列方法。为了存储解的概率质量函数，观察到了非常高的压缩比。由于所有的线性代数运算都是以张量列格式进行的，因此计算时间也显著减少。摘要：In this work, we perform Bayesian inference tasks for the chemical master equation in the tensor-train format. The tensor-train approximation has been proven to be very efficient in representing high dimensional data arising from the explicit representation of the chemical master equation solution. An additional advantage of representing the probability mass function in the tensor train format is that parametric dependency can be easily incorporated by introducing a tensor product basis expansion in the parameter space. Time is treated as an additional dimension of the tensor and a linear system is derived to solve the chemical master equation in time. We exemplify the tensor-train method by performing inference tasks such as smoothing and parameter inference using the tensor-train framework. A very high compression ratio is observed for storing the probability mass function of the solution. Since all linear algebra operations are performed in the tensor-train format, a significant reduction of the computational time is observed as well.

【13】 Meta-learning for Matrix Factorization without Shared Rows or Columns 标题：无共享行或列的矩阵因式分解的元学习

作者：Tomoharu Iwata 机构：NTT Communication Science Laboratories 链接：https://arxiv.org/abs/2106.15133 摘要：我们提出了一种方法，元学习知识的矩阵分解从各种矩阵，并利用这些知识分解看不见的矩阵。该方法使用一个以矩阵为输入的神经网络，生成给定矩阵的分解矩阵的先验分布。神经网络是元学习的，当分解矩阵通过最大后验概率（MAP）估计适应每个矩阵时，期望的插补误差最小。我们使用梯度下降法进行MAP估计，这使得我们能够通过梯度下降步骤来反向传播期望的插补误差，以更新神经网络参数，因为每个梯度下降步骤都是以封闭形式编写的，并且是可微的。该方法可以从矩阵中进行元学习，即使矩阵的行和列不共享，并且矩阵的大小不同。在三个用户项目评分数据集的实验中，我们证明了我们提出的方法在经过不同的矩阵训练后，可以从不可见矩阵中的有限个观察值中填充缺失值。摘要：We propose a method that meta-learns a knowledge on matrix factorization from various matrices, and uses the knowledge for factorizing unseen matrices. The proposed method uses a neural network that takes a matrix as input, and generates prior distributions of factorized matrices of the given matrix. The neural network is meta-learned such that the expected imputation error is minimized when the factorized matrices are adapted to each matrix by a maximum a posteriori (MAP) estimation. We use a gradient descent method for the MAP estimation, which enables us to backpropagate the expected imputation error through the gradient descent steps for updating neural network parameters since each gradient descent step is written in a closed form and is differentiable. The proposed method can meta-learn from matrices even when their rows and columns are not shared, and their sizes are different from each other. In our experiments with three user-item rating datasets, we demonstrate that our proposed method can impute the missing values from a limited number of observations in unseen matrices after being trained with different matrices.

【14】 Inference in Spatial Experiments with Interference using the SpatialEffect Package 标题：SpatialEffect软件包在空间干涉实验中的推理

作者：Peter M. Aronow,Cyrus Samii,Jonathan Sullivan,Ye Wang 链接：https://arxiv.org/abs/2106.15081 摘要：本文提出了在存在复杂溢出、位移效应和其他类型“干扰”时分析空间实验的方法。我们提出了一个稳健的，基于设计的方法来分析这种设置的影响。基于设计的方法从实验设计的已知特征中导出因果效应估计量的推论性质，其方式类似于抽样调查中的推论。这里介绍的方法针对的是一个称为“平均边缘化反应”的感兴趣的量，它等于在给定距离的干预点激活治疗的平均效果，平均来自其他干预点的环境影响。我们提供了一个基于SpatialEffect包的逐步教程。我们将这些方法应用于乌干达社区森林保护付款的随机实验，展示了我们的方法如何揭示更多传统分析无法检测到的可能实质性的空间溢出。摘要：This paper presents methods for analyzing spatial experiments when complex spillovers, displacement effects, and other types of "interference" are present. We present a robust, design-based approach to analyzing effects in such settings. The design-based approach derives inferential properties for causal effect estimators from known features of the experimental design, in a manner analogous to inference in sample surveys. The methods presented here target a quantity of interest called the "average marginalized response," which is equal to the average effect of activating a treatment at an intervention point that is a given distance away, averaging ambient effects emanating from other intervention points. We provide a step-by-step tutorial based on the SpatialEffect package for R. We apply the methods to a randomized experiment on payments for community forest conservation in Uganda, showing how our methods reveal possibly substantial spatial spillovers that more conventional analyses cannot detect.

【15】 How to Account for Alternatives When Comparing Effects: Revisiting 'Bringing Education to Afghan Girls' 标题：比较效果时如何考虑其他选择：重温“给阿富汗女孩带来教育”

作者：Dana Burde,Joel Middleton,Cyrus Samii,Ye Wang 机构：of Culture, Education, and Human Development, New York University (Email:, This research comes, as part of the Assessment of Learning and Social Effects of Community Based Education in Afghanistan 链接：https://arxiv.org/abs/2106.15076 摘要：本文使用“主要阶层”的方法来分解治疗效果，并解释为什么学校教育干预产生了特殊的初始效果，却在几年后的复制中产生了相当小的效果。具体应用是2008年和2015年一系列旨在增加阿富汗农村女孩初级教育的干预措施的复制品。这种干预提供了一种新的教育选择，因此，其效果取决于个人如何使用已经存在的替代方案。在比较不同复制品的效果时，主层方法考虑了使用模式的变化。我们的研究结果表明，尽管2015年接受干预的女孩比例比2008年大幅下降，但对于那些继续从中受益的女孩来说，干预更加有效。摘要：This paper uses a "principal strata" approach to decompose treatment effects and interpret why a schooling intervention that yielded exceptional initial effects yielded substantially smaller effects in a replication years later. The specific application is a set of 2008 and 2015 replications of an intervention aiming to increase primary education for girls in rural Afghanistan. The intervention offers a new schooling option, and as such, its effects depend on how individuals use alternatives that already exist. The principal strata approach accounts variation in use patterns when comparing effects across the replications. Our findings show that even though the share of girls for whom the intervention would be valuable dropped considerably in 2015 as compared to 2008, the intervention was even more efficaciousness for those who continued to benefit from it.

【16】 Causal Inference under Temporal and Spatial Interference 标题：时空干扰下的因果推理

作者：Ye Wang 机构： UCSD and PhD Candidate at the Wilf FamilyDepartment of Politics, New York University 链接：https://arxiv.org/abs/2106.15074 摘要：许多社会事件和政策在时间和空间上产生溢出效应。它们的发生不仅影响未来的利益结果，而且也影响附近地区的利益结果。在本文中，我们提出了一种基于设计的方法来估计任何事件或政策的直接和间接/溢出处理效果，假设在时间和空间干扰都允许的情况下，顺序忽略。如果干扰依赖度相对于样本量增长不太快，则所提出的估计量是一致的和渐近正态的。然而，传统的差异中的差异（DID）或双向固定效应模型导致了这种情况下的偏差估计。我们使用的方法来检查香港伞运动对随后的选举结果和机构改革如何影响房地产评估在纽约州的影响。摘要：Many social events and policies generate spillover effects in both time and space. Their occurrence influences not only the outcomes of interest in the future, but also these outcomes in nearby areas. In this paper, we propose a design-based approach to estimate the direct and indirect/spillover treatment effects of any event or policy under the assumption of sequential ignorability, when both temporal and spatial interference are allowed to present. The proposed estimators are shown to be consistent and asymptotically Normal if the degree of interference dependence does not grow too fast relative to the sample size. The conventional difference-in-differences (DID) or two-way fixed effects model, nevertheless, leads to biased estimates in this scenario. We apply the method to examine the impact of Hong Kong's Umbrella Movement on the result of the ensuing election and how an institutional reform affects real estate assessment in New York State.

【17】 Logistic-tree normal model for microbiome compositions 标题：微生物群落组成的Logistic树正态模型

作者：Zhuoqun Wang,Jialiang Mao,Li Ma 机构：Duke University, Durham, NC , LinkedIn Corporation, Sunnyvale, CA 备注：33 pages, 7 figures 链接：https://arxiv.org/abs/2106.15051 摘要：我们介绍了一个概率模型，称为“逻辑树正态分布”（LTN），微生物组分数据。LTN结合了两种流行的模型——逻辑正态（LN）和狄里克莱树（DT）——并继承了这两种模型的主要优点。LN模型在描述分类群之间丰富的协方差结构方面是灵活的，但是在面对高维（即，当分类群的数量很大时）时，由于其缺乏与多项式抽样模型的共轭性，因此在计算上是禁止的。另一方面，DT通过将多项式抽样模型分解成一个二项集合来避免这个问题，在分类群的系统发育树的每一个分裂处都有一个二项，并对每个二项概率采用共轭β模型，但同时DT在分类群之间产生限制性的协方差。相比之下，LTN模型将多项式模型分解为二项式，但它使用（多元）LN分布而不是beta来联合建模相应的二项式概率。因此，它允许丰富的协方差结构作为LN模型，而多项式似然的分解允许通过P′olya-Gamma增强恢复共轭性。因此，利用Gibbs抽样可以方便地对LTN模型进行贝叶斯推理。此外，该模型的多元高斯特性允许对高维数据进行有效推断的通用技术——例如基于协方差结构中稀疏性和低秩假设的技术——很容易合并。根据分析的目的，LTN模型可以作为独立模型使用，也可以嵌入到更复杂的模型中。我们展示了它在估计分类群协方差和混合效应模型中的应用。最后，我们使用一个基于LTN的混合效应模型来分析来自diabimune项目的纵向数据集。摘要：We introduce a probabilistic model, called the "logistic-tree normal" (LTN), for microbiome compositional data. The LTN marries two popular classes of models -- the logistic-normal (LN) and the Dirichlet-tree (DT) -- and inherits the key benefits of both. LN models are flexible in characterizing rich covariance structure among taxa but can be computationally prohibitive in face of high dimensionality (i.e., when the number of taxa is large) due to its lack of conjugacy to the multinomial sampling model. On the other hand, DT avoids this issue by decomposing the multinomial sampling model into a collection of binomials, one at each split of the phylogenetic tree of the taxa, and adopting a conjugate beta model for each binomial probability, but at the same time the DT incurs restrictive covariance among the taxa. In contrast, the LTN model decomposes the multinomial model into binomials as the DT does, but it jointly models the corresponding binomial probabilities using a (multivariate) LN distribution instead of betas. It therefore allows rich covariance structures as the LN models, while the decomposition of the multinomial likelihood allows conjugacy to be restored through the P\'olya-Gamma augmentation. Accordingly, Bayesian inference on the LTN model can readily proceed by Gibbs sampling. Moreover, the multivariate Gaussian aspect of the model allows common techniques for effective inference on high-dimensional data -- such as those based on sparsity and low-rank assumptions in the covariance structure -- to be readily incorporated. Depending on the goal of the analysis, the LTN model can be used either as a standalone model or embedded into more sophisticated models. We demonstrate its use in estimating taxa covariance and in mixed-effects modeling. Finally, we carry out a case study using an LTN-based mixed-effects model to analyze a longitudinal dataset from the DIABIMMUNE project.

【18】 Characterization of the Variation Spaces Corresponding to Shallow Neural Networks 标题：浅层神经网络对应的变异空间的刻画

作者：Jonathan W. Siegel,Jinchao Xu 机构：Department of Mathematics, Pennsylvania State University, University Park, PA 备注：arXiv admin note: substantial text overlap with arXiv:2101.12365 链接：https://arxiv.org/abs/2106.15002 摘要：我们考虑了$L^2（\Omega）$中函数字典对应的变分空间，并给出了这些空间中近似的基本理论。具体地说，我们比较了基于积分表示的定义和基于凸壳的定义。我们证明了在许多情况下，包括对应于浅ReLU$^k$网络的字典和衰减Fourier模式的字典，这两个定义是一致的。我们还给出了浅ReLU$^k$网络的变分空间的部分特征，并证明了关于衰减Fourier模字典的变分空间对应于Barron谱空间。摘要：We consider the variation space corresponding to a dictionary of functions in $L^2(\Omega)$ and present the basic theory of approximation in these spaces. Specifically, we compare the definition based on integral representations with the definition in terms of convex hulls. We show that in many cases, including the dictionaries corresponding to shallow ReLU$^k$ networks and a dictionary of decaying Fourier modes, that the two definitions coincide. We also give a partial characterization of the variation space for shallow ReLU$^k$ networks and show that the variation space with respect to the dictionary of decaying Fourier modes corresponds to the Barron spectral space.

【19】 Improved Convergence Rates for the Orthogonal Greedy Algorithm 标题：改进的正交贪婪算法的收敛速度

作者：Jonathan W. Siegel,Jinchao Xu 机构：Department of Mathematics, Pennsylvania State University, University Park, PA 链接：https://arxiv.org/abs/2106.15000 摘要：分析了正交贪心算法在凸壳熵较小的字典$\mathbb{D}$中的应用。我们证明了如果$\mathbb{D}$凸壳的度量熵在$\alpha>0$时以$\O（n^{-\frac{1}{2}-\alpha}）$的速率衰减，则正交贪婪算法以相同的速率收敛。在许多情况下，这改进了众所周知的正交贪婪算法的$O（n^{-\frac{1}{2}}）$收敛速度，特别是对于对应于浅层神经网络的字典。最后，我们证明了在给定的熵衰减假设下，这些改进率是尖锐的。摘要：We analyze the orthogonal greedy algorithm when applied to dictionaries $\mathbb{D}$ whose convex hull has small entropy. We show that if the metric entropy of the convex hull of $\mathbb{D}$ decays at a rate of $O(n^{-\frac{1}{2}-\alpha})$ for $\alpha > 0$, then the orthogonal greedy algorithm converges at the same rate. This improves upon the well-known $O(n^{-\frac{1}{2}})$ convergence rate of the orthogonal greedy algorithm in many cases, most notably for dictionaries corresponding to shallow neural networks. Finally, we show that these improved rates are sharp under the given entropy decay assumptions.

【20】 Test-Time Adaptation to Distribution Shift by Confidence Maximization and Input Transformation 标题：基于置信度最大化和输入变换的分布漂移测试时间自适应

作者：Chaithanya Kumar Mummadi,Robin Hutmacher,Kilian Rambach,Evgeny Levinkov,Thomas Brox,Jan Hendrik Metzen 机构：University of Freiburg, Bosch Center for Artificial Intelligence 备注：16 pages, 5 figures, 7 tables 链接：https://arxiv.org/abs/2106.14999 摘要：深度神经网络往往表现出较差的性能数据，是不太可能在列车时间的数据分布，例如数据受腐蚀的影响。以前的工作表明，测试时间适应数据转移，例如使用熵最小化，有效地提高了这种转移分布的性能。本文主要研究完全测试时间自适应设置，其中只需要来自目标分布的未标记数据。这允许适应任意预训练网络。具体来说，我们提出了一种新的损失，通过解决早熟收敛和熵最小化的不稳定性来提高测试时间的适应性。这是通过将熵替换为非饱和替代项，并添加基于分批熵最大化的多样性正则化器来实现的，该正则化器可防止收敛到平凡的折叠解。此外，我们建议预先在网路中加入一个输入转换模组，可以部分撤销测试时间分布的位移。令人惊讶的是，这种预处理可以在没有任何目标域标签或源域数据的情况下，以端到端的方式仅使用完全测试时间自适应损失来学习。我们的研究表明，我们的方法在提高公开可用的预训练图像分类器对像ImageNet-C这样具有挑战性的基准上常见的损坏的鲁棒性方面优于以前的工作。摘要：Deep neural networks often exhibit poor performance on data that is unlikely under the train-time data distribution, for instance data affected by corruptions. Previous works demonstrate that test-time adaptation to data shift, for instance using entropy minimization, effectively improves performance on such shifted distributions. This paper focuses on the fully test-time adaptation setting, where only unlabeled data from the target distribution is required. This allows adapting arbitrary pretrained networks. Specifically, we propose a novel loss that improves test-time adaptation by addressing both premature convergence and instability of entropy minimization. This is achieved by replacing the entropy by a non-saturating surrogate and adding a diversity regularizer based on batch-wise entropy maximization that prevents convergence to trivial collapsed solutions. Moreover, we propose to prepend an input transformation module to the network that can partially undo test-time distribution shifts. Surprisingly, this preprocessing can be learned solely using the fully test-time adaptation loss in an end-to-end fashion without any target domain labels or source domain data. We show that our approach outperforms previous work in improving the robustness of publicly available pretrained image classifiers to common corruptions on such challenging benchmarks as ImageNet-C.

【21】 Sharp Lower Bounds on the Approximation Rate of Shallow Neural Networks 标题：浅层神经网络逼近速度的精确下界

作者：Jonathan W. Siegel,Jinchao Xu 机构：Department of Mathematics, Pennsylvania State University, University Park, PA 备注：arXiv admin note: substantial text overlap with arXiv:2101.12365 链接：https://arxiv.org/abs/2106.14997 摘要：我们考虑了关于变分范数的浅层神经网络的逼近率。对于sigmoidal和ReLU激活函数，已经建立了这些速率的上界，但是这些速率是否尖锐仍然是一个重要的开放问题。本文通过证明由神经网络基函数凸壳的$L^2$度量熵的下界得到的浅层神经网络逼近率的锐下界，给出了这个问题的一个解决方案。此外，我们的方法也给出了凸壳Kolmogorov$n$-宽度的尖锐下界，这表明浅层神经网络对应的变分空间不能用线性方法有效地逼近。这些下界既适用于有界变化的sigmoid激活函数，也适用于ReLU的幂次激活函数。我们的结果还量化了Barron谱范数比变分范数强多少，并结合以前的结果，给出了在ReLU激活函数的情况下，$L^\infty$-度量熵到对数因子的渐近性。摘要：We consider the approximation rates of shallow neural networks with respect to the variation norm. Upper bounds on these rates have been established for sigmoidal and ReLU activation functions, but it has remained an important open problem whether these rates are sharp. In this article, we provide a solution to this problem by proving sharp lower bounds on the approximation rates for shallow neural networks, which are obtained by lower bounding the $L^2$-metric entropy of the convex hull of the neural network basis functions. In addition, our methods also give sharp lower bounds on the Kolmogorov $n$-widths of this convex hull, which show that the variation spaces corresponding to shallow neural networks cannot be efficiently approximated by linear methods. These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU. Our results also quantify how much stronger the Barron spectral norm is than the variation norm and, combined with previous results, give the asymptotics of the $L^\infty$-metric entropy up to logarithmic factors in the case of the ReLU activation function.

【22】 Fast Bayesian Variable Selection in Binomial and Negative Binomial Regression 标题：二项回归和负二项回归中贝叶斯变量的快速选择

作者：Martin Jankowiak 机构：Broad Institute, Cambridge, Massachusetts, USA 备注：18 pages 链接：https://arxiv.org/abs/2106.14981 摘要：贝叶斯变量选择是一个强大的数据分析工具，因为它为变量选择提供了一种考虑先验信息和不确定性的原则性方法。然而，贝叶斯变量选择的广泛采用受到计算挑战的阻碍，特别是在具有大量协变量或非共轭可能性的困难区域。计数数据的广义线性模型是一个重要的特例，广泛应用于生物学、生态学、经济学等领域。在这里，我们介绍了一种有效的MCMC方案，用于二项和负二项回归中的变量选择，该方案利用了回火吉布斯抽样（Zanella和Roberts，2019），并将logistic回归作为特例。在实验中，我们证明了我们的方法的有效性，包括17000个协变量的癌症数据。摘要：Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates or non-conjugate likelihoods. Generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond, represent an important special case. Here we introduce an efficient MCMC scheme for variable selection in binomial and negative binomial regression that exploits Tempered Gibbs Sampling (Zanella and Roberts, 2019) and that includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our approach, including on cancer data with seventeen thousand covariates.

【23】 A novel approach to photon transfer conversion gain estimation 标题：一种新的光子转移转换增益估计方法

作者：Aaron Hendrickson 备注：122 pages, 12 figures 链接：https://arxiv.org/abs/2106.14958 摘要：现代图像传感器成像特性的不均匀性是推动光子传输特性化方法像素级推广的主要因素。在这篇论文中，我们试图发展一系列的理论成果，为解决这一目标的最大障碍提供一种全面的方法：像素级转换增益估计方法。这是通过开发正态方差倒数差的估计器来实现的，然后使用它来构造转换增益的新估计器。推导了该估计量的前两个矩，并分别用于构造其绝对相对偏差和绝对变异系数的精确和近似置信区间。文中还讨论了一种逼近和计算最佳样本大小的方法，并用它来演示实际图像传感器的像素级转换增益估计过程。摘要：Nonuniformities in the imaging characteristics of modern image sensors are a primary factor in the push to develop a pixel-level generalization of the photon transfer characterization method. In this paper, we seek to develop a body of theoretical results leading toward a comprehensive approach for tackling the biggest obstacle in the way of this goal: a means of pixel-level conversion gain estimation. This is accomplished by developing an estimator for the reciprocal-difference of normal variances and then using this to construct a novel estimator of the conversion gain. The first two moments of this estimator are derived and used to construct exact and approximate confidence intervals for its absolute relative bias and absolute coefficient of variation, respectively. A means of approximating and computing optimal sample sizes are also discussed and used to demonstrate the process of pixel-level conversion gain estimation for a real image sensor.

【24】 Continuous Latent Process Flows 标题：连续潜流

作者：Ruizhi Deng,Marcus A. Brubaker,Greg Mori,Andreas M. Lehrmann 链接：https://arxiv.org/abs/2106.15580 摘要：连续时间序列动力学在任意时间戳下的部分观测存在于许多学科中。使用具有连续动态性的统计模型来拟合这类数据不仅在直观的层面上很有希望，而且具有实际的好处，包括能够生成连续的轨迹并对以前看不到的时间戳进行推断。尽管在这方面取得了令人兴奋的进展，但现有的模型在其表示能力和变分近似的质量方面仍然面临挑战。我们用连续潜在过程流（CLPF）来解决这些挑战，CLPF是一种将连续潜在过程解码为连续可观测过程的原理性结构，它使用随机微分方程驱动的依赖于时间的归一化流。为了利用极大似然法优化模型，我们提出了一种新的分段构造变分后验过程的方法，并利用轨迹重加权得到了相应的变分下界。我们的消融研究证明了我们的贡献在不规则时间网格上的各种推理任务中的有效性。与最新基线的比较表明，我们的模型在合成和真实时间序列数据上都具有良好的性能。摘要：Partial observations of continuous time-series dynamics at arbitrary time stamps exist in many disciplines. Fitting this type of data using statistical models with continuous dynamics is not only promising at an intuitive level but also has practical benefits, including the ability to generate continuous trajectories and to perform inference on previously unseen time stamps. Despite exciting progress in this area, the existing models still face challenges in terms of their representational power and the quality of their variational approximations. We tackle these challenges with continuous latent process flows (CLPF), a principled architecture decoding continuous latent processes into continuous observable processes using a time-dependent normalizing flow driven by a stochastic differential equation. To optimize our model using maximum likelihood, we propose a novel piecewise construction of a variational posterior process and derive the corresponding variational lower bound using trajectory re-weighting. Our ablation studies demonstrate the effectiveness of our contributions in various inference tasks on irregular time grids. Comparisons to state-of-the-art baselines show our model's favourable performance on both synthetic and real-world time-series data.

【25】 As easy as APC: Leveraging self-supervised learning in the context of time series classification with varying levels of sparsity and severe class imbalance 标题：与APC一样简单：在具有不同级别稀疏性和严重类别失衡的时间序列分类环境中利用自我监督学习

作者：Fiorella Wever,T. Anderson Keller,Victor Garcia,Laura Symul 机构： 1University of Amsterdam, Germany 3Department of Statistics, Stan-ford University 链接：https://arxiv.org/abs/2106.15577 摘要：高水平的稀疏性和强的类不平衡是普遍存在的挑战，往往同时出现在现实世界的时间序列数据。虽然大多数方法分别处理每个问题，但我们提出的方法同时处理这两个问题，同时对数据施加较少的假设。在这项工作中，我们提出利用自监督学习方法，特别是自回归预测编码（APC），来学习在缺失数据和类别不平衡的情况下时间序列数据的相关隐藏表示。我们使用GRU或GRU-D编码器在两个真实数据集上应用APC，并表明在所有设置下应用APC的一步超前预测可以改善分类结果。事实上，通过应用GRU-D-APC，我们在Physionet基准上获得了最先进的AUPRC结果。摘要：High levels of sparsity and strong class imbalance are ubiquitous challenges that are often presented simultaneously in real-world time series data. While most methods tackle each problem separately, our proposed approach handles both in conjunction, while imposing fewer assumptions on the data. In this work, we propose leveraging a self-supervised learning method, specifically Autoregressive Predictive Coding (APC), to learn relevant hidden representations of time series data in the context of both missing data and class imbalance. We apply APC using either a GRU or GRU-D encoder on two real-world datasets, and show that applying one-step-ahead prediction with APC improves the classification results in all settings. In fact, by applying GRU-D - APC, we achieve state-of-the-art AUPRC results on the Physionet benchmark.

【26】 Near-Optimal Explainable k-Means for All Dimensions标题：接近最佳的可解释k-所有维度的均值

作者：Moses Charikar,Lunjia Hu 机构： Stanford University 备注：31 pages 链接：https://arxiv.org/abs/2106.15566 摘要：许多聚类算法都是以一定的代价函数为指导的，例如广泛使用的$k$-均值代价函数。这些算法将数据点划分成具有复杂边界的聚类，给解释聚类决策带来困难。在最近的一项工作中，Dasgupta、Frost、Moshkovitz和Rashtchian（ICML'20）引入了可解释聚类，其中聚类边界是轴平行超平面，聚类是通过对数据应用决策树获得的。这里的核心问题是：可解释性约束在多大程度上增加了成本函数的值？在给定$d$维数据点的情况下，我们给出了一个有效的算法，在假设$k，d\ge2$的情况下，该算法可以找到一个可解释的聚类，其$k$意味着成本最多为$k^{1-2/d}\mathrm{poly}（d\logk）$倍于无可解释性约束的聚类所能达到的最小成本。结合Makarychev和Shan（ICML'21）的一个独立工作，我们得到了$k^{1-2/d}\mathrm{polylog}（k）$的一个改进的界，我们表明它对于$k，d\ge2$的每一个选择都是最优的，直到$k$中的一个多对数因子。特别是对于$d=2$，我们给出了一个$O（\log k\log\log k）$界，它比以前的最佳界$\widetilde O（k）$有指数级的提高。摘要：Many clustering algorithms are guided by certain cost functions such as the widely-used $k$-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML'20) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given $d$-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose $k$-means cost is at most $k^{1 - 2/d}\mathrm{poly}(d\log k)$ times the minimum cost achievable by a clustering without the explainability constraint, assuming $k,d\ge 2$. Combining this with an independent work by Makarychev and Shan (ICML'21), we get an improved bound of $k^{1 - 2/d}\mathrm{polylog}(k)$, which we show is optimal for every choice of $k,d\ge 2$ up to a poly-logarithmic factor in $k$. For $d = 2$ in particular, we show an $O(\log k\log\log k)$ bound, improving exponentially over the previous best bound of $\widetilde O(k)$.

【27】 Learning latent causal graphs via mixture oracles 标题：基于混合预言的潜在因果图学习

作者：Bohdan Kivva,Goutham Rajendran,Pradeep Ravikumar,Bryon Aragam 机构：University of Chicago, Carnegie Mellon University 备注：37 pages 链接：https://arxiv.org/abs/2106.15563 摘要：我们研究了在潜在变量存在的情况下，从数据中重建因果图模型的问题。主要的问题是恢复因果结构的潜在变量，同时考虑到一般，潜在的非线性变量之间的依赖性。在许多实际问题中，原始观测值（例如图像中的像素）之间的依赖性远不如某些高级潜在特征（例如概念或对象）之间的依赖性，这就是感兴趣的设置。我们提供了一个条件，在这个条件下，潜在的表示和潜在的潜在因果模型都可以通过简化为一个混合预言来识别。证明是建设性的，并导致几个算法显式重建完整的图形模型。我们讨论了有效的算法，并提供了实验来说明算法在实际中的应用。摘要：We study the problem of reconstructing a causal graphical model from data in the presence of latent variables. The main problem of interest is recovering the causal structure over the latent variables while allowing for general, potentially nonlinear dependence between the variables. In many practical problems, the dependence between raw observations (e.g. pixels in an image) is much less relevant than the dependence between certain high-level, latent features (e.g. concepts or objects), and this is the setting of interest. We provide conditions under which both the latent representations and the underlying latent causal model are identifiable by a reduction to a mixture oracle. The proof is constructive, and leads to several algorithms for explicitly reconstructing the full graphical model. We discuss efficient algorithms and provide experiments illustrating the algorithms in practice.

【28】 Damping effect in innovation processes: case studies from Twitter 标题：创新过程中的阻尼效应：来自Twitter的案例研究

作者：Giacomo Aletti,Irene Crimaldi 机构：and 链接：https://arxiv.org/abs/2106.15528 摘要：理解创新过程，即创新产生、传播和触发进一步创新的内在机制，无疑在许多领域（生物学、语言学、社会科学等）具有根本重要性。到目前为止介绍的模型满足堆定律（关于新事物出现的速率）和Zipf定律（表示元素频率分布的幂律行为）。然而，有经验案例远未显示出纯幂律行为，这种偏差在高频元素中存在。我们用一个适当的“阻尼”效应来解释这一现象，这个“阻尼”效应是由一个旧元素重复的概率决定的。虽然所提出的模型是非常普遍的，也可以在其他情况下使用，但它已经在一些Twitter数据集上进行了测试，并且在堆定律方面表现出了很好的性能，尤其是在低频和高频的频率秩图拟合方面。摘要：Understanding the innovation process, that is the underlying mechanisms through which novelties emerge, diffuse and trigger further novelties is undoubtedly of fundamental importance in many areas (biology, linguistics, social science and others). The models introduced so far satisfy the Heaps' law, regarding the rate at which novelties appear, and the Zipf's law, that states a power law behavior for the frequency distribution of the elements. However, there are empirical cases far from showing a pure power law behavior and such a deviation is present for elements with high frequencies. We explain this phenomenon by means of a suitable "damping" effect in the probability of a repetition of an old element. While the proposed model is extremely general and may be also employed in other contexts, it has been tested on some Twitter data sets and demonstrated great performances with respect to Heaps' law and, above all, with respect to the fitting of the frequency-rank plots for low and high frequencies.

【29】 Generalized Power Method for Generalized Orthogonal Procrustes Problem: Global Convergence and Optimization Landscape Analysis 标题：广义正交Procrstes问题的广义幂方法：全局收敛性和优化前景分析

作者：Shuyang Ling 链接：https://arxiv.org/abs/2106.15493 摘要：给定一组多点云，如何找到刚性变换（旋转、反射和移动）以使这些点云对齐？这个问题被称为广义正交Procrustes问题（GOPP），在包括统计学、成像科学和计算机视觉在内的多个学科中起着基础性的作用。尽管它具有巨大的实际意义，但由于其固有的非凸性，它仍然是一个具有挑战性的计算问题。本文研究了广义正交Procrustes问题的半定规划（SDP）松弛，证明了当信噪比较大时，SDP松弛的紧性成立，即SDP估计量正好等于最小二乘估计量。我们还证明了一个有效的具有适当初始值的广义幂方法对最小二乘估计具有全局线性收敛性。此外，我们还分析了Burer-Monteiro分解，结果表明，当信噪比较大时，相应的优化景观不存在虚假的局部最优解。这就解释了为什么随机初始化的一阶黎曼梯度方法尽管具有非凸性，却能得到满意的解。我们工作的一个亮点是，理论保证是纯代数的，不需要对噪声的统计特性作任何假设。我们的结果部分解决了[Bandeira，Khoo，Singer，2014]中提出的一个关于SDP松弛在解决广义正交Procrustes问题时的紧密性的开放问题。数值模拟补充了我们的理论分析。摘要：Given a set of multiple point clouds, how to find the rigid transformations (rotation, reflection, and shifting) such that these point clouds are well aligned? This problem, known as the generalized orthogonal Procrustes problem (GOPP), plays a fundamental role in several scientific disciplines including statistics, imaging science and computer vision. Despite its tremendous practical importance, it is still a challenging computational problem due to the inherent nonconvexity. In this paper, we study the semidefinite programming (SDP) relaxation of the generalized orthogonal Procrustes problems and prove that the tightness of the SDP relaxation holds, i.e., the SDP estimator exactly equals the least squares estimator, if the signal-to-noise ratio (SNR) is relatively large. We also prove that an efficient generalized power method with a proper initialization enjoys global linear convergence to the least squares estimator. In addition, we analyze the Burer-Monteiro factorization and show the corresponding optimization landscape is free of spurious local optima if the SNR is large. This explains why first-order Riemannian gradient methods with random initializations usually produce a satisfactory solution despite the nonconvexity. One highlight of our work is that the theoretical guarantees are purely algebraic and do not require any assumptions on the statistical property of the noise. Our results partially resolve one open problem posed in [Bandeira, Khoo, Singer, 2014] on the tightness of the SDP relaxation in solving the generalized orthogonal Procrustes problem. Numerical simulations are provided to complement our theoretical analysis.

【30】 Personalized Federated Learning with Gaussian Processes 标题：基于高斯过程的个性化联合学习

作者：Idan Achituve,Aviv Shamsian,Aviv Navon,Gal Chechik,Ethan Fetaya 机构：Bar-Ilan University, Israel, NVIDIA, Isreal 链接：https://arxiv.org/abs/2106.15482 摘要：联邦学习旨在学习一个全局模型，该模型在跨客户机通信有限的客户机设备上运行良好。个性化联合学习（PFL）通过学习个性化模型进一步扩展了这种设置，以处理客户机之间的数据异构性。此设置中的一个关键挑战是跨客户机有效地学习，即使每个客户机都有唯一的数据，而这些数据的大小通常是有限的。本文提出了一种基于深度核学习的高斯过程（GPs）的pFedGP算法。GPs是一种高度表达的模型，由于其贝叶斯性质，在低数据区工作良好。然而，将GPs应用于PFL带来了多重挑战。主要地，GPs性能在很大程度上取决于对一个好的内核函数的访问，而学习一个内核需要大量的训练集。因此，我们建议学习所有客户机的共享核函数，通过神经网络参数化，每个客户机使用一个个人GP分类器。我们进一步扩展了pFedGP，使用两种新的方法来包含诱导点，第一种方法有助于提高低数据区域的泛化能力，第二种方法减少了计算量。我们推导了一个新客户机上的PAC-Bayes推广界，并证明了它给出了非空保证。在使用CIFAR-10、CIFAR-100和CINIC-10的标准PFL基准上以及在输入噪声下学习的新设置上进行的大量实验表明，pFedGP实现了良好的校准预测，同时显著优于基线方法，精度增益达到21%。摘要：Federated learning aims to learn a global model that performs well on client devices with limited cross-client communication. Personalized federated learning (PFL) further extends this setup to handle data heterogeneity between clients by learning personalized models. A key challenge in this setting is to learn effectively across clients even though each client has unique data that is often limited in size. Here we present pFedGP, a solution to PFL that is based on Gaussian processes (GPs) with deep kernel learning. GPs are highly expressive models that work well in the low data regime due to their Bayesian nature. However, applying GPs to PFL raises multiple challenges. Mainly, GPs performance depends heavily on access to a good kernel function, and learning a kernel requires a large training set. Therefore, we propose learning a shared kernel function across all clients, parameterized by a neural network, with a personal GP classifier for each client. We further extend pFedGP to include inducing points using two novel methods, the first helps to improve generalization in the low data regime and the second reduces the computational cost. We derive a PAC-Bayes generalization bound on novel clients and empirically show that it gives non-vacuous guarantees. Extensive experiments on standard PFL benchmarks with CIFAR-10, CIFAR-100, and CINIC-10, and on a new setup of learning under input noise show that pFedGP achieves well-calibrated predictions while significantly outperforming baseline methods, reaching up to 21% in accuracy gain.

【31】 Interactive Dimensionality Reduction for Comparative Analysis 标题：用于比较分析的交互式降维方法

作者：Takanori Fujiwara,Xinhai Wei,Jian Zhao,Kwan-Liu Ma 机构： This functionality is important to interactively adjust anembedding result to intuitively find certain patterns of analysts’ target• Takanori Fujiwara and Kwan-Liu Ma are with University of California 备注：This manuscript is currently under review 链接：https://arxiv.org/abs/2106.15481 摘要：找出两组或两组以上数据集之间的异同是一项基本的分析任务。对于高维数据，通常采用降维（DR）方法来寻找各组的特征。然而，现有的灾难恢复方法对这种比较分析提供了有限的能力和灵活性，因为每种方法都是为一个狭窄的分析目标而设计的，例如识别最能区分群体的因素。在这项工作中，我们介绍了一个交互式DR框架，其中我们将我们的新DR方法，称为ULCA（统一线性比较分析），与一个交互式的可视化界面相结合。ULCA统一了两种DR方案：判别分析和对比学习，以支持各种比较分析任务。为了提供比较分析的灵活性，我们开发了一种优化算法，使分析人员能够交互式地细化ULCA结果。此外，我们还提供了一个交互式可视化界面，通过一组丰富的分析库来检查ULCA结果。我们评估了ULCA和优化算法，以显示它们的效率，并使用实际数据集进行了多个案例研究，以证明我们的框架的有效性。摘要：Finding the similarities and differences between two or more groups of datasets is a fundamental analysis task. For high-dimensional data, dimensionality reduction (DR) methods are often used to find the characteristics of each group. However, existing DR methods provide limited capability and flexibility for such comparative analysis as each method is designed only for a narrow analysis target, such as identifying factors that most differentiate groups. In this work, we introduce an interactive DR framework where we integrate our new DR method, called ULCA (unified linear comparative analysis), with an interactive visual interface. ULCA unifies two DR schemes, discriminant analysis and contrastive learning, to support various comparative analysis tasks. To provide flexibility for comparative analysis, we develop an optimization algorithm that enables analysts to interactively refine ULCA results. Additionally, we provide an interactive visualization interface to examine ULCA results with a rich set of analysis libraries. We evaluate ULCA and the optimization algorithm to show their efficiency as well as present multiple case studies using real-world datasets to demonstrate the usefulness of our framework.

【32】 High-dimensional separability for one- and few-shot learning 标题：单次和少次学习的高维可分性

作者：Alexander N. Gorban,Bogdan Grechuk,Evgeny M. Mirkes,Sergey V. Stasenko,Ivan Y. Tyukin 机构：Department of Mathematics, University of Leicester, Leicester, UK, Lobachevsky University, Nizhni Novgorod, Russia, Department of Geoscience and Petroleum, Norwegian University of Science and Technology, , Received: date; Accepted: date; Published: date 链接：https://arxiv.org/abs/2106.15416 摘要：这项工作是由一个实际的问题，人工智能（AI）错误的纠正。对一个大型人工智能系统进行系统的再训练几乎是不可能的。为了解决这个问题，专门的外部设备，校正器，被开发出来。他们应该提供快速和非迭代的系统修复，而不需要修改遗留的人工智能系统。AI校正器的一个通用部分是一个分类器，它应该将不希望的和错误的行为从正常操作中分离出来。训练这样的分类器是一个巨大的挑战，在核心的一个和少数镜头学习方法。简单方法的有效性是基于显著的维度缩减或维度效应的支持。随机可分性是维数现象的一个优点，它允许一次或几次错误纠正：在高维数据集中，在广泛的假设下，每个点都可以通过简单而健壮的线性判别法从集合的其余部分中分离出来。引入了数据域的层次结构，其中每个数据簇都有一个细粒度的内部结构等，建立并证明了新的细粒度数据分布的随机分离定理。在数据空间模式紧嵌入的假设下，证明了无限维极限下的分离定理。提出了一种新的人工智能系统多重校正方法，并以深度卷积神经网络预测误差和学习新类对象为例进行了说明。摘要：This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors. Systematic re-training of a large AI system is hardly possible. To solve this problem, special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system. A common universal part of the AI corrector is a classifier that should separate undesired and erroneous behavior from normal operation. Training of such classifiers is a grand challenge at the heart of the one- and few-shot learning methods. Effectiveness of one- and few-short methods is based on either significant dimensionality reductions or the blessing of dimensionality effects. Stochastic separability is a blessing of dimensionality phenomenon that allows one-and few-shot error correction: in high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust linear discriminant. The hierarchical structure of data universe is introduced where each data cluster has a granular internal structure, etc. New stochastic separation theorems for the data distributions with fine-grained structure are formulated and proved. Separation theorems in infinite-dimensional limits are proven under assumptions of compact embedding of patterns into data space. New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.

【33】 Online Interaction Detection for Click-Through Rate Prediction 标题：用于点击率预测的在线交互检测

作者：Qiuqiang Lin,Chuanhou Gao 机构： Gao are with the School of Mathematical Sciences, ZhejiangUniversity 备注：11pages, 4 figures, 1 supplement 链接：https://arxiv.org/abs/2106.15400 摘要：点击率预测旨在预测特定链接的点击率和印象率。这是一项具有挑战性的任务，因为（1）通常有分类特征，如果采用一种热编码，输入将是非常高维的，（2）不仅原始特征而且它们的相互作用都很重要，（3）有效的预测可能依赖于不同时间段的不同特征和相互作用。为了克服这些困难，我们提出了一种新的交互检测方法，即在线随机交叉链。该方法基于频繁项集挖掘的思想，通过观察随机选取样本的交集来检测信息交互。所发现的交互具有很高的可解释性，因为它们可以理解为逻辑表达式。ORIC可以在每次收集新数据时更新，而无需对历史数据进行重新训练。此外，历史数据和最新数据的重要性可以通过调整参数来控制。设计了一个处理流媒体交互的框架，使得几乎所有现有的CTR预测模型都可以在交互检测后应用。实证结果证明了ORIC在三个基准数据集上的有效性和有效性。摘要：Click-Through Rate prediction aims to predict the ratio of clicks to impressions of a specific link. This is a challenging task since (1) there are usually categorical features, and the inputs will be extremely high-dimensional if one-hot encoding is applied, (2) not only the original features but also their interactions are important, (3) an effective prediction may rely on different features and interactions in different time periods. To overcome these difficulties, we propose a new interaction detection method, named Online Random Intersection Chains. The method, which is based on the idea of frequent itemset mining, detects informative interactions by observing the intersections of randomly chosen samples. The discovered interactions enjoy high interpretability as they can be comprehended as logical expressions. ORIC can be updated every time new data is collected, without being retrained on historical data. What's more, the importance of the historical and latest data can be controlled by a tuning parameter. A framework is designed to deal with the streaming interactions, so almost all existing models for CTR prediction can be applied after interaction detection. Empirical results demonstrate the efficiency and effectiveness of ORIC on three benchmark datasets.

【34】 Scalable Gaussian Processes for Data-Driven Design using Big Data with Categorical Factors 标题：基于分类因素的大数据数据驱动设计的可扩展高斯过程

作者：Liwei Wang,Akshay Iyer,Suraj Yerramilli,Daniel Apley,Ping Zhu,Wei Chen 机构：a. The State Key Laboratory of Mechanical System and Vibration, Shanghai Key Laboratory of Digital Manufacture for Thin-Walled Structures, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China, b. Dept. of Mechanical Engineering 备注：Preprint submitted to Journal of Mechanical Design 链接：https://arxiv.org/abs/2106.15356 摘要：科学和工程问题通常需要使用人工智能来帮助理解和寻找有前途的设计。虽然高斯过程（Gaussian processs，GP）是一种易于使用和解释的学习者，但它们在适应大数据集、分类输入和多个响应方面存在困难，这已成为越来越多的数据驱动设计应用的共同挑战。在本文中，我们提出了一个GP模型，它利用通过变分推理得到的潜在变量和函数来同时解决上述问题。该方法建立在潜变量高斯过程（LVGP）模型的基础上，将分类因子映射到一个连续的潜空间中，实现了混合变量数据集的GP建模。通过将变分推理扩展到LVGP模型中，用一个小的诱导点集代替大的训练数据集来解决可伸缩性问题。输出响应向量由独立的潜在函数的线性组合表示，形成灵活的内核结构来处理可能具有不同行为的多个响应。比较研究表明，该方法对10^4个数据点以上的大数据集具有很好的可扩展性，同时在不需要太多超参数调整的情况下优于现有的机器学习方法。此外，还获得了一个可解释的潜在空间，以便深入了解分类因素的影响，例如与建筑积木和超材料和材料设计中的元素选择相关的因素。我们的方法被证明是机器学习的三元氧化物材料和拓扑优化的多尺度顺应机制与非周期性的微观结构和多种材料。摘要：Scientific and engineering problems often require the use of artificial intelligence to aid understanding and the search for promising designs. While Gaussian processes (GP) stand out as easy-to-use and interpretable learners, they have difficulties in accommodating big datasets, categorical inputs, and multiple responses, which has become a common challenge for a growing number of data-driven design applications. In this paper, we propose a GP model that utilizes latent variables and functions obtained through variational inference to address the aforementioned challenges simultaneously. The method is built upon the latent variable Gaussian process (LVGP) model where categorical factors are mapped into a continuous latent space to enable GP modeling of mixed-variable datasets. By extending variational inference to LVGP models, the large training dataset is replaced by a small set of inducing points to address the scalability issue. Output response vectors are represented by a linear combination of independent latent functions, forming a flexible kernel structure to handle multiple responses that might have distinct behaviors. Comparative studies demonstrate that the proposed method scales well for large datasets with over 10^4 data points, while outperforming state-of-the-art machine learning methods without requiring much hyperparameter tuning. In addition, an interpretable latent space is obtained to draw insights into the effect of categorical factors, such as those associated with building blocks of architectures and element choices in metamaterial and materials design. Our approach is demonstrated for machine learning of ternary oxide materials and topology optimization of a multiscale compliant mechanism with aperiodic microstructures and multiple materials.

【35】 Text mining and sentiment analysis of COVID-19 tweets 标题：冠状病毒推文的文本挖掘与情感分析

作者：Qihuang Zhang,Grace Y. Yi,Li-Pang Chen,Wenqing He 机构： 1Department of Statistical and Actuarial Sciences, University of Western Ontario, Canada 2Department of Computer Science 备注：20 pages, 10 figures, 1 table 链接：https://arxiv.org/abs/2106.15354 摘要：人类严重急性呼吸综合征冠状病毒2型（SARS-Cov-2）引起了COVID-19疾病，并继续在世界各地传播。它不仅威胁着公共卫生和全球经济，而且还威胁着心理健康和情绪。虽然人们对COVID-19大流行的影响进行了广泛的研究，但对人群情绪反应的讨论相对较少。在本文中，我们搜集了微博平台Twitter上与COVID-19相关的tweet，并研究了加拿大四个城市（多伦多、蒙特利尔、温哥华和卡尔加里）和美国四个城市（纽约、洛杉矶、芝加哥和西雅图）从2020年2月24日到2020年10月14日的tweet。应用维德和NRC方法，我们评估情绪强度得分，并将流感大流行不同时期的信息可视化。计算了三种防疫措施（口罩、疫苗和封锁）的微博情绪得分，以进行比较。四个加拿大城市的结果与美国的四个城市进行了比较。我们将回声状态网络方法与收敛交叉映射相结合，研究感染病例、tweet活动与COVID-19相关tweet情绪得分之间的因果关系。我们的分析表明，公众对COVID-19的看法在不同的时间段和地点有所不同。一般来说，人们对COVID-19和口罩的态度是积极的，而对疫苗和禁闭的态度是消极的。因果推断表明，情绪影响人们在Twitter上的活动，这也与每天的感染人数有关。摘要：The human severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2), causing the COVID-19 disease, has continued to spread all over the world. It menacingly affects not only public health and global economics but also mental health and mood. While the impact of the COVID-19 pandemic has been widely studied, relatively fewer discussions about the sentimental reaction of the population have been available. In this article, we scrape COVID-19 related tweets on the microblogging platform, Twitter, and examine the tweets from Feb~24, 2020 to Oct~14, 2020 in four Canadian cities (Toronto, Montreal, Vancouver, and Calgary) and four U.S. cities (New York, Los Angeles, Chicago, and Seattle). Applying the Vader and NRC approaches, we evaluate the sentiment intensity scores and visualize the information over different periods of the pandemic. Sentiment scores for the tweets concerning three anti-epidemic measures, masks, vaccine, and lockdown, are computed for comparisons. The results of four Canadian cities are compared with four cities in the United States. We study the causal relationships between the infected cases, the tweet activities, and the sentiment scores of COVID-19 related tweets, by integrating the echo state network method with convergent cross-mapping. Our analysis shows that public sentiments regarding COVID-19 vary in different time periods and locations. In general, people have a positive mood about COVID-19 and masks, but negative in the topics of vaccine and lockdown. The causal inference shows that the sentiment influences people's activities on Twitter, which is also correlated to the daily number of infections.

【36】 Patch-Based Image Restoration using Expectation Propagation 标题：基于期望传播的面片图像恢复

作者：Dan Yao,Stephen McLaughlin,Yoann Altmann 机构：School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, EH,AS, United Kingdom. 链接：https://arxiv.org/abs/2106.15327 摘要：提出了一种新的基于分片先验分布的期望传播（EP）图像恢复框架。montecarlo技术通常用于从难以处理的后验分布中采样，但在高维推理问题（如图像恢复）中会遇到可伸缩性问题。为了解决这个问题，EP在这里被用来近似后验分布使用产品的多元高斯密度。此外，对这些密度的协方差矩阵施加结构约束允许更大的可伸缩性和分布式计算。虽然该方法自然适合处理加性高斯观测噪声，但也可以推广到非高斯噪声。针对高斯噪声和泊松噪声的去噪、修复和反褶积问题进行的实验表明，这种灵活的近似贝叶斯方法对于成像问题中的不确定性量化具有潜在的优势，并且与采样技术相比，计算成本较低。摘要：This paper presents a new Expectation Propagation (EP) framework for image restoration using patch-based prior distributions. While Monte Carlo techniques are classically used to sample from intractable posterior distributions, they can suffer from scalability issues in high-dimensional inference problems such as image restoration. To address this issue, EP is used here to approximate the posterior distributions using products of multivariate Gaussian densities. Moreover, imposing structural constraints on the covariance matrices of these densities allows for greater scalability and distributed computation. While the method is naturally suited to handle additive Gaussian observation noise, it can also be extended to non-Gaussian noise. Experiments conducted for denoising, inpainting and deconvolution problems with Gaussian and Poisson noise illustrate the potential benefits of such flexible approximate Bayesian method for uncertainty quantification in imaging problems, at a reduced computational cost compared to sampling techniques.

【37】 Face Identification Proficiency Test Designed Using Item Response Theory 标题：基于项目反应理论的人脸识别能力测验设计

作者：Géraldine Jeckeln,Ying Hu,Jacqueline G. Cavazos,Amy N. Yates,Carina A. Hahn,Larry Tang,Jonathon Phillips,Alice J. O'Toole 机构：The University of Texas at Dallas, National Institute of Standards and Technology, University of Central Florida, Alice J. O’Toole 备注：17 pages (including references), 7 figures 链接：https://arxiv.org/abs/2106.15323 摘要：面部识别能力的测量对于确保专业法医面部检查人员和其他在应用场景中执行面部识别任务的人员的准确和一致的表现至关重要。目前的能力测试依赖于静态的刺激项目集，因此不能对同一个人进行多次有效的测试。要创建一个能力测试，必须集合大量“已知”难度的项目。同样难度的多个测试可以用项目的子集来构造。在这里，我们介绍了一个能力测试，三位一体身份匹配（TIM）测试，基于项目反应理论（IRT）的刺激难度测量。参与者查看面部图像“三元组”（N=225）（两张同一身份的图像和一张不同身份的图像）并选择不同的身份。在实验1中，大学生（n＝197）在提姆测验上表现出广泛的准确性。此外，IRT模型显示TIM测验产生不同难度的项目。实验二采用基于IRT的项目难度测量方法，将TIM测试分为三个相同的“简单”和三个相同的“困难”子集。模拟结果表明，全套，以及策展的子集，TIM项目产生了可靠的估计学科能力。总之，TIM测试可以为开发一个灵活、校准和适应性的框架提供一个起点，以衡量不同能力水平的熟练程度（例如，有面部处理缺陷的专业人员或人群）摘要：Measures of face identification proficiency are essential to ensure accurate and consistent performance by professional forensic face examiners and others who perform face identification tasks in applied scenarios. Current proficiency tests rely on static sets of stimulus items, and so, cannot be administered validly to the same individual multiple times. To create a proficiency test, a large number of items of "known" difficulty must be assembled. Multiple tests of equal difficulty can be constructed then using subsets of items. Here, we introduce a proficiency test, the Triad Identity Matching (TIM) test, based on stimulus difficulty measures based on Item Response Theory (IRT). Participants view face-image "triads" (N=225) (two images of one identity and one image of a different identity) and select the different identity. In Experiment 1, university students (N=197) showed wide-ranging accuracy on the TIM test. Furthermore, IRT modeling demonstrated that the TIM test produces items of various difficulty levels. In Experiment 2, IRT-based item difficulty measures were used to partition the TIM test into three equally "easy" and three equally "difficult" subsets. Simulation results indicated that the full set, as well as curated subsets, of the TIM items yielded reliable estimates of subject ability. In summary, the TIM test can provide a starting point for developing a framework that is flexible, calibrated, and adaptive to measure proficiency across various ability levels (e.g., professionals or populations with face processing deficits)

【38】 Deep Random Projection Outlyingness for Unsupervised Anomaly Detection 标题：基于深度随机投影的无监督异常检测

作者：Martin Bauw,Santiago Velasco-Forero,Jesus Angulo,Claude Adnet,Olivier Airiau 机构：Center for Mathematical Morphology, MINES ParisTech, PSL Research University, France, Thales LAS France, Advanced Radar Concepts, Limours, France 链接：https://arxiv.org/abs/2106.15307 摘要：随机投影是一种常用的算法设计技术，广泛应用于各种领域，包括信息检索、压缩感知和粗糙度测量等。在这项工作中，对原始的随机投影展开度量进行了修改，并与神经网络相关联，以获得一种能够处理多模态正态性的无监督异常检测方法。理论和实验论证证明了异常评分估计器的选择、随机投影的维数和投影的数目。研究了自适应丢包的贡献，以及该方法的仿射稳定性。所提出的神经网络方法的性能与最先进的异常检测方法相当。在MNIST、Fashion-MNIST和CIFAR-10数据集上进行的实验表明了该方法的有效性，并提出了一种扩展到半监督系统的可能性。摘要：Random projection is a common technique for designing algorithms in a variety of areas, including information retrieval, compressive sensing and measuring of outlyingness. In this work, the original random projection outlyingness measure is modified and associated with a neural network to obtain an unsupervised anomaly detection method able to handle multimodal normality. Theoretical and experimental arguments are presented to justify the choices of the anomaly score estimator, the dimensions of the random projections, and the number of such projections. The contribution of adapted dropouts is investigated, along with the affine stability of the proposed method. The performance of the proposed neural network approach is comparable to a state-of-the-art anomaly detection method. Experiments conducted on the MNIST, Fashion-MNIST and CIFAR-10 datasets show the relevance of the proposed approach, and suggest a possible extension to a semi-supervised setup.

【39】 US Fatal Police Shooting Analysis and Prediction 标题：美国警察致命枪击事件分析与预测

作者：Yuan Wang,Yangxin Fan 机构：University of Rochester 链接：https://arxiv.org/abs/2106.15298 摘要：我们相信“人人生而平等”。随着媒体报道的警察枪击案的增多，越来越多的美国人认为警察在执法过程中过度使用武力，特别是对特定人群。我们希望运用多维统计分析来揭示比单调的主流媒体更多的事实。我们的论文分为三个部分。首先，提出了一种量化主流媒体（CNN、FOX、ABC、NBC）致命枪击新闻报道偏差的新方法。其次，我们分析了《华盛顿邮报》最全面的美国致命警察枪击案数据集。我们使用FP-growth来揭示频繁模式，并使用DBSCAN聚类来发现致命的射击热点。我们带来了多个属性（社会经济、人口统计、政治倾向、教育程度、持枪率、警察训练时间等）来揭示冰山下的联系。我们发现一个州的警察射击率取决于许多变量。前四个最相关的属性是州加入年份、州土地面积、枪支拥有率和暴力犯罪率。第三，我们提出了四个回归模型来预测州一级的警察射击率。最佳模型Kstar可预测警察的致命射击率，相关系数约为88.53%。提出了梯度增强机、多类分类器、Logistic回归和朴素贝叶斯分类器等分类模型，用于预测警察枪击致死受害者的种族。我们的分类模型没有明显证据表明种族歧视发生在WP数据集记录的致命警察枪击事件中。摘要：We believe that "all men are created equal". With the rise of the police shootings reported by media, more people in the U.S. think that police use excessive force during law enforcement, especially to a specific group of people. We want to apply multidimensional statistical analysis to reveal more facts than the monotone mainstream media. Our paper has three parts. First, we proposed a new method to quantify fatal police shooting news reporting deviation of mainstream media, which includes CNN, FOX, ABC, and NBC. Second, we analyzed the most comprehensive US fatal police shooting dataset from Washington Post. We used FP-growth to reveal the frequent patterns and DBSCAN clustering to find fatal shooting hotspots. We brought multi-attributes (social economics, demographics, political tendency, education, gun ownership rate, police training hours, etc.) to reveal connections under the iceberg. We found that the police shooting rate of a state depends on many variables. The top four most relevant attributes were state joined year, state land area, gun ownership rate, and violent crime rate. Third, we proposed four regression models to predict police shooting rates at the state level. The best model Kstar could predict the fatal police shooting rate with about 88.53% correlation coefficient. We also proposed classification models, including Gradient Boosting Machine, Multi-class Classifier, Logistic Regression, and Naive Bayes Classifier, to predict the race of fatal police shooting victims. Our classification models show no significant evidence to conclude that racial discrimination happened during fatal police shootings recorded by the WP dataset.

【40】 Anomaly Detection and Automated Labeling for Voter Registration File Changes 标题：选民登记文件变更的异常检测与自动标注

作者：Sam Royston,Ben Greenberg,Omeed Tavasoli,Courtenay Cotton 机构：VoteShield, Protect Democracy, New York, USA 链接：https://arxiv.org/abs/2106.15285 摘要：美国选举中的选民资格是由包含哪些公民有资格投票的信息的州数据库拼凑而成的。州和地方一级的行政人员面临着一项极其艰巨的任务，即确保他们的每个管辖区都得到适当管理，同时监测对数据库的不当修改。监督选民登记文件（VRF）的变化至关重要，因为恶意行为者希望破坏美国的民主进程，最好操纵这些文件的内容，以实现其目标。2020年，我们看到选举官员在面对美国历史上最具争议的选举时表现出色，但要确保和监督美国人所依赖的选举制度，仍有许多工作要做。通过比较一段时间内vrf的快照所产生的数据，我们提出了一套利用机器学习减轻分析员和管理员在保护选民名单方面的负担的方法。我们首先通过将异常变化建模为稀疏加性噪声，评估了多种无监督异常检测方法在检测VRF修改中的有效性。在这种情况下，我们确定统计模型比较行政区在很短的时间跨度和非负矩阵分解是最有效的表面异常事件审查。这些方法在2019-2020年期间部署在我们组织的监测系统中，并与爱荷华州国务卿办公室合作使用。此外，我们提出了一个新部署的模型，该模型使用历史和人口统计元数据来标记数据库修改的可能根本原因。我们希望利用这个模型来预测哪些修改已知的原因，从而更好地确定潜在的异常修改。摘要：Voter eligibility in United States elections is determined by a patchwork of state databases containing information about which citizens are eligible to vote. Administrators at the state and local level are faced with the exceedingly difficult task of ensuring that each of their jurisdictions is properly managed, while also monitoring for improper modifications to the database. Monitoring changes to Voter Registration Files (VRFs) is crucial, given that a malicious actor wishing to disrupt the democratic process in the US would be well-advised to manipulate the contents of these files in order to achieve their goals. In 2020, we saw election officials perform admirably when faced with administering one of the most contentious elections in US history, but much work remains to secure and monitor the election systems Americans rely on. Using data created by comparing snapshots taken of VRFs over time, we present a set of methods that make use of machine learning to ease the burden on analysts and administrators in protecting voter rolls. We first evaluate the effectiveness of multiple unsupervised anomaly detection methods in detecting VRF modifications by modeling anomalous changes as sparse additive noise. In this setting we determine that statistical models comparing administrative districts within a short time span and non-negative matrix factorization are most effective for surfacing anomalous events for review. These methods were deployed during 2019-2020 in our organization's monitoring system and were used in collaboration with the office of the Iowa Secretary of State. Additionally, we propose a newly deployed model which uses historical and demographic metadata to label the likely root cause of database modifications. We hope to use this model to predict which modifications have known causes and therefore better identify potentially anomalous modifications.

【41】 Joint Majorization-Minimization for Nonnegative Matrix Factorization with the β-divergence标题：具有β-散度的非负矩阵分解的联合优化-最小化

作者：Arthur Marmin,José Henrique de Morais Goulart,Cédric Févotte 链接：https://arxiv.org/abs/2106.15214 摘要：本文提出了一种新的带$\beta$-散度目标函数的非负矩阵分解乘法更新算法。我们的新更新来自一个联合优化最小化（MM）方案，其中一个辅助函数（目标函数的紧上界）是为两个因素共同建立的，并在每次迭代中最小化。这与传统的方法不同，传统的方法是交替优化因子，并对每个因子分别应用MM方案。与经典方法一样，我们的联合MM算法也会产生易于实现的乘法更新。但是，它们会显著减少计算时间（对于同样好的解），特别是对于一些具有重要应用价值的$\beta$发散，例如平方欧氏距离和Kullback-Leibler或Itakura-Saito发散。我们报告实验结果使用不同的数据集：人脸图像，音频频谱图，高光谱数据和歌曲播放计数。根据$\beta$的值和数据集，与经典的交替方案相比，我们的联合MM方法可以减少大约10\%$到78\%$的CPU时间。摘要：This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $\beta$-divergence objective function. Our new updates are derived from a joint majorization-minimization (MM) scheme, in which an auxiliary function (a tight upper bound of the objective function) is built for the two factors jointly and minimized at each iteration. This is in contrast with the classic approach in which the factors are optimized alternately and a MM scheme is applied to each factor individually. Like the classic approach, our joint MM algorithm also results in multiplicative updates that are simple to implement. They however yield a significant drop of computation time (for equally good solutions), in particular for some $\beta$-divergences of important applicative interest, such as the squared Euclidean distance and the Kullback-Leibler or Itakura-Saito divergences. We report experimental results using diverse datasets: face images, audio spectrograms, hyperspectral data and song play counts. Depending on the value of $\beta$ and on the dataset, our joint MM approach yields a CPU time reduction of about $10\%$ to $78\%$ in comparison to the classic alternating scheme.

【42】 Optimal Rates for Random Order Online Optimization 标题：随机订单在线优化的最优费率

作者：Uri Sherman,Tomer Koren,Yishay Mansour 链接：https://arxiv.org/abs/2106.15207 摘要：我们研究了\ciet{garber2020online}最近提出的随机序模型中的在线凸优化问题，其中损失函数可以由对手选择，但随后以一致随机序呈现给在线算法。针对累积损失函数是（强）凸的，而单个损失函数是光滑的，但可能是非凸的情况，我们给出了达到最优界的算法，并显著优于\ciet{garber2020online}，完全消除了维数依赖，提高了它们对强凸性参数的标度。我们的分析依赖于无替换采样的算法稳定性和泛化之间的新联系，类似于有替换i.i.d.~设置中的研究，以及随机梯度下降的精细平均稳定性分析。摘要：We study online convex optimization in the random order model, recently proposed by \citet{garber2020online}, where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order. Focusing on the scenario where the cumulative loss function is (strongly) convex, yet individual loss functions are smooth but might be non-convex, we give algorithms that achieve the optimal bounds and significantly outperform the results of \citet{garber2020online}, completely removing the dimension dependence and improving their scaling with respect to the strong convexity parameter. Our analysis relies on novel connections between algorithmic stability and generalization for sampling without-replacement analogous to those studied in the with-replacement i.i.d.~setting, as well as on a refined average stability analysis of stochastic gradient descent.

【43】 INN: A Method Identifying Clean-annotated Samples via Consistency Effect in Deep Neural Networks 标题：INN：一种基于一致性效应的深度神经网络清洁标注样本识别方法

作者：Dongha Kim,Yongchan Choi,Kunwoong Kim,Yongdai Kim 机构：Sungsin Women’s University, Seoul National University, Department of Statistics and, Graduate School of Data Science 备注：17 pages, 9 figures 链接：https://arxiv.org/abs/2106.15185 摘要：在许多分类问题中，收集大量干净的带注释的数据是不容易的，因此人们对带噪声标签的数据进行了大量的研究。最新的有噪声标签问题的解决方案是建立在利用记忆效应的小损失策略上的。虽然它是一个强大的工具，记忆效果有几个缺点。学习成绩对利用记忆效应所需的训练时间的选择非常敏感。此外，当标签被严重污染或不平衡时，记忆效应可能不会发生，在这种情况下，基于小损失策略的方法无法识别干净的标签数据。我们引入了一种新的方法INN（integrationwiththenearhorithmearchoods）从带有噪声标签的训练数据中提取干净的标签数据。该方法基于一个新的发现，即干净标记数据的邻域预测模式与噪声标记数据的邻域预测模式不受训练时间的影响。INN方法需要更多的计算量，但比小损失策略更稳定和强大。通过各种实验，我们证明INN方法成功地解决了记忆效果的不足，有助于用带噪声标签的训练数据建立更精确的深度预测模型。摘要：In many classification problems, collecting massive clean-annotated data is not easy, and thus a lot of researches have been done to handle data with noisy labels. Most recent state-of-art solutions for noisy label problems are built on the small-loss strategy which exploits the memorization effect. While it is a powerful tool, the memorization effect has several drawbacks. The performances are sensitive to the choice of a training epoch required for utilizing the memorization effect. In addition, when the labels are heavily contaminated or imbalanced, the memorization effect may not occur in which case the methods based on the small-loss strategy fail to identify clean labeled data. We introduce a new method called INN(Integration with the Nearest Neighborhoods) to refine clean labeled data from training data with noisy labels. The proposed method is based on a new discovery that a prediction pattern at neighbor regions of clean labeled data is consistently different from that of noisy labeled data regardless of training epochs. The INN method requires more computation but is much stable and powerful than the small-loss strategy. By carrying out various experiments, we demonstrate that the INN method resolves the shortcomings in the memorization effect successfully and thus is helpful to construct more accurate deep prediction models with training data with noisy labels.

【44】 Local field reconstruction from rotating coil measurements in particle accelerator magnets 标题：粒子加速器磁体旋转线圈测量的局域场重建

作者：Ion Gabriel Ion,Melvin Liebsch,Abele Simona,Dimitrios Loukrezis,Carlo Petrone,Stephan Russenschuck,Herbert De Gersem,Sebastian Schöps 机构：Institut f¨ur Teilchenbeschleunigung und Elektromagnetische Felder, Technische Universit¨at Darmstadt, Graduate School Computational Engineering, Technische Universit¨at Darmstadt, European Organization for Nuclear Research, CERN, Geneva, Switzerland 链接：https://arxiv.org/abs/2106.15168 摘要：本文提出了一种由分布磁场测量重建粒子加速器磁体三维场解的通用方法。为了利用测量操作的局部性，对拉普拉斯方程进行了特殊的离散化。提取场表示的系数产生一个反问题，该问题通过贝叶斯反演来解决。这不仅可以为不确定性量化铺平道路，而且可以导出合适的正则化。该方法适用于旋转线圈的测量，并可推广到其它测量过程。摘要：In this paper a general approach to reconstruct three dimensional field solutions in particle accelerator magnets from distributed magnetic measurements is presented. To exploit the locality of the measurement operation a special discretization of the Laplace equation is used. Extracting the coefficients of the field representations yields an inverse problem which is solved by Bayesian inversion. This allows not only to pave the way for uncertainty quantification, but also to derive a suitable regularization. The approach is applied to rotating coil measurements and can be extended to any other measurement procedure.

【45】 Evolving-Graph Gaussian Processes 标题：演化图高斯过程

作者：David Blanco-Mulero,Markus Heinonen,Ville Kyrki 机构： GPs have been applied 1School of Electrical Engineering, Aalto University, Finland 2Department of Computer Science 备注：Accepted for publication at ICML 2021 Time Series Workshop (TSW) 链接：https://arxiv.org/abs/2106.15127 摘要：图高斯过程（GGPs）为图结构域提供了一种数据高效的解决方案。现有的方法主要集中在静态结构上，而许多真实的图形数据是动态结构，限制了GGPs的应用。为了克服这个问题，我们提出了演化图高斯过程（e-GGPs）。该方法可以通过邻域核来学习图顶点随时间的转移函数，从而模拟顶点之间的连通性和交互变化。我们评估的性能，我们的方法对时间序列回归问题的图形随着时间的推移发展。我们证明了e-GGPs相对于静态图高斯过程方法的优势。摘要：Graph Gaussian Processes (GGPs) provide a data-efficient solution on graph structured domains. Existing approaches have focused on static structures, whereas many real graph data represent a dynamic structure, limiting the applications of GGPs. To overcome this we propose evolving-Graph Gaussian Processes (e-GGPs). The proposed method is capable of learning the transition function of graph vertices over time with a neighbourhood kernel to model the connectivity and interaction changes between vertices. We assess the performance of our method on time-series regression problems where graphs evolve over time. We demonstrate the benefits of e-GGPs over static graph Gaussian Process approaches.

【46】 Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction 标题：小随机初始化类似于谱学习：过参数化低秩矩阵重构的优化和泛化保证

作者：Dominik Stöger,Mahdi Soltanolkotabi 机构：Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California 备注：80 pages 链接：https://arxiv.org/abs/2106.15013 摘要：近年来，对于基于梯度的超参数化非凸损失方法的收敛性和推广性的研究取得了重要的理论进展。然而，许多方面的优化和推广，特别是小随机初始化的关键作用还没有完全了解。在本文中，我们通过证明小随机初始化和几个梯度下降迭代的行为类似于流行的谱方法，朝着揭开这个角色的神秘面纱迈出了一步。我们还表明，这种来自小随机初始化的隐式谱偏差，对于过参数化模型更为显著，也使梯度下降迭代在特定的轨迹上朝向不仅全局最优而且具有良好泛化性的解。具体地说，我们关注的问题是通过一个自然的非凸公式从几个测量值重建一个低秩矩阵。在这种情况下，我们证明了从小的随机初始化开始的梯度下降迭代的轨迹可以近似地分解为三个阶段：（I）谱或对齐阶段，其中我们表明迭代具有类似于谱初始化的隐式谱偏差，这使得我们可以证明在这个阶段结束时，列空间迭代和底层低秩矩阵充分对齐，（II）一个鞍回避/细化阶段，我们表明梯度迭代的轨迹远离某些退化鞍点，以及（III）局部细化阶段，我们证明在避免鞍点之后，迭代快速收敛到底层低秩矩阵。我们分析的基础是对超参数非凸优化方案的分析，这些方案可能对低秩重建以外的计算问题有影响。摘要：Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models. Nevertheless, many aspects of optimization and generalization and in particular the critical role of small random initialization are not fully understood. In this paper, we take a step towards demystifying this role by proving that small random initialization followed by a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, also puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well. Concretely, we focus on the problem of reconstructing a low-rank matrix from a few measurements via a natural nonconvex formulation. In this setting, we show that the trajectory of the gradient descent iterations from small random initialization can be approximately decomposed into three phases: (I) a spectral or alignment phase where we show that that the iterates have an implicit spectral bias akin to spectral initialization allowing us to show that at the end of this phase the column space of the iterates and the underlying low-rank matrix are sufficiently aligned, (II) a saddle avoidance/refinement phase where we show that the trajectory of the gradient iterates moves away from certain degenerate saddle points, and (III) a local refinement phase where we show that after avoiding the saddles the iterates converge quickly to the underlying low-rank matrix. Underlying our analysis are insights for the analysis of overparameterized nonconvex optimization schemes that may have implications for computational problems beyond low-rank reconstruction.

【47】 A Note on the Topology of the First Stage of 2SLS with Many Instruments 标题：关于多仪器2SLS第一级拓扑的一点注记

作者：Guy Tchuente 机构：University of Kent 备注：21 链接：https://arxiv.org/abs/2106.15003 摘要：估计量的有限样本性质通常用渐近理论来理解或近似。两个主要的渐近结构已经被用来描述许多仪器的存在。第一种假设仪器的数量随着样本量的增加而增加。我证明，在这种情况下，渐近构造中使用的一个关键假设可能意味着“有效”工具的数量应该是有限的，从而导致内部矛盾。第二种渐近表示法认为辅助变量的个数可以是有限的，无限的，甚至是连续的。数字不随样本大小而变化。在这种情况下，得到的正则化估计依赖于施加在仪器集上的拓扑以及正则化参数。这些限制可能导致偏见或限制可接受的文书。然而，这些假设在内部是一致的。许多IV渐近假设的局限性为有限样本分布研究提供了支持，以便更好地理解许多IV估计量的行为。摘要：The finite sample properties of estimators are usually understood or approximated using asymptotic theories. Two main asymptotic constructions have been used to characterize the presence of many instruments. The first assumes that the number of instruments increases with the sample size. I demonstrate that in this case, one of the key assumptions used in the asymptotic construction may imply that the number of ``effective" instruments should be finite, resulting in an internal contradiction. The second asymptotic representation considers that the number of instrumental variables (IVs) may be finite, infinite, or even a continuum. The number does not change with the sample size. In this scenario, the regularized estimator obtained depends on the topology imposed on the set of instruments as well as on a regularization parameter. These restrictions may induce a bias or restrict the set of admissible instruments. However, the assumptions are internally coherent. The limitations of many IVs asymptotic assumptions provide support for finite sample distributional studies to better understand the behavior of many IV estimators.

【48】 Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment 标题：基于信用分配算法无关性的强化学习模块性

作者：Michael Chang,Sidhant Kaushik,Sergey Levine,Thomas L. Griffiths 机构：Equal contribution 1Department of Computer Science, USA 2Department of ComputerScience, Princeton University 备注：Long Presentation at the Thirty-eighth International Conference on Machine Learning (ICML) 2021. 21 pages, 11 figures 链接：https://arxiv.org/abs/2106.14993 摘要：许多迁移问题需要重新使用以前的最优决策来解决新的任务，这表明需要学习算法来修改选择特定动作的机制，而不是选择其他动作的机制。然而，如何实现这种模块化的信用分配，目前还没有一种形式主义或理论。为了回答这个问题，我们定义模块化信用分配作为一个约束，以最小化不同决策的反馈信号之间的算法互信息。通过对学习算法本身进行因果分析，我们引入了我们称之为模块化的准则来检验学习算法是否满足这个约束。我们将最近提出的社会决策框架推广为比Markov决策过程更细粒度的形式主义，以证明对于不包含循环的决策序列，某些单步时间差动作值方法满足这个准则，而所有的策略梯度方法则不满足。经验证据表明，这种行动价值方法比政策梯度方法对转移问题的样本效率更高，转移问题只需要对先前最优决策序列进行稀疏的改变。摘要：Many transfer problems require re-using previously optimal decisions for solving new tasks, which suggests the need for learning algorithms that can modify the mechanisms for choosing certain actions independently of those for choosing others. However, there is currently no formalism nor theory for how to achieve this kind of modular credit assignment. To answer this question, we define modular credit assignment as a constraint on minimizing the algorithmic mutual information among feedback signals for different decisions. We introduce what we call the modularity criterion for testing whether a learning algorithm satisfies this constraint by performing causal analysis on the algorithm itself. We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process to prove that for decision sequences that do not contain cycles, certain single-step temporal difference action-value methods meet this criterion while all policy-gradient methods do not. Empirical evidence suggests that such action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.

【49】 On component interactions in two-stage recommender systems 标题：两阶段推荐系统中的组件交互研究

作者：Jiri Hron,Karl Krauth,Michael I. Jordan,Niki Kilbertus 机构：University of Cambridge, UC Berkeley, Helmholtz AI, Munich 链接：https://arxiv.org/abs/2106.14979 摘要：由于其可扩展性，两阶段推荐被当今许多最大的在线平台使用，包括YouTube、LinkedIn和Pinterest。这些系统通过两个步骤产生建议：（i）多个提名者——调整为低预测延迟——从整个项目库中预选一小部分候选人(ii）~更慢但更准确的分级进一步缩小了指定项目的范围，并为用户服务。尽管两阶段推荐算法很受欢迎，但是关于两阶段推荐算法的文献相对较少，而且算法通常被视为各部分的总和。这种处理假定两个阶段的性能是由单独部署的组件的行为来解释的。事实并非如此：利用合成和真实世界的数据，我们证明了ranker和提名者之间的相互作用实质上影响了整体绩效。基于这些发现，我们导出了一个推广下界，它表明仔细选择每个提名者的训练集有时是一个差的和一个最优的两阶段推荐者之间的唯一区别。由于手动搜索一个好的选择是困难的，我们学习一个代替。特别是，使用混合专家方法，我们训练提名者（专家）专门处理项目库的不同子集。这将显著提高性能。摘要：Thanks to their scalability, two-stage recommenders are used by many of today's largest online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce recommendations in two steps: (i) multiple nominators -- tuned for low prediction latency -- preselect a small subset of candidates from the whole item pool; (ii)~a slower but more accurate ranker further narrows down the nominated items, and serves to the user. Despite their popularity, the literature on two-stage recommenders is relatively scarce, and the algorithms are often treated as the sum of their parts. Such treatment presupposes that the two-stage performance is explained by the behavior of individual components if they were deployed independently. This is not the case: using synthetic and real-world data, we demonstrate that interactions between the ranker and the nominators substantially affect the overall performance. Motivated by these findings, we derive a generalization lower bound which shows that careful choice of each nominator's training set is sometimes the only difference between a poor and an optimal two-stage recommender. Since searching for a good choice manually is difficult, we learn one instead. In particular, using a Mixture-of-Experts approach, we train the nominators (experts) to specialize on different subsets of the item pool. This significantly improves performance.

【50】 Robust Distributed Optimization With Randomly Corrupted Gradients 标题：具有随机破坏梯度的鲁棒分布式优化

作者：Berkay Turan,Cesar A. Uribe,Hoi-To Wai,Mahnoosh Alizadeh 机构：C´esar A. Uribe 备注：17 pages, 3 figures, submitted to IEEE TSP 链接：https://arxiv.org/abs/2106.14956 摘要：在本文中，我们提出了一个一阶分布式优化算法，该算法对拜占庭失败具有很强的鲁棒性，其中所有参与的代理都容易失败。我们将每个代理的状态建模为一个两状态马尔可夫链，表示不同时刻的拜占庭行为或可信行为。我们在任何时候都不限制拜占庭特工的最大数量。我们设计的方法基于三层防御：1）时间梯度平均，2）鲁棒聚合，3）梯度归一化。我们研究了随机优化的两种设置，即样本平均逼近和随机逼近，证明了对于强凸和光滑非凸代价函数，我们的算法获得了阶最优的统计误差和收敛速度。摘要：In this paper, we propose a first-order distributed optimization algorithm that is provably robust to Byzantine failures-arbitrary and potentially adversarial behavior, where all the participating agents are prone to failure. We model each agent's state over time as a two-state Markov chain that indicates Byzantine or trustworthy behaviors at different time instants. We set no restrictions on the maximum number of Byzantine agents at any given time. We design our method based on three layers of defense: 1) Temporal gradient averaging, 2) robust aggregation, and 3) gradient normalization. We study two settings for stochastic optimization, namely Sample Average Approximation and Stochastic Approximation, and prove that for strongly convex and smooth non-convex cost functions, our algorithm achieves order-optimal statistical error and convergence rates.

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-06-30，如有侵权请联系 cloudcommunity@tencent.com 删除

linux