前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Using factor analysis for decomposition分解之因子分析

Using factor analysis for decomposition分解之因子分析

作者头像
到不了的都叫做远方
修改2020-04-20 10:14:55
6230
修改2020-04-20 10:14:55
举报

Factor analysis is another technique we can use to reduce dimensionality. However, factor analysis makes assumptions and PCA does not. The basic assumption is that there are implicit features responsible for the features of the dataset.

因子分析是我们能用于降维的另一项技术,然而,因子分析使用假设,而PCA不需要,最基本的假设是有隐藏函数能代表数据集的特征。

This recipe will boil down to the explicit features from our samples in an attempt to understand the independent variables as much as the dependent variables.

这个方法把我们例子送确定的实例压缩,企图像理解因变量那样来理解自变量

Getting ready准备工作

To compare PCA and factor analysis, let's use the iris dataset again, but we'll first need to load the factor analysis class:为了比较PCA和因子分析,我们再次使用iris数据集,单我们首先需要载入因子分析类:

from sklearn.decomposition import FactorAnalysis

How to do it...怎么做

From a programming perspective, factor analysis isn't much different from PCA:从程序的角度来看,因子分析与PCA没什么不同,

fa = FactorAnalysis()
iris_two_dim = fa.fit_transform(iris.data)
iris_two_dim[:5]
array([[-1.33125848, 0.55846779],
       [-1.33914102, -0.00509715],
       [-1.40258715, -0.307983 ],
       [-1.29839497, -0.71854288],
       [-1.33587575, 0.36533259]])

Compare the following plot to the plot in the last section:比较上节最后讲到的图中的这些散点

Since factor analysis is a probabilistic transform, we can examine different aspects such as the log likelihood of the observations under the model, and better still, compare the log likelihoods across models.

Factor analysis is not without flaws. The reason is that you're not fitting a model to predict an outcome, you're fitting a model as a preparation step. This isn't a bad thing per se, but errors here compound when training the actual model.

因为因子分析是基于概率的变换,我们能测试不同层面,例如通过模型观察形如log的形式,甚至可以做关于log的交叉方法。

因子分析并不是没有错误,因为你无法你和一个模型来预测一个结果,你是在拟合一个模型的预处理阶段。

How it works...如何工作的

Factor analysis is similar to PCA, which was covered previously. However, there is an important distinction to be made. PCA is a linear transformation of the data to a different space where the first component "explains" the variance of the data, and each subsequent component is orthogonal to the first component.

因子分析很像恰恰相反的PCA,然而,有一个很大的区别,PCA是通过对数据进行线性变换到不同的能够用一个结构解释数据偏移量的空间,并且随后的成分都与第一个成分垂直。

For example, you can think of PCA as taking a dataset of N dimensions and going down to some space of M dimensions, where M < N.

例如,你能想象PCA是吧一个N维数据集降维到其他为M维的空间,其中M<N

Factor analysis, on the other hand, works under the assumption that there are only M important features and a linear combination of these features (plus noise) creates the dataset in N dimensions. To put it another way, you don't do regression on an outcome variable, you do regression on the features to determine the latent factors of the dataset.

因子分析则相反,在假设成立的情况下,只有M个重要的特征,是经由N维数据集通过线性结合成的,换句话说,我们不对输出进行回归分析,而是通过对数据集进行回归分析来确定其中潜在的因子

本文系外文翻译,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文系外文翻译前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
作者已关闭评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档