专栏首页翻译scikit-learn CookbookUsing factor analysis for decomposition分解之因子分析

Using factor analysis for decomposition分解之因子分析

Factor analysis is another technique we can use to reduce dimensionality. However, factor analysis makes assumptions and PCA does not. The basic assumption is that there are implicit features responsible for the features of the dataset.


This recipe will boil down to the explicit features from our samples in an attempt to understand the independent variables as much as the dependent variables.


Getting ready准备工作

To compare PCA and factor analysis, let's use the iris dataset again, but we'll first need to load the factor analysis class:为了比较PCA和因子分析,我们再次使用iris数据集,单我们首先需要载入因子分析类:

from sklearn.decomposition import FactorAnalysis

How to do it...怎么做

From a programming perspective, factor analysis isn't much different from PCA:从程序的角度来看,因子分析与PCA没什么不同,

fa = FactorAnalysis()
iris_two_dim = fa.fit_transform(iris.data)
array([[-1.33125848, 0.55846779],
       [-1.33914102, -0.00509715],
       [-1.40258715, -0.307983 ],
       [-1.29839497, -0.71854288],
       [-1.33587575, 0.36533259]])

Compare the following plot to the plot in the last section:比较上节最后讲到的图中的这些散点

Since factor analysis is a probabilistic transform, we can examine different aspects such as the log likelihood of the observations under the model, and better still, compare the log likelihoods across models.

Factor analysis is not without flaws. The reason is that you're not fitting a model to predict an outcome, you're fitting a model as a preparation step. This isn't a bad thing per se, but errors here compound when training the actual model.



How it works...如何工作的

Factor analysis is similar to PCA, which was covered previously. However, there is an important distinction to be made. PCA is a linear transformation of the data to a different space where the first component "explains" the variance of the data, and each subsequent component is orthogonal to the first component.


For example, you can think of PCA as taking a dataset of N dimensions and going down to some space of M dimensions, where M < N.


Factor analysis, on the other hand, works under the assumption that there are only M important features and a linear combination of these features (plus noise) creates the dataset in N dimensions. To put it another way, you don't do regression on an outcome variable, you do regression on the features to determine the latent factors of the dataset.



原文作者:Trent Hauck


  • Feature selection特征选择

    This recipe along with the two following it will be centered around automatic fe...

  • Evaluating the linear regression model评估线性回归模型

    In this recipe, we'll look at how well our regression fits the underlying data. ...

  • Scaling data to the standard normal缩放数据到标准正态形式

    A preprocessing step that is almost recommended is to scale columns to the stand...

  • 在犯罪会话数据中使用网络知识改善说话人识别(CS SI)


  • Kubernetes Scheduler Extender浅析

    Scheduler 组件可以视为一种监视 watche 和将 Pod 分配 assign 到 Node 的特殊类型控制器 controller。在 Kubern...

  • SAP CDS view自学教程之一:如何测试基于SAP CDS view自动生成的OData服务

    I am a newbie of CDS view related topic and recently I have to learn it. I will ...

    Jerry Wang
  • 1D卷积入门:一维卷积是如何处理数字信号的


  • A Day in the Life of a Web Page Request

    Author: bakari   Date: 2012.5.23 老师上课的时候给了一张图,个人感觉非常经典,几乎将请求一个网页所要进行的流程都弄得非常详细,对...

  • Educational Codeforces Round 44 (Rated for Div. 2)A. Chess Placing

    You are given a chessboard of size 1 × n. It is guaranteed that n is even. The c...

  • Duke@coursera 数据分析与统计推断unit6introduction to linear regression

    properties (I) the magnitude (absolutevalue) of the correlation coefficient meas...