What is PCA ?

figure cited here, recommend reading: A step by step explanation of Principal Component Analysis

PCA,Principal Component Analysis, is a dimensionality-reduction method. It can reduce the number of variables of a data set, using one or more components to represent the original data.

Principal components are constructed as linear combinations of the initial variables.

Geometrically speaking, principal components are new axes with the most spread out projection of all the data points.

The more spread out, the more variance they carry, the more information they can keep, so PCA can reduce the dimensionality and preserve as much information as possible.

Step 1: Standardization

This step transforms all the variables to the same scale, because PCA is quite sensitive regarding the variances of the initial variables.

Step 2: Compute the Covariance Matrix

This matrix can reflect relationships among all the variables, and high correlation means redundant information.

Step 3: Compute the eigenvectors and eigenvalues of the covariance matrix

The eigenvectors of the Covariance matrix are Principal Components,since these directions have the most variance, and eigenvalues are the amount of variance carried in each Principal Component.

Step 4: Keep p components

Rank the eigenvalues from highest to lowest, for example, PC1 may carry 95% of the variance and PC2 carries 5%. We can keep all components or discard some of lesser significance ones.

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 【LEETCODE】模拟面试-215. Kth Largest Element in an Array

    图:新生大学 https://leetcode.com/problems/kth-largest-element-in-an-array/ Find the ...

    杨熹
  • The Chinese zodiac

    haoLan: The Chinese zodiac, explained Asking a zodiac sign is a polite way of as...

    杨熹
  • 用数学为爱情保鲜

    16/5/22 数学的力量 爱情数学 心得: 数学的力量是很强大的,它存在于我们的生活中,影响着我们的生活,无处不在。 说得简单一些,数学就是一门研究...

    杨熹
  • CF---(452)A. Eevee

    A. Eevee time limit per test 1 second memory limit per test 256 megabytes in...

    Gxjun
  • 从“London”出发,8步搞定自然语言处理(Python代码)

    【新智元导读】自然语言处理是AI的一个子领域,从人们日常沟通所用的非结构化文本信息中提取结构化数据,以便计算机理解。本文用通俗易懂的语言深入浅出的介绍了自然语言...

    新智元
  • Using factor analysis for decomposition分解之因子分析

    Factor analysis is another technique we can use to reduce dimensionality. Howeve...

    到不了的都叫做远方
  • 【每日一题】问题 1111: Cylinder

    Using a sheet of paper and scissors, you can cut out two faces to form a cylinde...

    编程范 源代码公司
  • Educational Codeforces Round 44 (Rated for Div. 2)A. Chess Placing

    You are given a chessboard of size 1 × n. It is guaranteed that n is even. The c...

    用户2965768
  • JDBC读取数据优化-fetch size

    最近由于业务上的需求,一张旧表结构中的数据,需要提取出来,根据规则,导入一张新表结构中,开发同学写了一个工具,用于实现新旧结构的transformation,

    bisal
  • 与异构服务器在线联合布置和分配虚拟网络功能(Networking and Internet Architecture)

    网络功能虚拟化(NFV)是一种新兴的虚拟化技术,具有显著降低成本和提高服务敏捷性的潜力。网络功能虚拟化使因特网服务提供商(isp)能够在不安装新设备的情况下使用...

    用户6869393

扫码关注云+社区

领取腾讯云代金券