做PCA的函数有很多,但是一直没有搞清楚他们的差别。正好最近有看到一篇公众号在说这个事情,我顺便也总结一下。
我们在R中输入的数据类型有两类,分别为R mode和Q mode。一般来说数据每一列为一个变量(variable),每一行为一个数据(observation)。其中R mode的数据行数大于列数,是基于变量的分析;Q mode数据列数大于行数,是基于数据的分析。而OTU表一般情况下样本数小于OTU数,属于R mode型数据。
Princomp和prcomp都是R自带的stats包中的函数。Princomp只能用于R mode,它基于协方差(covariance) 或者相关矩阵(correlation) 提取的特征(eigen)并进行特征值分解。默认用法为x.princomp=princomp(x,cor = FALSE, scores = TRUE)。Cor是逻辑值的参数,默认cor = FALSE用协方差矩阵计算。cor = TRUE就会用相关矩阵计算特征值。Princomp的说明文档中推荐prcomp更好:
The calculation is done using eigen
onthe correlation or covariance matrix, as determined by cor
. This is done for compatibility with the S-PLUSresult. A preferred method of calculation is to use svd
on x
, as is done in prcomp
.
Prcomp对于R mode和Q mode都可以使用,它基于奇异值分解singular value decomposition(svd)。默认用法为x.prcomp=prcomp(x,center = TRUE,scale. = FALSE)。
Center为逻辑值,决定是否要以0为中心化,Scale为逻辑值,决定是否要以单位方差进行标准化。该函数文档中说这种方法在数值上更准确:This is generally the preferred method fornumerical accuracy。
Princomp和prcomp在算法上略有差异。除了分别为特征值分解和奇异值分解外,两者在之前计算协方差的时候标准化的过程存在差异:princomp计算时分母为N,而prcomp分母为
N-1。
Rda是vegan包的一个函数,我自己一直用的是rda这个函数来做PCA。虽然简单,但是功能强大。只输入OTU表时做PCA,如果再加上环境因子就做RDA。函数的说明文档中没有专门提做PCA时的方法。但是做RDA采用的是奇异值分解。
对一批数据进行了测试,发现三种方法解释度基本一样,princomp和prcomp的标准偏差也很相似。Princomp cor=TRUE 或FALSE对标准偏差和解释量会产生一定的影响。
#princomp, cor=FLASE >x.princomp=princomp(x,cor = FALSE, scores = TRUE) >summary.princomp=summary(x.princomp)Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6Standard deviation 158.0098553 39.56810555 27.9073064 25.59040875 19.16191625 17.393073476Proportion of Variance 0.8220015 0.05154597 0.0256413 0.02156049 0.01208877 0.009959944Cumulative Proportion 0.8220015 0.87354743 0.8991887 0.92074922 0.93283799 0.942797936 Comp.7 Comp.8 Comp.9 Comp.10 Standard deviation 15.600129519 14.280121953 13.789363984 12.467862969 Proportion of Variance 0.008012363 0.006713795 0.006260265 0.005117859 Cumulative Proportion 0.950810299 0.957524094 0.963784359 0.968902217 #princomp, cor=TRUE>x.princomp=princomp(x,cor = TRUE, scores = TRUE)>summary.princomp=summary(x.princomp)Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7Standard deviation 4.958829 1.33309878 1.00000725 0.93005754 0.8879892 0.69828170 0.61823692Proportion of Variance 0.768437 0.05553601 0.03125045 0.02703147 0.0246414 0.01523742 0.01194428Cumulative Proportion 0.768437 0.82397302 0.85522347 0.88225494 0.9068963 0.92213376 0.93407804 Comp.8 Comp.9 Comp.10 Standard deviation 0.543226379 0.508760183 0.496698409 Proportion of Variance 0.009221716 0.008088654 0.007709666 Cumulative Proportion 0.943299752 0.951388406 0.959098072 #prcomp>x.prcomp=prcomp(x,center = TRUE, scale. = FALSE)>summary.prcomp=summary(x.prcomp)Importance of components%s: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8Standard deviation 158.023 39.57147 27.90968 25.59258 19.16354 17.39455 15.60145 14.28133Proportion of Variance 0.822 0.05155 0.02564 0.02156 0.01209 0.00996 0.00801 0.00671Cumulative Proportion 0.822 0.87355 0.89919 0.92075 0.93284 0.94280 0.95081 0.95752 PC9 PC10 Standard deviation 13.79054 12.46892 Proportion of Variance 0.00626 0.00512 Cumulative Proportion 0.96378 0.96890 #rda>x.pca = rda(x)>summary(x.pca)
#注意,这里如果先将OTU表转置一下再做princomp就会报错:
> x.princomp=princomp(t(x),cor = TRUE, scores = TRUE);x.princomp
Error in princomp.default(t(x), cor = TRUE, scores = TRUE) : 'princomp'只能在单位比变量多的情况下使用
#再次证明了princomp不能用于Q mode型数据。
参考材料:
https://stats.stackexchange.com/questions/9500/why-do-the-r-functions-princomp-and-prcomp-give-different-eigenvalues
https://mp.weixin.qq.com/s?__biz=MzU5OTM1OTk0MA==&mid=2247484166&idx=1&sn=15a0877fbbab19c75d32a849dc04ece2&chksm=feb76badc9c0e2bb3c9c6021a88bb7785c28f3d2ccd6b63426793906b6881fadb21204353b08&mpshare=1&scene=1&srcid=1130lGzuiCxmLAJlZwvFuiRw&pass_ticket=rdDMz%2Bbpbq8TMr3Q9Hn4toBgGUJVesJ1NyaSe1b2EX0%2B4yWnO2kkzUioHfREJhpj#rd
http://cache.baiducontent.com/c?m=9d78d513d98101f404bcc9214f029026475bda257a9591027ea78e55d6211e564711a2e67d650704a49421381cfc545ce8f33777370526bc8ccffc40ddbf942929&p=87769a47ca934eac5ae9ca12534dbb&newp=8b2a9753cd9b52ff57ee967a450d81231610db2151d6d5106b82c825d7331b001c3bbfb423261001d5c0786c0ba54858ebfa3470300625a3dda5c91d9fb4c574799e3c6c63&user=baidu&fm=sc&query=princomp&qid=cee4ebea00000bfe&p1=5
https://blog.csdn.net/lfz_carlos/article/details/48442091