前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >高维数据 | R语言绘图基础之主成分分析

高维数据 | R语言绘图基础之主成分分析

作者头像
黑妹的小屋
发布2020-08-06 15:05:15
1.8K0
发布2020-08-06 15:05:15
举报

高维数据可视化之主成分分析

在视觉性方面,人类普遍能够感知的是二维和三维空间。对于高维数据的可视化是将高维数据投影到二维或三维空间,去掉冗余属性,同时保留高维空间的数据和特征。说白了,高维数据的可视化就是使用降维度方法,主要分成线性和非线性两大类,关于非线性的非度量多维尺度分析NMDS见往期文章非度量多维尺度分析(NMDS),关于线性的PCA方法,见往期文章PCA做图最佳搭档-ggbiplot,本文主要针对迷弥小粉丝关于绘制线性PCA图数据处理过程遇到的问题进行记录。

01

原始数据

#迷弥小可爱提供的原始数据,为保密具体数值,此处公布随机数生成部分数据截图如下:

原始数据来看,变量“处理组”和“时间”属于中文字符型,需要将其改成英文的字符型“Treament”和“Time”,用reader时才可被RStudio识别。

02

ggbiplot绘制

#数据的导入。

library(readr)
Enzyme <- read_csv("~/Desktop/主成分分析/Enzyme.csv")
#导入
View(Enzyme)

#查看Enzyme,确保导入成功,不乱码。

#数据处理。

我们处理数据的目标是研究不同的Treament对MDA,LOX等指标的影响。

>df1<-Enzyme[,2:10]
#预处理,选取Enzyme数据框中2到10列的数据形成新的矩阵df1.
> df1
# A tibble: 36 x 9
    Time   MDA   LOX   APX   PAL   CAT   POD   PPO   TPC
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1    40    41    19    58    80 186.     40    20   965
 2    40    42    20    55   155 172.     34    37   955
 3    40    43    17    51    92 166.     30    36   937
 4    40    30    23    57   208 163.     85    23  1261
 5    40    34    24    58   180 146.     89    26  1249
 6    40    38    24    52    99 167.     81    21  1269
 7    40    40    27    55   150  70.7    81    16  1322
 8    40    30    30    54   104  56.7    81    16  1312
 9    40    43    29    55   167  73.7    82    22  1357
10    40    46    27    58    96  66.7    64    22  1080
# … with 26 more rows
>Enzyme1.pca <- prcomp(df1,scale. = TRUE)
#对给定的数据矩阵进行主成分分析,并将结果作为类prcomp的对象返回。
> Enzyme1.pca
Standard deviations (1, .., p=9):
[1] 2.0909293 1.1638511 1.0689045 0.9458037 0.7227202 0.6075112 0.5181985 0.2353166 0.1451410

Rotation (n x k) = (9 x 9):
             PC1         PC2         PC3          PC4         PC5         PC6         PC7
Time  0.46315503 -0.04916147  0.03637764  0.162115952  0.09831722 -0.05605495  0.15384446
MDA   0.43051775 -0.04773755 -0.14795093 -0.069491134 -0.36795168 -0.34716366 -0.30304858
LOX  -0.09369929  0.32940027 -0.72371726 -0.441381496 -0.08318472 -0.00732021  0.37465711
APX  -0.08716532 -0.75858481 -0.10162202 -0.059575764 -0.45541633  0.34852207  0.26596939
PAL   0.18847050 -0.30570836  0.24096297 -0.784312315  0.43453792 -0.08163434 -0.03770386
CAT  -0.37539362 -0.03778714  0.33391251 -0.026114784 -0.18066354 -0.69437542  0.46568301
POD   0.41870137  0.00242730  0.02591931  0.254736824  0.31195985  0.10630817  0.65436944
PPO   0.44279023 -0.08209132 -0.12802074 -0.008043973 -0.27235413 -0.34781106 -0.01258019
TPC  -0.18960223 -0.45795031 -0.50603522  0.299564950  0.49765782 -0.36567563 -0.16354171
              PC8         PC9
Time -0.423321246  0.73470281
MDA  -0.488029855 -0.44691662
LOX  -0.086060923  0.09687049
APX  -0.054174793  0.02299334
PAL   0.013325895 -0.02694037
CAT  -0.113141408  0.03184374
POD   0.038176838 -0.46993978
PPO   0.746837400  0.16634344
TPC  -0.009080942 -0.01764606

#ggbiplot绘图

> library(ggbiplot)
>ggbiplot(Enzyme1.pca,obs.scale = 1,var.scale = 1,
groups =Enzyme$Treament,ellipse = TRUE,circle = TRUE)
+scale_color_discrete(name="")
+theme(legend.direction = 'horizontal'
,legend.position = 'top')

03

fviz_pca_ind绘制-1

#数据处理。

df<-Enzyme[c(2,3,4,5,6,7,8,9,10)]
View(df)
> Enzyme.pca<-PCA(df,graph=FALSE)
> Enzyme.pca
**Results for the Principal Component Analysis (PCA)**
The analysis was performed on 36 individuals, described by 9 variables
*The results are available in the following objects:


   name               description                          
1  "$eig"             "eigenvalues"                        
2  "$var"             "results for the variables"          
3  "$var$coord"       "coord. for the variables"           
4  "$var$cor"         "correlations variables - dimensions"
5  "$var$cos2"        "cos2 for the variables"             
6  "$var$contrib"     "contributions of the variables"     
7  "$ind"             "results for the individuals"        
8  "$ind$coord"       "coord. for the individuals"         
9  "$ind$cos2"        "cos2 for the individuals"           
10 "$ind$contrib"     "contributions of the individuals"   
11 "$call"            "summary statistics"                 
12 "$call$centre"     "mean of the variables"              
13 "$call$ecart.type" "standard error of the variables"    
14 "$call$row.w"      "weights for the individuals"        
15 "$call$col.w"      "weights for the variables"      

#fviz_pca_ind()绘图

> library(factoextra)
> library(FactoMineR)
> fviz_pca_ind(Enzyme.pca,geom.ind="point",pointsize=3,
pointshape=21,fill.ind=Enzyme$Treament,
palette=c("#00AFBB","#E7B800","#FC4E07","blue"),
addEllipses=TRUE,legend.titl="Groups",title="")
+theme_grey()

04

fviz_pca_ind绘制-2

#数据处理。

我们处理数据的目标是研究在不同的作用时间Time对MDA,LOX等指标的影响。若直接按照fviz_pca_ind绘制-1提供的方法绘制PCA,这时候会出现以下错误:

>fviz_pca_ind(Enzyme2.pca,geom.ind="point",pointsize=4,pointshape=21,fill.ind=Enzyme$Time,palette=c("#00AFBB","#E7B800","#FC4E07"),addEllipses=TRUE,legend.titl="Groups",title="")+theme_grey()
错误: Continuous value supplied to discrete scale

这是什么原因呢???我们来查看数据框

>str(Enzyme$Time)
num [1:36] 40 40 40 40 40 40 40 40 40 40 ...
#查看数据框中某列的数值类型,你会发现是num数值型,
而fill.ind=Enzyme$Time这列数据需要因子向量型。
>as.character(Enzyme$Time)
 [1] "40"  "40"  "40"  "40"  "40"  "40"  "40"  "40"  "40"  "40"  "40"  "40"  "80"  "80"  "80" 
[16] "80"  "80"  "80"  "80"  "80"  "80"  "80"  "80"  "80"  "120" "120" "120" "120" "120" "120"
[31] "120" "120" "120" "120" "120" "120"
#将Time所在列的数据转换为字符型的因子向量。
>Enzyme$Time<-as.character(Enzyme$Time)
#将数据框中的某列的数值转换为因子向量,变量名仍为Time
>str(Enzyme$Time)
  chr [1:36] "40" "40" "40" "40" "40" "40" "40" "40" "40" "40" "40" "40" "80" "80" "80" "80" ...
#再次查看,Time的类型,发现修改成功,变成字符型chr了。

#绘图

> library(factoextra)
> library(FactoMineR)
> df2<-Enzyme[,3:10]
> df2
# A tibble: 36 x 8
     MDA   LOX   APX   PAL   CAT   POD   PPO   TPC
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1    41    19    58    80 186.     40    20   965
 2    42    20    55   155 172.     34    37   955
 3    43    17    51    92 166.     30    36   937
 4    30    23    57   208 163.     85    23  1261
 5    34    24    58   180 146.     89    26  1249
 6    38    24    52    99 167.     81    21  1269
 7    40    27    55   150  70.7    81    16  1322
 8    30    30    54   104  56.7    81    16  1312
 9    43    29    55   167  73.7    82    22  1357
10    46    27    58    96  66.7    64    22  1080
# … with 26 more rows
> Enzyme2.pca<-PCA(df2,graph=FALSE)

> Enzyme2.pca
**Results for the Principal Component Analysis (PCA)**
The analysis was performed on 36 individuals, described by 8 variables
*The results are available in the following objects:

   name               description                          
1  "$eig"             "eigenvalues"                        
2  "$var"             "results for the variables"          
3  "$var$coord"       "coord. for the variables"           
4  "$var$cor"         "correlations variables - dimensions"
5  "$var$cos2"        "cos2 for the variables"             
6  "$var$contrib"     "contributions of the variables"     
7  "$ind"             "results for the individuals"        
8  "$ind$coord"       "coord. for the individuals"         
9  "$ind$cos2"        "cos2 for the individuals"           
10 "$ind$contrib"     "contributions of the individuals"   
11 "$call"            "summary statistics"                 
12 "$call$centre"     "mean of the variables"              
13 "$call$ecart.type" "standard error of the variables"    
14 "$call$row.w"      "weights for the individuals"        
15 "$call$col.w"      "weights for the variables"  
> fviz_pca_ind(Enzyme2.pca,geom.ind="point",
pointsize=4,pointshape=21,fill.ind=Enzyme$Time,
palette=c("#00AFBB","#E7B800","#FC4E07"),
addEllipses=TRUE,legend.titl="Groups",title="")
+theme_grey()

小结

ggbiplot:是ggplot2中绘制主成分的bioplot

groups:指可选因素的变量,即观测值所属的组。

prcomp():对给定的数据矩阵进行主成分分析,并将结果作为类prcomp的对象返回。

PCA():是FactoMineR包中主成分分析函数,能够对数据进行降维处理。

fviz_pca_ind():是factoextra包中的函数,能够以散点的形式展现数据分析结果。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-04-27,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 菜鸟学数据分析之R语言 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
图数据库 KonisGraph
图数据库 KonisGraph(TencentDB for KonisGraph)是一种云端图数据库服务,基于腾讯在海量图数据上的实践经验,提供一站式海量图数据存储、管理、实时查询、计算、可视化分析能力;KonisGraph 支持属性图模型和 TinkerPop Gremlin 查询语言,能够帮助用户快速完成对图数据的建模、查询和可视化分析。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档