•宇宙实验媛 公众号文章 《群体结构分析三种常用方法(下篇)》
http://software.genetics.ucla.edu/admixture/download.html 不需要安装,解压出来即可使用
使用这篇文章中获得的 https://www.jianshu.com/p/5938ca3b6725 vcf 文件
plink --vcf ../KiwifruitPathogenFiltered.recode.vcf --make-bed --out KiwifruitPathogen --allow-extra-chr
这里用到的plink版本是1.9
这里K随便选择一个5
admixture --cv KiwifruitPathogen.bed 5 | tee log5.out
遇到报错
**** ADMIXTURE Version 1.3.0 ****
**** Copyright 2008-2015 ****
**** David Alexander, Suyash Shringarpure, ****
**** John Novembre, Ken Lange ****
**** ****
**** Please cite our paper! ****
**** Information at www.genetics.ucla.edu/software/admixture ****
Cross-validation will be performed. Folds=5.
Random seed: 43
Point estimation method: Block relaxation algorithm
Convergence acceleration algorithm: QuasiNewton, 3 secant conditions
Point estimation will terminate when objective function delta < 0.0001
Estimation of standard errors disabled; will compute point estimates only.
Invalid chromosome code! Use integers.
搜索最后一行报错Invalid chromosome code! Use integers找到解决办法 https://www.biostars.org/p/236704/ 有人说将第一步plink生成的.bim文件第一列的内容改成整数 我这里的数据用到的是细菌单倍体,我将第一列内容统一改为1 使用vim编辑器批量替换字符串 https://www.cnblogs.com/nkwy2012/p/6365714.html :%s/A/B/g
:替换所有行中的A为B 再次运行以上命令就成功了,输出结果 意外的是将bim文件中的CM000染色体编号替换为数字1后,使用smartpca做主成分分析也成功了!
**** ADMIXTURE Version 1.3.0 ****
**** Copyright 2008-2015 ****
**** David Alexander, Suyash Shringarpure, ****
**** John Novembre, Ken Lange ****
**** ****
**** Please cite our paper! ****
**** Information at www.genetics.ucla.edu/software/admixture ****
Cross-validation will be performed. Folds=5.
Random seed: 43
Point estimation method: Block relaxation algorithm
Convergence acceleration algorithm: QuasiNewton, 3 secant conditions
Point estimation will terminate when objective function delta < 0.0001
Estimation of standard errors disabled; will compute point estimates only.
Size of G: 21x22612
Performing five EM steps to prime main algorithm
1 (EM) Elapsed: 0.032 Loglikelihood: -68682.8 (delta): 609659
2 (EM) Elapsed: 0.027 Loglikelihood: -64365.8 (delta): 4316.97
3 (EM) Elapsed: 0.028 Loglikelihood: -62115.1 (delta): 2250.72
4 (EM) Elapsed: 0.027 Loglikelihood: -60418 (delta): 1697.11
5 (EM) Elapsed: 0.028 Loglikelihood: -59017.1 (delta): 1400.89
Initial loglikelihood: -59017.1
Starting main algorithm
1 (QN/Block) Elapsed: 0.117 Loglikelihood: -31056.4 (delta): 27960.7
2 (QN/Block) Elapsed: 0.117 Loglikelihood: -24590.8 (delta): 6465.61
3 (QN/Block) Elapsed: 0.15 Loglikelihood: -21524.3 (delta): 3066.49
4 (QN/Block) Elapsed: 0.134 Loglikelihood: -20417.1 (delta): 1107.22
5 (QN/Block) Elapsed: 0.136 Loglikelihood: -20102.6 (delta): 314.425
6 (QN/Block) Elapsed: 0.159 Loglikelihood: -19876 (delta): 226.682
7 (QN/Block) Elapsed: 0.147 Loglikelihood: -19726.1 (delta): 149.867
8 (QN/Block) Elapsed: 0.132 Loglikelihood: -19708.2 (delta): 17.8477
9 (QN/Block) Elapsed: 0.148 Loglikelihood: -19707.1 (delta): 1.15903
10 (QN/Block) Elapsed: 0.131 Loglikelihood: -19707.1 (delta): 0.000143719
11 (QN/Block) Elapsed: 0.139 Loglikelihood: -19707.1 (delta): 1.3288e-05
Summary:
Converged in 11 iterations (1.755 sec)
Loglikelihood: -19707.085953
Fst divergences between estimated populations:
Pop0 Pop1 Pop2 Pop3
Pop0
Pop1 0.774
Pop2 0.746 0.829
Pop3 0.865 0.773 0.894
Pop4 0.780 0.705 0.793 0.811
CV error (K=5): 0.15979
Writing output files.
但是如何对结果进行解读还得继续仔细看文献呀!