前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >PRSice使用文档(英文版)

PRSice使用文档(英文版)

作者头像
生信与临床
发布2020-08-27 16:57:43
1.4K0
发布2020-08-27 16:57:43
举报

usage: Rscript PRSice.R [options] <-b base_file> <-t target_file> <--prsice prsice_location> Required: --prsice Location of the PRSice binary --dir Location to install ggplot. Only require if ggplot is not installed Base File: --base-info Base INFO score filtering. Format should be <Column name>:<Threshold>. SNPs with info score less than <Threshold> will be ignored Column name default: INFO Threshold default: 0.9 --base-maf Base MAF filtering. Format should be <Column name>:<Threshold>. SNPs with maf less than <Threshold> will be ignored. An additional column can also be added (e.g. also filter MAF for cases), using the following format: <Column name>:<Threshold>,<Column name>:<Threshold> --beta Whether the test statistic is in the form of BETA or OR. If set, test statistic is assume to be in the form of BETA. Mutually exclusive from --or --bp Column header containing the SNP coordinate Default: BP --chr Column header containing the chromosome Default: CHR --index If set, assume the INDEX instead of NAME for the corresponding columns are provided. Index should be 0-based (start counting from 0) --no-default Remove all default options. If set, PRSice will not set any default column name and you must manually provide all required columns (--snp, --stat, --A1, --pvalue) --or Whether the test statistic is in the form of BETA or OR. If set, test statistic is assume to be in the form of OR. Mutually exclusive from --beta --pvalue | -p Column header containing the p-value Default: P --snp Column header containing the SNP ID Default: SNP --stat Column header containing the summary statistic If --beta is set, default as BETA. Otherwise, will search for OR or BETA from the header of the base file Target File: --binary-target Indicate whether the target phenotype is binary or not. Either T or F should be provided where T represent a binary phenotype. For multiple phenotypes, the input should be separated by comma without space. Default: T if --beta and F if --beta is not --info Filter SNPs based on info score. Only used for imputed target --keep File containing the sample(s) to be extracted from the target file. First column should be FID and the second column should be IID. If --ignore-fid is set, first column should be IID Mutually exclusive from --remove --maf Filter SNPs based on minor allele frequency (MAF) --nonfounders Keep the nonfounders in the analysis Note: They will still be excluded from LD calculation --pheno | -f Phenotype file containing the phenotype(s). First column must be FID of the samples and the second column must be IID of the samples. When --ignore-fid is set, first column must be the IID of the samples. Must contain a header if --pheno-col is specified --pheno-col | -F Headers of phenotypes to be included from the phenotype file --prevalence | -k Prevalence of all binary trait. If provided will adjust the ascertainment bias of the R2. Note that when multiple binary trait is found, prevalence information must be provided for all of them --remove File containing the sample(s) to be removed from the target file. First column should be FID and the second column should be IID. If --ignore-fid is set, first column should be IID Mutually exclusive from --keep --target | -t Target genotype file. Currently support both BGEN and binary PLINK format. For multiple chromosome input, simply substitute the chromosome number with #. PRSice will automatically replace # with 1-22 For binary plink format, you can also specify a seperate fam file by <prefix>,<fam file> --target-list File containing prefix of target genotype files. Similar to --target but allow more flexibility. Do not support external fam file at the moment --type File type of the target file. Support bed (binary plink) and bgen format. Default: bed Dosage: --allow-inter Allow the generate of intermediate file. This will speed up PRSice when using dosage data as clumping reference and for hard coding PRS calculation --dose-thres Translate any SNPs with highest genotype probability less than this threshold to missing call --hard-thres A hardcall is saved when the distance to the nearest hardcall is less than the hardcall threshold. Otherwise a missing code is saved Default is: 0.1 --hard Use hard coding instead of dosage for PRS construction. Default is to use dosage instead of hard coding Clumping: --clump-kb The distance for clumping in kb Default: 250kb (1mb for PRSet) --clump-r2 The R2 threshold for clumping Default: 0.1 --clump-p The p-value threshold use for clumping. Default: 1 --ld | -L LD reference file. Use for LD calculation. If not provided, will use the post-filtered target genotype for LD calculation. Support multiple chromosome input Please see --target for more information --ld-dose-thres Translate any SNPs with highest genotype probability less than this threshold to missing call --ld-geno Filter SNPs based on genotype missingness --ld-hard-thres A hardcall is saved when the distance to the nearest hardcall is less than the hardcall threshold. Otherwise a missing code is saved Default is: 0.1 --ld-info Filter SNPs based on info score. Only used for imputed LD reference --ld-keep File containing the sample(s) to be extracted from the LD reference file. First column should be FID and the second column should be IID. If --ignore-fid is set, first column should be IID Mutually exclusive from --ld-remove No effect if --ld was not provided --ld-list File containing prefix of LD reference files. Similar to --ld but allow more flexibility. Do not support external fam file at the moment --ld-maf Filter SNPs based on minor allele frequency --ld-remove File containing the sample(s) to be removed from the LD reference file. First column should be FID and the second column should be IID. If --ignore-fid is set, first column should be IID Mutually exclusive from --ld-keep --ld-type File type of the LD file. Support bed (binary plink) and bgen format. Default: bed --no-clump Stop PRSice from performing clumping --proxy Proxy threshold for index SNP to be considered as part of the region represented by the clumped SNP(s). e.g. --proxy 0.8 means the index SNP will represent region of any clumped SNP(s) that has a R2>=0.8 even if the index SNP does not physically locate within the region Covariate: --cov | -C Covariate file. First column should be FID and the second column should be IID. If --ignore-fid is set, first column should be IID --cov-col | -c Header of covariates. If not provided, will use all variables in the covariate file. By adding @ in front of the string, any numbers within [ and ] will be parsed. E.g. @PC[1-3] will be read as PC1,PC2,PC3. Discontinuous input are also supported: @cov[1.3-5] will be parsed as cov1,cov3,cov4,cov5 --cov-factor Header of categorical covariate(s). Dummy variable will be automatically generated. Any items in --cov-factor must also be found in --cov-col Also accept continuous input (start with @). P-value Thresholding: --bar-levels Level of barchart to be plotted. When --fastscore is set, PRSice will only calculate the PRS for threshold within the bar level. Levels should be comma separated without space --fastscore Only calculate threshold stated in --bar-levels --no-full By default, PRSice will include the full model, i.e. p-value threshold = 1. Setting this flag will disable that behaviour --interval | -i The step size of the threshold. Default: 0.00005 --lower | -l The starting p-value threshold. Default: 5e-8 --model Genetic model use for regression. The genetic encoding is based on the base data where the encoding represent number of the coding allele Available models include: add - Additive model, code as 0/1/2 (default) dom - Dominant model, code as 0/1/1 rec - Recessive model, code as 0/0/1 het - Heterozygous only model, code as 0/1/0 --missing Method to handle missing genotypes. By default, final scores are averages of valid per-allele scores with missing genotypes contribute an amount proportional to imputed allele frequency. To throw out missing observations instead (decreasing the denominator in the final average when this happens), use the 'SET_ZERO' modifier. Alternatively, you can use the 'CENTER' modifier to shift all scores to mean zero. --no-regress Do not perform the regression analysis and simply output all PRS. --score Method to calculate the polygenic score. Available methods include: avg - Take the average effect size (default) std - Standardize the effect size con-std - Standardize the effect size using mean and sd derived from control samples sum - Direct summation of the effect size --upper | -u The final p-value threshold. Default: 0.5 PRSet: --background String to indicate a background file. This string should have the format of Name:Type where type can be bed - 0-based range with 3 column. Chr Start End range - 1-based range with 3 column. Chr Start End gene - A file contain a column of gene name --bed | -B Bed file containing the selected regions. Name of bed file will be used as the region identifier. WARNING: Bed file is 0-based --feature Feature(s) to be included from the gtf file. Default: exon,CDS,gene,protein_coding. --full-back Use the whole genome as background for competitive p-value calculation --gtf | -g GTF file containing gene boundaries. Required when --msigdb is used --msigdb | -m MSIGDB file containing the pathway information. Require the gtf file --snp-set Provide a SNP set file containing the snp set(s). Two different file format is allowed: SNP list format - A file containing a single column of SNP ID. Name of the set will be the file name or can be provided using --snp-set File:Name MSigDB format - Each row represent a single SNP set with the first column containing the name of the SNP set. --wind-3 Add N base(s) to the 3' region of each feature(s) --wind-5 Add N base(s) to the 5' region of each feature(s) Plotting: --bar-col-high Colour of the most predicting threshold Default: firebrick --bar-col-lower Colour of the poorest predicting threshold Default: dodgerblue --bar-col-p Change the colour of bar to p-value threshold instead of the association with phenotype --bar-palatte Colour palatte to be used for bar plotting when --bar_col_p is set. Default: YlOrRd --device Select different plotting devices. You can choose any plotting devices supported by base R. Default: png --multi-plot Plot the top N phenotype / gene set in a summary plot --plot When set, will only perform plotting. --plot-set Define the gene set to be plot. Default: Base --quantile | -q Number of quantiles to plot. No quantile plot will be generated when this is not provided. --quant-break Quantile groupings for plotting the strata plot --quant-extract | -e File containing sample ID to be plot on a separated quantile e.g. extra quantile containing only schizophrenia samples. Must contain IID. Should contain FID if --ignore-fid isn't set. --quant-ref Reference quantile for quantile plot --scatter-r2 y-axis of the high resolution scatter plot should be R2 Misc: --all-score Output PRS for ALL threshold. WARNING: This will generate a huge file --chr-id Try to construct an RS ID for SNP based on its chromosome, coordinate, effective allele and non-effective allele. e.g. c:L-aBd is translated to: <chr>:<coordinate>-<effective><noneffective>d This is always true for target file, whereas for base file, this is only used if the RS ID wasn't provided --exclude File contains SNPs to be excluded from the analysis --extract File contains SNPs to be included in the analysis --id-delim This parameter causes sample IDs to be parsed as <FID><delimiter><IID>; the default delimiter is '_'. --ignore-fid Ignore FID for all input. When this is set, first column of all file will be assume to be IID instead of FID --keep-ambig Keep ambiguous SNPs. Only use this option if you are certain that the base and target has the same A1 and A2 alleles --logit-perm When performing permutation, still use logistic regression instead of linear regression. This will substantially slow down PRSice --memory Maximum memory usage allowed (in Mb). PRSice will try its best to honor this setting --non-cumulate Calculate non-cumulative PRS. PRS will be reset to 0 for each new P-value threshold instead of adding up --out | -o Prefix for all file output --perm Number of permutation to perform. This swill generate the empirical p-value. Recommend to use value larger than 10,000 --print-snp Print all SNPs that remains in the analysis after clumping is performed. For PRSet, Y indicate the SNPs falls within the gene set of interest and N otherwise. If only PRSice is performed, a single "gene set" called "Base" will be presented with all entries marked as Y --seed | -s Seed used for permutation. If not provided, system time will be used as seed. When same seed and same input is provided, same result can be generated --thread | -n Number of thread use --use-ref-maf When specified, missingness imputation will be performed based on the reference samples --ultra Ultra aggressive memory usage. When this is enabled PRSice and PRSet will try to load all genotypes into memory after clumping is performed. This should drastically speed up PRSice and PRSet at the expense of higher memory consumption. Has no effect for dosage score --x-range Range of SNPs to be excluded from the whole analysis. It can either be a single bed file or a comma seperated list of range. Range must be in the format of chr:start-end or chr:coordinate --help | -h Display this help message

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2020-08-25,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信与临床 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档