蛋白质组学集成化R包Promor使用,先无脑跑一遍示例,看看效果哈。
01
1.构建对象,需要两个表格。一个是蛋白组的表达矩阵,另一个是样本信息表。
library(promor)
raw_2 <- create_df( prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/st.txt", exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt", input_type = "standard")## 0 empty row(s) removed.## 0 empty column(s) removed.## Zeros have been replaced with NAs.## Data have been log-transformed.head(raw_2)## H_1 H_2## A0AV96;B7Z8Z7;A0AV96-2;D6R9D6 27.31346 27.39656## A0AVT1;A0AVT1-2 25.04756 25.46382## H7BXI1;A0FGR8-6;A0FGR8-2;A0FGR8;C9JGI7;A0FGR8-4 25.74894 25.71981## A0JLT2;A0JLT2-2 22.32174 23.72161## A0MZ66-4;A0MZ66;A0MZ66-3;B7Z7Z9;A0MZ66-5;A0MZ66-6;A0MZ66-2 23.93796 23.78596## A1A528;O43264 25.76873 25.10205## H_3 L_1## A0AV96;B7Z8Z7;A0AV96-2;D6R9D6 27.06825 27.59676## A0AVT1;A0AVT1-2 24.56093 26.10937## H7BXI1;A0FGR8-6;A0FGR8-2;A0FGR8;C9JGI7;A0FGR8-4 25.80931 26.35760## A0JLT2;A0JLT2-2 23.09950 21.20967## A0MZ66-4;A0MZ66;A0MZ66-3;B7Z7Z9;A0MZ66-5;A0MZ66-6;A0MZ66-2 23.45826 24.21727## A1A528;O43264 24.34333 24.70962## L_2 L_3## A0AV96;B7Z8Z7;A0AV96-2;D6R9D6 27.86850 27.95160## A0AVT1;A0AVT1-2 26.10129 25.79853## H7BXI1;A0FGR8-6;A0FGR8-2;A0FGR8;C9JGI7;A0FGR8-4 26.59747 26.45245## A0JLT2;A0JLT2-2 23.76203 22.60479## A0MZ66-4;A0MZ66;A0MZ66-3;B7Z7Z9;A0MZ66-5;A0MZ66-6;A0MZ66-2 24.95323 24.20725## A1A528;O43264
2.过滤缺失值过多的蛋白
# 在某一组中缺失值超过0.4 则会被删除。是稍微严格的质控方式raw_filtered <- filterbygroup_na(raw_2, set_na = 0.4)## 224 proteins with higher than 40% NAs in at least one group removed.
3.可视化缺失值的分布
heatmap_na(raw_filtered, palette = "mako")
4.缺失值的插补:
Available methods: “minDet”, “RF”, “kNN”, and “SVD”
imp_df_mp <- impute_na(raw_filtered, seed = 327)
5.可视化插补后的数据
impute_plot(original = raw_filtered, imputed = imp_df_mp, n_row = 3, n_col = 3, palette = "mako")## Warning: Removed 324 rows containing non-finite values (`stat_density()`).
6.标准化数据 (使用的是limma包的normalizeBetweenArrays())
norm_df <- normalize_data(imp_df_mp)
7.箱线图可视化标准化前后
norm_plot(original = imp_df_mp, normalized = norm_df, palette = "mako")
8.鉴定差异蛋白
(使用limma包计算差异,默认阈值是BH.adjusted.P < 0.05 & abs(lgfc)>1)。当然,阈值可以自己卡。
fit_df <- find_dep(imp_df_mp)## 1294 siginificantly differentially expressed proteins found.
9.火山图
volcano_plot(fit_df, text_size = 5, palette = "mako")
10. 差异表达蛋白的热图
3、
xxxx