文章/答案/技术大牛

发布

社区首页 >问答首页 >构造一个简单的数据帧(DADA2管道到Phyloseq的过渡)

问构造一个简单的数据帧(DADA2管道到Phyloseq的过渡)
EN

Stack Overflow用户

提问于 2020-12-19 17:05:58

回答 1查看 129关注 0票数 0

我使用自己的数据成功地完成了Phyloseq教程(https://benjjneb.github.io/dada2/tutorial.html)，但在过渡到DADA2时遇到了困难。我需要从文件名中编码的信息构造一个简单的data.frame。这是本教程中提供的代码。

#Make a data.frame holding the sample data
samples.out <- rownames(seqtab.nochim)
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
gender <- substr(subject,1,1)
subject <- substr(subject,2,999)
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
samdf <- data.frame(Subject=subject, Gender=gender, Day=day)
samdf$When <- "Early"
samdf$When[samdf$Day>100] <- "Late"
rownames(samdf) <- samples.out

我的应该比这个简单，因为我没有时间作为一个因素。我只有六个治疗组。

这就是我想要弄清楚的。

#Make a data.frame holding the sample data
samples.out <- rownames(seqtab.nochim)

#create vector with the treatments
trtmt <- c("EM", "EP", "EM", "AR37", "NEA2", "AR1", "AR37", "NEA2", "EP", "NEA2", "EP", "EM", "AR37", "EP", "NEA2", "Ctrl", "Ctrl", "AR37", "EP", "AR37", "AR37", "EP", "AR1", "AR1", "EP", "EM", "EM", "AR37", "AR1", "EM", "AR37", "NEA2", "AR1", "Ctrl", "EP", "Ctrl", "EP", "AR37", "AR37")

#Add a new column to the samples.out dataframe 
samples.out_2 <- samples.out
samples.out_2 <- cbind(samples.out, new_col = trtmt)

#Rename columns
colnames(samples.out_2)[colnames(samples.out_2) == "samples.out"] <- "Sample"
colnames(samples.out_2)[colnames(samples.out_2) == "new_col"] <- "Treatment"

#Head of my samples.out_2 data frame (I have a total of 39 samples and 6 treatment groups)
Sample Treatment
193    EM
194    EP
196    EM
197    AR37
198    NEA2

#Still stuck with how to make this relevant to my metadata!
sample <- sapply(strsplit(samples.out_2, "D"), `[`, 1) #what does the "D" mean (I think it has to do with the mouse dataset used in the tutorial)? However, I am not sure what I need to pull from my data.frame. Also, What does '[' mean? I know the meanings for operators like [], (), etc., but not for a single one in quotes.
treatment <- substr(sample,1,39) #I don't understand what I am trying to extract or change
sample <- substr(sample,2,999) #I don't understand what I am trying to extract or change
samdf <- data.frame(Sample=sample, Treatment=treatment)
rownames(samdf) <- samples.out

如果任何人已经使用自己的数据完成了本教程，并理解了这种转换，我将非常感谢您的见解。谢谢

phyloseq

回答 1

Stack Overflow用户

发布于 2021-04-15 13:19:20

您希望使用名为samdf的对象中的元数据创建一个数据框(与本教程中的操作相同)。在本教程中，序列的元数据在其文件名中进行了编码(数据似乎不是这样)：

例如，对于第一个

主体:性别(F)-主体-(No3)-Day (D0)

本教程中定义Subject、Gender和Day的代码行与您的数据无关。

subject <- sapply(strsplit(samples.out, "D"), `[`, 1) # define subject as beginning of the filename string up to D
gender <- substr(subject,1,1) #gets first letter for the gender
subject <- substr(subject,2,999) #remove gender to actually get the subject number
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2)) #define day

最后两行很重要，一行用于使用元数据创建数据帧，第二行用于分配与seqtab.nochim中相同的行名，以便您可以沿着管道进一步构建phyloseq对象。确保samdf和seqtab.nochim的行数相同：

isTRUE(dim(seqtab.nochim)[1] == dim(samdf)[1]) #should be true

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65368230

复制

相似问题

问构造一个简单的数据帧(DADA2管道到Phyloseq的过渡)
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问构造一个简单的数据帧(DADA2管道到Phyloseq的过渡)EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问构造一个简单的数据帧(DADA2管道到Phyloseq的过渡)
EN