首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >滤波read_lines输出?

滤波read_lines输出?
EN

Stack Overflow用户
提问于 2019-03-21 14:44:29
回答 2查看 90关注 0票数 0

给定一个read_lines输出:

代码语言:javascript
运行
复制
c("# This data file generated by ffffff at: Wed Jan 13 11:57:32 2011", 
"#", "# This file contains raw genotype data, including data that is not used in ffffff reports.", 
"# This data has undergone a general quality review however only a subset of markers have been ", 
"# individually validated for accuracy. As such, this data is suitable only for research, ", 
"# educational, and informational use and not for medical or other use.", 
"# ", "# Below is a text version of your data.  Fields are TAB-separated", 
"# Each line corresponds to a single SNP.  For each SNP, we provide its identifier ", 
"# (an rsid or an internal id), its location on the reference human genome, and the ", 
"# genotype call oriented with respect to the plus strand on the human reference sequence.", 
"# We are using reference human assembly build 37 (also known as Annotation Release 104).", 
"# Note that it is possible that data downloaded at different times may be different due to ongoing ", 
"# improvements in our ability to call genotypes. More information about these changes can be found at:", 
"# fffffffff", 
"# ", "# More information on reference human assembly builds:", 
"# ffffffffffffffff", 
"#", "# rsid\tchromosome\tposition\tgenotype", "rs548049170\t1\t69869\tTT", 
"rs13328684\t1\t74792\t--", "rs9283150\t1\t565508\tAA", "i713426\t1\t726912\t--", 
"rs116587930\t1\t727841\tGG", "rs3131972\t1\t752721\tAG", "rs12184325\t1\t754105\tCC", 
"rs12567639\t1\t756268\tAA", "rs114525117\t1\t759036\tGG", "rs12124819\t1\t776546\tAA", 
"rs12127425\t1\t794332\tGG", "rs79373928\t1\t801536\tTT", "rs72888853\t1\t815421\t--", 
"rs7538305\t1\t824398\tAC", "rs28444699\t1\t830181\tAA", "i713449\t1\t830731\t--", 
"rs116452738\t1\t834830\tGG", "rs72631887\t1\t835092\tTT", "rs28678693\t1\t838665\tTT", 
"rs4970382\t1\t840753\tCC", "rs4475691\t1\t846808\tCC", "rs72631889\t1\t851390\tGG", 
"rs7537756\t1\t854250\tAA", "rs13302982\t1\t861808\tGG", "rs376747791\t1\t863130\tAA", 
"rs2880024\t1\t866893\tCC", "rs13302914\t1\t868404\tTT", "rs76723341\t1\t872952\tCC", 
"rs2272757\t1\t881627\tAA", "rs35471880\t1\t881918\tGG")

我想要read_csv,但首先我需要过滤所有前缀,以#开头。

请建议如何从不以#开头的行开始解析文件

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-03-21 14:48:26

您的文件似乎是一个选项卡分隔的数据集,其注释由#分隔。我建议

代码语言:javascript
运行
复制
readr::read_tsv("your_file", comment="#")

您可能也需要col_names=FALSE,因为看起来您的标题行也有注释(这很尴尬;最好可以对其进行上游修改)。

票数 2
EN

Stack Overflow用户

发布于 2019-03-21 14:50:04

你可以这样做-

代码语言:javascript
运行
复制
df <- readr::read_csv("input",comment = "#",col_names = F)

编辑-

你也可以这样做

代码语言:javascript
运行
复制
dt <- readr::read_csv(setdiff(dt,grep("^#",dt,value=T)),col_names = F)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55283070

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档