首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >根据下一个值在列中移动值并填充空项

根据下一个值在列中移动值并填充空项
EN

Stack Overflow用户
提问于 2020-04-29 17:05:08
回答 2查看 55关注 0票数 1

我有一个数据争论的问题,我不知道如何解决。我有一个dataframe,其中一个列上的行都向上移动,而这个列没有被完全填充。我需要向下移动行并填充X行数,这取决于其他列中有多少数据。

编辑:我改变了显示数据的方式。在此之前,我曾把它贴在一张标价表上,这导致了人们的错误。对此我很抱歉。我处理的数据如下:

代码语言:javascript
运行
复制
code    IdGene  Type    COGgene PosLeft postRight   Strand  Function
1
    1075082 CDS ROG0189 93  710 +   NA
8
    1075089 CDS COG0226 5632    6741    +   [P] ABC-type phosphate transport system, periplasmic component
    1075103 CDS NA  6796    7869    +   NA
9
    1075105 CDS NA  8075    8923    +   NA
    1075096 CDS ROG0189 8983    10149   +   NA
    1071820 CDS NA  10181   10723   +   NA
10
    1071880 CDS COG0642 10893   13316   +   [T] Signal transduction histidine kinase
    1072052 CDS COG2204 13288   14586   +   [T] Response regulator containing CheY-like receiver, AAA-type
12
    1075092 CDS NA  15525   16472   +   NA
13
    1075087 CDS NA  16655   17371   +   NA
    1074837 CDS NA  17383   17703   +   NA
    1071956 CDS NA  17710   18168   +   NA
14
    1071684 CDS NA  18251   18919   -   NA
15
    1075519 CDS ROG5478 19044   19334   +   NA
27
    1075067 CDS ROG8331 35989   36417   +   NA
    1075056 CDS COG2244 36478   38019   +   [R] Membrane protein involved in the export
    1075546 CDS COG1035 38016   39218   +   [C] Coenzyme F420-reducing hydrogenase, beta subunit
    1074004 CDS ROG1263 39215   40375   +   NA
    1075083 CDS COG1701 40406   40582   +   [S] Uncharacterized protein conserved in archaea
    1075068 CDS COG0463 40593   41537   +   [M] Glycosyltransferases involved in cell wall biogenesis
    1075064 CDS ROG2632 41534   42700   +   NA
    1075066 CDS COG0463 42724   43656   +   [M] Glycosyltransferases involved in cell wall biogenesis
    1075069 CDS COG1215 43671   44066   +   [M] Glycosyltransferases, probably involved in cell wall

我需要把它转化为:

代码语言:javascript
运行
复制
code    IdGene  Type    COGgene PosLeft postRight   Strand  Function
1   1075082 CDS ROG0189 93  710 +   NA
8   1075089 CDS COG0226 5632    6741    +   [P] ABC-type phosphate transport system, periplasmic component
8   1075103 CDS NA  6796    7869    +   NA
9   1075105 CDS NA  8075    8923    +   NA
9   1075096 CDS ROG0189 8983    10149   +   NA
9   1071820 CDS NA  10181   10723   +   NA
10  1071880 CDS COG0642 10893   13316   +   [T] Signal transduction histidine kinase
10  1072052 CDS COG2204 13288   14586   +   [T] Response regulator containing CheY-like receiver, AAA-type
12  1075092 CDS NA  15525   16472   +   NA
13  1075087 CDS NA  16655   17371   +   NA
13  1074837 CDS NA  17383   17703   +   NA
13  1071956 CDS NA  17710   18168   +   NA
14  1071684 CDS NA  18251   18919   -   NA
15  1075519 CDS ROG5478 19044   19334   +   NA
27  1075067 CDS ROG8331 35989   36417   +   NA
27  1075056 CDS COG2244 36478   38019   +   [R] Membrane protein involved in the export
27  1075546 CDS COG1035 38016   39218   +   [C] Coenzyme F420-reducing hydrogenase, beta subunit
27  1074004 CDS ROG1263 39215   40375   +   NA
27  1075083 CDS COG1701 40406   40582   +   [S] Uncharacterized protein conserved in archaea
27  1075068 CDS COG0463 40593   41537   +   [M] Glycosyltransferases involved in cell wall biogenesis
27  1075064 CDS ROG2632 41534   42700   +   NA
27  1075066 CDS COG0463 42724   43656   +   [M] Glycosyltransferases involved in cell wall biogenesis
27  1075069 CDS COG1215 43671   44066   +   [M] Glycosyltransferases, probably involved in cell wall

任何关于如何解决这个问题的建议都是很棒的。理想情况下,在R,但awk或其他也可以。

EN

回答 2

Stack Overflow用户

发布于 2020-04-29 17:17:11

如果您对输出的格式(表示列空间)没有问题,那么您可以尝试在awk中进行跟踪,同时考虑到您正在从Input_file中读取数据。

代码语言:javascript
运行
复制
awk '
BEGIN{
  OFS="\t"
}
FNR==1 || FNR==2{
  print
  next
}
$2~/[0-9]+/{
  value=$2
  next
}
{
  $2=value"    | "}
1
'  Input_file
票数 2
EN

Stack Overflow用户

发布于 2020-04-29 17:21:22

本文给出了预期的结果:

代码语言:javascript
运行
复制
awk -F '|' '1*$2{id=$2;next}NR<3||sub(/\s+/,id)' input

如果f包含输入数据:

代码语言:javascript
运行
复制
$ awk -F '|' '1*$2{id=$2;next}NR<3||sub(/\s+/,id)' f
| code | IdGene  | Type | COGgene | PosLeft | postRight | Strand | Function |
|------|---------|------|---------|---------|-----------|--------|----------|
| 1    | 1075082 | CDS  | ROG0189 | 93      | 710       | +      | NA       |
| 2    | 1075099 | CDS  | NA      | 783     | 1778      | +      | NA       |
| 3    | 1073305 | CDS  | NA      | 1872    | 2648      | +      | NA       |
| 4    | 1075537 | CDS  | NA      | 2783    | 3451      | +      | NA       |
| 4    | 1074931 | CDS  | COG0186 | 3460    | 3996      | +      | KO       |
| 5    | 1075097 | CDS  | NA      | 4088    | 4534      | +      | NA       |
| 5    | 1074010 | CDS  | NA      | 4457    | 4849      | -      | NA       |
| 5    | 1075093 | CDS  | ROG5695 | 4958    | 5503      | +      | NA       |
| 5    | 1075089 | CDS  | COG0226 | 5632    | 6741      | +      | KO       |
| 5    | 1075103 | CDS  | NA      | 6796    | 7869      | +      | NA       |
| 5    | 1075105 | CDS  | NA      | 8075    | 8923      | +      | NA       |
| 5    | 1075096 | CDS  | ROG0189 | 8983    | 10149     | +      | NA       |
| 5    | 1071820 | CDS  | NA      | 10181   | 10723     | +      | NA       |

更新输入更改:

这个一行将用于新输入,并保持输出格式:

代码语言:javascript
运行
复制
awk  'NF<2{id=$1;next}NR==1||sub("\\s{"length(id)"}",id)' file

使用f中的输入数据再次测试

代码语言:javascript
运行
复制
$ awk  'NF<2{id=$1;next}NR==1||sub("\\s{"length(id)"}",id)' f
code    IdGene  Type    COGgene PosLeft postRight   Strand  Function
1   1075082 CDS ROG0189 93  710 +   NA
8   1075089 CDS COG0226 5632    6741    +   [P] ABC-type phosphate transport system, periplasmic component
8   1075103 CDS NA  6796    7869    +   NA
9   1075105 CDS NA  8075    8923    +   NA
9   1075096 CDS ROG0189 8983    10149   +   NA
9   1071820 CDS NA  10181   10723   +   NA
10  1071880 CDS COG0642 10893   13316   +   [T] Signal transduction histidine kinase
10  1072052 CDS COG2204 13288   14586   +   [T] Response regulator containing CheY-like receiver, AAA-type
12  1075092 CDS NA  15525   16472   +   NA
13  1075087 CDS NA  16655   17371   +   NA
13  1074837 CDS NA  17383   17703   +   NA
13  1071956 CDS NA  17710   18168   +   NA
14  1071684 CDS NA  18251   18919   -   NA
15  1075519 CDS ROG5478 19044   19334   +   NA
27  1075067 CDS ROG8331 35989   36417   +   NA
27  1075056 CDS COG2244 36478   38019   +   [R] Membrane protein involved in the export
27  1075546 CDS COG1035 38016   39218   +   [C] Coenzyme F420-reducing hydrogenase, beta subunit
27  1074004 CDS ROG1263 39215   40375   +   NA
27  1075083 CDS COG1701 40406   40582   +   [S] Uncharacterized protein conserved in archaea
27  1075068 CDS COG0463 40593   41537   +   [M] Glycosyltransferases involved in cell wall biogenesis
27  1075064 CDS ROG2632 41534   42700   +   NA
27  1075066 CDS COG0463 42724   43656   +   [M] Glycosyltransferases involved in cell wall biogenesis
27  1075069 CDS COG1215 43671   44066   +   [M] Glycosyltransferases, probably involved in cell
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61507417

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档