前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >snakemake 学习笔记3

snakemake 学习笔记3

作者头像
邓飞
发布2019-07-07 15:29:57
9100
发布2019-07-07 15:29:57
举报
文章被收录于专栏:育种数据分析之放飞自我

之前写的博客, 记录记录一下学习的轨迹.

目标

这次, 我要实现这个路程图.

目标介绍

  • 第一: 生成1.txt , 2.txt, 3.txt
  • 第二: 向每个文件中加入”add a”字符, 命名为:1_add_a.txt, 2_add_a.txt, 3_add_a.txt
  • 第三: 向文件中增加”add b”, 命名为:1_add_a_add_b.txt, 2_add_a_add_b.txt, 3_add_a_add_b.txt
  • 第四: 向文件中增加”add c”, 命名为: 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt
  • 第五: 将1_add_a_add_b.txt, 2_add_a_add_b.txt, 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt 合并为hebing.txt文件

1. 生成三个文件

代码语言:javascript
复制
(snake_test) [dengfei@localhost ex4]$ ls *txt
1.txt  2.txt  3.txt
(snake_test) [dengfei@localhost ex4]$ cat *txt
this is 1.txt
this is 2.txt
this is 3.txt

2. 在每个文件中增加”add a”

对应的Snakefile内容如下:

代码语言:javascript
复制
rule adda:
    input: "{file}.txt"
    output: "{file}_add_a.txt"
    shell: "cat {input} |xargs echo add a >{output}"

预览一下命令:snakemake -np {1,2,3}_add_a.txt

注意: 这里要把生成的文件{1,2,3}_add_a.txt写出来, 命令才可以运行.

代码语言:javascript
复制
(snake_test) [dengfei@localhost ex4]$ snakemake -np {1,2,3}_add_a.txt
Building DAG of jobs...
Job counts:
    count    jobs
    3    adda
    3

[Tue Apr  2 21:09:19 2019]
rule adda:
    input: 3.txt
    output: 3_add_a.txt
    jobid: 2
    wildcards: file=3

cat 3.txt |xargs echo add a >3_add_a.txt

[Tue Apr  2 21:09:19 2019]
rule adda:
    input: 2.txt
    output: 2_add_a.txt
    jobid: 0
    wildcards: file=2

cat 2.txt |xargs echo add a >2_add_a.txt

[Tue Apr  2 21:09:19 2019]
rule adda:
    input: 1.txt
    output: 1_add_a.txt
    jobid: 1
    wildcards: file=1

cat 1.txt |xargs echo add a >1_add_a.txt
Job counts:
    count    jobs
    3    adda
    3
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
代码语言:javascript
复制

执行命令:

代码语言:javascript
复制
snakemake  {1,2,3}_add_a.txt
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count    jobs
    3    adda
    3

[Tue Apr  2 21:11:09 2019]
rule adda:
    input: 3.txt
    output: 3_add_a.txt
    jobid: 0
    wildcards: file=3

[Tue Apr  2 21:11:09 2019]
Finished job 0.
1 of 3 steps (33%) done

[Tue Apr  2 21:11:09 2019]
rule adda:
    input: 1.txt
    output: 1_add_a.txt
    jobid: 1
    wildcards: file=1

[Tue Apr  2 21:11:09 2019]
Finished job 1.
2 of 3 steps (67%) done

[Tue Apr  2 21:11:09 2019]
rule adda:
    input: 2.txt
    output: 2_add_a.txt
    jobid: 2
    wildcards: file=2

[Tue Apr  2 21:11:09 2019]
Finished job 2.
3 of 3 steps (100%) done
Complete log: /home/dengfei/test/snakemake/ex4/.snakemake/log/2019-04-02T211109.153566.snakemake.log
代码语言:javascript
复制

查看*add_a.txt文件:

代码语言:javascript
复制
(snake_test) [dengfei@localhost ex4]$ ls *add_a.txt
1_add_a.txt  2_add_a.txt  3_add_a.txt
(snake_test) [dengfei@localhost ex4]$ cat *add_a.txt
add a this is 1.txt
add a this is 2.txt
add a this is 3.txt

搞定.

3. 在每个文件中增加”add b”

对应的Snakefile内容如下:

代码语言:javascript
复制
rule adda:
    input: "{file}.txt"
    output: "{file}_add_a.txt"
    shell: "cat {input} |xargs echo add a >{output}"
rule addb:
    input:
        "{file}_add_a.txt"
    output:
        "{file}_add_a_add_b.txt"
    shell:
        "cat {input} | xargs echo add b >{output}"

预览一下命令:snakemake -np {1,2,3}_add_a_add_b.txt

代码语言:javascript
复制
(snake_test) [dengfei@localhost ex4]$ snakemake  {1,2,3}_add_a_add_b.txt
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count    jobs
    3    addb
    3

[Tue Apr  2 21:13:57 2019]
rule addb:
    input: 2_add_a.txt
    output: 2_add_a_add_b.txt
    jobid: 0
    wildcards: file=2

[Tue Apr  2 21:13:57 2019]
Finished job 0.
1 of 3 steps (33%) done

[Tue Apr  2 21:13:57 2019]
rule addb:
    input: 1_add_a.txt
    output: 1_add_a_add_b.txt
    jobid: 1
    wildcards: file=1

[Tue Apr  2 21:13:57 2019]
Finished job 1.
2 of 3 steps (67%) done

[Tue Apr  2 21:13:57 2019]
rule addb:
    input: 3_add_a.txt
    output: 3_add_a_add_b.txt
    jobid: 2
    wildcards: file=3

[Tue Apr  2 21:13:57 2019]
Finished job 2.
3 of 3 steps (100%) done
Complete log: /home/dengfei/test/snakemake/ex4/.snakemake/log/2019-04-02T211357.666661.snakemake.log
代码语言:javascript
复制

执行命令:

代码语言:javascript
复制
snakemake  {1,2,3}_add_a_add_b.txt

查看流程图

命令:

代码语言:javascript
复制
snakemake --dag {1,2,3}_add_a_add_b.txt |dot -Tpdf >a.pdf

这里生成的a.pdf如下:

4. 在每个文件中增加”add c”

Snakemake命令:

代码语言:javascript
复制
rule adda:
    input: "{file}.txt"
    output: "{file}_add_a.txt"
    shell: "cat {input} |xargs echo add a >{output}"
rule addb:
    input:
        "{file}_add_a.txt"
    output:
        "{file}_add_a_add_b.txt"
    shell:
        "cat {input} | xargs echo add b >{output}"

rule addc:
    input:
        "{file}_add_a_add_b.txt"
    output:
        "{file}_add_a_add_b_add_c.txt"
    shell:
        "cat {input} | xargs echo add c >{output}"

流程图:

命令:

代码语言:javascript
复制
snakemake --dag {1,2,3}_add_a_add_b_add_c.txt |dot -Tpdf >a1.pdf
代码语言:javascript
复制

5. 将文件合并

代码语言:javascript
复制
rule adda:
    input: "{file}.txt"
    output: "{file}_add_a.txt"
    shell: "cat {input} |xargs echo add a >{output}"
rule addb:
    input:
        "{file}_add_a.txt"
    output:
        "{file}_add_a_add_b.txt"
    shell:
        "cat {input} | xargs echo add b >{output}"

rule addc:
    input:
        "{file}_add_a_add_b.txt"
    output:
        "{file}_add_a_add_b_add_c.txt"
    shell:
        "cat {input} | xargs echo add c >{output}"

rule hebing:
    input:
       a=expand("{file}_add_a_add_b_add_c.txt",file=["1","2","3"]),
       b=expand("{file}_add_a_add_b.txt",file=["1","2"])
    output:"hebing.txt"
    shell:"cat {input.a} {input.b} >{output}"

执行命令:

代码语言:javascript
复制
snakemake hebing.txt

执行结果:

代码语言:javascript
复制
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count    jobs
    3    addc
    1    hebing
    4

[Tue Apr  2 21:21:04 2019]
rule addc:
    input: 1_add_a_add_b.txt
    output: 1_add_a_add_b_add_c.txt
    jobid: 1
    wildcards: file=1

[Tue Apr  2 21:21:04 2019]
Finished job 1.
1 of 4 steps (25%) done

[Tue Apr  2 21:21:04 2019]
rule addc:
    input: 3_add_a_add_b.txt
    output: 3_add_a_add_b_add_c.txt
    jobid: 3
    wildcards: file=3

[Tue Apr  2 21:21:04 2019]
Finished job 3.
2 of 4 steps (50%) done

[Tue Apr  2 21:21:04 2019]
rule addc:
    input: 2_add_a_add_b.txt
    output: 2_add_a_add_b_add_c.txt
    jobid: 2
    wildcards: file=2

[Tue Apr  2 21:21:04 2019]
Finished job 2.
3 of 4 steps (75%) done

[Tue Apr  2 21:21:04 2019]
rule hebing:
    input: 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt, 1_add_a_add_b.txt, 2_add_a_add_b.txt
    output: hebing.txt
    jobid: 0

[Tue Apr  2 21:21:04 2019]
Finished job 0.
4 of 4 steps (100%) done
Complete log: /home/dengfei/test/snakemake/ex4/.snakemake/log/2019-04-02T212104.719887.snakemake.log
代码语言:javascript
复制

流程图:

搞定

欢迎关注我的公众号: R-breeding

相关阅读

snakemake 学习笔记1 snakemake 学习笔记2

后记1

今天测试了一下rule all的功能, 它是定义输出文件的, 如果没有定义, 需要在命令行中书写.

因为最后的输出文件是hebing.txt, 所以我们这里在Snakefile中定义一下输出文件.

代码语言:javascript
复制
rule all:
    input:"hebing.txt"
rule adda:
    input: "{file}.txt"
    output: "{file}_add_a.txt"
    shell: "cat {input} |xargs echo add a >{output}"
rule addb:
    input:
        "{file}_add_a.txt"
    output:
        "{file}_add_a_add_b.txt"
    shell:
        "cat {input} | xargs echo add b >{output}"

rule addc:
    input:
        "{file}_add_a_add_b.txt"
    output:
        "{file}_add_a_add_b_add_c.txt"
    shell:
        "cat {input} | xargs echo add c >{output}"

rule hebing:
    input:
       a=expand("{file}_add_a_add_b_add_c.txt",file=["1","2","3"]),
       b=expand("{file}_add_a_add_b.txt",file=["1","2"])
    output:"hebing.txt"
    shell:"cat {input.a} {input.b} >{output}"

执行命令:

代码语言:javascript
复制
snakemake

结果如下:

代码语言:javascript
复制
(base) [dengfei@localhost ex4]$ snakemake
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count    jobs
    3    adda
    3    addb
    3    addc
    1    all
    1    hebing
    11

rule adda:
    input: 1.txt
    output: 1_add_a.txt
    jobid: 7
    wildcards: file=1

Finished job 7.
1 of 11 steps (9%) done

rule adda:
    input: 2.txt
    output: 2_add_a.txt
    jobid: 9
    wildcards: file=2

Finished job 9.
2 of 11 steps (18%) done

rule adda:
    input: 3.txt
    output: 3_add_a.txt
    jobid: 10
    wildcards: file=3

Finished job 10.
3 of 11 steps (27%) done

rule addb:
    input: 3_add_a.txt
    output: 3_add_a_add_b.txt
    jobid: 8
    wildcards: file=3

Finished job 8.
4 of 11 steps (36%) done

rule addb:
    input: 1_add_a.txt
    output: 1_add_a_add_b.txt
    jobid: 3
    wildcards: file=1

Finished job 3.
5 of 11 steps (45%) done

rule addb:
    input: 2_add_a.txt
    output: 2_add_a_add_b.txt
    jobid: 6
    wildcards: file=2

Finished job 6.
6 of 11 steps (55%) done

rule addc:
    input: 3_add_a_add_b.txt
    output: 3_add_a_add_b_add_c.txt
    jobid: 5
    wildcards: file=3

Finished job 5.
7 of 11 steps (64%) done

rule addc:
    input: 2_add_a_add_b.txt
    output: 2_add_a_add_b_add_c.txt
    jobid: 2
    wildcards: file=2

Finished job 2.
8 of 11 steps (73%) done

rule addc:
    input: 1_add_a_add_b.txt
    output: 1_add_a_add_b_add_c.txt
    jobid: 4
    wildcards: file=1

Finished job 4.
9 of 11 steps (82%) done

rule hebing:
    input: 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt, 1_add_a_add_b.txt, 2_add_a_add_b.txt
    output: hebing.txt
    jobid: 1

Finished job 1.
10 of 11 steps (91%) done

localrule all:
    input: hebing.txt
    jobid: 0

Finished job 0.
11 of 11 steps (100%) done

查看结果:

代码语言:javascript
复制
(base) [dengfei@localhost ex4]$ cat hebing.txt 
add c add b add a this is 1.txt
add c add b add a this is 2.txt
add c add b add a this is 3.txt
add b add a this is 1.txt
add b add a this is 2.txt

后记2

snakemake如果是默认的名称, 为Snakefile, 但是这样写没有高亮, 可以写为a.py, 然后用snakemake -s a.py运行即可.

代码语言:javascript
复制
rule all:
    input:"hebing.txt"
rule adda:
    input: "{file}.txt"
    output: "{file}_add_a.txt"
    shell: "cat {input} |xargs echo add a >{output}"
rule addb:
    input:
        "{file}_add_a.txt"
    output:
        "{file}_add_a_add_b.txt"
    shell:
        "cat {input} | xargs echo add b >{output}"

rule addc:
    input:
        "{file}_add_a_add_b.txt"
    output:
        "{file}_add_a_add_b_add_c.txt"
    shell:
        "cat {input} | xargs echo add c >{output}"

rule hebing:
    input:
       a=expand("{file}_add_a_add_b_add_c.txt",file=["1","2","3"]),
       b=expand("{file}_add_a_add_b.txt",file=["1","2"])
    output:"hebing.txt"
    shell:"cat {input.a} {input.b} >{output}"

执行结果:

代码语言:javascript
复制
(base) [dengfei@localhost ex4]$ snakemake -s a.py 
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count    jobs
    3    adda
    3    addb
    3    addc
    1    all
    1    hebing
    11

rule adda:
    input: 1.txt
    output: 1_add_a.txt
    jobid: 8
    wildcards: file=1

Finished job 8.
1 of 11 steps (9%) done

rule adda:
    input: 3.txt
    output: 3_add_a.txt
    jobid: 10
    wildcards: file=3

Finished job 10.
2 of 11 steps (18%) done

rule adda:
    input: 2.txt
    output: 2_add_a.txt
    jobid: 9
    wildcards: file=2

Finished job 9.
3 of 11 steps (27%) done

rule addb:
    input: 3_add_a.txt
    output: 3_add_a_add_b.txt
    jobid: 7
    wildcards: file=3

Finished job 7.
4 of 11 steps (36%) done

rule addb:
    input: 2_add_a.txt
    output: 2_add_a_add_b.txt
    jobid: 4
    wildcards: file=2

Finished job 4.
5 of 11 steps (45%) done

rule addb:
    input: 1_add_a.txt
    output: 1_add_a_add_b.txt
    jobid: 3
    wildcards: file=1

Finished job 3.
6 of 11 steps (55%) done

rule addc:
    input: 3_add_a_add_b.txt
    output: 3_add_a_add_b_add_c.txt
    jobid: 2
    wildcards: file=3

Finished job 2.
7 of 11 steps (64%) done

rule addc:
    input: 2_add_a_add_b.txt
    output: 2_add_a_add_b_add_c.txt
    jobid: 5
    wildcards: file=2

Finished job 5.
8 of 11 steps (73%) done

rule addc:
    input: 1_add_a_add_b.txt
    output: 1_add_a_add_b_add_c.txt
    jobid: 6
    wildcards: file=1

Finished job 6.
9 of 11 steps (82%) done

rule hebing:
    input: 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt, 1_add_a_add_b.txt, 2_add_a_add_b.txt
    output: hebing.txt
    jobid: 1

Finished job 1.
10 of 11 steps (91%) done

localrule all:
    input: hebing.txt
    jobid: 0

Finished job 0.
11 of 11 steps (100%) done

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2019-04-13,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 育种数据分析之放飞自我 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 之前写的博客, 记录记录一下学习的轨迹.
  • 目标
  • 目标介绍
  • 1. 生成三个文件
  • 2. 在每个文件中增加”add a”
  • 3. 在每个文件中增加”add b”
  • 4. 在每个文件中增加”add c”
  • 5. 将文件合并
  • 搞定
  • 相关阅读
  • 后记1
  • 后记2
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档