正则表达式是常用的序列匹配工具,在之前的文章三大开源生信基础教程和视频课程中也有介绍,最近给高颜值免费在线SCI绘图工具 的一个序列搜索功能增加封面和Demo图片时,举出了几个常用案例,以解释最简单的几个正则的使用方式。
import re
seq = “ACGTCGATGACTGACGACTCGAACTGACGCATGACGCACGAGCATGAGAGACGCGATACGACGAGACTGA”
pat = [‘ACTG’, ‘[AC]TG’, ‘[^TG]TG’, ‘^[AC]CG’, ‘GA$’, ‘A.G’, ‘A..G’, ‘A.{2}G’, ‘A.{2,4}G’, ‘A.G’, ‘A.?G’, ‘A.+G’, ‘A.+?G’]
for i in pat: re_obj = re.compile(r”%s” % i) match_list = re_obj.finditer(seq) if match_list: print(i) for match_part in match_list: print(match_part.span(), match_part.group())
输出结果如上图。