我目前在java中有一个BNDM搜索算法,但我想修改它,使字母"N“与任何其他字母相匹配。例如,字符串:"NATG“应与"CATG”匹配。我正在创建一个用于核苷酸匹配的软件,这样序列将只有A,G,T,C,N,其中N是任何A,G,T,C。
例如: If Sequence:"ATGCN“和Source:"ATGATGAATGCC”。程序应该返回与序列匹配的源的索引范围。在这种情况下,7-11。此外,如果匹配多次,则应打印每个匹配项。由于源代码通常有上千个字符,我希望实现一个快速的搜索算法。下面是我当前的BNDM代码,但是这只允许精确匹配。
我不确定下面的BNDM算法是否可以适应这一点。我对不同的搜索算法持开放态度。
我附上了下面的代码:
import java.util.Scanner;
public class BNDM {
public static void main(String[] args){
Scanner sc = new Scanner(System.in);
int sum = 5;
String source,pattern;
System.out.print("Enter sequence:");
pattern = sc.nextLine();
System.out.print("Enter source:");
source= sc.nextLine();
if (pattern.length() == source.length() && pattern.equals(source))
{
System.out.println("Sequence = Source");
}
char[] x = pattern.toCharArray(), y = source.toCharArray();
int i, j, s, d, last, m = x.length, n = y.length;
int[] b = new int[65536];
/* Pre processing */
for (i = 0; i < b.length; i++) {
b[i] = 0;
}
s = 1;
for (i = m - 1; i >= 0; i--) {
b[x[i]] |= s;
s <<= 1;
}
/* Searching phase */
j = 0;
while (j <= n - m) {
i = m - 1;
last = m;
d = ~0;
while (i >= 0 && d != 0) {
d &= b[y[j + i]];
i--;
if (d != 0) {
if (i >= 0) {
last = i + 1;
} else {
System.out.println("Sequence in Source starting at
position:");
System.out.println(j);
System.out.println("Sequence:");
System.out.println(pattern);
System.out.println("Source:");
System.out.println(source.substring(j,j+m));
}
}
d <<= 1;
}
j += last;
}
}
}
发布于 2018-10-26 03:36:44
这种匹配可以使用正则表达式轻松完成:
// remember to add these at the top:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String pattern = "ATGCN";
String nucleotides = "ATGATGAATGCC";
// first convert the pattern into a proper regex
// i.e. replacing any N with [ATCG]
Pattern regex = Pattern.compile(pattern.replaceAll("N", "[ATCG]"));
// create a Matcher to find everywhere that the pattern matches
Matcher m = regex.matcher(nucleotides);
// find all the matches
while (m.find()) {
System.out.println("Match found:");
System.out.println("start:" + m.start());
System.out.println("end:" + (m.end() - 1)); // minus 1 here because the end of a regex match is always off by 1
System.out.println();
}
发布于 2018-10-26 05:57:44
public class Match {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String origin = in.next();
String match = in.next();
Pattern pattern = Pattern.compile(match.replaceAll("N", "(A|G|T|C)"));
Matcher matcher = pattern.matcher(origin);
while (matcher.find()){
System.out.println(matcher.start() + "-" + (matcher.end() - 1));
}
}
}
https://stackoverflow.com/questions/52996620
复制相似问题