Leetcode: Repeated DNA Sequences

题目:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

刚开始就想到用map进行子串的存储,结果Memory Limit Exceeded

我刚开始这样写的:

class Solution
{
public:
    vector<string> findRepeatedDnaSequences(string s)
    {
    	map<string, int> dnamap;
    	vector<string> result;
    	string::size_type length = s.length();
    	string sequence;
    	for (size_t i = 0; i + 10 < length; i++)
    	{
    		sequence = s.substr(i, 10);
    		dnamap[sequence] += 1;
    	}
    	for (std::map<string, int>::iterator it = dnamap.begin(); it != dnamap.end(); ++it)
    	{
    		if (it->second > 1)
    		{
    			result.push_back(it->first);
    		}
    	}
    	return result;
    }
};

后来看到https://leetcode.com/discuss/24478/i-did-it-in-10-lines-of-c的讨论,原来可以将子字符串转成数字,存储在map中,这样就不会超出限制的内存空间了。

原文是这样说的:The main idea is to store the substring as int in map to bypass the memory limits.

下面是我写的C++代码:

class Solution
{
    public:
        int str2int(string s) {
    	int num = 0;
    	for (int i = 0; i < s.size(); ++i)
    		//s[i]&7取出s[i]的后三个字节,因为用三个字节可以完全表示A,C,G,T
    		num = (num << 3) + (s[i] & 7);
    	return num;
    }
    
    vector<string> findRepeatedDnaSequences(string s) {
    	vector<string> result;
    	unordered_map<int, int> dictionary;
    	string sequence;
    	for (int i = 0; i + 10 <= s.size(); ++i)
    	{
    	    sequence = s.substr(i, 10);
    		if (dictionary[str2int(sequence)]++ == 1)
    			result.push_back(sequence);
    	}
    	return result;
    }
};

看到很多朋友这道题都使用的是unordered_map,而不是map,然后我查阅了资料,下面是unordered_map和map的区别:

unordered_map原来是boost库中的容器,在C++11标准中被引入到STL中。unordered_map和map的区别就是,map是按照operator<比较判断元素是否相同,以及比较元素的大小,然后选择合适的位置插入到树中。所以,如果对map进行遍历(中序遍历)的话,输出的结果是有序的。顺序就是按照operator< 定义的大小排序。而unordered_map是计算元素的Hash值,根据Hash值判断元素是否相同。所以,对unordered_map进行遍历,结果是无序的。

用法的区别就是,map的key需要定义operator<。 而boost::unordered_map需要定义hash_value函数并且重载operator==。对于内置类型,如string,这些都不用操心。对于自定义的类型做key,就需要自己重载operator<或者hash_value()了。 当不需要结果排好序时,最好unordered_map

其实,C++中map对于与Java中的TreeMap,而unordered_map对应于Java中的HashMap。

C#代码:

貌似C#中Dictionary不支持对不存在的key的直接索引,所以要先判断key存不存在。C++中如果不存在会自动添加要索引的key。

public class Solution
{
    private int StrToInt(string s)
    {
        int num = 0;
        for (int i = 0; i < s.Length; ++i)
        {
            num = (num << 3) + (s[i] & 7);
        }
        return num;
    }
    
    public IList<string> FindRepeatedDnaSequences(string s)
    {
        IList<string> result = new List<string>();
        IDictionary<int, int> map = new Dictionary<int, int>();
        string sequence = string.Empty;
        int key = 0;
        for (int i = 0; i + 10 <= s.Length; ++i)
        {
            sequence = s.Substring(i, 10);
            key = StrToInt(sequence);
            if (map.ContainsKey(key))
            {
                if (map[key]++ == 1)
                {
                    result.Add(sequence);
                }
            }
            else
            {
                map.Add(key, 1);
            }
        }
        return result;
    }
}

Python代码:

注意内置函数ord将单个字符串即字符转成对于的ASCII(Return the integer ordinal of a one-character string)

class Solution:
    # @param s, a string
    # @return a list of strings
    def str2int(self, s):
        num = 0
        for i in range(len(s)):
            num = (num << 3) + (ord(s[i]) & 7)
        return num

    def findRepeatedDnaSequences(self, s):
        length = len(s)
        result = []
        if length <= 10:
            return result
        map = {}
        i = 0
        while (i + 10) <= length:
            sequence = s[i : i + 10]
            num = self.str2int(sequence)
            if map.has_key(num):
                map[num] = map[num] + 1
                if map[num] == 1:
                    result.append(sequence)
            else:
                map[num] = 0
            i = i + 1
        return result

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券