题目：

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:

["AAAAACCCCC", "CCCCCAAAAA"].

提示：

这道题最初的想法就是使用HashMap来做，一开始的时候用了STL里的unordered_map：

class Solution {

public:

    vector<string> findRepeatedDnaSequences(string s) {

        unordered_set<string> string_set;

        if (s.length() <= ) {

            return {};

        }

        unordered_map<string ,int> m;

        for (int i = ; i < s.length() - ; ++i) {

            string tmp = s.substr(i, );

            if (m.find(tmp) == m.end()) {

                m[tmp] = ;

            } else {

                string_set.insert(tmp);

            }

        }

        return vector<string>(string_set.begin(), string_set.end());

    }

};

做下来以后虽然AC了，但是时间却很不理想，原因也很简单，对于这种特定的问题，使用自定义的map和hash方法会让算法的效率得到很大的提高，之于如何设计map和hash的问题，我们不妨先看一看这个问题输入所具有的特点：

输入的字符串长度是固定的（10个字符）
字符只有{'A', 'C', 'T', 'G'}四种
结合上述两点，可以发现输入的可能性是可以计算出来的，即4^10个。

这样一来，我们就可以利用一个基本类型的数组作为map，而hash的算法则是可以想办法把4种字符分别映射到[1,4]这4个数字上，具体的做法可以看代码。

代码：

class Solution {

public:

    vector<string> findRepeatedDnaSequences(string s) {

        // 将该char数组作为一个map

        char map[] = {};

        int len = s.length(), num = ;

        vector<string> res;

        if (len <= ) {

            return res;

        }

        for (int i = ; i < ; ++i) {

            num <<= ;

            // 可以试一下下面的计算方法，效果就是把4个字符映射到了1，2，3，4这四个数字上

            num |= (s[i] - 'A' + ) % ;

        }

        for (int i = ; i < s.length(); ++i) {

            num <<= ;

            num |= (s[i] - 'A' + ) % ;

            // 0xfffff代表了20个1组成的二进制序列，按位与之后，结果就是当前字符串代表的数值

            num = num & 0xfffff;

            if (map[num]++ == ) {

                res.push_back(s.substr(i - , ));

            }

        }

        return res;

    }

};

秒客网

【LeetCode】187. Repeated DNA Sequences

题目：

代码：

相关文章