算法作业HW15：LeetCode187 Repeated DNA Sequences

Description:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Note:

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Solution:

Analysis and Thinking:

问题要求我们写一个函数，实现找到所有长度为10的重复字符串（在输入中出现次数超过两次）。这里为了便于对输入数据进行处理，采取了字符->二进制数字映射操作，即对于ACGT这四个代表不同碱基类型的字符，用00,01,10,11来代替，从而把长度位10的字符串转换为20位整数值，从而便于处理，改变存储的容器类型还能一定程度减小空间复杂性。

问题的求解可以通过构造一个整数容器，先收集所有答案，具体如下：

Steps:

1.判断输入字符串总长度有无大于10，无则返回空结果

2.构建map，存储ACGT到二进制的映射关系

3. 利用前十个碱基构成的字符子串，构造出事的rstTemp值（循环十次，rstTemp不断左移两位，相加）

4. 当输入DNA序列（字符串）还未遍历完，定义一个十六进制数0x3ffff，rstTemp通过每次循环左移两位在与该十六进制数逐位与，消除其最高两位，然后通过加上s[当前遍历数+10]，从而获得一个碱基值（对应二进制）

5.定义两个set，定义为record、helper，helper用于存储一长度为10子串

6. 通过find（）函数，分别去record、helper查找是否包含子串，如果helper成功找，则将当前子串最高位碱基添加入结果容器

7.全部字符串内容遍历完，返回结果容器

Codes:

class Solution
{
    #defineeraser 03xffff
public:
    vector<string>findRepeatedDnaSequences(string s)
    {
        vector<string> result;
        int rstTemp;
        
        if(s.size()<10)
            return result;
        
        set<unsigned int> record;
        set<unsigned int> helper;
        map<char,int> mapper{{'A',0},{'C',1},{'G',2},{'T',4}};
        
        for(int i=0;i!=10;++i)
        {
            rstTemp=(rstTemp<<2)+mapper[s[i]];
            set<unsigned int>::iterator j=record.find(rstTemp);
            
            if(j!=record.end()){
                j=helper.find(rstTemp);
                if(j==helper.end()){
                    result.push_back(string(s,i-9,10));
                    helper.insert(rstTemp);
                }
                
            }
            else{
                record.inset(rstTemp);
            }
        }
        return result;
    }
};

Results:

秒客网

算法作业HW15：LeetCode187 Repeated DNA Sequences

相关文章