查找字符串中所有出现的分割子字符串

时间:2021-04-10 19:20:39

I'm trying to solve a little extraordinary problem. I need to find the amount all occurrences of a substring in a string if the substring don't have to be in one piece.

我正在努力解决一个非常特殊的问题。如果子串不必是一个整体,我需要找到字符串中所有出现的子串的数量。


Example:

Input:

adnndaend

I'll try to find the substring and.

我会尝试找到子串和。

Occurrences:

adnndaend

adnndaend

adnndaend

adnndaend

adnndaend

adnndaend

Output:

6

I've tried to achieve the list of occurences by using python re.findall:

我试图通过使用python re.findall来实现出现的列表:

re.findall('^.*a.*n.*d.*$', 'adnndaend')

but it returns the list with just one item - the whole string:

但它只返回一个项目列表 - 整个字符串:

['adnndaend']

So could you, please, tell me, what's wrong with my regex or show me your better solution? Ideally in Python or Java, I'm not very familiar with other languages.

那么请你告诉我,我的正则表达式有什么问题或者告诉我你更好的解决方案?理想情况下,在Python或Java中,我对其他语言不是很熟悉。

4 个解决方案

#1


2  

You could get all combinations of using the amount of times a, n and d appear:

您可以获得使用a,n和d出现次数的所有组合:

from itertools import combinations
def sub_s(st,word):
   all_s = (x for x in st if x in word)
   return len([x for x in (combinations(all_s, len(word))) if "".join(x) == word] )

#2


2  

Regex returns non-overlapping matches, which in your case is only a single one. So regex is out of the question. Instead, I came up with this little recursive function:

正则表达式返回非重叠匹配,在您的情况下只有一个匹配。所以正则表达式是不可能的。相反,我提出了这个小递归函数:

def count(haystack, needle):
    result= 0
    pos= -1
    char= needle[0] # we'll be searching the haystack for all occurences of this character.

    while True:
        # find the next occurence
        pos= haystack.find(char, pos+1)

        # if there are no more occurences, we're done
        if pos==-1:
            return result

        # once we found the first character, recursively count the occurences of
        # needle (without the first character) in what's left of haystack
        if len(needle)==1:
            result+= 1
        else:
            result+= count(haystack[pos+1:], needle[1:])

I didn't test it extensively, but:

我没有广泛测试它,但是:

>>> print count('adnndaend', 'and')
6

#3


1  

public int findOccurrences(String str, String key) {
    int total = 0;
    for (int i = 0; i < str.length(); i++) {
        if (str.charAt(i) == key.charAt(0)) {
            if (key.length() > 1) {
                total += findOccurrences(str.substring(i), key.substring(1));
            } else {
                total += 1;
            }
        }
    }
    return total;
}

@Test
public void yup(){
    System.out.println(findOccurrences("adnndaend", "and"));
}

Output = 6

输出= 6

#4


1  

You could use itertools.combinations as follows:

您可以使用itertools.combinations,如下所示:

import itertools
pattern = "and"
print len([''.join(i) for i in itertools.combinations('adnndaend',len(pattern) if ''.join(i) == pattern])

output:

6

idea is Generate all combinations of characters sequence using itertools.combinations and match them against your pattern; resulting list would have only matched items.

想法是使用itertools.combinations生成字符序列的所有组合,并将它们与您的模式匹配;结果列表只有匹配的项目。

#1


2  

You could get all combinations of using the amount of times a, n and d appear:

您可以获得使用a,n和d出现次数的所有组合:

from itertools import combinations
def sub_s(st,word):
   all_s = (x for x in st if x in word)
   return len([x for x in (combinations(all_s, len(word))) if "".join(x) == word] )

#2


2  

Regex returns non-overlapping matches, which in your case is only a single one. So regex is out of the question. Instead, I came up with this little recursive function:

正则表达式返回非重叠匹配,在您的情况下只有一个匹配。所以正则表达式是不可能的。相反,我提出了这个小递归函数:

def count(haystack, needle):
    result= 0
    pos= -1
    char= needle[0] # we'll be searching the haystack for all occurences of this character.

    while True:
        # find the next occurence
        pos= haystack.find(char, pos+1)

        # if there are no more occurences, we're done
        if pos==-1:
            return result

        # once we found the first character, recursively count the occurences of
        # needle (without the first character) in what's left of haystack
        if len(needle)==1:
            result+= 1
        else:
            result+= count(haystack[pos+1:], needle[1:])

I didn't test it extensively, but:

我没有广泛测试它,但是:

>>> print count('adnndaend', 'and')
6

#3


1  

public int findOccurrences(String str, String key) {
    int total = 0;
    for (int i = 0; i < str.length(); i++) {
        if (str.charAt(i) == key.charAt(0)) {
            if (key.length() > 1) {
                total += findOccurrences(str.substring(i), key.substring(1));
            } else {
                total += 1;
            }
        }
    }
    return total;
}

@Test
public void yup(){
    System.out.println(findOccurrences("adnndaend", "and"));
}

Output = 6

输出= 6

#4


1  

You could use itertools.combinations as follows:

您可以使用itertools.combinations,如下所示:

import itertools
pattern = "and"
print len([''.join(i) for i in itertools.combinations('adnndaend',len(pattern) if ''.join(i) == pattern])

output:

6

idea is Generate all combinations of characters sequence using itertools.combinations and match them against your pattern; resulting list would have only matched items.

想法是使用itertools.combinations生成字符序列的所有组合,并将它们与您的模式匹配;结果列表只有匹配的项目。