字符串中子字符串的基本索引递归(python)

I'm working on teaching myself basic programming.
One simple project is to find the index of recurrences of a substring within a string. So for example, in string "abcdefdef" and substring "def", I would like the output to be 3 and 6. I have some code written, but I'm not getting the answers I want. Following is what I have written

我正在自学基础编程。一个简单的项目是在字符串中找到子字符串的递归索引。例如，在字符串“abcdefdef”和substring“def”中，我希望输出为3和6。我写了一些代码，但是我没有得到我想要的答案。以下是我所写的。

Note:I'm aware that there may be easier way to produce the result, leveraging built-in features/packages of the language, such as Regular Expressions. I'm also aware that my approach is probably not an optimal algorithm. Never the less, at this time, I'm only seeking advice on fixing the following logic, rather than using more idiomatic approaches.

注意:我知道可能会有更简单的方法来产生结果，利用语言的内置特性/包，比如正则表达式。我也意识到我的方法可能不是一个最优的算法。在这段时间里，我只是在寻求一些建议来解决以下的逻辑问题，而不是使用更多的惯用方法。

import string

def MIT(String, substring): # "String" is the main string I'm searching within
    String_list = list(String)
    substring_list = list(substring)
    i = 0
    j = 0
    counter = 0
    results = []
    while i < (len(String)-1):
        if [j] == [i]:
            j = j + 1
            i = i + 1
            counter  = counter + 1
            if counter == len(substring):
                results.append([i - len(substring)+1])
                counter = 0
                j = 0
                i = i+1
        else:
            counter = 0
            j = 0
            i = i+1
    print results
    return

My line of reasoning is as such. I turn the String and substring into a list. That allows for indexing of each letter in the string. I set i and j = 0--these will be my first values in the String and substring index, respectively. I also have a new variable, counter, which I set = to 0. Basically, I'm using counter to count how many times the letter in position [i] is equal to the element in position [j]. If counter equals the length of substring, then I know that [i - len(substring) + 1] is a position where my substring starts, so I add it to a list called results. Then I reset counter and j and continue searching for more substrings.

我的推理是这样的。我将字符串和子字符串转换为列表。这允许对字符串中的每个字母进行索引。我设置I和j = 0，它们分别是字符串和子串索引中的第一个值。我还有一个新的变量，counter，我把它设为0。基本上，我使用计数器来计算位置[I]的字母等于位置的元素的次数[j]。如果counter等于子字符串的长度，那么我知道[I - len(substring) + 1]是我的子字符串开始的位置，所以我将它添加到一个名为results的列表中。然后我重置计数器和j并继续搜索更多的子字符串。

I know the code is awkward, but I thought that I should still be able to get the answer. Instead I get:

我知道代码很笨拙，但是我想我还是可以得到答案的。而是我得到:

>>> MIT("abcdefghi", "def")
[[3]]
>>> MIT("abcdefghi", "efg")
[[3]]
>>> MIT("abcdefghi", "b")
[[1]]
>>> MIT("abcdefghi", "k")
[[1]]

Any thoughts?

任何想法吗?

5 个解决方案

#1

The regular expressions module (re) is much more suited for this task.

正则表达式模块(re)更适合于此任务。

Good reference: http://docs.python.org/howto/regex.html

好的参考:http://docs.python.org/howto/regex.html

Also: http://docs.python.org/library/re.html

还:http://docs.python.org/library/re.html

EDIT: A more 'manual' way may be to use slicing

编辑:一个更“手动”的方法可能是使用切片。

s = len(String)
l = len(substring)
for i in range(s-l+1):
    if String[i:i+l] == substring:
        pass #add to results or whatever

#2

The main/major problem are the following:

主要的/主要的问题是:

for comparison, use: if String[i] == substring[j]
用于比较，使用:如果String[i] ==子字符串[j]
you increment i twice when you found a match, remove the second increment.
当你找到一个匹配项时，你增加i两次，去掉第二个增量。
the loop should go till while i < len(String):
循环应该一直到我< len(String):

and of course it won't find overlapping matches (eg: MIT("aaa", "aa"))

当然，它不会找到重叠的匹配(例如:MIT(“aaa”，“aa”))

There are some minor "problems", it's not really pythonic, there is no need for building lists, increment is clearer if written i += 1, a useful function should return the values not print them, etc...

有一些次要的“问题”，它不是真正的python，不需要构建列表，如果写入i += 1，增量就更清楚了，一个有用的函数应该返回不打印它们的值，等等…

If you want proper and fast code, check the classic algorithm book: http://www.amazon.com/Introduction-Algorithms-Thomas-H-Cormen/dp/0262033844 . It has a whole chapter about string search.

如果您想要正确和快速的代码，请检查经典的算法手册:http://www.amazon.com/- algorithm - thomas - h - cormen/dp/0262033844。它有一个关于字符串搜索的完整章节。

If you want a pythonic solution without implementing the whole thing check the other answers.

如果你想要一个python解决方案而不需要实现整个事情，那就检查其他的答案。

#3

First, I added some comments to your code to give some tips

首先，我向您的代码添加了一些注释，以提供一些提示。

import string

def MIT(String, substring): 
    String_list = list(String)  # this doesn't need to be done; you can index strings
    substring_list = list(substring)
    i = 0
    j = 0
    counter = 0
    results = []
    while i < (len(String)-1):   
        if [j] == [i]:   # here you're comparing two, one-item lists. you must do substring[j] and substring[i]
            j = j + 1
            i = i + 1
            counter  = counter + 1
            if counter == len(substring):
                results.append([i - len(substring)+1]) # remove the brackets; append doesn't require them
                counter = 0
                j = 0
                i = i+1 # remove this 
        else:
            counter = 0
            j = 0
            i = i+1
print results
return

Here's how I would do it without using built-in libraries and such:

下面是我在不使用内置库的情况下如何做的:

def MIT(fullstring, substring):
    results = []
    sub_len = len(substring)
    for i in range(len(fullstring)):  # range returns a list of values from 0 to (len(fullstring) - 1)
        if fullstring[i:i+sub_len] == substring: # this is slice notation; it means take characters i up to (but not including) i + the length of th substring
            results.append(i)
    return results

#4

I'm not clear on whether you want to learn some good string searching algorithms, or a straightforward way to do it in Python. If it's the latter, then string.find is your friend. Something like

我不清楚您是否想要学习一些优秀的字符串搜索算法，或者用Python来做这个简单的方法。如果是后者，则是字符串。发现是你的朋友。类似的

def find_all_indexes(needle, haystack):
    """Find the index for the beginning of each occurrence of ``needle`` in ``haystack``. Overlaps are allowed."""
    indexes = []
    last_index = haystack.find(needle)
    while -1 != last_index:
        indexes.append(last_index)
        last_index = haystack.find(needle, last_index + 1)
    return indexes


if __name__ == '__main__':
    print find_all_indexes('is', 'This is my string.')

While this is a pretty naive approach, it should be easily understandable.

虽然这是一种相当幼稚的做法，但它应该很容易理解。

If you're looking for something that uses even less of the standard library (and will actually teach you a fairly common algorithm used when implementing libraries), you could try implementing the Boyer-Moore string search algorithm.

如果您正在寻找使用更少的标准库(并且将在实现库时使用相当通用的算法)，您可以尝试实现Boyer-Moore字符串搜索算法。

#5

For finding the position of substring in a string this algorithm will do:

为了在字符串中找到子字符串的位置，这个算法会这样做:

def posnof_substring(string,sub_string):
l=len(sub_string)
for i in range(len(string)-len(sub_string)+1):
    if(string[i:i+len(sub_string)] == sub_string ):      
        posn=i+1
return posn

I myself checked this algorithm and it worked!

我亲自检查了这个算法，它成功了!

#1