如何判断字符串是否在Python中重复出现?

时间:2021-12-12 07:25:45

I'm looking for a way to test whether or not a given string repeats itself for the entire string or not.

我正在寻找一种方法来测试给定的字符串是否在整个字符串中重复出现。

Examples:

例子:

[
    '0045662100456621004566210045662100456621',             # '00456621'
    '0072992700729927007299270072992700729927',             # '00729927'
    '001443001443001443001443001443001443001443',           # '001443'
    '037037037037037037037037037037037037037037037',        # '037'
    '047619047619047619047619047619047619047619',           # '047619'
    '002457002457002457002457002457002457002457',           # '002457'
    '001221001221001221001221001221001221001221',           # '001221'
    '001230012300123001230012300123001230012300123',        # '00123'
    '0013947001394700139470013947001394700139470013947',    # '0013947'
    '001001001001001001001001001001001001001001001001001',  # '001'
    '001406469760900140646976090014064697609',              # '0014064697609'
]

are strings which repeat themselves, and

弦是重复的吗

[
    '004608294930875576036866359447',
    '00469483568075117370892018779342723',
    '004739336492890995260663507109',
    '001508295625942684766214177978883861236802413273',
    '007518796992481203',
    '0071942446043165467625899280575539568345323741',
    '0434782608695652173913',
    '0344827586206896551724137931',
    '002481389578163771712158808933',
    '002932551319648093841642228739',
    '0035587188612099644128113879',
    '003484320557491289198606271777',
    '00115074798619102416570771',
]

are examples of ones that do not.

有些例子是没有的。

The repeating sections of the strings I'm given can be quite long, and the strings themselves can be 500 or more characters, so looping through each character trying to build a pattern then checking the pattern vs the rest of the string seems awful slow. Multiply that by potentially hundreds of strings and I can't see any intuitive solution.

我给出的字符串的重复部分可能很长,字符串本身可能有500个或更多字符,所以循环遍历试图构建模式的每个字符,然后检查模式与字符串的其余部分似乎非常慢。把它乘以可能的数百个字符串我看不到任何直观的解。

I've looked into regexes a bit and they seem good for when you know what you're looking for, or at least the length of the pattern you're looking for. Unfortunately, I know neither.

我已经研究过一些regexes,当您知道您要查找的是什么,或者至少是您要查找的模式的长度时,它们似乎很适合。不幸的是,我都不认识。

How can I tell if a string is repeating itself and if it is, what the shortest repeating subsequence is?

如何判断一个字符串是否在重复如果是,最短重复子序列是什么?

13 个解决方案

#1


558  

Here's a concise solution which avoids regular expressions and slow in-Python loops:

这里有一个简洁的解决方案,它避免了正则表达式和python中的缓慢循环:

def principal_period(s):
    i = (s+s).find(s, 1, -1)
    return None if i == -1 else s[:i]

See the Community Wiki answer started by @davidism for benchmark results. In summary,

参见由@davidism results启动的社区Wiki答复。总之,

David Zhang's solution is the clear winner, outperforming all others by at least 5x for the large example set.

张大卫的解决方案显然是赢家,在这个大的示例集上,他的表现至少比其他所有人高出5倍。

(That answer's words, not mine.)

(那个答案是话,不是我的。)

This is based on the observation that a string is periodic if and only if it is equal to a nontrivial rotation of itself. Kudos to @AleksiTorhamo for realizing that we can then recover the principal period from the index of the first occurrence of s in (s+s)[1:-1], and for informing me of the optional start and end arguments of Python's string.find.

这是基于观察到一个弦是周期性的当且仅当它等于它自身的非平凡旋转时。感谢@AleksiTorhamo意识到我们可以从(s+s)中第一次出现s的索引中恢复主周期[1:-1],并告诉我Python的string.find的可选的开始和结束参数。

#2


180  

Here's a solution using regular expressions.

这里有一个使用正则表达式的解决方案。

import re

REPEATER = re.compile(r"(.+?)\1+$")

def repeated(s):
    match = REPEATER.match(s)
    return match.group(1) if match else None

Iterating over the examples in the question:

重复问题中的例子:

examples = [
    '0045662100456621004566210045662100456621',
    '0072992700729927007299270072992700729927',
    '001443001443001443001443001443001443001443',
    '037037037037037037037037037037037037037037037',
    '047619047619047619047619047619047619047619',
    '002457002457002457002457002457002457002457',
    '001221001221001221001221001221001221001221',
    '001230012300123001230012300123001230012300123',
    '0013947001394700139470013947001394700139470013947',
    '001001001001001001001001001001001001001001001001001',
    '001406469760900140646976090014064697609',
    '004608294930875576036866359447',
    '00469483568075117370892018779342723',
    '004739336492890995260663507109',
    '001508295625942684766214177978883861236802413273',
    '007518796992481203',
    '0071942446043165467625899280575539568345323741',
    '0434782608695652173913',
    '0344827586206896551724137931',
    '002481389578163771712158808933',
    '002932551319648093841642228739',
    '0035587188612099644128113879',
    '003484320557491289198606271777',
    '00115074798619102416570771',
]

for e in examples:
    sub = repeated(e)
    if sub:
        print("%r: %r" % (e, sub))
    else:
        print("%r does not repeat." % e)

... produces this output:

…产生该输出:

'0045662100456621004566210045662100456621': '00456621'
'0072992700729927007299270072992700729927': '00729927'
'001443001443001443001443001443001443001443': '001443'
'037037037037037037037037037037037037037037037': '037'
'047619047619047619047619047619047619047619': '047619'
'002457002457002457002457002457002457002457': '002457'
'001221001221001221001221001221001221001221': '001221'
'001230012300123001230012300123001230012300123': '00123'
'0013947001394700139470013947001394700139470013947': '0013947'
'001001001001001001001001001001001001001001001001001': '001'
'001406469760900140646976090014064697609': '0014064697609'
'004608294930875576036866359447' does not repeat.
'00469483568075117370892018779342723' does not repeat.
'004739336492890995260663507109' does not repeat.
'001508295625942684766214177978883861236802413273' does not repeat.
'007518796992481203' does not repeat.
'0071942446043165467625899280575539568345323741' does not repeat.
'0434782608695652173913' does not repeat.
'0344827586206896551724137931' does not repeat.
'002481389578163771712158808933' does not repeat.
'002932551319648093841642228739' does not repeat.
'0035587188612099644128113879' does not repeat.
'003484320557491289198606271777' does not repeat.
'00115074798619102416570771' does not repeat.

The regular expression (.+?)\1+$ is divided into three parts:

正则表达式(.+?)\1+$分为三个部分:

  1. (.+?) is a matching group containing at least one (but as few as possible) of any character (because +? is non-greedy).

    (.+?)是一个匹配组,包含至少一个(但尽可能少)的任何字符(因为+?是贪婪的,)。

  2. \1+ checks for at least one repetition of the matching group in the first part.

    \1+在第一部分中检查匹配组的至少一次重复。

  3. $ checks for the end of the string, to ensure that there's no extra, non-repeating content after the repeated substrings (and using re.match() ensures that there's no non-repeating text before the repeated substrings).

    $检查字符串的末尾,以确保在重复的子字符串之后没有额外的、非重复的内容(并且使用re.match()确保在重复的子字符串之前没有非重复的文本)。

In Python 3.4 and later, you could drop the $ and use re.fullmatch() instead, or (in any Python at least as far back as 2.3) go the other way and use re.search() with the regex ^(.+?)\1+$, all of which are more down to personal taste than anything else.

在Python 3.4和以后,你可以把美元和使用re.fullmatch()相反,或者(在任何Python至少早在2.3)采取其他方法和使用re.search()的regex ^(. + ?)\ 1 +美元,所有这一切更比其他任何个人品味。

#3


90  

You can make the observation that for a string to be considered repeating, its length must be divisible by the length of its repeated sequence. Given that, here is a solution that generates divisors of the length from 1 to n / 2 inclusive, divides the original string into substrings with the length of the divisors, and tests the equality of the result set:

你可以观察到,对于一个被认为是重复的字符串,它的长度必须被它的重复序列的长度整除。考虑到这一点,这里有一个解决方案,生成长度从1到n / 2的除数,将原始字符串分割成具有除数长度的子字符串,并测试结果集的相等性:

from math import sqrt, floor

def divquot(n):
    if n > 1:
        yield 1, n
    swapped = []
    for d in range(2, int(floor(sqrt(n))) + 1):
        q, r = divmod(n, d)
        if r == 0:
            yield d, q
            swapped.append((q, d))
    while swapped:
        yield swapped.pop()

def repeats(s):
    n = len(s)
    for d, q in divquot(n):
        sl = s[0:d]
        if sl * q == s:
            return sl
    return None

EDIT: In Python 3, the / operator has changed to do float division by default. To get the int division from Python 2, you can use the // operator instead. Thank you to @TigerhawkT3 for bringing this to my attention.

编辑:在Python 3中,/操作符已经默认更改为浮动除法。要从Python 2中获得int分割,可以使用//运算符。谢谢@TigerhawkT3提醒我。

The // operator performs integer division in both Python 2 and Python 3, so I've updated the answer to support both versions. The part where we test to see if all the substrings are equal is now a short-circuiting operation using all and a generator expression.

/运算符在Python 2和Python 3中执行整数除法,因此我更新了答案以支持这两个版本。我们测试所有子字符串是否相等的部分现在是使用all和一个生成器表达式的短路操作。

UPDATE: In response to a change in the original question, the code has now been updated to return the smallest repeating substring if it exists and None if it does not. @godlygeek has suggested using divmod to reduce the number of iterations on the divisors generator, and the code has been updated to match that as well. It now returns all positive divisors of n in ascending order, exclusive of n itself.

更新:为了响应原始问题中的更改,现在已经更新了代码,如果存在最小的重复子字符串,则返回no。@godlygeek建议使用divmod减少除数生成器上的迭代次数,并更新了代码以与之匹配。它现在以升序返回n的所有正因数,不包括n本身。

Further update for high performance: After multiple tests, I've come to the conclusion that simply testing for string equality has the best performance out of any slicing or iterator solution in Python. Thus, I've taken a leaf out of @TigerhawkT3 's book and updated my solution. It's now over 6x as fast as before, noticably faster than Tigerhawk's solution but slower than David's.

进一步提高性能:在多次测试之后,我得出结论,简单地测试字符串相等性,在Python中的任何切片或迭代器解决方案中都有最好的性能。因此,我从@TigerhawkT3的书中删除了一页,并更新了我的解决方案。它现在比以前快了6倍,比泰格霍克的方案快得多,但比大卫的要慢。

#4


84  

Here are some benchmarks for the various answers to this question. There were some surprising results, including wildly different performance depending on the string being tested.

以下是对这个问题的各种回答的一些基准。有一些令人惊讶的结果,包括根据所测试的字符串有很大不同的性能。

Some functions were modified to work with Python 3 (mainly by replacing / with // to ensure integer division). If you see something wrong, want to add your function, or want to add another test string, ping @ZeroPiraeus in the Python chatroom.

一些函数被修改为使用Python 3(主要是通过替换/ with /来确保整数除法)。如果发现问题,希望添加函数,或者希望在Python聊天室中添加另一个测试字符串ping @ZeroPiraeus。

In summary: there's about a 50x difference between the best- and worst-performing solutions for the large set of example data supplied by OP here (via this comment). David Zhang's solution is the clear winner, outperforming all others by around 5x for the large example set.

总之:对于OP提供的大量示例数据(通过此注释),最好的和最差的解决方案之间的差异是50x。David Zhang的解决方案显然是赢家,在这个大的示例集上,其表现要比其他所有人高出5倍左右。

A couple of the answers are very slow in extremely large "no match" cases. Otherwise, the functions seem to be equally matched or clear winners depending on the test.

在非常大的“无匹配”情况下,有几个答案非常慢。否则,根据测试,函数似乎是相等的或明确的赢家。

Here are the results, including plots made using matplotlib and seaborn to show the different distributions:

这里是结果,包括使用matplotlib和seaborn进行的图,以显示不同的发行版:


Corpus 1 (supplied examples - small set)

语料库1(提供示例-小集合)

mean performance:
 0.0003  david_zhang
 0.0009  zero
 0.0013  antti
 0.0013  tigerhawk_2
 0.0015  carpetpython
 0.0029  tigerhawk_1
 0.0031  davidism
 0.0035  saksham
 0.0046  shashank
 0.0052  riad
 0.0056  piotr

median performance:
 0.0003  david_zhang
 0.0008  zero
 0.0013  antti
 0.0013  tigerhawk_2
 0.0014  carpetpython
 0.0027  tigerhawk_1
 0.0031  davidism
 0.0038  saksham
 0.0044  shashank
 0.0054  riad
 0.0058  piotr

如何判断字符串是否在Python中重复出现?


Corpus 2 (supplied examples - large set)

语料库2(提供示例-大集合)

mean performance:
 0.0006  david_zhang
 0.0036  tigerhawk_2
 0.0036  antti
 0.0037  zero
 0.0039  carpetpython
 0.0052  shashank
 0.0056  piotr
 0.0066  davidism
 0.0120  tigerhawk_1
 0.0177  riad
 0.0283  saksham

median performance:
 0.0004  david_zhang
 0.0018  zero
 0.0022  tigerhawk_2
 0.0022  antti
 0.0024  carpetpython
 0.0043  davidism
 0.0049  shashank
 0.0055  piotr
 0.0061  tigerhawk_1
 0.0077  riad
 0.0109  saksham

如何判断字符串是否在Python中重复出现?


Corpus 3 (edge cases)

语料库3(边界情况)

mean performance:
 0.0123  shashank
 0.0375  david_zhang
 0.0376  piotr
 0.0394  carpetpython
 0.0479  antti
 0.0488  tigerhawk_2
 0.2269  tigerhawk_1
 0.2336  davidism
 0.7239  saksham
 3.6265  zero
 6.0111  riad

median performance:
 0.0107  tigerhawk_2
 0.0108  antti
 0.0109  carpetpython
 0.0135  david_zhang
 0.0137  tigerhawk_1
 0.0150  shashank
 0.0229  saksham
 0.0255  piotr
 0.0721  davidism
 0.1080  zero
 1.8539  riad

如何判断字符串是否在Python中重复出现?


The tests and raw results are available here.

测试和原始结果在这里可用。

#5


37  

Non-regex solution:

Non-regex解决方案:

def repeat(string):
    for i in range(1, len(string)//2+1):
        if not len(string)%len(string[0:i]) and string[0:i]*(len(string)//len(string[0:i])) == string:
            return string[0:i]

Faster non-regex solution, thanks to @ThatWeirdo (see comments):

更快的非regex解决方案,感谢@ThatWeirdo(参见注释):

def repeat(string):
    l = len(string)
    for i in range(1, len(string)//2+1):
        if l%i: continue
        s = string[0:i]
        if s*(l//i) == string:
            return s

The above solution is very rarely slower than the original by a few percent, but it's usually a good bit faster - sometimes a whole lot faster. It's still not faster than davidism's for longer strings, and zero's regex solution is superior for short strings. It comes out to the fastest (according to davidism's test on github - see his answer) with strings of about 1000-1500 characters. Regardless, it's reliably second-fastest (or better) in all cases I tested. Thanks, ThatWeirdo.

上面的解决方案很少比原来的方案慢几个百分点,但是它通常会快一点——有时会快很多。对于长字符串,它仍然不会比戴维德式更快,而对于短字符串,zero的正则解更优越。它以大约1000-1500个字符组成的字符串(根据戴维德对github的测试——见他的答案)获得了最快的结果。无论如何,它在我测试的所有情况下都是可靠的第二快(或更好)。谢谢,ThatWeirdo。

Test:

测试:

print(repeat('009009009'))
print(repeat('254725472547'))
print(repeat('abcdeabcdeabcdeabcde'))
print(repeat('abcdefg'))
print(repeat('09099099909999'))
print(repeat('02589675192'))

Results:

结果:

009
2547
abcde
None
None
None

#6


23  

First, halve the string as long as it's a "2 part" duplicate. This reduces the search space if there are an even number of repeats. Then, working forwards to find the smallest repeating string, check if splitting the full string by increasingly larger sub-string results in only empty values. Only sub-strings up to length // 2 need to be tested since anything over that would have no repeats.

首先,只要字符串是“2部分”重复,就将其减半。如果重复次数为偶数,这将减少搜索空间。然后,通过工作转发来查找最小的重复字符串,检查是否通过越来越大的子字符串将整个字符串拆分为空值。只有长度为// 2的子字符串需要进行测试,因为任何超过该长度的字符串都不会重复。

def shortest_repeat(orig_value):
    if not orig_value:
        return None

    value = orig_value

    while True:
        len_half = len(value) // 2
        first_half = value[:len_half]

        if first_half != value[len_half:]:
            break

        value = first_half

    len_value = len(value)
    split = value.split

    for i in (i for i in range(1, len_value // 2) if len_value % i == 0):
        if not any(split(value[:i])):
            return value[:i]

    return value if value != orig_value else None

This returns the shortest match or None if there is no match.

如果没有匹配,则返回最短的匹配。

#7


16  

The problem may also be solved in O(n) in worst case with prefix function.

这个问题也可以用O(n)来解决,在最坏的情况下用前缀函数。

Note, it may be slower in general case(UPD: and is much slower) than other solutions which depend on number of divisors of n, but usually find fails sooner, I think one of bad cases for them will be aaa....aab, where there are n - 1 = 2 * 3 * 5 * 7 ... *p_n - 1 a's

注意,它可能在一般情况下慢(乌利希期刊指南:更慢)比其他解决方案,依赖于n的因子,但通常发现早失败,我想为他们糟糕的情况下将aaa ....aab, n - 1 = 2 * 3 * 5 * 7…* p_n - 1 a

First of all you need to calculate prefix function

首先需要计算前缀函数

def prefix_function(s):
    n = len(s)
    pi = [0] * n
    for i in xrange(1, n):
        j = pi[i - 1]
        while(j > 0 and s[i] != s[j]):
            j = pi[j - 1]
        if (s[i] == s[j]):
            j += 1
        pi[i] = j;
    return pi

then either there's no answer or the shortest period is

要么没有答案,要么就是最短的周期。

k = len(s) - prefix_function(s[-1])

and you just have to check if k != n and n % k == 0 (if k != n and n % k == 0 then answer is s[:k], else there's no answer

你只需要检查k != n和n % k是否= 0(如果k != n和n % k = 0,那么答案是s[:k],否则就没有答案

You may check the proof here (in Russian, but online translator will probably do the trick)

你可以在这里查看证明(用俄语,但在线翻译可能会有帮助)

def riad(s):
    n = len(s)
    pi = [0] * n
    for i in xrange(1, n):
        j = pi[i - 1]
        while(j > 0 and s[i] != s[j]):
            j = pi[j - 1]
        if (s[i] == s[j]):
            j += 1
        pi[i] = j;
    k = n - pi[-1]
    return s[:k] if (n != k and n % k == 0) else None

#8


16  

This version tries only those candidate sequence lengths that are factors of the string length; and uses the * operator to build a full-length string from the candidate sequence:

这个版本只尝试那些作为字符串长度因子的候选序列长度;并使用*操作符从候选序列中构建一个全长字符串:

def get_shortest_repeat(string):
    length = len(string)
    for i in range(1, length // 2 + 1):
        if length % i:  # skip non-factors early
            continue

        candidate = string[:i]
        if string == candidate * (length // i):
            return candidate

    return None

Thanks to TigerhawkT3 for noticing that length // 2 without + 1 would fail to match the abab case.

感谢TigerhawkT3注意到长度// 2没有+ 1将无法匹配abab案例。

#9


15  

Here's a straight forward solution, without regexes.

这里有一个直截了当的解决方案,没有regex。

For substrings of s starting from zeroth index, of lengths 1 through len(s), check if that substring, substr is the repeated pattern. This check can be performed by concatenating substr with itself ratio times, such that the length of the string thus formed is equal to the length of s. Hence ratio=len(s)/len(substr).

对于从零索引开始的s的子字符串,长度为1到len(s),检查该子字符串是否为重复模式。这个检查可以通过将子str与自身的比值乘以串联来执行,这样形成的字符串的长度就等于s的长度。因此ratio=len(s)/len(substr)。

Return when first such substring is found. This would provide the smallest possible substring, if one exists.

找到第一个子字符串时返回。如果存在子字符串,这将提供最小的子字符串。

def check_repeat(s):
    for i in range(1, len(s)):
        substr = s[:i]
        ratio = len(s)/len(substr)
        if substr * ratio == s:
            print 'Repeating on "%s"' % substr
            return
    print 'Non repeating'

>>> check_repeat('254725472547')
Repeating on "2547"
>>> check_repeat('abcdeabcdeabcdeabcde')
Repeating on "abcde"

#10


9  

I started with more than eight solutions to this problem. Some were bases on regex (match, findall, split), some of string slicing and testing, and some with string methods (find, count, split). Each had benefits in code clarity, code size, speed and memory consumption. I was going to post my answer here when I noticed that execution speed was ranked as important, so I did more testing and improvement to arrive at this:

对于这个问题,我一开始就提出了八种以上的解决方案。有些基于regex (match、findall、split),有些基于字符串切片和测试,有些基于字符串方法(find、count、split)。它们在代码清晰、代码大小、速度和内存消耗方面都有好处。当我注意到执行速度被列为重要的时候,我会在这里发布我的答案,所以我做了更多的测试和改进来达到这个目的:

def repeating(s):
    size = len(s)
    incr = size % 2 + 1
    for n in xrange(1, size//2+1, incr):
        if size % n == 0:
            if s[:n] * (size//n) == s:
                return s[:n]

This answer seems similar to a few other answers here, but it has a few speed optimisations others have not used:

这个答案似乎与这里的一些其他答案相似,但它有一些其他人没有使用的速度优化:

  • xrange is a little faster in this application,
  • xrange在这个应用程序中稍微快一些,
  • if an input string is an odd length, do not check any even length substrings,
  • 如果输入字符串是奇数长度,不要检查任何偶数长度的子字符串,
  • by using s[:n] directly, we avoid creating a variable in each loop.
  • 通过直接使用s[:n],我们避免在每个循环中创建一个变量。

I would be interested to see how this performs in the standard tests with common hardware. I believe it will be well short of David Zhang's excellent algorithm in most tests, but should be quite fast otherwise.

我很想看看在普通硬件的标准测试中它是如何执行的。我相信,在大多数测试中,它将远远低于张大卫的优秀算法,但在其他测试中应该会很快。

I found this problem to be very counter-intuitive. The solutions I thought would be fast were slow. The solutions that looked slow were fast! It seems that Python's string creation with the multiply operator and string comparisons are highly optimised.

我发现这个问题非常反直觉。我认为很快的解决方案是缓慢的。看起来慢的解决方案很快!看起来Python的字符串创建与乘法运算符和字符串比较是高度优化的。

#11


2  

This function runs very quickly (tested and it's over 3 times faster than fastest solution here on strings with over 100k characters and the difference gets bigger the longer the repeating pattern is). It tries to minimise the number of comparisons needed to get the answer:

这个函数运行得非常快(经过测试,对于超过100k个字符的字符串,它比最快的解决方案快3倍多,重复模式越长,差异就越大)。它试图将得到答案所需的比较数量最小化:

def repeats(string):
    n = len(string)
    tried = set([])
    best = None
    nums = [i for i in  xrange(2, int(n**0.5) + 1) if n % i == 0]
    nums = [n/i for i in nums if n/i!=i] + list(reversed(nums)) + [1]
    for s in nums:
        if all(t%s for t in tried):
            print 'Trying repeating string of length:', s
            if string[:s]*(n/s)==string:
                best = s
            else:
                tried.add(s)
    if best:
        return string[:best]

Note that for example for string of length 8 it checks only fragment of size 4 and it does not have to test further because pattern of length 1 or 2 would result in repeating pattern of length 4:

注意,对于长度为8的字符串,它只检查大小为4的片段,不需要进一步测试,因为长度为1或2的模式会导致长度为4的重复模式:

>>> repeats('12345678')
Trying repeating string of length: 4
None

# for this one we need only 2 checks 
>>> repeats('1234567812345678')
Trying repeating string of length: 8
Trying repeating string of length: 4
'12345678'

#12


1  

In David Zhang's answer if we have some sort of circular buffer this will not work: principal_period('6210045662100456621004566210045662100456621') due to the starting 621, where I would have liked it to spit out: 00456621.

在David Zhang的回答中,如果我们有某种循环缓冲区,这将不起作用:principal_period(' 62100456621006621006621006621006610066210066100662100662100456621006666 '),因为开始的621,我希望它能吐出:0045666621。

Extending his solution we can use the following:

扩展他的解决方案,我们可以使用以下方法:

def principal_period(s):
    for j in range(int(len(s)/2)):
        idx = (s[j:]+s[j:]).find(s[j:], 1, -1)
        if idx != -1:
            # Make sure that the first substring is part of pattern
            if s[:j] == s[j:][:idx][-j:]:
                break

    return None if idx == -1 else s[j:][:idx]

principal_period('6210045662100456621004566210045662100456621')
>>> '00456621'

#13


-1  

Here is the code in python that checks for repetition of sub string in the main string given by the user.

这是python中的代码,用于检查用户给定的主字符串中的子字符串的重复。

print "Enter a string...."
#mainstring = String given by user
mainstring=raw_input(">")
if(mainstring==''):
    print "Invalid string"
    exit()
#charlist = Character list of mainstring
charlist=list(mainstring)
strarr=''
print "Length of your string :",len(mainstring)
for i in range(0,len(mainstring)):
    strarr=strarr+charlist[i]
    splitlist=mainstring.split(strarr)
    count = 0
    for j in splitlist:
        if j =='':
            count+=1
    if count == len(splitlist):
        break
if count == len(splitlist):
    if count == 2:
        print "No repeating Sub-String found in string %r"%(mainstring)

    else:
        print "Sub-String %r repeats in string %r"%(strarr,mainstring)
else :
    print "No repeating Sub-String found in string %r"%(mainstring)

Input:

输入:

0045662100456621004566210045662100456621

0045662100456621004566210045662100456621

Output :

输出:

Length of your string : 40

弦的长度:40

Sub-String '00456621' repeats in string '0045662100456621004566210045662100456621'

子串'00456621'在字符串'0045662100456621004566210045662100456621'中重复

Input :

输入:

004608294930875576036866359447

004608294930875576036866359447

Output:

输出:

Length of your string : 30

弦的长度:30。

No repeating Sub-String found in string '004608294930875576036866359447'

在字符串'004608294930875576036866359447'中没有发现重复子字符串

#1


558  

Here's a concise solution which avoids regular expressions and slow in-Python loops:

这里有一个简洁的解决方案,它避免了正则表达式和python中的缓慢循环:

def principal_period(s):
    i = (s+s).find(s, 1, -1)
    return None if i == -1 else s[:i]

See the Community Wiki answer started by @davidism for benchmark results. In summary,

参见由@davidism results启动的社区Wiki答复。总之,

David Zhang's solution is the clear winner, outperforming all others by at least 5x for the large example set.

张大卫的解决方案显然是赢家,在这个大的示例集上,他的表现至少比其他所有人高出5倍。

(That answer's words, not mine.)

(那个答案是话,不是我的。)

This is based on the observation that a string is periodic if and only if it is equal to a nontrivial rotation of itself. Kudos to @AleksiTorhamo for realizing that we can then recover the principal period from the index of the first occurrence of s in (s+s)[1:-1], and for informing me of the optional start and end arguments of Python's string.find.

这是基于观察到一个弦是周期性的当且仅当它等于它自身的非平凡旋转时。感谢@AleksiTorhamo意识到我们可以从(s+s)中第一次出现s的索引中恢复主周期[1:-1],并告诉我Python的string.find的可选的开始和结束参数。

#2


180  

Here's a solution using regular expressions.

这里有一个使用正则表达式的解决方案。

import re

REPEATER = re.compile(r"(.+?)\1+$")

def repeated(s):
    match = REPEATER.match(s)
    return match.group(1) if match else None

Iterating over the examples in the question:

重复问题中的例子:

examples = [
    '0045662100456621004566210045662100456621',
    '0072992700729927007299270072992700729927',
    '001443001443001443001443001443001443001443',
    '037037037037037037037037037037037037037037037',
    '047619047619047619047619047619047619047619',
    '002457002457002457002457002457002457002457',
    '001221001221001221001221001221001221001221',
    '001230012300123001230012300123001230012300123',
    '0013947001394700139470013947001394700139470013947',
    '001001001001001001001001001001001001001001001001001',
    '001406469760900140646976090014064697609',
    '004608294930875576036866359447',
    '00469483568075117370892018779342723',
    '004739336492890995260663507109',
    '001508295625942684766214177978883861236802413273',
    '007518796992481203',
    '0071942446043165467625899280575539568345323741',
    '0434782608695652173913',
    '0344827586206896551724137931',
    '002481389578163771712158808933',
    '002932551319648093841642228739',
    '0035587188612099644128113879',
    '003484320557491289198606271777',
    '00115074798619102416570771',
]

for e in examples:
    sub = repeated(e)
    if sub:
        print("%r: %r" % (e, sub))
    else:
        print("%r does not repeat." % e)

... produces this output:

…产生该输出:

'0045662100456621004566210045662100456621': '00456621'
'0072992700729927007299270072992700729927': '00729927'
'001443001443001443001443001443001443001443': '001443'
'037037037037037037037037037037037037037037037': '037'
'047619047619047619047619047619047619047619': '047619'
'002457002457002457002457002457002457002457': '002457'
'001221001221001221001221001221001221001221': '001221'
'001230012300123001230012300123001230012300123': '00123'
'0013947001394700139470013947001394700139470013947': '0013947'
'001001001001001001001001001001001001001001001001001': '001'
'001406469760900140646976090014064697609': '0014064697609'
'004608294930875576036866359447' does not repeat.
'00469483568075117370892018779342723' does not repeat.
'004739336492890995260663507109' does not repeat.
'001508295625942684766214177978883861236802413273' does not repeat.
'007518796992481203' does not repeat.
'0071942446043165467625899280575539568345323741' does not repeat.
'0434782608695652173913' does not repeat.
'0344827586206896551724137931' does not repeat.
'002481389578163771712158808933' does not repeat.
'002932551319648093841642228739' does not repeat.
'0035587188612099644128113879' does not repeat.
'003484320557491289198606271777' does not repeat.
'00115074798619102416570771' does not repeat.

The regular expression (.+?)\1+$ is divided into three parts:

正则表达式(.+?)\1+$分为三个部分:

  1. (.+?) is a matching group containing at least one (but as few as possible) of any character (because +? is non-greedy).

    (.+?)是一个匹配组,包含至少一个(但尽可能少)的任何字符(因为+?是贪婪的,)。

  2. \1+ checks for at least one repetition of the matching group in the first part.

    \1+在第一部分中检查匹配组的至少一次重复。

  3. $ checks for the end of the string, to ensure that there's no extra, non-repeating content after the repeated substrings (and using re.match() ensures that there's no non-repeating text before the repeated substrings).

    $检查字符串的末尾,以确保在重复的子字符串之后没有额外的、非重复的内容(并且使用re.match()确保在重复的子字符串之前没有非重复的文本)。

In Python 3.4 and later, you could drop the $ and use re.fullmatch() instead, or (in any Python at least as far back as 2.3) go the other way and use re.search() with the regex ^(.+?)\1+$, all of which are more down to personal taste than anything else.

在Python 3.4和以后,你可以把美元和使用re.fullmatch()相反,或者(在任何Python至少早在2.3)采取其他方法和使用re.search()的regex ^(. + ?)\ 1 +美元,所有这一切更比其他任何个人品味。

#3


90  

You can make the observation that for a string to be considered repeating, its length must be divisible by the length of its repeated sequence. Given that, here is a solution that generates divisors of the length from 1 to n / 2 inclusive, divides the original string into substrings with the length of the divisors, and tests the equality of the result set:

你可以观察到,对于一个被认为是重复的字符串,它的长度必须被它的重复序列的长度整除。考虑到这一点,这里有一个解决方案,生成长度从1到n / 2的除数,将原始字符串分割成具有除数长度的子字符串,并测试结果集的相等性:

from math import sqrt, floor

def divquot(n):
    if n > 1:
        yield 1, n
    swapped = []
    for d in range(2, int(floor(sqrt(n))) + 1):
        q, r = divmod(n, d)
        if r == 0:
            yield d, q
            swapped.append((q, d))
    while swapped:
        yield swapped.pop()

def repeats(s):
    n = len(s)
    for d, q in divquot(n):
        sl = s[0:d]
        if sl * q == s:
            return sl
    return None

EDIT: In Python 3, the / operator has changed to do float division by default. To get the int division from Python 2, you can use the // operator instead. Thank you to @TigerhawkT3 for bringing this to my attention.

编辑:在Python 3中,/操作符已经默认更改为浮动除法。要从Python 2中获得int分割,可以使用//运算符。谢谢@TigerhawkT3提醒我。

The // operator performs integer division in both Python 2 and Python 3, so I've updated the answer to support both versions. The part where we test to see if all the substrings are equal is now a short-circuiting operation using all and a generator expression.

/运算符在Python 2和Python 3中执行整数除法,因此我更新了答案以支持这两个版本。我们测试所有子字符串是否相等的部分现在是使用all和一个生成器表达式的短路操作。

UPDATE: In response to a change in the original question, the code has now been updated to return the smallest repeating substring if it exists and None if it does not. @godlygeek has suggested using divmod to reduce the number of iterations on the divisors generator, and the code has been updated to match that as well. It now returns all positive divisors of n in ascending order, exclusive of n itself.

更新:为了响应原始问题中的更改,现在已经更新了代码,如果存在最小的重复子字符串,则返回no。@godlygeek建议使用divmod减少除数生成器上的迭代次数,并更新了代码以与之匹配。它现在以升序返回n的所有正因数,不包括n本身。

Further update for high performance: After multiple tests, I've come to the conclusion that simply testing for string equality has the best performance out of any slicing or iterator solution in Python. Thus, I've taken a leaf out of @TigerhawkT3 's book and updated my solution. It's now over 6x as fast as before, noticably faster than Tigerhawk's solution but slower than David's.

进一步提高性能:在多次测试之后,我得出结论,简单地测试字符串相等性,在Python中的任何切片或迭代器解决方案中都有最好的性能。因此,我从@TigerhawkT3的书中删除了一页,并更新了我的解决方案。它现在比以前快了6倍,比泰格霍克的方案快得多,但比大卫的要慢。

#4


84  

Here are some benchmarks for the various answers to this question. There were some surprising results, including wildly different performance depending on the string being tested.

以下是对这个问题的各种回答的一些基准。有一些令人惊讶的结果,包括根据所测试的字符串有很大不同的性能。

Some functions were modified to work with Python 3 (mainly by replacing / with // to ensure integer division). If you see something wrong, want to add your function, or want to add another test string, ping @ZeroPiraeus in the Python chatroom.

一些函数被修改为使用Python 3(主要是通过替换/ with /来确保整数除法)。如果发现问题,希望添加函数,或者希望在Python聊天室中添加另一个测试字符串ping @ZeroPiraeus。

In summary: there's about a 50x difference between the best- and worst-performing solutions for the large set of example data supplied by OP here (via this comment). David Zhang's solution is the clear winner, outperforming all others by around 5x for the large example set.

总之:对于OP提供的大量示例数据(通过此注释),最好的和最差的解决方案之间的差异是50x。David Zhang的解决方案显然是赢家,在这个大的示例集上,其表现要比其他所有人高出5倍左右。

A couple of the answers are very slow in extremely large "no match" cases. Otherwise, the functions seem to be equally matched or clear winners depending on the test.

在非常大的“无匹配”情况下,有几个答案非常慢。否则,根据测试,函数似乎是相等的或明确的赢家。

Here are the results, including plots made using matplotlib and seaborn to show the different distributions:

这里是结果,包括使用matplotlib和seaborn进行的图,以显示不同的发行版:


Corpus 1 (supplied examples - small set)

语料库1(提供示例-小集合)

mean performance:
 0.0003  david_zhang
 0.0009  zero
 0.0013  antti
 0.0013  tigerhawk_2
 0.0015  carpetpython
 0.0029  tigerhawk_1
 0.0031  davidism
 0.0035  saksham
 0.0046  shashank
 0.0052  riad
 0.0056  piotr

median performance:
 0.0003  david_zhang
 0.0008  zero
 0.0013  antti
 0.0013  tigerhawk_2
 0.0014  carpetpython
 0.0027  tigerhawk_1
 0.0031  davidism
 0.0038  saksham
 0.0044  shashank
 0.0054  riad
 0.0058  piotr

如何判断字符串是否在Python中重复出现?


Corpus 2 (supplied examples - large set)

语料库2(提供示例-大集合)

mean performance:
 0.0006  david_zhang
 0.0036  tigerhawk_2
 0.0036  antti
 0.0037  zero
 0.0039  carpetpython
 0.0052  shashank
 0.0056  piotr
 0.0066  davidism
 0.0120  tigerhawk_1
 0.0177  riad
 0.0283  saksham

median performance:
 0.0004  david_zhang
 0.0018  zero
 0.0022  tigerhawk_2
 0.0022  antti
 0.0024  carpetpython
 0.0043  davidism
 0.0049  shashank
 0.0055  piotr
 0.0061  tigerhawk_1
 0.0077  riad
 0.0109  saksham

如何判断字符串是否在Python中重复出现?


Corpus 3 (edge cases)

语料库3(边界情况)

mean performance:
 0.0123  shashank
 0.0375  david_zhang
 0.0376  piotr
 0.0394  carpetpython
 0.0479  antti
 0.0488  tigerhawk_2
 0.2269  tigerhawk_1
 0.2336  davidism
 0.7239  saksham
 3.6265  zero
 6.0111  riad

median performance:
 0.0107  tigerhawk_2
 0.0108  antti
 0.0109  carpetpython
 0.0135  david_zhang
 0.0137  tigerhawk_1
 0.0150  shashank
 0.0229  saksham
 0.0255  piotr
 0.0721  davidism
 0.1080  zero
 1.8539  riad

如何判断字符串是否在Python中重复出现?


The tests and raw results are available here.

测试和原始结果在这里可用。

#5


37  

Non-regex solution:

Non-regex解决方案:

def repeat(string):
    for i in range(1, len(string)//2+1):
        if not len(string)%len(string[0:i]) and string[0:i]*(len(string)//len(string[0:i])) == string:
            return string[0:i]

Faster non-regex solution, thanks to @ThatWeirdo (see comments):

更快的非regex解决方案,感谢@ThatWeirdo(参见注释):

def repeat(string):
    l = len(string)
    for i in range(1, len(string)//2+1):
        if l%i: continue
        s = string[0:i]
        if s*(l//i) == string:
            return s

The above solution is very rarely slower than the original by a few percent, but it's usually a good bit faster - sometimes a whole lot faster. It's still not faster than davidism's for longer strings, and zero's regex solution is superior for short strings. It comes out to the fastest (according to davidism's test on github - see his answer) with strings of about 1000-1500 characters. Regardless, it's reliably second-fastest (or better) in all cases I tested. Thanks, ThatWeirdo.

上面的解决方案很少比原来的方案慢几个百分点,但是它通常会快一点——有时会快很多。对于长字符串,它仍然不会比戴维德式更快,而对于短字符串,zero的正则解更优越。它以大约1000-1500个字符组成的字符串(根据戴维德对github的测试——见他的答案)获得了最快的结果。无论如何,它在我测试的所有情况下都是可靠的第二快(或更好)。谢谢,ThatWeirdo。

Test:

测试:

print(repeat('009009009'))
print(repeat('254725472547'))
print(repeat('abcdeabcdeabcdeabcde'))
print(repeat('abcdefg'))
print(repeat('09099099909999'))
print(repeat('02589675192'))

Results:

结果:

009
2547
abcde
None
None
None

#6


23  

First, halve the string as long as it's a "2 part" duplicate. This reduces the search space if there are an even number of repeats. Then, working forwards to find the smallest repeating string, check if splitting the full string by increasingly larger sub-string results in only empty values. Only sub-strings up to length // 2 need to be tested since anything over that would have no repeats.

首先,只要字符串是“2部分”重复,就将其减半。如果重复次数为偶数,这将减少搜索空间。然后,通过工作转发来查找最小的重复字符串,检查是否通过越来越大的子字符串将整个字符串拆分为空值。只有长度为// 2的子字符串需要进行测试,因为任何超过该长度的字符串都不会重复。

def shortest_repeat(orig_value):
    if not orig_value:
        return None

    value = orig_value

    while True:
        len_half = len(value) // 2
        first_half = value[:len_half]

        if first_half != value[len_half:]:
            break

        value = first_half

    len_value = len(value)
    split = value.split

    for i in (i for i in range(1, len_value // 2) if len_value % i == 0):
        if not any(split(value[:i])):
            return value[:i]

    return value if value != orig_value else None

This returns the shortest match or None if there is no match.

如果没有匹配,则返回最短的匹配。

#7


16  

The problem may also be solved in O(n) in worst case with prefix function.

这个问题也可以用O(n)来解决,在最坏的情况下用前缀函数。

Note, it may be slower in general case(UPD: and is much slower) than other solutions which depend on number of divisors of n, but usually find fails sooner, I think one of bad cases for them will be aaa....aab, where there are n - 1 = 2 * 3 * 5 * 7 ... *p_n - 1 a's

注意,它可能在一般情况下慢(乌利希期刊指南:更慢)比其他解决方案,依赖于n的因子,但通常发现早失败,我想为他们糟糕的情况下将aaa ....aab, n - 1 = 2 * 3 * 5 * 7…* p_n - 1 a

First of all you need to calculate prefix function

首先需要计算前缀函数

def prefix_function(s):
    n = len(s)
    pi = [0] * n
    for i in xrange(1, n):
        j = pi[i - 1]
        while(j > 0 and s[i] != s[j]):
            j = pi[j - 1]
        if (s[i] == s[j]):
            j += 1
        pi[i] = j;
    return pi

then either there's no answer or the shortest period is

要么没有答案,要么就是最短的周期。

k = len(s) - prefix_function(s[-1])

and you just have to check if k != n and n % k == 0 (if k != n and n % k == 0 then answer is s[:k], else there's no answer

你只需要检查k != n和n % k是否= 0(如果k != n和n % k = 0,那么答案是s[:k],否则就没有答案

You may check the proof here (in Russian, but online translator will probably do the trick)

你可以在这里查看证明(用俄语,但在线翻译可能会有帮助)

def riad(s):
    n = len(s)
    pi = [0] * n
    for i in xrange(1, n):
        j = pi[i - 1]
        while(j > 0 and s[i] != s[j]):
            j = pi[j - 1]
        if (s[i] == s[j]):
            j += 1
        pi[i] = j;
    k = n - pi[-1]
    return s[:k] if (n != k and n % k == 0) else None

#8


16  

This version tries only those candidate sequence lengths that are factors of the string length; and uses the * operator to build a full-length string from the candidate sequence:

这个版本只尝试那些作为字符串长度因子的候选序列长度;并使用*操作符从候选序列中构建一个全长字符串:

def get_shortest_repeat(string):
    length = len(string)
    for i in range(1, length // 2 + 1):
        if length % i:  # skip non-factors early
            continue

        candidate = string[:i]
        if string == candidate * (length // i):
            return candidate

    return None

Thanks to TigerhawkT3 for noticing that length // 2 without + 1 would fail to match the abab case.

感谢TigerhawkT3注意到长度// 2没有+ 1将无法匹配abab案例。

#9


15  

Here's a straight forward solution, without regexes.

这里有一个直截了当的解决方案,没有regex。

For substrings of s starting from zeroth index, of lengths 1 through len(s), check if that substring, substr is the repeated pattern. This check can be performed by concatenating substr with itself ratio times, such that the length of the string thus formed is equal to the length of s. Hence ratio=len(s)/len(substr).

对于从零索引开始的s的子字符串,长度为1到len(s),检查该子字符串是否为重复模式。这个检查可以通过将子str与自身的比值乘以串联来执行,这样形成的字符串的长度就等于s的长度。因此ratio=len(s)/len(substr)。

Return when first such substring is found. This would provide the smallest possible substring, if one exists.

找到第一个子字符串时返回。如果存在子字符串,这将提供最小的子字符串。

def check_repeat(s):
    for i in range(1, len(s)):
        substr = s[:i]
        ratio = len(s)/len(substr)
        if substr * ratio == s:
            print 'Repeating on "%s"' % substr
            return
    print 'Non repeating'

>>> check_repeat('254725472547')
Repeating on "2547"
>>> check_repeat('abcdeabcdeabcdeabcde')
Repeating on "abcde"

#10


9  

I started with more than eight solutions to this problem. Some were bases on regex (match, findall, split), some of string slicing and testing, and some with string methods (find, count, split). Each had benefits in code clarity, code size, speed and memory consumption. I was going to post my answer here when I noticed that execution speed was ranked as important, so I did more testing and improvement to arrive at this:

对于这个问题,我一开始就提出了八种以上的解决方案。有些基于regex (match、findall、split),有些基于字符串切片和测试,有些基于字符串方法(find、count、split)。它们在代码清晰、代码大小、速度和内存消耗方面都有好处。当我注意到执行速度被列为重要的时候,我会在这里发布我的答案,所以我做了更多的测试和改进来达到这个目的:

def repeating(s):
    size = len(s)
    incr = size % 2 + 1
    for n in xrange(1, size//2+1, incr):
        if size % n == 0:
            if s[:n] * (size//n) == s:
                return s[:n]

This answer seems similar to a few other answers here, but it has a few speed optimisations others have not used:

这个答案似乎与这里的一些其他答案相似,但它有一些其他人没有使用的速度优化:

  • xrange is a little faster in this application,
  • xrange在这个应用程序中稍微快一些,
  • if an input string is an odd length, do not check any even length substrings,
  • 如果输入字符串是奇数长度,不要检查任何偶数长度的子字符串,
  • by using s[:n] directly, we avoid creating a variable in each loop.
  • 通过直接使用s[:n],我们避免在每个循环中创建一个变量。

I would be interested to see how this performs in the standard tests with common hardware. I believe it will be well short of David Zhang's excellent algorithm in most tests, but should be quite fast otherwise.

我很想看看在普通硬件的标准测试中它是如何执行的。我相信,在大多数测试中,它将远远低于张大卫的优秀算法,但在其他测试中应该会很快。

I found this problem to be very counter-intuitive. The solutions I thought would be fast were slow. The solutions that looked slow were fast! It seems that Python's string creation with the multiply operator and string comparisons are highly optimised.

我发现这个问题非常反直觉。我认为很快的解决方案是缓慢的。看起来慢的解决方案很快!看起来Python的字符串创建与乘法运算符和字符串比较是高度优化的。

#11


2  

This function runs very quickly (tested and it's over 3 times faster than fastest solution here on strings with over 100k characters and the difference gets bigger the longer the repeating pattern is). It tries to minimise the number of comparisons needed to get the answer:

这个函数运行得非常快(经过测试,对于超过100k个字符的字符串,它比最快的解决方案快3倍多,重复模式越长,差异就越大)。它试图将得到答案所需的比较数量最小化:

def repeats(string):
    n = len(string)
    tried = set([])
    best = None
    nums = [i for i in  xrange(2, int(n**0.5) + 1) if n % i == 0]
    nums = [n/i for i in nums if n/i!=i] + list(reversed(nums)) + [1]
    for s in nums:
        if all(t%s for t in tried):
            print 'Trying repeating string of length:', s
            if string[:s]*(n/s)==string:
                best = s
            else:
                tried.add(s)
    if best:
        return string[:best]

Note that for example for string of length 8 it checks only fragment of size 4 and it does not have to test further because pattern of length 1 or 2 would result in repeating pattern of length 4:

注意,对于长度为8的字符串,它只检查大小为4的片段,不需要进一步测试,因为长度为1或2的模式会导致长度为4的重复模式:

>>> repeats('12345678')
Trying repeating string of length: 4
None

# for this one we need only 2 checks 
>>> repeats('1234567812345678')
Trying repeating string of length: 8
Trying repeating string of length: 4
'12345678'

#12


1  

In David Zhang's answer if we have some sort of circular buffer this will not work: principal_period('6210045662100456621004566210045662100456621') due to the starting 621, where I would have liked it to spit out: 00456621.

在David Zhang的回答中,如果我们有某种循环缓冲区,这将不起作用:principal_period(' 62100456621006621006621006621006610066210066100662100662100456621006666 '),因为开始的621,我希望它能吐出:0045666621。

Extending his solution we can use the following:

扩展他的解决方案,我们可以使用以下方法:

def principal_period(s):
    for j in range(int(len(s)/2)):
        idx = (s[j:]+s[j:]).find(s[j:], 1, -1)
        if idx != -1:
            # Make sure that the first substring is part of pattern
            if s[:j] == s[j:][:idx][-j:]:
                break

    return None if idx == -1 else s[j:][:idx]

principal_period('6210045662100456621004566210045662100456621')
>>> '00456621'

#13


-1  

Here is the code in python that checks for repetition of sub string in the main string given by the user.

这是python中的代码,用于检查用户给定的主字符串中的子字符串的重复。

print "Enter a string...."
#mainstring = String given by user
mainstring=raw_input(">")
if(mainstring==''):
    print "Invalid string"
    exit()
#charlist = Character list of mainstring
charlist=list(mainstring)
strarr=''
print "Length of your string :",len(mainstring)
for i in range(0,len(mainstring)):
    strarr=strarr+charlist[i]
    splitlist=mainstring.split(strarr)
    count = 0
    for j in splitlist:
        if j =='':
            count+=1
    if count == len(splitlist):
        break
if count == len(splitlist):
    if count == 2:
        print "No repeating Sub-String found in string %r"%(mainstring)

    else:
        print "Sub-String %r repeats in string %r"%(strarr,mainstring)
else :
    print "No repeating Sub-String found in string %r"%(mainstring)

Input:

输入:

0045662100456621004566210045662100456621

0045662100456621004566210045662100456621

Output :

输出:

Length of your string : 40

弦的长度:40

Sub-String '00456621' repeats in string '0045662100456621004566210045662100456621'

子串'00456621'在字符串'0045662100456621004566210045662100456621'中重复

Input :

输入:

004608294930875576036866359447

004608294930875576036866359447

Output:

输出:

Length of your string : 30

弦的长度:30。

No repeating Sub-String found in string '004608294930875576036866359447'

在字符串'004608294930875576036866359447'中没有发现重复子字符串