检查两个字符串是否是Python中彼此的排列

时间:2022-01-29 11:29:14

I'm checking if two strings a and b are permutations of each other, and I'm wondering what the ideal way to do this is in Python. From the Zen of Python, "There should be one -- and preferably only one -- obvious way to do it," but I see there are at least two ways:

我正在检查两个字符串a和b是否是彼此的排列,我想知道在Python中执行此操作的理想方法是什么。从Python的禅宗,“应该有一个 - 最好只有一个 - 显而易见的方法”,但我发现至少有两种方式:

sorted(a) == sorted(b)

and

all(a.count(char) == b.count(char) for char in a)

but the first one is slower when (for example) the first char of a is nowhere in b, and the second is slower when they are actually permutations.

但是第一个是慢的时候(例如)a的第一个字符在b中没有,而第二个字符在它们实际上是排列时更慢。

Is there any better (either in the sense of more Pythonic, or in the sense of faster on average) way to do it? Or should I just choose from these two depending on which situation I expect to be most common?

有更好的方法(无论是更多的Pythonic,还是平均更快的意义上)的方式呢?或者我应该从这两个中选择,具体取决于我期望最常见的情况?

21 个解决方案

#1


heuristically you're probably better to split them off based on string size.

启发式地,你可能最好根据字符串大小将它们分开。

Pseudocode:

returnvalue = false
if len(a) == len(b)
   if len(a) < threshold
      returnvalue = (sorted(a) == sorted(b))
   else
       returnvalue = naminsmethod(a, b)
return returnvalue

If performance is critical, and string size can be large or small then this is what I'd do.

如果性能至关重要,字符串大小可以大或小,那么这就是我要做的。

It's pretty common to split things like this based on input size or type. Algorithms have different strengths or weaknesses and it would be foolish to use one where another would be better... In this case Namin's method is O(n), but has a larger constant factor than the O(n log n) sorted method.

根据输入大小或类型分割这样的东西是很常见的。算法有不同的优点或缺点,使用另一个更好的方法是愚蠢的......在这种情况下,Namin的方法是O(n),但是具有比O(n log n)排序方法更大的常数因子。

#2


Here is a way which is O(n), asymptotically better than the two ways you suggest.

这是一种O(n)的方法,渐近地比你建议的两种方式更好。

import collections

def same_permutation(a, b):
    d = collections.defaultdict(int)
    for x in a:
        d[x] += 1
    for x in b:
        d[x] -= 1
    return not any(d.itervalues())

## same_permutation([1,2,3],[2,3,1])
#. True

## same_permutation([1,2,3],[2,3,1,1])
#. False

#3


"but the first one is slower when (for example) the first char of a is nowhere in b".

“但是,当(例如)a的第一个字符在b中无处时,第一个更慢。”

This kind of degenerate-case performance analysis is not a good idea. It's a rat-hole of lost time thinking up all kinds of obscure special cases.

这种退化案例的性能分析并不是一个好主意。想到各种晦涩的特殊情况,这是一个失去时间的老鼠洞。

Only do the O-style "overall" analysis.

只进行O型“整体”分析。

Overall, the sorts are O( n log( n ) ).

总的来说,排序是O(n log(n))。

The a.count(char) for char in a solution is O( n 2 ). Each count pass is a full examination of the string.

解决方案中char的a.count(char)是O(n 2)。每个计数通过都是对字符串的全面检查。

If some obscure special case happens to be faster -- or slower, that's possibly interesting. But it only matters when you know the frequency of your obscure special cases. When analyzing sort algorithms, it's important to note that a fair number of sorts involve data that's already in the proper order (either by luck or by a clever design), so sort performance on pre-sorted data matters.

如果一些不起眼的特殊情况碰巧更快 - 或更慢,那可能很有趣。但只有当你知道你不明显的特殊情况的频率时才重要。在分析排序算法时,重要的是要注意相当多的排序涉及已经按正确顺序排列的数据(通过运气或巧妙的设计),因此对预排序数据的排序性能很重要。

In your obscure special case ("the first char of a is nowhere in b") is this frequent enough to matter? If it's just a special case you thought of, set it aside. If it's a fact about your data, then consider it.

在你不起眼的特殊情况下(“a的第一个字母在b中无处可去”)这是否经常发生?如果它只是你想到的特殊情况,请把它放在一边。如果这是关于您的数据的事实,那么请考虑它。

#4


I think the first one is the "obvious" way. It is shorter, clearer, and likely to be faster in many cases because Python's built-in sort is highly optimized.

我认为第一个是“明显的”方式。它更短,更清晰,并且在许多情况下可能更快,因为Python的内置排序是高度优化的。

#5


Your second example won't actually work:

你的第二个例子实际上不会起作用:

all(a.count(char) == b.count(char) for char in a)

will only work if b does not contain extra characters not in a. It also does duplicate work if the characters in string a repeat.

仅当b不包含不在a中的额外字符时才会起作用。如果字符串中的字符重复,它也会重复工作。

If you want to know whether two strings are permutations of the same unique characters, just do:

如果您想知道两个字符串是否是相同唯一字符的排列,只需执行以下操作:

set(a) == set(b)

To correct your second example:

要纠正你的第二个例子:

all(str1.count(char) == str2.count(char) for char in set(a) | set(b))

set() objects overload the bitwise OR operator so that it will evaluate to the union of both sets. This will make sure that you will loop over all the characters of both strings once for each character only.

set()对象重载按位OR运算符,以便它将计算为两个集合的并集。这将确保您将仅为每个字符循环两个字符串的所有字符。

That said, the sorted() method is much simpler and more intuitive, and would be what I would use.

也就是说,sorted()方法更简单,更直观,而且我会使用它。

#6


Here are some timed executions on very small strings, using two different methods:
1. sorting
2. counting (specifically the original method by @namin).

下面是一些非常小的字符串的定时执行,使用两种不同的方法:1。排序2.计数(特别是@namin的原始方法)。

a, b, c = 'confused', 'unfocused', 'foncused'

sort_method = lambda x,y: sorted(x) == sorted(y)

def count_method(a, b):
    d = {}
    for x in a:
        d[x] = d.get(x, 0) + 1
    for x in b:
        d[x] = d.get(x, 0) - 1
    for v in d.itervalues():
        if v != 0:
            return False
    return True

Average run times of the 2 methods over 100,000 loops are:

超过100,000个循环的2种方法的平均运行时间为:

non-match (string a and b)

不匹配(字符串a和b)

$ python -m timeit -s 'import temp' 'temp.sort_method(temp.a, temp.b)'
100000 loops, best of 3: 9.72 usec per loop
$ python -m timeit -s 'import temp' 'temp.count_method(temp.a, temp.b)'
10000 loops, best of 3: 28.1 usec per loop

match (string a and c)

匹配(字符串a和c)

$ python -m timeit -s 'import temp' 'temp.sort_method(temp.a, temp.c)'
100000 loops, best of 3: 9.47 usec per loop
$ python -m timeit -s 'import temp' 'temp.count_method(temp.a, temp.c)'
100000 loops, best of 3: 24.6 usec per loop

Keep in mind that the strings used are very small. The time complexity of the methods are different, so you'll see different results with very large strings. Choose according to your data, you may even use a combination of the two.

请记住,使用的字符串非常小。方法的时间复杂度不同,因此您会看到非常大的字符串的不同结果。根据您的数据选择,您甚至可以使用两者的组合。

#7


I did a pretty thorough comparison in Java with all words in a book I had. The counting method beats the sorting method in every way. The results:

我在Java中用我所拥有的书中的所有单词进行了非常彻底的比较。计数方法以各种方式击败排序方法。结果:

Testing against 9227 words.

Permutation testing by sorting ... done.        18.582 s
Permutation testing by counting ... done.       14.949 s

If anyone wants the algorithm and test data set, comment away.

如果有人想要算法和测试数据集,请注释掉。

#8


First, for solving such problems, e.g. whether String 1 and String 2 are exactly the same or not, easily, you can use an "if" since it is O(1). Second, it is important to consider that whether they are only numerical values or they can be also words in the string. If the latter one is true (words and numerical values are in the string at the same time), your first solution will not work. You can enhance it by using "ord()" function to make it ASCII numerical value. However, in the end, you are using sort; therefore, in the worst case your time complexity will be O(NlogN). This time complexity is not bad. But, you can do better. You can make it O(N). My "suggestion" is using Array(list) and set at the same time. Note that finding a value in Array needs iteration so it's time complexity is O(N), but searching a value in set (which I guess it is implemented with HashTable in Python, I'm not sure) has O(1) time complexity:

首先,为了解决这些问题,例如无论字符串1和字符串2是否完全相同,您都可以使用“if”,因为它是O(1)。其次,重要的是要考虑它们是否只是数值,还是字符串中的单词。如果后一个为真(单词和数值同时在字符串中),则第一个解决方案将不起作用。您可以使用“ord()”函数来增强它,使其成为ASCII数值。但是,最后,你正在使用sort;因此,在最坏的情况下,您的时间复杂度将为O(NlogN)。这个时间复杂性也不错。但是,你可以做得更好。你可以把它做成O(N)。我的“建议”是使用Array(列表)并同时设置。请注意,在Array中查找值需要迭代,因此它的时间复杂度为O(N),但是在集合中搜索值(我猜它是用Python中的HashTable实现的,我不确定)具有O(1)时间复杂度:

def Permutation2(Str1, Str2):

ArrStr1 = list(Str1) #convert Str1 to array
SetStr2 = set(Str2) #convert Str2 to set

ArrExtra = []

if len(Str1) != len(Str2): #check their length
    return False

elif Str1 == Str2: #check their values
    return True

for x in xrange(len(ArrStr1)):
    ArrExtra.append(ArrStr1[x])

for x in xrange(len(ArrExtra)): #of course len(ArrExtra) == len(ArrStr1) ==len(ArrStr2)
    if ArrExtra[x] in SetStr2: #checking in set is O(1)
        continue
    else:
        return False

return True

#9


Go with the first one - it's much more straightforward and easier to understand. If you're actually dealing with incredibly large strings and performance is a real issue, then don't use Python, use something like C.

与第一个一起 - 它更直接,更容易理解。如果你实际上处理的是非常大的字符串并且性能是一个真正的问题,那么不要使用Python,使用像C这样的东西。

As far as the Zen of Python is concerned, that there should only be one obvious way to do things refers to small, simple things. Obviously for any sufficiently complicated task, there will always be zillions of small variations on ways to do it.

就Python的禅而言,只有一种明显的做事方式是指小而简单的事情。显然,对于任何足够复杂的任务,总会有数以万计的小变化。

#10


Sorry that my code is not in Python, I have never used it, but I am sure this can be easily translated into python. I believe this is faster than all the other examples already posted. It is also O(n), but stops as soon as possible:

很抱歉,我的代码不是Python,我从未使用它,但我确信这可以很容易地转换为python。我相信这比已发布的所有其他示例更快。它也是O(n),但是尽快停止:

public boolean isPermutation(String a, String b) {
    if (a.length() != b.length()) {
        return false;
    }

    int[] charCount = new int[256];
    for (int i = 0; i < a.length(); ++i) {
        ++charCount[a.charAt(i)];
    }

    for (int i = 0; i < b.length(); ++i) {
        if (--charCount[b.charAt(i)] < 0) {
            return false;
        }
    }
    return true;
}

First I don't use a dictionary but an array of size 256 for all the characters. Accessing the index should be much faster. Then when the second string is iterated, I immediately return false when the count gets below 0. When the second loop has finished, you can be sure that the strings are a permutation, because the strings have equal length and no character was used more often in b compared to a.

首先,我不使用字典,而是使用大小为256的数组来表示所有字符。访问索引应该快得多。然后当第二个字符串被迭代时,当计数值低于0时我立即返回false。当第二个循环结束时,你可以确定字符串是一个排列,因为字符串长度相等而且没有更频繁地使用字符在b中与a相比。

#11


Here's martinus code in python. It only works for ascii strings:

这是python中的martinus代码。它仅适用于ascii字符串:

def is_permutation(a, b):
    if len(a) != len(b):
        return False

    char_count = [0] * 256
    for c in a:
        char_count[ord(c)] += 1

    for c in b:
        char_count[ord(c)] -= 1
        if char_count[ord(c)] < 0:
            return False

    return True

#12


In Python 3.1/2.7 you can just use collections.Counter(a) == collections.Counter(b).

在Python 3.1 / 2.7中,您可以使用collections.Counter(a)== collections.Counter(b)。

But sorted(a) == sorted(b) is still the most obvious IMHO. You are talking about permutations - changing order - so sorting is the obvious operation to erase that difference.

但排序(a)==排序(b)仍然是最明显的恕我直言。你在谈论排列 - 改变顺序 - 所以排序是消除这种差异的明显操作。

#13


This is a PHP function I wrote about a week ago which checks if two words are anagrams. How would this compare (if implemented the same in python) to the other methods suggested? Comments?

这是我一周前写的PHP函数,它检查两个单词是否是字谜。如何比较建议的其他方法(如果在python中实现相同)?评论?

public function is_anagram($word1, $word2) {
    $letters1 = str_split($word1);
    $letters2 = str_split($word2);
    if (count($letters1) == count($letters2)) {
        foreach ($letters1 as $letter) {
            $index = array_search($letter, $letters2);
            if ($index !== false) {
                unset($letters2[$index]);
            }
            else { return false; }
        }
        return true;
    }
    return false;        
}

Here's a literal translation to Python of the PHP version (by JFS):

这是PHP版本的Python的字面翻译(由JFS提供):

def is_anagram(word1, word2):
    letters2 = list(word2)
    if len(word1) == len(word2):
       for letter in word1:
           try:
               del letters2[letters2.index(letter)]
           except ValueError:
               return False               
       return True
    return False

Comments:

    1. The algorithm is O(N**2). Compare it to @namin's version (it is O(N)).
    2. The multiple returns in the function look horrible.

#14


This version is faster than any examples presented so far except it is 20% slower than sorted(x) == sorted(y) for short strings. It depends on use cases but generally 20% performance gain is insufficient to justify a complication of the code by using different version for short and long strings (as in @patros's answer).

此版本比目前为止提供的任何示例都要快,除了它比排序(x)==排序(y)短字符串慢20%。这取决于用例,但通常20%的性能增益不足以证明通过对短字符串和长字符串使用不同版本来证明代码的复杂性(如@ patros的答案)。

It doesn't use len so it accepts any iterable therefore it works even for data that do not fit in memory e.g., given two big text files with many repeated lines it answers whether the files have the same lines (lines can be in any order).

它不使用len,因此它接受任何迭代,因此它甚至适用于不适合内存的数据,例如,给定两个带有许多重复行的大文本文件,它可以回答文件是否具有相同的行(行可以按任何顺序排列) )。

def isanagram(iterable1, iterable2):
    d = {}
    get = d.get
    for c in iterable1:
        d[c] = get(c, 0) + 1
    try:
        for c in iterable2:
            d[c] -= 1
        return not any(d.itervalues())
    except KeyError:
        return False

It is unclear why this version is faster then defaultdict (@namin's) one for large iterable1 (tested on 25MB thesaurus).

目前还不清楚为什么这个版本比defaultdict(@namin的)快一个大的iterable1(在25MB词库上测试)。

If we replace get in the loop by try: ... except KeyError then it performs 2 times slower for short strings i.e. when there are few duplicates.

如果我们通过try替换get循环:...除了KeyError之外,它对短字符串执行的速度要慢2倍,即重复次数很少时。

#15


This is derived from @patros' answer.

这来自@patros的回答。

from collections import Counter

def is_anagram(a, b, threshold=1000000):
    """Returns true if one sequence is a permutation of the other.

    Ignores whitespace and character case.
    Compares sorted sequences if the length is below the threshold,
    otherwise compares dictionaries that contain the frequency of the
    elements.
    """
    a, b = a.strip().lower(), b.strip().lower()
    length_a, length_b = len(a), len(b)
    if length_a != length_b:
        return False
    if length_a < threshold:
        return sorted(a) == sorted(b)
    return Counter(a) == Counter(b)  # Or use @namin's method if you don't want to create two dictionaries and don't mind the extra typing.

#16


In Swift (or another languages implementation), you could look at the encoded values ( in this case Unicode) and see if they match.

在Swift(或其他语言实现)中,您可以查看编码值(在本例中为Unicode)并查看它们是否匹配。

Something like:

let string1EncodedValues = "Hello".unicodeScalars.map() {
//each encoded value
$0
//Now add the values
}.reduce(0){ total, value in
    total + value.value
}

let string2EncodedValues = "oellH".unicodeScalars.map() {
    $0
    }.reduce(0) { total, value in
    total + value.value
}

let equalStrings = string1EncodedValues == string2EncodedValues ? true : false

You will need to handle spaces and cases as needed.

您需要根据需要处理空间和案例。

#17


def matchPermutation(s1, s2):
  a = []
  b = []

  if len(s1) != len(s2):
    print 'length should be the same'
  return

  for i in range(len(s1)):
    a.append(s1[i])

  for i in range(len(s2)):
    b.append(s2[i])

  if set(a) == set(b):
    print 'Permutation of each other'
  else:
    print 'Not a permutation of each other'
  return

#matchPermutaion('rav', 'var') #returns True
matchPermutaion('rav', 'abc') #returns False

#18


This is an O(n) solution in Python using hashing with dictionaries. Notice that I don't use default dictionaries because the code can stop this way if we determine the two strings are not permutations after checking the second letter for instance.

这是Python中使用散列哈希的O(n)解决方案。请注意,我不使用默认字典,因为如果我们在检查第二个字母后确定两个字符串不是排列,则代码可以以此方式停止。

def if_two_words_are_permutations(s1, s2):
    if len(s1) != len(s2):
        return False
    dic = {}
for ch in s1:
    if ch in dic.keys():
        dic[ch] += 1
    else:
        dic[ch] = 1
for ch in s2:
    if not ch in dic.keys():
        return False
    elif dic[ch] == 0:
        return False
    else:
        dic[ch] -= 1
return True

#19


Checking if two strings are permutations of each other in Python

# First method
def permutation(s1,s2):
 if len(s1) != len(s2):return False;
 return ' '.join(sorted(s1)) == ' '.join(sorted(s2))


# second method
def permutation1(s1,s2):
 if len(s1) != len(s2):return False;
 array = [0]*128;
 for c in s1:
 array[ord(c)] +=1
 for c in s2:
   array[ord(c)] -=1
   if (array[ord(c)]) < 0:
     return False
 return True

#20


How about something like this. Pretty straight-forward and readable. This is for strings since the as per the OP.

这样的事情怎么样?非常简单易读。这是因为根据OP的字符串。

Given that the complexity of sorted() is O(n log n).

鉴于sorted()的复杂性是O(n log n)。

def checkPermutation(a,b):
    # input: strings a and b
    # return: boolean true if a is Permutation of b

    if len(a) != len(b):
        return False
    else:
        s_a = ''.join(sorted(a))
        s_b = ''.join(sorted(b))
        if s_a == s_b:
            return True
        else:
            return False

# test inputs
a = 'sRF7w0qbGp4fdgEyNlscUFyouETaPHAiQ2WIxzohiafEGJLw03N8ALvqMw6reLN1kHRjDeDausQBEuIWkIBfqUtsaZcPGoqAIkLlugTxjxLhkRvq5d6i55l4oBH1QoaMXHIZC5nA0K5KPBD9uIwa789sP0ZKV4X6'
b = 'Vq3EeiLGfsAOH2PW6skMN8mEmUAtUKRDIY1kow9t1vIEhe81318wSMICGwf7Rv2qrLrpbeh8bh4hlRLZXDSMyZJYWfejLND4u9EhnNI51DXcQKrceKl9arWqOl7sWIw3EBkeu7Fw4TmyfYwPqCf6oUR0UIdsAVNwbyyiajtQHKh2EKLM1KlY6NdvQTTA7JKn6bLInhFvwZ4yKKbzkgGhF3Oogtnmzl29fW6Q2p0GPuFoueZ74aqlveGTYc0zcXUJkMzltzohoRdMUKP4r5XhbsGBED8ReDbL3ouPhsFchERvvNuaIWLUCY4gl8OW06SMuvceZrCg7EkSFxxprYurHz7VQ2muxzQHj7RG2k3khxbz2ZAhWIlBBtPtg4oXIQ7cbcwgmBXaTXSBgBe3Y8ywYBjinjEjRJjVAiZkWoPrt8JtZv249XiN0MTVYj0ZW6zmcvjZtRn32U3KLMOdjLnRFUP2I3HJtp99uVlM9ghIpae0EfC0v2g78LkZE1YAKsuqCiiy7DVOhyAZUbOrRwXOEDHxUyXwCmo1zfVkPVhwysx8HhH7Iy0yHAMr0Tb97BqcpmmyBsrSgsV1aT3sjY0ctDLibrxbRXBAOexncqB4BBKWJoWkQwUZkFfPXemZkWYmE72w5CFlI6kuwBQp27dCDZ39kRG7Txs1MbsUnRnNHBy1hSOZvTQRYZPX0VmU8SVGUqzwm1ECBHZakQK4RUquk3txKCqbDfbrNmnsEcjFaiMFWkY3Esg6p3Mm41KWysTpzN6287iXjgGSBw6CBv0hH635WiZ0u47IpUD5mY9rkraDDl5sDgd3f586EWJdKAaou3jR7eYU7YuJT3RQVRI0MuS0ec0xYID3WTUI0ckImz2ck7lrtfnkewzRMZSE2ANBkEmg2XAmwrCv0gy4ExW5DayGRXoqUv06ZLGCcBEiaF0fRMlinhElZTVrGPqqhT03WSq4P97JbXA90zUxiHCnsPjuRTthYl7ZaiVZwNt3RtYT4Ff1VQ5KXRwRzdzkRMsubBX7YEhhtl0ZGVlYiP4N4t00Jr7fB4687eabUqK6jcUVpXEpTvKDbj0JLcLYsneM9fsievUz193f6aMQ5o5fm4Ilx3TUZiX4AUsoyd8CD2SK3NkiLuR255BDIA0Zbgnj2XLyQPiJ1T4fjStpjxKOTzsQsZxpThY9Fvjvoxcs3HAiXjLtZ0TSOX6n4ZLjV3TdJMc4PonwqIb3lAndlTMnuzEPof2dXnpexoVm5c37XQ7fBkoMBJ4ydnW25XKYJbkrueRDSwtJGHjY37dob4jPg0axM5uWbqGocXQ4DyiVm5GhvuYX32RQaOtXXXw8cWK6JcSUnlP1gGLMNZEGeDXOuGWiy4AJ7SH93ZQ4iPgoxdfCuW0qbsLKT2HopcY9dtBIRzr91wnES9lDL49tpuW77LSt5dGA0YLSeWAaZt9bDrduE0gDZQ2yX4SDvAOn4PMcbFRfTqzdZXONmO7ruBHHb1tVFlBFNc4xkoetDO2s7mpiVG6YR4EYMFIG1hBPh7Evhttb34AQzqImSQm1gyL3O7n3p98Kqb9qqIPbN1kuhtW5mIbIioWW2n7MHY7E5mt0'

print(checkPermutation(a, b)) #optional

#21


def permute(str1,str2):
    if sorted(str1) == sorted(str2):
        return True
    else:
        return False

str1="hello"
str2='olehl'
a=permute(str1,str2)
print(a

#1


heuristically you're probably better to split them off based on string size.

启发式地,你可能最好根据字符串大小将它们分开。

Pseudocode:

returnvalue = false
if len(a) == len(b)
   if len(a) < threshold
      returnvalue = (sorted(a) == sorted(b))
   else
       returnvalue = naminsmethod(a, b)
return returnvalue

If performance is critical, and string size can be large or small then this is what I'd do.

如果性能至关重要,字符串大小可以大或小,那么这就是我要做的。

It's pretty common to split things like this based on input size or type. Algorithms have different strengths or weaknesses and it would be foolish to use one where another would be better... In this case Namin's method is O(n), but has a larger constant factor than the O(n log n) sorted method.

根据输入大小或类型分割这样的东西是很常见的。算法有不同的优点或缺点,使用另一个更好的方法是愚蠢的......在这种情况下,Namin的方法是O(n),但是具有比O(n log n)排序方法更大的常数因子。

#2


Here is a way which is O(n), asymptotically better than the two ways you suggest.

这是一种O(n)的方法,渐近地比你建议的两种方式更好。

import collections

def same_permutation(a, b):
    d = collections.defaultdict(int)
    for x in a:
        d[x] += 1
    for x in b:
        d[x] -= 1
    return not any(d.itervalues())

## same_permutation([1,2,3],[2,3,1])
#. True

## same_permutation([1,2,3],[2,3,1,1])
#. False

#3


"but the first one is slower when (for example) the first char of a is nowhere in b".

“但是,当(例如)a的第一个字符在b中无处时,第一个更慢。”

This kind of degenerate-case performance analysis is not a good idea. It's a rat-hole of lost time thinking up all kinds of obscure special cases.

这种退化案例的性能分析并不是一个好主意。想到各种晦涩的特殊情况,这是一个失去时间的老鼠洞。

Only do the O-style "overall" analysis.

只进行O型“整体”分析。

Overall, the sorts are O( n log( n ) ).

总的来说,排序是O(n log(n))。

The a.count(char) for char in a solution is O( n 2 ). Each count pass is a full examination of the string.

解决方案中char的a.count(char)是O(n 2)。每个计数通过都是对字符串的全面检查。

If some obscure special case happens to be faster -- or slower, that's possibly interesting. But it only matters when you know the frequency of your obscure special cases. When analyzing sort algorithms, it's important to note that a fair number of sorts involve data that's already in the proper order (either by luck or by a clever design), so sort performance on pre-sorted data matters.

如果一些不起眼的特殊情况碰巧更快 - 或更慢,那可能很有趣。但只有当你知道你不明显的特殊情况的频率时才重要。在分析排序算法时,重要的是要注意相当多的排序涉及已经按正确顺序排列的数据(通过运气或巧妙的设计),因此对预排序数据的排序性能很重要。

In your obscure special case ("the first char of a is nowhere in b") is this frequent enough to matter? If it's just a special case you thought of, set it aside. If it's a fact about your data, then consider it.

在你不起眼的特殊情况下(“a的第一个字母在b中无处可去”)这是否经常发生?如果它只是你想到的特殊情况,请把它放在一边。如果这是关于您的数据的事实,那么请考虑它。

#4


I think the first one is the "obvious" way. It is shorter, clearer, and likely to be faster in many cases because Python's built-in sort is highly optimized.

我认为第一个是“明显的”方式。它更短,更清晰,并且在许多情况下可能更快,因为Python的内置排序是高度优化的。

#5


Your second example won't actually work:

你的第二个例子实际上不会起作用:

all(a.count(char) == b.count(char) for char in a)

will only work if b does not contain extra characters not in a. It also does duplicate work if the characters in string a repeat.

仅当b不包含不在a中的额外字符时才会起作用。如果字符串中的字符重复,它也会重复工作。

If you want to know whether two strings are permutations of the same unique characters, just do:

如果您想知道两个字符串是否是相同唯一字符的排列,只需执行以下操作:

set(a) == set(b)

To correct your second example:

要纠正你的第二个例子:

all(str1.count(char) == str2.count(char) for char in set(a) | set(b))

set() objects overload the bitwise OR operator so that it will evaluate to the union of both sets. This will make sure that you will loop over all the characters of both strings once for each character only.

set()对象重载按位OR运算符,以便它将计算为两个集合的并集。这将确保您将仅为每个字符循环两个字符串的所有字符。

That said, the sorted() method is much simpler and more intuitive, and would be what I would use.

也就是说,sorted()方法更简单,更直观,而且我会使用它。

#6


Here are some timed executions on very small strings, using two different methods:
1. sorting
2. counting (specifically the original method by @namin).

下面是一些非常小的字符串的定时执行,使用两种不同的方法:1。排序2.计数(特别是@namin的原始方法)。

a, b, c = 'confused', 'unfocused', 'foncused'

sort_method = lambda x,y: sorted(x) == sorted(y)

def count_method(a, b):
    d = {}
    for x in a:
        d[x] = d.get(x, 0) + 1
    for x in b:
        d[x] = d.get(x, 0) - 1
    for v in d.itervalues():
        if v != 0:
            return False
    return True

Average run times of the 2 methods over 100,000 loops are:

超过100,000个循环的2种方法的平均运行时间为:

non-match (string a and b)

不匹配(字符串a和b)

$ python -m timeit -s 'import temp' 'temp.sort_method(temp.a, temp.b)'
100000 loops, best of 3: 9.72 usec per loop
$ python -m timeit -s 'import temp' 'temp.count_method(temp.a, temp.b)'
10000 loops, best of 3: 28.1 usec per loop

match (string a and c)

匹配(字符串a和c)

$ python -m timeit -s 'import temp' 'temp.sort_method(temp.a, temp.c)'
100000 loops, best of 3: 9.47 usec per loop
$ python -m timeit -s 'import temp' 'temp.count_method(temp.a, temp.c)'
100000 loops, best of 3: 24.6 usec per loop

Keep in mind that the strings used are very small. The time complexity of the methods are different, so you'll see different results with very large strings. Choose according to your data, you may even use a combination of the two.

请记住,使用的字符串非常小。方法的时间复杂度不同,因此您会看到非常大的字符串的不同结果。根据您的数据选择,您甚至可以使用两者的组合。

#7


I did a pretty thorough comparison in Java with all words in a book I had. The counting method beats the sorting method in every way. The results:

我在Java中用我所拥有的书中的所有单词进行了非常彻底的比较。计数方法以各种方式击败排序方法。结果:

Testing against 9227 words.

Permutation testing by sorting ... done.        18.582 s
Permutation testing by counting ... done.       14.949 s

If anyone wants the algorithm and test data set, comment away.

如果有人想要算法和测试数据集,请注释掉。

#8


First, for solving such problems, e.g. whether String 1 and String 2 are exactly the same or not, easily, you can use an "if" since it is O(1). Second, it is important to consider that whether they are only numerical values or they can be also words in the string. If the latter one is true (words and numerical values are in the string at the same time), your first solution will not work. You can enhance it by using "ord()" function to make it ASCII numerical value. However, in the end, you are using sort; therefore, in the worst case your time complexity will be O(NlogN). This time complexity is not bad. But, you can do better. You can make it O(N). My "suggestion" is using Array(list) and set at the same time. Note that finding a value in Array needs iteration so it's time complexity is O(N), but searching a value in set (which I guess it is implemented with HashTable in Python, I'm not sure) has O(1) time complexity:

首先,为了解决这些问题,例如无论字符串1和字符串2是否完全相同,您都可以使用“if”,因为它是O(1)。其次,重要的是要考虑它们是否只是数值,还是字符串中的单词。如果后一个为真(单词和数值同时在字符串中),则第一个解决方案将不起作用。您可以使用“ord()”函数来增强它,使其成为ASCII数值。但是,最后,你正在使用sort;因此,在最坏的情况下,您的时间复杂度将为O(NlogN)。这个时间复杂性也不错。但是,你可以做得更好。你可以把它做成O(N)。我的“建议”是使用Array(列表)并同时设置。请注意,在Array中查找值需要迭代,因此它的时间复杂度为O(N),但是在集合中搜索值(我猜它是用Python中的HashTable实现的,我不确定)具有O(1)时间复杂度:

def Permutation2(Str1, Str2):

ArrStr1 = list(Str1) #convert Str1 to array
SetStr2 = set(Str2) #convert Str2 to set

ArrExtra = []

if len(Str1) != len(Str2): #check their length
    return False

elif Str1 == Str2: #check their values
    return True

for x in xrange(len(ArrStr1)):
    ArrExtra.append(ArrStr1[x])

for x in xrange(len(ArrExtra)): #of course len(ArrExtra) == len(ArrStr1) ==len(ArrStr2)
    if ArrExtra[x] in SetStr2: #checking in set is O(1)
        continue
    else:
        return False

return True

#9


Go with the first one - it's much more straightforward and easier to understand. If you're actually dealing with incredibly large strings and performance is a real issue, then don't use Python, use something like C.

与第一个一起 - 它更直接,更容易理解。如果你实际上处理的是非常大的字符串并且性能是一个真正的问题,那么不要使用Python,使用像C这样的东西。

As far as the Zen of Python is concerned, that there should only be one obvious way to do things refers to small, simple things. Obviously for any sufficiently complicated task, there will always be zillions of small variations on ways to do it.

就Python的禅而言,只有一种明显的做事方式是指小而简单的事情。显然,对于任何足够复杂的任务,总会有数以万计的小变化。

#10


Sorry that my code is not in Python, I have never used it, but I am sure this can be easily translated into python. I believe this is faster than all the other examples already posted. It is also O(n), but stops as soon as possible:

很抱歉,我的代码不是Python,我从未使用它,但我确信这可以很容易地转换为python。我相信这比已发布的所有其他示例更快。它也是O(n),但是尽快停止:

public boolean isPermutation(String a, String b) {
    if (a.length() != b.length()) {
        return false;
    }

    int[] charCount = new int[256];
    for (int i = 0; i < a.length(); ++i) {
        ++charCount[a.charAt(i)];
    }

    for (int i = 0; i < b.length(); ++i) {
        if (--charCount[b.charAt(i)] < 0) {
            return false;
        }
    }
    return true;
}

First I don't use a dictionary but an array of size 256 for all the characters. Accessing the index should be much faster. Then when the second string is iterated, I immediately return false when the count gets below 0. When the second loop has finished, you can be sure that the strings are a permutation, because the strings have equal length and no character was used more often in b compared to a.

首先,我不使用字典,而是使用大小为256的数组来表示所有字符。访问索引应该快得多。然后当第二个字符串被迭代时,当计数值低于0时我立即返回false。当第二个循环结束时,你可以确定字符串是一个排列,因为字符串长度相等而且没有更频繁地使用字符在b中与a相比。

#11


Here's martinus code in python. It only works for ascii strings:

这是python中的martinus代码。它仅适用于ascii字符串:

def is_permutation(a, b):
    if len(a) != len(b):
        return False

    char_count = [0] * 256
    for c in a:
        char_count[ord(c)] += 1

    for c in b:
        char_count[ord(c)] -= 1
        if char_count[ord(c)] < 0:
            return False

    return True

#12


In Python 3.1/2.7 you can just use collections.Counter(a) == collections.Counter(b).

在Python 3.1 / 2.7中,您可以使用collections.Counter(a)== collections.Counter(b)。

But sorted(a) == sorted(b) is still the most obvious IMHO. You are talking about permutations - changing order - so sorting is the obvious operation to erase that difference.

但排序(a)==排序(b)仍然是最明显的恕我直言。你在谈论排列 - 改变顺序 - 所以排序是消除这种差异的明显操作。

#13


This is a PHP function I wrote about a week ago which checks if two words are anagrams. How would this compare (if implemented the same in python) to the other methods suggested? Comments?

这是我一周前写的PHP函数,它检查两个单词是否是字谜。如何比较建议的其他方法(如果在python中实现相同)?评论?

public function is_anagram($word1, $word2) {
    $letters1 = str_split($word1);
    $letters2 = str_split($word2);
    if (count($letters1) == count($letters2)) {
        foreach ($letters1 as $letter) {
            $index = array_search($letter, $letters2);
            if ($index !== false) {
                unset($letters2[$index]);
            }
            else { return false; }
        }
        return true;
    }
    return false;        
}

Here's a literal translation to Python of the PHP version (by JFS):

这是PHP版本的Python的字面翻译(由JFS提供):

def is_anagram(word1, word2):
    letters2 = list(word2)
    if len(word1) == len(word2):
       for letter in word1:
           try:
               del letters2[letters2.index(letter)]
           except ValueError:
               return False               
       return True
    return False

Comments:

    1. The algorithm is O(N**2). Compare it to @namin's version (it is O(N)).
    2. The multiple returns in the function look horrible.

#14


This version is faster than any examples presented so far except it is 20% slower than sorted(x) == sorted(y) for short strings. It depends on use cases but generally 20% performance gain is insufficient to justify a complication of the code by using different version for short and long strings (as in @patros's answer).

此版本比目前为止提供的任何示例都要快,除了它比排序(x)==排序(y)短字符串慢20%。这取决于用例,但通常20%的性能增益不足以证明通过对短字符串和长字符串使用不同版本来证明代码的复杂性(如@ patros的答案)。

It doesn't use len so it accepts any iterable therefore it works even for data that do not fit in memory e.g., given two big text files with many repeated lines it answers whether the files have the same lines (lines can be in any order).

它不使用len,因此它接受任何迭代,因此它甚至适用于不适合内存的数据,例如,给定两个带有许多重复行的大文本文件,它可以回答文件是否具有相同的行(行可以按任何顺序排列) )。

def isanagram(iterable1, iterable2):
    d = {}
    get = d.get
    for c in iterable1:
        d[c] = get(c, 0) + 1
    try:
        for c in iterable2:
            d[c] -= 1
        return not any(d.itervalues())
    except KeyError:
        return False

It is unclear why this version is faster then defaultdict (@namin's) one for large iterable1 (tested on 25MB thesaurus).

目前还不清楚为什么这个版本比defaultdict(@namin的)快一个大的iterable1(在25MB词库上测试)。

If we replace get in the loop by try: ... except KeyError then it performs 2 times slower for short strings i.e. when there are few duplicates.

如果我们通过try替换get循环:...除了KeyError之外,它对短字符串执行的速度要慢2倍,即重复次数很少时。

#15


This is derived from @patros' answer.

这来自@patros的回答。

from collections import Counter

def is_anagram(a, b, threshold=1000000):
    """Returns true if one sequence is a permutation of the other.

    Ignores whitespace and character case.
    Compares sorted sequences if the length is below the threshold,
    otherwise compares dictionaries that contain the frequency of the
    elements.
    """
    a, b = a.strip().lower(), b.strip().lower()
    length_a, length_b = len(a), len(b)
    if length_a != length_b:
        return False
    if length_a < threshold:
        return sorted(a) == sorted(b)
    return Counter(a) == Counter(b)  # Or use @namin's method if you don't want to create two dictionaries and don't mind the extra typing.

#16


In Swift (or another languages implementation), you could look at the encoded values ( in this case Unicode) and see if they match.

在Swift(或其他语言实现)中,您可以查看编码值(在本例中为Unicode)并查看它们是否匹配。

Something like:

let string1EncodedValues = "Hello".unicodeScalars.map() {
//each encoded value
$0
//Now add the values
}.reduce(0){ total, value in
    total + value.value
}

let string2EncodedValues = "oellH".unicodeScalars.map() {
    $0
    }.reduce(0) { total, value in
    total + value.value
}

let equalStrings = string1EncodedValues == string2EncodedValues ? true : false

You will need to handle spaces and cases as needed.

您需要根据需要处理空间和案例。

#17


def matchPermutation(s1, s2):
  a = []
  b = []

  if len(s1) != len(s2):
    print 'length should be the same'
  return

  for i in range(len(s1)):
    a.append(s1[i])

  for i in range(len(s2)):
    b.append(s2[i])

  if set(a) == set(b):
    print 'Permutation of each other'
  else:
    print 'Not a permutation of each other'
  return

#matchPermutaion('rav', 'var') #returns True
matchPermutaion('rav', 'abc') #returns False

#18


This is an O(n) solution in Python using hashing with dictionaries. Notice that I don't use default dictionaries because the code can stop this way if we determine the two strings are not permutations after checking the second letter for instance.

这是Python中使用散列哈希的O(n)解决方案。请注意,我不使用默认字典,因为如果我们在检查第二个字母后确定两个字符串不是排列,则代码可以以此方式停止。

def if_two_words_are_permutations(s1, s2):
    if len(s1) != len(s2):
        return False
    dic = {}
for ch in s1:
    if ch in dic.keys():
        dic[ch] += 1
    else:
        dic[ch] = 1
for ch in s2:
    if not ch in dic.keys():
        return False
    elif dic[ch] == 0:
        return False
    else:
        dic[ch] -= 1
return True

#19


Checking if two strings are permutations of each other in Python

# First method
def permutation(s1,s2):
 if len(s1) != len(s2):return False;
 return ' '.join(sorted(s1)) == ' '.join(sorted(s2))


# second method
def permutation1(s1,s2):
 if len(s1) != len(s2):return False;
 array = [0]*128;
 for c in s1:
 array[ord(c)] +=1
 for c in s2:
   array[ord(c)] -=1
   if (array[ord(c)]) < 0:
     return False
 return True

#20


How about something like this. Pretty straight-forward and readable. This is for strings since the as per the OP.

这样的事情怎么样?非常简单易读。这是因为根据OP的字符串。

Given that the complexity of sorted() is O(n log n).

鉴于sorted()的复杂性是O(n log n)。

def checkPermutation(a,b):
    # input: strings a and b
    # return: boolean true if a is Permutation of b

    if len(a) != len(b):
        return False
    else:
        s_a = ''.join(sorted(a))
        s_b = ''.join(sorted(b))
        if s_a == s_b:
            return True
        else:
            return False

# test inputs
a = 'sRF7w0qbGp4fdgEyNlscUFyouETaPHAiQ2WIxzohiafEGJLw03N8ALvqMw6reLN1kHRjDeDausQBEuIWkIBfqUtsaZcPGoqAIkLlugTxjxLhkRvq5d6i55l4oBH1QoaMXHIZC5nA0K5KPBD9uIwa789sP0ZKV4X6'
b = 'Vq3EeiLGfsAOH2PW6skMN8mEmUAtUKRDIY1kow9t1vIEhe81318wSMICGwf7Rv2qrLrpbeh8bh4hlRLZXDSMyZJYWfejLND4u9EhnNI51DXcQKrceKl9arWqOl7sWIw3EBkeu7Fw4TmyfYwPqCf6oUR0UIdsAVNwbyyiajtQHKh2EKLM1KlY6NdvQTTA7JKn6bLInhFvwZ4yKKbzkgGhF3Oogtnmzl29fW6Q2p0GPuFoueZ74aqlveGTYc0zcXUJkMzltzohoRdMUKP4r5XhbsGBED8ReDbL3ouPhsFchERvvNuaIWLUCY4gl8OW06SMuvceZrCg7EkSFxxprYurHz7VQ2muxzQHj7RG2k3khxbz2ZAhWIlBBtPtg4oXIQ7cbcwgmBXaTXSBgBe3Y8ywYBjinjEjRJjVAiZkWoPrt8JtZv249XiN0MTVYj0ZW6zmcvjZtRn32U3KLMOdjLnRFUP2I3HJtp99uVlM9ghIpae0EfC0v2g78LkZE1YAKsuqCiiy7DVOhyAZUbOrRwXOEDHxUyXwCmo1zfVkPVhwysx8HhH7Iy0yHAMr0Tb97BqcpmmyBsrSgsV1aT3sjY0ctDLibrxbRXBAOexncqB4BBKWJoWkQwUZkFfPXemZkWYmE72w5CFlI6kuwBQp27dCDZ39kRG7Txs1MbsUnRnNHBy1hSOZvTQRYZPX0VmU8SVGUqzwm1ECBHZakQK4RUquk3txKCqbDfbrNmnsEcjFaiMFWkY3Esg6p3Mm41KWysTpzN6287iXjgGSBw6CBv0hH635WiZ0u47IpUD5mY9rkraDDl5sDgd3f586EWJdKAaou3jR7eYU7YuJT3RQVRI0MuS0ec0xYID3WTUI0ckImz2ck7lrtfnkewzRMZSE2ANBkEmg2XAmwrCv0gy4ExW5DayGRXoqUv06ZLGCcBEiaF0fRMlinhElZTVrGPqqhT03WSq4P97JbXA90zUxiHCnsPjuRTthYl7ZaiVZwNt3RtYT4Ff1VQ5KXRwRzdzkRMsubBX7YEhhtl0ZGVlYiP4N4t00Jr7fB4687eabUqK6jcUVpXEpTvKDbj0JLcLYsneM9fsievUz193f6aMQ5o5fm4Ilx3TUZiX4AUsoyd8CD2SK3NkiLuR255BDIA0Zbgnj2XLyQPiJ1T4fjStpjxKOTzsQsZxpThY9Fvjvoxcs3HAiXjLtZ0TSOX6n4ZLjV3TdJMc4PonwqIb3lAndlTMnuzEPof2dXnpexoVm5c37XQ7fBkoMBJ4ydnW25XKYJbkrueRDSwtJGHjY37dob4jPg0axM5uWbqGocXQ4DyiVm5GhvuYX32RQaOtXXXw8cWK6JcSUnlP1gGLMNZEGeDXOuGWiy4AJ7SH93ZQ4iPgoxdfCuW0qbsLKT2HopcY9dtBIRzr91wnES9lDL49tpuW77LSt5dGA0YLSeWAaZt9bDrduE0gDZQ2yX4SDvAOn4PMcbFRfTqzdZXONmO7ruBHHb1tVFlBFNc4xkoetDO2s7mpiVG6YR4EYMFIG1hBPh7Evhttb34AQzqImSQm1gyL3O7n3p98Kqb9qqIPbN1kuhtW5mIbIioWW2n7MHY7E5mt0'

print(checkPermutation(a, b)) #optional

#21


def permute(str1,str2):
    if sorted(str1) == sorted(str2):
        return True
    else:
        return False

str1="hello"
str2='olehl'
a=permute(str1,str2)
print(a