为什么Python2中的某些代码确定性和Python 3中的非确定性?

时间:2022-10-26 03:19:24

I'm trying to write a script to calculate all of the possible fuzzy string match matches to for a short string, or 'kmer', and the same code that works in Python 2.7.X gives me a non-deterministic answer with Python 3.3.X, and I can't figure out why.

我正在尝试编写一个脚本来计算所有可能的模糊字符串匹配匹配为一个短字符串,或'kmer',并在Python 2.7.X中使用相同的代码给我一个非确定性的答案与Python 3.3 .X,我无法弄明白为什么。

I iterate over a dictionary, itertools.product, and itertools.combinations in my code, but I iterate over all of them to completion with no breaks or continues. In addition, I store all of my results in a separate dictionary instead of the one I'm iterating over. In short - I'm not making any mistakes that are obvious to me, so why is the behavior different between Python2 and Python3?

我在我的代码中遍历字典,itertools.product和itertools.combinations,但是我遍历所有这些字典完成而没有中断或继续。另外,我将所有结果存储在一个单独的字典中,而不是我正在迭代的字典中。简而言之 - 我没有犯任何明显的错误,为什么Python2和Python3之间的行为有所不同?

Sample, slightly simplified code below:

示例,略微简化的代码如下:

import itertools

def find_best_fuzzy_kmer( kmers ):
    for kmer, value in kmers.items():
        for similar_kmer in permute_string( kmer, m ):
            # Tabulate Kmer

def permute_string( query, m ):
    query_list = list(query)
    output = set() # hold output
    for i in range(m+1):
        # pre-calculate the possible combinations of new bases
        base_combinations = list(itertools.product('AGCT', repeat=i))
        # for each combination `idx` in idxs, replace str[idx]
        for positions in itertools.combinations(range(len(query_list)), i):
            for bases in base_combinations:
                # Generate Permutations and add to output
    return output

1 个解决方案

#1


32  

If by "non-deterministic" you mean the order in which dictionary keys appear (when you iterate over a dictionary) changes from run to run, and the dictionary keys are strings, please say so. Then I can help. But so far you haven't said any of that ;-)

如果通过“非确定性”表示字典键出现的顺序(当您遍历字典时)从运行更改为运行,并且字典键是字符串,请说明。然后我可以帮忙。但到目前为止你还没有说过任何一个;-)

Assuming that's the problem, here's a little program:

假设这是问题,这里有一个小程序:

d = dict((L, i) for i, L in enumerate('abcd'))
print(d)

and the output from 4 runs under Python 3.3.2:

并且在Python 3.3.2下运行4个输出:

{'d': 3, 'a': 0, 'c': 2, 'b': 1}
{'d': 3, 'b': 1, 'c': 2, 'a': 0}
{'d': 3, 'a': 0, 'b': 1, 'c': 2}
{'a': 0, 'b': 1, 'c': 2, 'd': 3}

The cause is hinted at from this part of python -h output:

从python -h输出的这一部分暗示了原因:

Other environment variables:
...
PYTHONHASHSEED: if this variable is set to 'random', a random value is used
   to seed the hashes of str, bytes and datetime objects.  It can also be
   set to an integer in the range [0,4294967295] to get hash values with a
   predictable seed.

This is a half-baked "security fix", intended to help prevent DOS attacks based on constructing dict inputs designed to provoke quadratic-time behavior. "random" is the default in Python3.

这是一个半生不熟的“安全修复程序”,旨在帮助防止基于构造旨在激发二次时行为的字典输入的DOS攻击。 “random”是Python3中的默认值。

You can turn that off by setting the envar PYTHONHASHSEED to an integer (your choice - pick 0 if you don't care). Then iterating a dict with string keys will produce them in the same order across runs.

您可以通过将envar PYTHONHASHSEED设置为整数来关闭它(您的选择 - 如果您不在乎,则选择0)。然后使用字符串键迭代dict将在运行时以相同的顺序生成它们。

As @AlcariTheMad said in a comment, you can enable the Python3 default behavior under Python 2 via python -R ....

正如@AlcariTheMad在评论中所说,你可以通过python -R在Python 2下启用Python3默认行为....

#1


32  

If by "non-deterministic" you mean the order in which dictionary keys appear (when you iterate over a dictionary) changes from run to run, and the dictionary keys are strings, please say so. Then I can help. But so far you haven't said any of that ;-)

如果通过“非确定性”表示字典键出现的顺序(当您遍历字典时)从运行更改为运行,并且字典键是字符串,请说明。然后我可以帮忙。但到目前为止你还没有说过任何一个;-)

Assuming that's the problem, here's a little program:

假设这是问题,这里有一个小程序:

d = dict((L, i) for i, L in enumerate('abcd'))
print(d)

and the output from 4 runs under Python 3.3.2:

并且在Python 3.3.2下运行4个输出:

{'d': 3, 'a': 0, 'c': 2, 'b': 1}
{'d': 3, 'b': 1, 'c': 2, 'a': 0}
{'d': 3, 'a': 0, 'b': 1, 'c': 2}
{'a': 0, 'b': 1, 'c': 2, 'd': 3}

The cause is hinted at from this part of python -h output:

从python -h输出的这一部分暗示了原因:

Other environment variables:
...
PYTHONHASHSEED: if this variable is set to 'random', a random value is used
   to seed the hashes of str, bytes and datetime objects.  It can also be
   set to an integer in the range [0,4294967295] to get hash values with a
   predictable seed.

This is a half-baked "security fix", intended to help prevent DOS attacks based on constructing dict inputs designed to provoke quadratic-time behavior. "random" is the default in Python3.

这是一个半生不熟的“安全修复程序”,旨在帮助防止基于构造旨在激发二次时行为的字典输入的DOS攻击。 “random”是Python3中的默认值。

You can turn that off by setting the envar PYTHONHASHSEED to an integer (your choice - pick 0 if you don't care). Then iterating a dict with string keys will produce them in the same order across runs.

您可以通过将envar PYTHONHASHSEED设置为整数来关闭它(您的选择 - 如果您不在乎,则选择0)。然后使用字符串键迭代dict将在运行时以相同的顺序生成它们。

As @AlcariTheMad said in a comment, you can enable the Python3 default behavior under Python 2 via python -R ....

正如@AlcariTheMad在评论中所说,你可以通过python -R在Python 2下启用Python3默认行为....