让这个更快。(Min, Max在同一迭代中使用条件)

I would like to ask if/how could I rewrite those lines below, to run faster.

我想问一下，为了跑得更快，我是否/如何重写下面的代码。

*(-10000, 10000) is just a range where I can be sure my numbers are between.

*(-10000, 10000)只是一个范围，我可以确定我的号码在这之间。

    first = 10000
    last = -10000

    for key in my_data.keys():
        if "LastFirst_" in key:  # In my_data there are many more keys with lots of vals.
            first = min(first, min(my_data[key]))
            last = max(last, max(my_data[key]))

    print first, last

Also, is there any pythonic way to write that (even if that wouldn't mean it will run faster)?

还有，有没有什么毕达哥拉斯式的写法(即使这并不意味着它会跑得更快)?

Thx

谢谢

5 个解决方案

#1

Use the * operator to unpack the values:

使用*操作符解压值:

>>> my_data = {'LastFirst_1':[1, 4, 5], 'LastFirst_2':[2, 4, 6]}
>>> d = [item for k,v in my_data.items() if 'LastFirst_' in k for item in v]
>>> first = 2
>>> last = 5
>>> min(first, *d)
1
>>> max(last, *d)
6

#2

You could use some comprehensions to simplify the code.

您可以使用一些理解来简化代码。

first = min(min(data) for (key, data) in my_data.items() if "LastFirst_" in key)
last = max(max(data) for (key, data) in my_data.items() if "LastFirst_" in key)

#3

The min and max functions are overloaded to take either multiple values (as you use it), or one sequence of values, so you can pass in iterables (e.g. lists) and get the min or max of them.

min和max函数被重载，可以取多个值(正如您使用的那样)，也可以取一个值序列，这样您就可以传入可迭代的值(例如列表)并获得它们的最小值或最大值。

Also, if you're only interested in the values, use .values() or itervalues(). If you're interested in both, use .items() or .iteritems(). (In Python 3, there is no .iter- version.)

另外，如果您只对值感兴趣，可以使用.values()或itervalues()。如果您对两者都感兴趣，可以使用.items()或.iteritems()。(在Python 3中，没有.iter- version。)

If you have many sequences, you can use itertools.chain to make them one long iterable. You can also manually string them along using multiple for in a single comprehension, but that can be distasteful.

如果有很多序列，可以使用itertools。链，使它们成为一个长迭代。您还可以在单个理解中使用多个字符串来手动字符串，但这可能会令人反感。

import itertools

def flatten1(iterables):
    # The "list" is necessary, because we want to use this twice
    # but `chain` returns an iterator, which can only be used once.
    return list(itertools.chain(*iterables))

# Note: The "(" ")" indicates that this is an iterator, not a list.
valid_lists = (v for k,v in my_data.iteritems() if "LastFirst_" in k)
valid_values = flatten1(valid_lists)
# Alternative: [w for w in v for k,v in my_data.iteritems() if "LastFirst_" in k]  

first = min(valid_values)
last = max(valid_values)

print first, last

If the maximum and minimum elements are NOT in the dict, then the coder should decide what to do, but I would suggest that they consider allowing the default behavior of max/min (probably a raised exception, or the None value), rather than try to guess the upper or lower bound. Either one would be more Pythonic.

如果最大和最小元素不在词典中，那么程序员应该决定该做什么，但是我建议他们考虑允许max/min的默认行为(可能是一个引发的异常或无值)，而不是尝试猜测上限或下限。任何一个都更符合python语言。

In Python 3, you may specify a default argument, e.g. max(valid_values, default=10000).

在Python 3中，您可以指定一个默认参数，例如max(valid_values, default=10000)。

#4

my_data = {'LastFirst_a': [1, 2, 34000], 'LastFirst_b': [-12000, 1, 5]}

first = 10000
last = -10000

# Note: replace .items() with .iteritems() if you're using Python 2.
relevant_data = [el for k, v in my_data.items() for el in v if "LastFirst_" in k]
# maybe faster:
# relevant_data = [el for k, v in my_data.items() for el in v if k.startswith("LastFirst_")]

first = max(first, max(relevant_data))
last = min(last, min(relevant_data))

print(first, last)

#5

values = [my_data[k] for k in my_data if 'LastKey_' in k]
flattened = [item for sublist in values for item in sublist]
min(first, min(flattened))
max(last, max(flattened))

或

values = [item for sublist in (j for a, j in d.iteritems() if 'LastKey_' in a) for item in sublist]
min(first, min(values))
max(last, max(values))

I was running some benchmarks and it seems that the second solution is slightly faster than the first. However, I also compared these two versions with the code posted by other posters.

我运行了一些基准测试，第二个解决方案似乎比第一个稍快一些。但是，我也比较了这两个版本和其他海报的代码。

solution one:  0.648876905441
solution two:  0.634277105331
solution three (TigerhawkT3):  2.14495801926
solution four (Def_Os):  1.07884407043
solution five (leewangzhong):  0.635314941406

based on a randomly generated dictionary of 1 million keys. I think that leewangzhong's solution is really good. Besides the timing shown above, in the next experiments it's resulting slightly faster than my second solution (we are talking about milliseconds, though), like:

基于随机生成的100万个键的字典。我认为李旺中的解决方案真的不错。除了上面显示的时间，在接下来的实验中，它比我的第二个解决方案(我们说的是毫秒)快了一点，比如:

solution one:  0.678879022598
solution two:  0.62641787529
solution three:  2.15943193436
solution four:  1.05863213539
solution five:  0.611482858658

Itertools is really a great module!

Itertools真的是一个很棒的模块!

#1