如何迭代字典 - 一次n个键值对

时间:2022-06-02 07:38:27

I have a very large dictionary with thousands of elements. I need to execute a function with this dictionary as parameter. Now, instead of passing the whole dictionary in a single execution, I want to execute the function in batches - with x key-value pairs of the dictionary at a time.

我有一个包含数千个元素的非常大的字典。我需要用这个字典作为参数执行一个函数。现在,我不是在一次执行中传递整个字典,而是希望批量执行该函数 - 一次使用字典的x键值对。

I am doing the following:

我正在做以下事情:

mydict = ##some large hash
x = ##batch size
def some_func(data):
    ##do something on data
temp = {}
for key,value in mydict.iteritems():
        if len(temp) != 0 and len(temp)%x == 0:
                some_func(temp)
                temp = {}
                temp[key] = value
        else:
                temp[key] = value
if temp != {}:
        some_func(temp)

This looks very hackish to me. I want to know if there is an elegant/better way of doing this.

这对我来说看起来很骇人听闻。我想知道是否有一种优雅/更好的方式来做到这一点。

3 个解决方案

#1


6  

I often use this little utility:

我经常使用这个小工具:

import itertools

def chunked(it, size):
    it = iter(it)
    while True:
        p = tuple(itertools.islice(it, size))
        if not p:
            break
        yield p

For your use case:

对于您的用例:

for chunk in chunked(big_dict.iteritems(), batch_size):
    func(chunk)

#2


1  

Here are two solutions adapted from earlier answers of mine.

以下是根据我早期答案改编的两种解决方案。

Either, you can just get the list of items from the dictionary and create new dicts from slices of that list. This is not optimal, though, as it does a lot of copying of that huge dictionary.

或者,您可以从字典中获取项目列表,并从该列表的切片中创建新的dicts。然而,这并不是最佳的,因为它会对那个庞大的字典进行大量复制。

def chunks(dictionary, size):
    items = dictionary.items()
    return (dict(items[i:i+size]) for i in range(0, len(items), size))

Alternatively, you can use some of the itertools module's functions to yield (generate) new sub-dictionaries as you loop. This is similar to @georg's answer, just using a for loop.

或者,您可以使用一些itertools模块的函数在循环时生成(生成)新的子字典。这与@ georg的答案类似,只是使用for循环。

from itertools import chain, islice
def chunks(dictionary, size):
    iterator = dictionary.iteritems()
    for first in iterator:
        yield dict(chain([first], islice(iterator, size - 1)))

Example usage. for both cases:

用法示例。对于这两种情况:

mydict = {i+1: chr(i+65) for i in range(26)}
for sub_d in chunks2(mydict, 10):
    some_func(sub_d)

#3


0  

From more-itertools:

来自更多的itertools:

def chunked(iterable, n):
    """Break an iterable into lists of a given length::
        >>> list(chunked([1, 2, 3, 4, 5, 6, 7], 3))
        [[1, 2, 3], [4, 5, 6], [7]]
    If the length of ``iterable`` is not evenly divisible by ``n``, the last
    returned list will be shorter.
    This is useful for splitting up a computation on a large number of keys
    into batches, to be pickled and sent off to worker processes. One example
    is operations on rows in MySQL, which does not implement server-side
    cursors properly and would otherwise load the entire dataset into RAM on
    the client.
    """
    # Doesn't seem to run into any number-of-args limits.
    for group in (list(g) for g in izip_longest(*[iter(iterable)] * n,
                                                fillvalue=_marker)):
        if group[-1] is _marker:
            # If this is the last group, shuck off the padding:
            del group[group.index(_marker):]
        yield group

#1


6  

I often use this little utility:

我经常使用这个小工具:

import itertools

def chunked(it, size):
    it = iter(it)
    while True:
        p = tuple(itertools.islice(it, size))
        if not p:
            break
        yield p

For your use case:

对于您的用例:

for chunk in chunked(big_dict.iteritems(), batch_size):
    func(chunk)

#2


1  

Here are two solutions adapted from earlier answers of mine.

以下是根据我早期答案改编的两种解决方案。

Either, you can just get the list of items from the dictionary and create new dicts from slices of that list. This is not optimal, though, as it does a lot of copying of that huge dictionary.

或者,您可以从字典中获取项目列表,并从该列表的切片中创建新的dicts。然而,这并不是最佳的,因为它会对那个庞大的字典进行大量复制。

def chunks(dictionary, size):
    items = dictionary.items()
    return (dict(items[i:i+size]) for i in range(0, len(items), size))

Alternatively, you can use some of the itertools module's functions to yield (generate) new sub-dictionaries as you loop. This is similar to @georg's answer, just using a for loop.

或者,您可以使用一些itertools模块的函数在循环时生成(生成)新的子字典。这与@ georg的答案类似,只是使用for循环。

from itertools import chain, islice
def chunks(dictionary, size):
    iterator = dictionary.iteritems()
    for first in iterator:
        yield dict(chain([first], islice(iterator, size - 1)))

Example usage. for both cases:

用法示例。对于这两种情况:

mydict = {i+1: chr(i+65) for i in range(26)}
for sub_d in chunks2(mydict, 10):
    some_func(sub_d)

#3


0  

From more-itertools:

来自更多的itertools:

def chunked(iterable, n):
    """Break an iterable into lists of a given length::
        >>> list(chunked([1, 2, 3, 4, 5, 6, 7], 3))
        [[1, 2, 3], [4, 5, 6], [7]]
    If the length of ``iterable`` is not evenly divisible by ``n``, the last
    returned list will be shorter.
    This is useful for splitting up a computation on a large number of keys
    into batches, to be pickled and sent off to worker processes. One example
    is operations on rows in MySQL, which does not implement server-side
    cursors properly and would otherwise load the entire dataset into RAM on
    the client.
    """
    # Doesn't seem to run into any number-of-args limits.
    for group in (list(g) for g in izip_longest(*[iter(iterable)] * n,
                                                fillvalue=_marker)):
        if group[-1] is _marker:
            # If this is the last group, shuck off the padding:
            del group[group.index(_marker):]
        yield group