如何有效地列出一个图像序列?Python中的数字序列比较。

时间:2021-10-20 04:42:38

I have a directory of 9 images:

我有一个9张图片的目录:

image_0001, image_0002, image_0003
image_0010, image_0011
image_0011-1, image_0011-2, image_0011-3
image_9999

I would like to be able to list them in an efficient way, like this (4 entries for 9 images):

我希望能够以一种有效的方式列出它们,比如(9张图片有4个条目):

(image_000[1-3], image_00[10-11], image_0011-[1-3], image_9999)

Is there a way in python, to return a directory of images, in a short/clear way (without listing every file)?

在python中,是否有一种方法,以简短/清晰的方式返回图像目录(不列出每个文件)?

So, possibly something like this:

可能是这样的:

list all images, sort numerically, create a list (counting each image in sequence from start). When an image is missing (create a new list), continue until original file list is finished. Now I should just have some lists that contain non broken sequences.

列出所有图像,按数字排序,创建一个列表(从开始按顺序计算每个图像)。当一个图像丢失(创建一个新列表),继续直到原始文件列表完成。现在我应该有一些包含不间断序列的列表。

I'm trying to make it easy to read/describe a list of numbers. If I had a sequence of 1000 consecutive files It could be clearly listed as file[0001-1000] rather than file['0001','0002','0003' etc...]

我正设法使读/描述一串数字变得容易。如果我有一个连续1000个文件的序列,它可以被清楚地列出为文件[0001-]而不是文件['0001','0002','0003'等等…

Edit1(based on suggestion): Given a flattened list, how would you derive the glob patterns?

Edit1(基于建议):给定一个扁平的列表,如何派生glob模式?

Edit2 I'm trying to break the problem down into smaller pieces. Here is an example of part of the solution: data1 works, data2 returns 0010 as 64, data3 (the realworld data) doesn't work:

Edit2,我试着把问题分解成小块。这里有一个解决方案的部分示例:data1工作,data2返回0010作为64,data3 (realworld数据)不起作用:

# Find runs of consecutive numbers using groupby.  The key to the solution
# is differencing with a range so that consecutive numbers all appear in
# same group.
from operator import itemgetter
from itertools import *

data1=[01,02,03,10,11,100,9999]
data2=[0001,0002,0003,0010,0011,0100,9999]
data3=['image_0001','image_0002','image_0003','image_0010','image_0011','image_0011-2','image_0011-3','image_0100','image_9999']

list1 = []
for k, g in groupby(enumerate(data1), lambda (i,x):i-x):
    list1.append(map(itemgetter(1), g))
print 'data1'
print list1

list2 = []
for k, g in groupby(enumerate(data2), lambda (i,x):i-x):
    list2.append(map(itemgetter(1), g))
print '\ndata2'
print list2

returns:

返回:

data1
[[1, 2, 3], [10, 11], [100], [9999]]

data2
[[1, 2, 3], [8, 9], [64], [9999]]

3 个解决方案

#1


6  

Here is a working implementation of what you want to achieve, using the code you added as a starting point:

以下是您想要实现的目标的有效实现,使用您添加的代码作为起点:

#!/usr/bin/env python

import itertools
import re

# This algorithm only works if DATA is sorted.
DATA = ["image_0001", "image_0002", "image_0003",
        "image_0010", "image_0011",
        "image_0011-1", "image_0011-2", "image_0011-3",
        "image_0100", "image_9999"]

def extract_number(name):
    # Match the last number in the name and return it as a string,
    # including leading zeroes (that's important for formatting below).
    return re.findall(r"\d+$", name)[0]

def collapse_group(group):
    if len(group) == 1:
        return group[0][1]  # Unique names collapse to themselves.
    first = extract_number(group[0][1])  # Fetch range
    last = extract_number(group[-1][1])  # of this group.
    # Cheap way to compute the string length of the upper bound,
    # discarding leading zeroes.
    length = len(str(int(last)))
    # Now we have the length of the variable part of the names,
    # the rest is only formatting.
    return "%s[%s-%s]" % (group[0][1][:-length],
        first[-length:], last[-length:])

groups = [collapse_group(tuple(group)) \
    for key, group in itertools.groupby(enumerate(DATA),
        lambda(index, name): index - int(extract_number(name)))]

print groups

This prints ['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999'], which is what you want.

它打印['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999'],这就是你想要的。

HISTORY: I initially answered the question backwards, as @Mark Ransom pointed out below. For the sake of history, my original answer was:

历史:正如@Mark Ransom指出的,我最初是逆向回答这个问题的。为了历史,我最初的答案是:

You're looking for glob. Try:

你正在寻找水珠。试一试:

import glob
images = glob.glob("image_[0-9]*")

Or, using your example:

或者,用你的例子:

images = [glob.glob(pattern) for pattern in ("image_000[1-3]*",
    "image_00[10-11]*", "image_0011-[1-3]*", "image_9999*")]
images = [image for seq in images for image in seq]  # flatten the list

#2


3  

Okay, so I found your question to be a fascinating puzzle. I've left how to "compress" the numeric ranges up to you (marked as a TODO), as there are different ways to accomplish that depending on how you like it formatted and if you want the minimum number of elements or the minimum string description length.

我觉得你的问题很有趣。我已经将如何“压缩”数值范围留给了您(标记为TODO),因为根据您喜欢的格式以及希望元素的最小数量或最小字符串描述长度有不同的实现方法。

This solution uses a simple regular expression (digit strings) to classify each string into two groups: static and variable. After the data is classified, I use groupby to collect the static data into longest matching groups to achieve the summary effect. I mix integer index sentinals into the result (in matchGrouper) so I can re-select the varying parts from all elements (in unpack).

这个解决方案使用一个简单的正则表达式(数字字符串)将每个字符串分成两组:静态和变量。对数据进行分类后,我使用groupby将静态数据收集到最长的匹配组中,实现汇总效果。我将整型索引标记混合到结果中(在matchGrouper中),以便可以从所有元素(在unpack中)中重新选择不同的部分。

import re
import glob
from itertools import groupby
from operator import itemgetter

def classifyGroups(iterable, reObj=re.compile('\d+')):
    """Yields successive match lists, where each item in the list is either
    static text content, or a list of matching values.

     * `iterable` is a list of strings, such as glob('images/*')
     * `reObj` is a compiled regular expression that describes the
            variable section of the iterable you want to match and classify
    """
    def classify(text, pos=0):
        """Use a regular expression object to split the text into match and non-match sections"""
        r = []
        for m in reObj.finditer(text, pos):
            m0 = m.start()
            r.append((False, text[pos:m0]))
            pos = m.end()
            r.append((True, text[m0:pos]))
        r.append((False, text[pos:]))
        return r

    def matchGrouper(each):
        """Returns index of matches or origional text for non-matches"""
        return [(i if t else v) for i,(t,v) in enumerate(each)]

    def unpack(k,matches):
        """If the key is an integer, unpack the value array from matches"""
        if isinstance(k, int):
            k = [m[k][1] for m in matches]
        return k

    # classify each item into matches
    matchLists = (classify(t) for t in iterable)

    # group the matches by their static content
    for key, matches in groupby(matchLists, matchGrouper):
        matches = list(matches)
        # Yield a list of content matches.  Each entry is either text
        # from static content, or a list of matches
        yield [unpack(k, matches) for k in key]

Finally, we add enough logic to perform pretty printing of the output, and run an example.

最后,我们添加足够的逻辑来执行输出的漂亮打印,并运行一个示例。

def makeResultPretty(res):
    """Formats data somewhat like the question"""
    r = []
    for e in res:
        if isinstance(e, list):
            # TODO: collapse and simplify ranges as desired here
            if len(set(e))<=1:
                # it's a list of the same element
                e = e[0]
            else: 
                # prettify the list
                e = '['+' '.join(e)+']'
        r.append(e)
    return ''.join(r)

fnList = sorted(glob.glob('images/*'))
re_digits = re.compile(r'\d+')
for res in classifyGroups(fnList, re_digits):
    print makeResultPretty(res)

My directory of images was created from your example. You can replace fnList with the following list for testing:

我的图像目录是由您的示例创建的。您可以用以下列表替换fnList进行测试:

fnList = [
 'images/image_0001.jpg',
 'images/image_0002.jpg',
 'images/image_0003.jpg',
 'images/image_0010.jpg',
 'images/image_0011-1.jpg',
 'images/image_0011-2.jpg',
 'images/image_0011-3.jpg',
 'images/image_0011.jpg',
 'images/image_9999.jpg']

And when I run against this directory, my output looks like:

当我运行这个目录时,我的输出是:

*/3926936% python classify.py
images/image_[0001 0002 0003 0010].jpg
images/image_0011-[1 2 3].jpg
images/image_[0011 9999].jpg

#3


2  

def ranges(sorted_list):
    first = None
    for x in sorted_list:
        if first is None:
            first = last = x
        elif x == increment(last):
            last = x
        else:
            yield first, last
            first = last = x
    if first is not None:
        yield first, last

The increment function is left as an exercise for the reader.

递增函数留给读者作为练习。

Edit: here's an example of how it would be used with integers instead of strings as input.

编辑:这里有一个例子,说明如何将它用于整数而不是字符串作为输入。

def increment(x): return x+1

list(ranges([1,2,3,4,6,7,8,10]))
[(1, 4), (6, 8), (10, 10)]

For each contiguous range in the input you get a pair indicating the start and end of the range. If an element isn't part of a range, the start and end values are identical.

对于输入中的每个连续范围,您将得到一对表示范围的开始和结束。如果元素不是范围的一部分,则开始值和结束值是相同的。

#1


6  

Here is a working implementation of what you want to achieve, using the code you added as a starting point:

以下是您想要实现的目标的有效实现,使用您添加的代码作为起点:

#!/usr/bin/env python

import itertools
import re

# This algorithm only works if DATA is sorted.
DATA = ["image_0001", "image_0002", "image_0003",
        "image_0010", "image_0011",
        "image_0011-1", "image_0011-2", "image_0011-3",
        "image_0100", "image_9999"]

def extract_number(name):
    # Match the last number in the name and return it as a string,
    # including leading zeroes (that's important for formatting below).
    return re.findall(r"\d+$", name)[0]

def collapse_group(group):
    if len(group) == 1:
        return group[0][1]  # Unique names collapse to themselves.
    first = extract_number(group[0][1])  # Fetch range
    last = extract_number(group[-1][1])  # of this group.
    # Cheap way to compute the string length of the upper bound,
    # discarding leading zeroes.
    length = len(str(int(last)))
    # Now we have the length of the variable part of the names,
    # the rest is only formatting.
    return "%s[%s-%s]" % (group[0][1][:-length],
        first[-length:], last[-length:])

groups = [collapse_group(tuple(group)) \
    for key, group in itertools.groupby(enumerate(DATA),
        lambda(index, name): index - int(extract_number(name)))]

print groups

This prints ['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999'], which is what you want.

它打印['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999'],这就是你想要的。

HISTORY: I initially answered the question backwards, as @Mark Ransom pointed out below. For the sake of history, my original answer was:

历史:正如@Mark Ransom指出的,我最初是逆向回答这个问题的。为了历史,我最初的答案是:

You're looking for glob. Try:

你正在寻找水珠。试一试:

import glob
images = glob.glob("image_[0-9]*")

Or, using your example:

或者,用你的例子:

images = [glob.glob(pattern) for pattern in ("image_000[1-3]*",
    "image_00[10-11]*", "image_0011-[1-3]*", "image_9999*")]
images = [image for seq in images for image in seq]  # flatten the list

#2


3  

Okay, so I found your question to be a fascinating puzzle. I've left how to "compress" the numeric ranges up to you (marked as a TODO), as there are different ways to accomplish that depending on how you like it formatted and if you want the minimum number of elements or the minimum string description length.

我觉得你的问题很有趣。我已经将如何“压缩”数值范围留给了您(标记为TODO),因为根据您喜欢的格式以及希望元素的最小数量或最小字符串描述长度有不同的实现方法。

This solution uses a simple regular expression (digit strings) to classify each string into two groups: static and variable. After the data is classified, I use groupby to collect the static data into longest matching groups to achieve the summary effect. I mix integer index sentinals into the result (in matchGrouper) so I can re-select the varying parts from all elements (in unpack).

这个解决方案使用一个简单的正则表达式(数字字符串)将每个字符串分成两组:静态和变量。对数据进行分类后,我使用groupby将静态数据收集到最长的匹配组中,实现汇总效果。我将整型索引标记混合到结果中(在matchGrouper中),以便可以从所有元素(在unpack中)中重新选择不同的部分。

import re
import glob
from itertools import groupby
from operator import itemgetter

def classifyGroups(iterable, reObj=re.compile('\d+')):
    """Yields successive match lists, where each item in the list is either
    static text content, or a list of matching values.

     * `iterable` is a list of strings, such as glob('images/*')
     * `reObj` is a compiled regular expression that describes the
            variable section of the iterable you want to match and classify
    """
    def classify(text, pos=0):
        """Use a regular expression object to split the text into match and non-match sections"""
        r = []
        for m in reObj.finditer(text, pos):
            m0 = m.start()
            r.append((False, text[pos:m0]))
            pos = m.end()
            r.append((True, text[m0:pos]))
        r.append((False, text[pos:]))
        return r

    def matchGrouper(each):
        """Returns index of matches or origional text for non-matches"""
        return [(i if t else v) for i,(t,v) in enumerate(each)]

    def unpack(k,matches):
        """If the key is an integer, unpack the value array from matches"""
        if isinstance(k, int):
            k = [m[k][1] for m in matches]
        return k

    # classify each item into matches
    matchLists = (classify(t) for t in iterable)

    # group the matches by their static content
    for key, matches in groupby(matchLists, matchGrouper):
        matches = list(matches)
        # Yield a list of content matches.  Each entry is either text
        # from static content, or a list of matches
        yield [unpack(k, matches) for k in key]

Finally, we add enough logic to perform pretty printing of the output, and run an example.

最后,我们添加足够的逻辑来执行输出的漂亮打印,并运行一个示例。

def makeResultPretty(res):
    """Formats data somewhat like the question"""
    r = []
    for e in res:
        if isinstance(e, list):
            # TODO: collapse and simplify ranges as desired here
            if len(set(e))<=1:
                # it's a list of the same element
                e = e[0]
            else: 
                # prettify the list
                e = '['+' '.join(e)+']'
        r.append(e)
    return ''.join(r)

fnList = sorted(glob.glob('images/*'))
re_digits = re.compile(r'\d+')
for res in classifyGroups(fnList, re_digits):
    print makeResultPretty(res)

My directory of images was created from your example. You can replace fnList with the following list for testing:

我的图像目录是由您的示例创建的。您可以用以下列表替换fnList进行测试:

fnList = [
 'images/image_0001.jpg',
 'images/image_0002.jpg',
 'images/image_0003.jpg',
 'images/image_0010.jpg',
 'images/image_0011-1.jpg',
 'images/image_0011-2.jpg',
 'images/image_0011-3.jpg',
 'images/image_0011.jpg',
 'images/image_9999.jpg']

And when I run against this directory, my output looks like:

当我运行这个目录时,我的输出是:

*/3926936% python classify.py
images/image_[0001 0002 0003 0010].jpg
images/image_0011-[1 2 3].jpg
images/image_[0011 9999].jpg

#3


2  

def ranges(sorted_list):
    first = None
    for x in sorted_list:
        if first is None:
            first = last = x
        elif x == increment(last):
            last = x
        else:
            yield first, last
            first = last = x
    if first is not None:
        yield first, last

The increment function is left as an exercise for the reader.

递增函数留给读者作为练习。

Edit: here's an example of how it would be used with integers instead of strings as input.

编辑:这里有一个例子,说明如何将它用于整数而不是字符串作为输入。

def increment(x): return x+1

list(ranges([1,2,3,4,6,7,8,10]))
[(1, 4), (6, 8), (10, 10)]

For each contiguous range in the input you get a pair indicating the start and end of the range. If an element isn't part of a range, the start and end values are identical.

对于输入中的每个连续范围,您将得到一对表示范围的开始和结束。如果元素不是范围的一部分,则开始值和结束值是相同的。