如何在Python中对给定后缀的文件执行不区分大小写的搜索?

时间:2022-10-16 20:16:01

I'm looking for the equivalent of find $DIR -iname '*.mp3', and I don't want to do the kooky ['mp3', 'Mp3', MP3', etc] thing. But I can't figure out how to combine the re*.IGNORECASE stuff with the simple endswith() approach. My goal is to not miss a single file, and I'd like to eventually expand this to other media/file types/suffixes.

我在寻找等价于查找$DIR -iname '*。“mp3”,我不想做“mp3”、“mp3”、“mp3”之类的东西。但是我不知道怎么把它们结合起来。使用简单的endswith()方法忽略这些内容。我的目标是不遗漏任何一个文件,我希望最终将其扩展到其他媒体/文件类型/后缀。

import os
import re
suffix = ".mp3"

mp3_count = 0

for root, dirs, files in os.walk("/Volumes/audio"):
    for file in files:
        # if file.endswith(suffix):
        if re.findall('mp3', suffix, flags=re.IGNORECASE):
            mp3_count += 1

print(mp3_count)

TIA for any feedback

TIA的任何反馈

3 个解决方案

#1


1  

You can try this :)

你可以试试这个:

import os
# import re
suffix = "mp3"

mp3_count = 0

for root, dirs, files in os.walk("/Volumes/audio"):
    for file in files:
        # if file.endswith(suffix):
        if file.split('.')[-1].lower() == suffix:
            mp3_count += 1

print(mp3_count)

Python's string.split() will separate the string into a list, depending on what parameter is given, and you can access the suffix by [-1], the last element in the list

Python的string.split()将根据给定的参数将字符串分隔为一个列表,您可以通过列表中的最后一个元素[-1]访问后缀

#2


1  

Don't bother with os.walk. Learn to use the easier, awesome pathlib.Path instead. Like so:

与os.walk别烦。学会使用更简单、更棒的路径库。路径。像这样:

from pathlib import Path

suffix = ".mp3"

mp3_count = 0

p = Path('Volumes')/'audio': # note the easy path creation syntax
# OR even:
p = Path()/'Volumes'/'audio': 

for subp in p.rglob('*'): #  recursively iterate all items matching the glob pattern
    # .suffix property refers to .ext extension
    ext = subp.suffix
    # use the .lower() method to get lowercase version of extension
    if ext.lower() == suffix: 
        mp3_count += 1

print(mp3_count)

"One-liner", if you're into that sort of thing (multiple lines for clarity):

“一行”,如果你喜欢这样的话(清晰的多行):

sum([1 for subp in (Path('Volumes')/'audio').rglob('*')
     if subp.suffix.lower() == suffix])

#3


0  

The regex equivalent of .endswith is the $ sign.

与.endswith等价的regex是$符号。

To use your example above, you could do this;

用上面的例子,你可以这样做;

re.findall('mp3$', suffix, flags=re.IGNORECASE):

Though it might be more accurate to do this;

虽然这样做可能更准确;

re.findall(r'\.mp3$', suffix, flags=re.IGNORECASE):

which makes sure that the filename ends with .mp3 rather than picking up files such as test.amp3.

确保文件名以.mp3结尾,而不是获取test.amp3之类的文件。

This is a pretty good example of a situation that doesn't really require regex - so while you're welcome to learn from these examples, it's worth considering the alternatives provided by other answerers.

这是不需要regex的情况的一个很好的例子——因此,尽管欢迎您从这些示例中学习,但是值得考虑其他答案者提供的替代方案。

#1


1  

You can try this :)

你可以试试这个:

import os
# import re
suffix = "mp3"

mp3_count = 0

for root, dirs, files in os.walk("/Volumes/audio"):
    for file in files:
        # if file.endswith(suffix):
        if file.split('.')[-1].lower() == suffix:
            mp3_count += 1

print(mp3_count)

Python's string.split() will separate the string into a list, depending on what parameter is given, and you can access the suffix by [-1], the last element in the list

Python的string.split()将根据给定的参数将字符串分隔为一个列表,您可以通过列表中的最后一个元素[-1]访问后缀

#2


1  

Don't bother with os.walk. Learn to use the easier, awesome pathlib.Path instead. Like so:

与os.walk别烦。学会使用更简单、更棒的路径库。路径。像这样:

from pathlib import Path

suffix = ".mp3"

mp3_count = 0

p = Path('Volumes')/'audio': # note the easy path creation syntax
# OR even:
p = Path()/'Volumes'/'audio': 

for subp in p.rglob('*'): #  recursively iterate all items matching the glob pattern
    # .suffix property refers to .ext extension
    ext = subp.suffix
    # use the .lower() method to get lowercase version of extension
    if ext.lower() == suffix: 
        mp3_count += 1

print(mp3_count)

"One-liner", if you're into that sort of thing (multiple lines for clarity):

“一行”,如果你喜欢这样的话(清晰的多行):

sum([1 for subp in (Path('Volumes')/'audio').rglob('*')
     if subp.suffix.lower() == suffix])

#3


0  

The regex equivalent of .endswith is the $ sign.

与.endswith等价的regex是$符号。

To use your example above, you could do this;

用上面的例子,你可以这样做;

re.findall('mp3$', suffix, flags=re.IGNORECASE):

Though it might be more accurate to do this;

虽然这样做可能更准确;

re.findall(r'\.mp3$', suffix, flags=re.IGNORECASE):

which makes sure that the filename ends with .mp3 rather than picking up files such as test.amp3.

确保文件名以.mp3结尾,而不是获取test.amp3之类的文件。

This is a pretty good example of a situation that doesn't really require regex - so while you're welcome to learn from these examples, it's worth considering the alternatives provided by other answerers.

这是不需要regex的情况的一个很好的例子——因此,尽管欢迎您从这些示例中学习,但是值得考虑其他答案者提供的替代方案。