I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:
我想在文件夹中打开一系列子文件夹,找到一些文本文件并打印一些文本文件。我用这个:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')
But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?
但是这也不能访问子文件夹。有人知道我如何使用相同的命令来访问子文件夹吗?
9 个解决方案
#1
88
In Python 3.5 and newer use the new recursive **/
functionality:
在Python 3.5和更新版本中,使用新的递归**/功能:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)
When recursive
is set, **
followed by a path separator matches 0 or more subdirectories.
设置递归时,**后跟路径分隔符匹配0或更多子目录。
In earlier Python versions, glob.glob()
cannot list files in subdirectories recursively.
在早期的Python版本中,glob.glob()不能递归地列出子目录中的文件。
In that case I'd use os.walk()
combined with fnmatch.filter()
instead:
在这种情况下,我将使用os.walk()和fnmatch.filter()相结合:
import os
import fnmatch
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in fnmatch.filter(files, '*.txt')]
This'll walk your directories recursively and return all absolute pathnames to matching .txt
files. In this specific case the fnmatch.filter()
may be overkill, you could also use a .endswith()
test:
这将递归地遍历目录,并将所有绝对路径名返回到匹配的.txt文件中。在这种特定的情况下,fnmatch.filter()可能会过度,您还可以使用.endswith()测试:
import os
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in files if f.endswith('.txt')]
#2
13
The glob2 package supports wild cards and is reasonably fast
glob2包支持通配符,并且速度相当快
code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)
On my laptop it takes approximately 2 seconds to match >60,000 file paths.
在我的笔记本电脑上,大约需要2秒的时间来匹配>万条文件路径。
#3
8
To find files in immediate subdirectories:
在直接子目录中查找文件:
configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')
For a recursive version that traverse all subdirectories, you could use **
and pass recursive=True
since Python 3.5:
对于遍历所有子目录的递归版本,可以使用**并传递递归=True,因为Python 3.5:
configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)
Both function calls return lists. You could use glob.iglob()
to return paths one by one. Or use pathlib
:
两个函数都调用返回列表。可以使用glob.iglob()逐个返回路径。或者使用pathlib:
from pathlib import Path
path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir
Both methods return iterators (you can get paths one by one).
这两个方法都返回迭代器(您可以逐个获得路径)。
#4
6
You can use Formic with Python 2.6
您可以在Python 2.6中使用Formic
import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")
Disclosure - I am the author of this package.
披露-我是这个包裹的作者。
#5
2
Here is a adapted version that enables glob.glob
like functionality without using glob2
.
这里有一个适合的版本,它支持glob。glob喜欢不使用glob2的功能。
def find_files(directory, pattern='*'):
if not os.path.exists(directory):
raise ValueError("Directory not found {}".format(directory))
matches = []
for root, dirnames, filenames in os.walk(directory):
for filename in filenames:
full_path = os.path.join(root, filename)
if fnmatch.filter([full_path], pattern):
matches.append(os.path.join(root, filename))
return matches
So if you have the following dir structure
如果你有下面的dir结构
tests/files
├── a0
│ ├── a0.txt
│ ├── a0.yaml
│ └── b0
│ ├── b0.yaml
│ └── b00.yaml
└── a1
You can do something like this
你可以这样做
files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']
Pretty much fnmatch
pattern match on the whole filename itself, rather than the filename only.
几乎是整个文件名本身的fnmatch模式匹配,而不仅仅是文件名。
#6
2
configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")
configfile = glob.glob(“C:/用户/ sam /桌面/ * * / * . txt”)
Doesn't works for all cases, instead use glob2
并不适用于所有情况,而是使用glob2
configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")
#7
2
If you can install glob2 package...
如果您可以安装glob2软件包…
import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext") # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")
All filenames and folders:
所有的文件名和文件夹:
all_ff = glob2.glob("C:\\top_directory\\**\\**")
#8
1
If you're running Python 3.4+, you can use the pathlib
module. The Path.glob()
method supports the **
pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Path
objects for all matching files.
如果您正在运行Python 3.4+,您可以使用pathlib模块。Path.glob()方法支持**模式,它的意思是“递归地包含这个目录和所有子目录”。它返回一个生成器,生成所有匹配文件的路径对象。
from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")
#9
0
As pointed out by Martijn, glob can only do this through the **
operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly
正如Martijn所指出的,glob只能通过Python 3.5中引入的**操作符来实现这一点。由于OP显式地请求glob模块,下面将返回一个行为类似的延迟计算迭代器
import os, glob, itertools
configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))
Note that you can only iterate once over configfiles
in this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles)
.
注意,在这种方法中,只能对configfiles进行一次迭代。如果您需要一个可以在多个操作中使用的真正的configfile列表,那么必须使用list(configfiles)显式地创建这个列表。
#1
88
In Python 3.5 and newer use the new recursive **/
functionality:
在Python 3.5和更新版本中,使用新的递归**/功能:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)
When recursive
is set, **
followed by a path separator matches 0 or more subdirectories.
设置递归时,**后跟路径分隔符匹配0或更多子目录。
In earlier Python versions, glob.glob()
cannot list files in subdirectories recursively.
在早期的Python版本中,glob.glob()不能递归地列出子目录中的文件。
In that case I'd use os.walk()
combined with fnmatch.filter()
instead:
在这种情况下,我将使用os.walk()和fnmatch.filter()相结合:
import os
import fnmatch
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in fnmatch.filter(files, '*.txt')]
This'll walk your directories recursively and return all absolute pathnames to matching .txt
files. In this specific case the fnmatch.filter()
may be overkill, you could also use a .endswith()
test:
这将递归地遍历目录,并将所有绝对路径名返回到匹配的.txt文件中。在这种特定的情况下,fnmatch.filter()可能会过度,您还可以使用.endswith()测试:
import os
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in files if f.endswith('.txt')]
#2
13
The glob2 package supports wild cards and is reasonably fast
glob2包支持通配符,并且速度相当快
code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)
On my laptop it takes approximately 2 seconds to match >60,000 file paths.
在我的笔记本电脑上,大约需要2秒的时间来匹配>万条文件路径。
#3
8
To find files in immediate subdirectories:
在直接子目录中查找文件:
configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')
For a recursive version that traverse all subdirectories, you could use **
and pass recursive=True
since Python 3.5:
对于遍历所有子目录的递归版本,可以使用**并传递递归=True,因为Python 3.5:
configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)
Both function calls return lists. You could use glob.iglob()
to return paths one by one. Or use pathlib
:
两个函数都调用返回列表。可以使用glob.iglob()逐个返回路径。或者使用pathlib:
from pathlib import Path
path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir
Both methods return iterators (you can get paths one by one).
这两个方法都返回迭代器(您可以逐个获得路径)。
#4
6
You can use Formic with Python 2.6
您可以在Python 2.6中使用Formic
import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")
Disclosure - I am the author of this package.
披露-我是这个包裹的作者。
#5
2
Here is a adapted version that enables glob.glob
like functionality without using glob2
.
这里有一个适合的版本,它支持glob。glob喜欢不使用glob2的功能。
def find_files(directory, pattern='*'):
if not os.path.exists(directory):
raise ValueError("Directory not found {}".format(directory))
matches = []
for root, dirnames, filenames in os.walk(directory):
for filename in filenames:
full_path = os.path.join(root, filename)
if fnmatch.filter([full_path], pattern):
matches.append(os.path.join(root, filename))
return matches
So if you have the following dir structure
如果你有下面的dir结构
tests/files
├── a0
│ ├── a0.txt
│ ├── a0.yaml
│ └── b0
│ ├── b0.yaml
│ └── b00.yaml
└── a1
You can do something like this
你可以这样做
files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']
Pretty much fnmatch
pattern match on the whole filename itself, rather than the filename only.
几乎是整个文件名本身的fnmatch模式匹配,而不仅仅是文件名。
#6
2
configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")
configfile = glob.glob(“C:/用户/ sam /桌面/ * * / * . txt”)
Doesn't works for all cases, instead use glob2
并不适用于所有情况,而是使用glob2
configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")
#7
2
If you can install glob2 package...
如果您可以安装glob2软件包…
import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext") # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")
All filenames and folders:
所有的文件名和文件夹:
all_ff = glob2.glob("C:\\top_directory\\**\\**")
#8
1
If you're running Python 3.4+, you can use the pathlib
module. The Path.glob()
method supports the **
pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Path
objects for all matching files.
如果您正在运行Python 3.4+,您可以使用pathlib模块。Path.glob()方法支持**模式,它的意思是“递归地包含这个目录和所有子目录”。它返回一个生成器,生成所有匹配文件的路径对象。
from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")
#9
0
As pointed out by Martijn, glob can only do this through the **
operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly
正如Martijn所指出的,glob只能通过Python 3.5中引入的**操作符来实现这一点。由于OP显式地请求glob模块,下面将返回一个行为类似的延迟计算迭代器
import os, glob, itertools
configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))
Note that you can only iterate once over configfiles
in this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles)
.
注意,在这种方法中,只能对configfiles进行一次迭代。如果您需要一个可以在多个操作中使用的真正的configfile列表,那么必须使用list(configfiles)显式地创建这个列表。