I have this function that prints the name of all the files in a directory recursively. The problem is that my code is very slow because it has to access a remote network device with every iteration.
我有这个函数,递归地打印目录中的所有文件的名称。问题是我的代码非常慢,因为它需要每次迭代都访问远程网络设备。
My plan is to first load all the files from the directory recursively and then after that go through all files with the regex to filter out all the files I don't want. Does anyone have a better suggestion?
我的计划是先递归地从目录中加载所有文件,然后遍历所有与regex的文件,以过滤掉所有我不想要的文件。有人有更好的建议吗?
public static printFnames(String sDir){
File[] faFiles = new File(sDir).listFiles();
for(File file: faFiles){
if(file.getName().matches("^(.*?)")){
System.out.println(file.getAbsolutePath());
}
if(file.isDirectory()){
printFnames(file.getAbsolutePath());
}
}
}
This is just a test later on I'm not going to use the code like this, instead I'm going to add the path and modification date of every file which matches an advanced regex to an array.
这只是一个稍后的测试,我不会使用这样的代码,我将添加每个文件的路径和修改日期,这些文件匹配一个高级正则表达式到一个数组。
15 个解决方案
#1
121
Assuming this is actual production code you'll be writing, then I suggest using the solution to this sort of thing that's already been solved - Apache Commons IO, specifically FileUtils.listFiles()
. It handles nested directories, filters (based on name, modification time, etc).
假设这是您将要编写的实际生产代码,那么我建议您使用该解决方案来解决已经被解决的问题——Apache Commons IO,特别是FileUtils.listFiles()。它处理嵌套的目录、过滤器(基于名称、修改时间等)。
For example, for your regex:
例如,对于regex:
Collection files = FileUtils.listFiles(
dir,
new RegexFileFilter("^(.*?)"),
DirectoryFileFilter.DIRECTORY
);
This will recursively search for files matching the ^(.*?)
regex, returning the results as a collection.
这将递归地搜索与(.*?)regex匹配的文件,并将结果作为集合返回。
It's worth noting that this will be no faster than rolling your own code, it's doing the same thing - trawling a filesystem in Java is just slow. The difference is, the Apache Commons version will have no bugs in it.
值得注意的是,这将不会比滚动您自己的代码更快,它正在做同样的事情——在Java中拖拽文件系统只是很慢。不同之处在于,Apache Commons版本中没有bug。
#2
41
In Java 8, it's a 1-liner via Files.find()
with an arbitrarily large depth (eg 999
) and BasicFileAttributes
of isRegularFile()
在Java 8中,它是一个通过fil. find()的1-liner(),并使用任意大的深度(如999)和isRegularFile()的BasicFileAttributes ()
public static printFnames(String sDir) {
Files.find(Paths.get(sDir), 999, (p, bfa) -> bfa.isRegularFile()).forEach(System.out::println);
}
To add more filtering, enhance the lambda, for example all jpg files modified in the last 24 hours:
为了添加更多的过滤,增强lambda,例如在过去24小时中修改的所有jpg文件:
(p, bfa) -> bfa.isRegularFile()
&& p.getFileName().toString().matches(".*\\.jpg")
&& bfa.lastModifiedTime().toMillis() > System.currentMillis() - 86400000
#3
22
This is a very simple recursive method to get all files from a given root.
这是一个非常简单的递归方法,可以从给定的根中获取所有的文件。
It uses the Java 7 NIO Path class.
它使用Java 7 NIO路径类。
private List<String> getFileNames(List<String> fileNames, Path dir) {
try(DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
for (Path path : stream) {
if(path.toFile().isDirectory()) {
getFileNames(fileNames, path);
} else {
fileNames.add(path.toAbsolutePath().toString());
System.out.println(path.getFileName());
}
}
} catch(IOException e) {
e.printStackTrace();
}
return fileNames;
}
#4
16
With Java 7 a faster way to walk thru a directory tree was introduced with the Paths
and Files
functionality. They're much faster then the "old" File
way.
通过使用Java 7,可以更快地通过路径和文件功能引入目录树。它们比“旧”的文件方式要快得多。
This would be the code to walk thru and check path names with a regular expression:
这将是通过正则表达式遍历和检查路径名的代码:
public final void test() throws IOException, InterruptedException {
final Path rootDir = Paths.get("path to your directory where the walk starts");
// Walk thru mainDir directory
Files.walkFileTree(rootDir, new FileVisitor<Path>() {
// First (minor) speed up. Compile regular expression pattern only one time.
private Pattern pattern = Pattern.compile("^(.*?)");
@Override
public FileVisitResult preVisitDirectory(Path path,
BasicFileAttributes atts) throws IOException {
boolean matches = pattern.matcher(path.toString()).matches();
// TODO: Put here your business logic when matches equals true/false
return (matches)? FileVisitResult.CONTINUE:FileVisitResult.SKIP_SUBTREE;
}
@Override
public FileVisitResult visitFile(Path path, BasicFileAttributes mainAtts)
throws IOException {
boolean matches = pattern.matcher(path.toString()).matches();
// TODO: Put here your business logic when matches equals true/false
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult postVisitDirectory(Path path,
IOException exc) throws IOException {
// TODO Auto-generated method stub
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path path, IOException exc)
throws IOException {
exc.printStackTrace();
// If the root directory has failed it makes no sense to continue
return path.equals(rootDir)? FileVisitResult.TERMINATE:FileVisitResult.CONTINUE;
}
});
}
#5
12
Java's interface for reading filesystem folder contents is not very performant (as you've discovered). JDK 7 fixes this with a completely new interface for this sort of thing, which should bring native level performance to these sorts of operations.
Java用于读取文件系统文件夹内容的接口并不是很出色(正如您所发现的)。JDK 7使用一个全新的接口来解决这类问题,这将为这些操作带来本机级别的性能。
The core issue is that Java makes a native system call for every single file. On a low latency interface, this is not that big of a deal - but on a network with even moderate latency, it really adds up. If you profile your algorithm above, you'll find that the bulk of the time is spent in the pesky isDirectory() call - that's because you are incurring a round trip for every single call to isDirectory(). Most modern OSes can provide this sort of information when the list of files/folders was originally requested (as opposed to querying each individual file path for it's properties).
核心问题是Java对每个文件都进行本机系统调用。在低延迟的界面上,这并不是什么大不了的事情——但是在一个甚至是中等延迟的网络上,它真的是增加了。如果您对上面的算法进行了概要分析,您会发现大部分时间都花费在了pesky isDirectory()调用中——这是因为您正在对isDirectory()的每个调用进行往返访问。在最初请求文件/文件夹列表时,大多数现代的操作系统都可以提供这种信息(而不是查询每个单独的文件路径以获得它的属性)。
If you can't wait for JDK7, one strategy for addressing this latency is to go multi-threaded and use an ExecutorService with a maximum # of threads to perform your recursion. It's not great (you have to deal with locking of your output data structures), but it'll be a heck of a lot faster than doing this single threaded.
如果您不能等待JDK7,那么解决这个延迟的一个策略就是使用多线程,并使用具有最大线程#的ExecutorService来执行递归。它不是很好(您必须处理您的输出数据结构的锁定),但是它将比单线程运行要快得多。
In all of your discussions about this sort of thing, I highly recommend that you compare against the best you could do using native code (or even a command line script that does roughly the same thing). Saying that it takes an hour to traverse a network structure doesn't really mean that much. Telling us that you can do it native in 7 second, but it takes an hour in Java will get people's attention.
在所有关于这类事情的讨论中,我强烈建议您与最好的方法进行比较,您可以使用本机代码(甚至是一个大致相同的命令行脚本)。说要花一个小时来遍历网络结构并不意味着那么多。告诉我们你可以在7秒内完成它,但是在Java中花一个小时会引起人们的注意。
#6
12
The fast way to get the content of a directory using Java 7 NIO :
使用Java 7 NIO获取目录内容的快速方法:
import java.nio.file.DirectoryStream;
import java.nio.file.Files;
import java.nio.file.FileSystems;
import java.nio.file.Path;
...
Path dir = FileSystems.getDefault().getPath( filePath );
DirectoryStream<Path> stream = Files.newDirectoryStream( dir );
for (Path path : stream) {
System.out.println( path.getFileName() );
}
stream.close();
#7
5
this will work just fine ... and its recursive
这将会很好……和它的递归
File root = new File("ROOT PATH");
for ( File file : root.listFiles())
{
getFilesRecursive(file);
}
private static void getFilesRecursive(File pFile)
{
for(File files : pFile.listFiles())
{
if(files.isDirectory())
{
getFilesRecursive(files);
}
else
{
// do your thing
// you can either save in HashMap and use it as
// per your requirement
}
}
}
#8
3
I personally like this version of FileUtils. Here's an example that finds all mp3s or flacs in a directory or any of its subdirectories:
我个人喜欢这个版本的FileUtils。下面是一个在目录或其子目录中找到所有mp3或flacs的示例:
String[] types = {"mp3", "flac"};
Collection<File> files2 = FileUtils.listFiles(/path/to/your/dir, types , true);
#9
2
This will work fine
这将会正常工作
public void displayAll(File path){
if(path.isFile()){
System.out.println(path.getName());
}else{
System.out.println(path.getName());
File files[] = path.listFiles();
for(File dirOrFile: files){
displayAll(dirOrFile);
}
}
}
#10
1
This Function will probably list all the file name and its path from its directory and its subdirectories.
这个函数可能会列出所有文件名及其路径,从它的目录及其子目录。
public void listFile(String pathname) {
File f = new File(pathname);
File[] listfiles = f.listFiles();
for (int i = 0; i < listfiles.length; i++) {
if (listfiles[i].isDirectory()) {
File[] internalFile = listfiles[i].listFiles();
for (int j = 0; j < internalFile.length; j++) {
System.out.println(internalFile[j]);
if (internalFile[j].isDirectory()) {
String name = internalFile[j].getAbsolutePath();
listFile(name);
}
}
} else {
System.out.println(listfiles[i]);
}
}
}
#11
0
it feels like it's stupid access the filesystem and get the contents for every subdirectory instead of getting everything at once.
它觉得访问文件系统并获取每个子目录的内容,而不是同时获取所有内容,这感觉很愚蠢。
Your feeling is wrong. That's how filesystems work. There is no faster way (except when you have to do this repeatedly or for different patterns, you can cache all the file paths in memory, but then you have to deal with cache invalidation i.e. what happens when files are added/removed/renamed while the app runs).
你的感觉是错误的。的文件系统是如何工作的。没有更快的方法(除非您必须反复地重复或使用不同的模式,您可以在内存中缓存所有的文件路径,但是您必须处理缓存失效,即在应用程序运行时添加/删除/重命名文件时会发生什么情况)。
#12
0
Just so you know isDirectory() is quite a slow method. I'm finding it quite slow in my file browser. I'll be looking into a library to replace it with native code.
因此,您知道isDirectory()是一个相当慢的方法。我在文件浏览器中发现它很慢。我将查找一个库,用本机代码替换它。
#13
0
The more efficient way I found in dealing with millions of folders and files is to capture directory listing through DOS command in some file and parse it. Once you have parsed data then you can do analysis and compute statistics.
在处理数百万个文件夹和文件时,我发现的更有效的方法是通过DOS命令在某些文件中捕获目录列表并解析它。一旦你分析了数据,你就可以做分析和计算统计。
#14
0
import java.io.*;
public class MultiFolderReading {
public void checkNoOfFiles (String filename) throws IOException {
File dir=new File(filename);
File files[]=dir.listFiles();//files array stores the list of files
for(int i=0;i<files.length;i++)
{
if(files[i].isFile()) //check whether files[i] is file or directory
{
System.out.println("File::"+files[i].getName());
System.out.println();
}
else if(files[i].isDirectory())
{
System.out.println("Directory::"+files[i].getName());
System.out.println();
checkNoOfFiles(files[i].getAbsolutePath());
}
}
}
public static void main(String[] args) throws IOException {
MultiFolderReading mf=new MultiFolderReading();
String str="E:\\file";
mf.checkNoOfFiles(str);
}
}
#15
0
In Guava you don't have to wait for a Collection to be returned to you but can actually iterate over the files. It is easy to imagine a IDoSomethingWithThisFile
interface in the signature of the below function:
在Guava中,您不必等待将一个集合返回给您,但实际上可以遍历这些文件。在下面的函数的签名中,很容易想象一个带有这个文件接口的idosomethingsinterface:
public static void collectFilesInDir(File dir) {
TreeTraverser<File> traverser = Files.fileTreeTraverser();
FluentIterable<File> filesInPostOrder = traverser.preOrderTraversal(dir);
for (File f: filesInPostOrder)
System.out.printf("File: %s\n", f.getPath());
}
TreeTraverser also allows you to between various traversal styles.
TreeTraverser还允许您在各种遍历样式之间进行切换。
#1
121
Assuming this is actual production code you'll be writing, then I suggest using the solution to this sort of thing that's already been solved - Apache Commons IO, specifically FileUtils.listFiles()
. It handles nested directories, filters (based on name, modification time, etc).
假设这是您将要编写的实际生产代码,那么我建议您使用该解决方案来解决已经被解决的问题——Apache Commons IO,特别是FileUtils.listFiles()。它处理嵌套的目录、过滤器(基于名称、修改时间等)。
For example, for your regex:
例如,对于regex:
Collection files = FileUtils.listFiles(
dir,
new RegexFileFilter("^(.*?)"),
DirectoryFileFilter.DIRECTORY
);
This will recursively search for files matching the ^(.*?)
regex, returning the results as a collection.
这将递归地搜索与(.*?)regex匹配的文件,并将结果作为集合返回。
It's worth noting that this will be no faster than rolling your own code, it's doing the same thing - trawling a filesystem in Java is just slow. The difference is, the Apache Commons version will have no bugs in it.
值得注意的是,这将不会比滚动您自己的代码更快,它正在做同样的事情——在Java中拖拽文件系统只是很慢。不同之处在于,Apache Commons版本中没有bug。
#2
41
In Java 8, it's a 1-liner via Files.find()
with an arbitrarily large depth (eg 999
) and BasicFileAttributes
of isRegularFile()
在Java 8中,它是一个通过fil. find()的1-liner(),并使用任意大的深度(如999)和isRegularFile()的BasicFileAttributes ()
public static printFnames(String sDir) {
Files.find(Paths.get(sDir), 999, (p, bfa) -> bfa.isRegularFile()).forEach(System.out::println);
}
To add more filtering, enhance the lambda, for example all jpg files modified in the last 24 hours:
为了添加更多的过滤,增强lambda,例如在过去24小时中修改的所有jpg文件:
(p, bfa) -> bfa.isRegularFile()
&& p.getFileName().toString().matches(".*\\.jpg")
&& bfa.lastModifiedTime().toMillis() > System.currentMillis() - 86400000
#3
22
This is a very simple recursive method to get all files from a given root.
这是一个非常简单的递归方法,可以从给定的根中获取所有的文件。
It uses the Java 7 NIO Path class.
它使用Java 7 NIO路径类。
private List<String> getFileNames(List<String> fileNames, Path dir) {
try(DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
for (Path path : stream) {
if(path.toFile().isDirectory()) {
getFileNames(fileNames, path);
} else {
fileNames.add(path.toAbsolutePath().toString());
System.out.println(path.getFileName());
}
}
} catch(IOException e) {
e.printStackTrace();
}
return fileNames;
}
#4
16
With Java 7 a faster way to walk thru a directory tree was introduced with the Paths
and Files
functionality. They're much faster then the "old" File
way.
通过使用Java 7,可以更快地通过路径和文件功能引入目录树。它们比“旧”的文件方式要快得多。
This would be the code to walk thru and check path names with a regular expression:
这将是通过正则表达式遍历和检查路径名的代码:
public final void test() throws IOException, InterruptedException {
final Path rootDir = Paths.get("path to your directory where the walk starts");
// Walk thru mainDir directory
Files.walkFileTree(rootDir, new FileVisitor<Path>() {
// First (minor) speed up. Compile regular expression pattern only one time.
private Pattern pattern = Pattern.compile("^(.*?)");
@Override
public FileVisitResult preVisitDirectory(Path path,
BasicFileAttributes atts) throws IOException {
boolean matches = pattern.matcher(path.toString()).matches();
// TODO: Put here your business logic when matches equals true/false
return (matches)? FileVisitResult.CONTINUE:FileVisitResult.SKIP_SUBTREE;
}
@Override
public FileVisitResult visitFile(Path path, BasicFileAttributes mainAtts)
throws IOException {
boolean matches = pattern.matcher(path.toString()).matches();
// TODO: Put here your business logic when matches equals true/false
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult postVisitDirectory(Path path,
IOException exc) throws IOException {
// TODO Auto-generated method stub
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path path, IOException exc)
throws IOException {
exc.printStackTrace();
// If the root directory has failed it makes no sense to continue
return path.equals(rootDir)? FileVisitResult.TERMINATE:FileVisitResult.CONTINUE;
}
});
}
#5
12
Java's interface for reading filesystem folder contents is not very performant (as you've discovered). JDK 7 fixes this with a completely new interface for this sort of thing, which should bring native level performance to these sorts of operations.
Java用于读取文件系统文件夹内容的接口并不是很出色(正如您所发现的)。JDK 7使用一个全新的接口来解决这类问题,这将为这些操作带来本机级别的性能。
The core issue is that Java makes a native system call for every single file. On a low latency interface, this is not that big of a deal - but on a network with even moderate latency, it really adds up. If you profile your algorithm above, you'll find that the bulk of the time is spent in the pesky isDirectory() call - that's because you are incurring a round trip for every single call to isDirectory(). Most modern OSes can provide this sort of information when the list of files/folders was originally requested (as opposed to querying each individual file path for it's properties).
核心问题是Java对每个文件都进行本机系统调用。在低延迟的界面上,这并不是什么大不了的事情——但是在一个甚至是中等延迟的网络上,它真的是增加了。如果您对上面的算法进行了概要分析,您会发现大部分时间都花费在了pesky isDirectory()调用中——这是因为您正在对isDirectory()的每个调用进行往返访问。在最初请求文件/文件夹列表时,大多数现代的操作系统都可以提供这种信息(而不是查询每个单独的文件路径以获得它的属性)。
If you can't wait for JDK7, one strategy for addressing this latency is to go multi-threaded and use an ExecutorService with a maximum # of threads to perform your recursion. It's not great (you have to deal with locking of your output data structures), but it'll be a heck of a lot faster than doing this single threaded.
如果您不能等待JDK7,那么解决这个延迟的一个策略就是使用多线程,并使用具有最大线程#的ExecutorService来执行递归。它不是很好(您必须处理您的输出数据结构的锁定),但是它将比单线程运行要快得多。
In all of your discussions about this sort of thing, I highly recommend that you compare against the best you could do using native code (or even a command line script that does roughly the same thing). Saying that it takes an hour to traverse a network structure doesn't really mean that much. Telling us that you can do it native in 7 second, but it takes an hour in Java will get people's attention.
在所有关于这类事情的讨论中,我强烈建议您与最好的方法进行比较,您可以使用本机代码(甚至是一个大致相同的命令行脚本)。说要花一个小时来遍历网络结构并不意味着那么多。告诉我们你可以在7秒内完成它,但是在Java中花一个小时会引起人们的注意。
#6
12
The fast way to get the content of a directory using Java 7 NIO :
使用Java 7 NIO获取目录内容的快速方法:
import java.nio.file.DirectoryStream;
import java.nio.file.Files;
import java.nio.file.FileSystems;
import java.nio.file.Path;
...
Path dir = FileSystems.getDefault().getPath( filePath );
DirectoryStream<Path> stream = Files.newDirectoryStream( dir );
for (Path path : stream) {
System.out.println( path.getFileName() );
}
stream.close();
#7
5
this will work just fine ... and its recursive
这将会很好……和它的递归
File root = new File("ROOT PATH");
for ( File file : root.listFiles())
{
getFilesRecursive(file);
}
private static void getFilesRecursive(File pFile)
{
for(File files : pFile.listFiles())
{
if(files.isDirectory())
{
getFilesRecursive(files);
}
else
{
// do your thing
// you can either save in HashMap and use it as
// per your requirement
}
}
}
#8
3
I personally like this version of FileUtils. Here's an example that finds all mp3s or flacs in a directory or any of its subdirectories:
我个人喜欢这个版本的FileUtils。下面是一个在目录或其子目录中找到所有mp3或flacs的示例:
String[] types = {"mp3", "flac"};
Collection<File> files2 = FileUtils.listFiles(/path/to/your/dir, types , true);
#9
2
This will work fine
这将会正常工作
public void displayAll(File path){
if(path.isFile()){
System.out.println(path.getName());
}else{
System.out.println(path.getName());
File files[] = path.listFiles();
for(File dirOrFile: files){
displayAll(dirOrFile);
}
}
}
#10
1
This Function will probably list all the file name and its path from its directory and its subdirectories.
这个函数可能会列出所有文件名及其路径,从它的目录及其子目录。
public void listFile(String pathname) {
File f = new File(pathname);
File[] listfiles = f.listFiles();
for (int i = 0; i < listfiles.length; i++) {
if (listfiles[i].isDirectory()) {
File[] internalFile = listfiles[i].listFiles();
for (int j = 0; j < internalFile.length; j++) {
System.out.println(internalFile[j]);
if (internalFile[j].isDirectory()) {
String name = internalFile[j].getAbsolutePath();
listFile(name);
}
}
} else {
System.out.println(listfiles[i]);
}
}
}
#11
0
it feels like it's stupid access the filesystem and get the contents for every subdirectory instead of getting everything at once.
它觉得访问文件系统并获取每个子目录的内容,而不是同时获取所有内容,这感觉很愚蠢。
Your feeling is wrong. That's how filesystems work. There is no faster way (except when you have to do this repeatedly or for different patterns, you can cache all the file paths in memory, but then you have to deal with cache invalidation i.e. what happens when files are added/removed/renamed while the app runs).
你的感觉是错误的。的文件系统是如何工作的。没有更快的方法(除非您必须反复地重复或使用不同的模式,您可以在内存中缓存所有的文件路径,但是您必须处理缓存失效,即在应用程序运行时添加/删除/重命名文件时会发生什么情况)。
#12
0
Just so you know isDirectory() is quite a slow method. I'm finding it quite slow in my file browser. I'll be looking into a library to replace it with native code.
因此,您知道isDirectory()是一个相当慢的方法。我在文件浏览器中发现它很慢。我将查找一个库,用本机代码替换它。
#13
0
The more efficient way I found in dealing with millions of folders and files is to capture directory listing through DOS command in some file and parse it. Once you have parsed data then you can do analysis and compute statistics.
在处理数百万个文件夹和文件时,我发现的更有效的方法是通过DOS命令在某些文件中捕获目录列表并解析它。一旦你分析了数据,你就可以做分析和计算统计。
#14
0
import java.io.*;
public class MultiFolderReading {
public void checkNoOfFiles (String filename) throws IOException {
File dir=new File(filename);
File files[]=dir.listFiles();//files array stores the list of files
for(int i=0;i<files.length;i++)
{
if(files[i].isFile()) //check whether files[i] is file or directory
{
System.out.println("File::"+files[i].getName());
System.out.println();
}
else if(files[i].isDirectory())
{
System.out.println("Directory::"+files[i].getName());
System.out.println();
checkNoOfFiles(files[i].getAbsolutePath());
}
}
}
public static void main(String[] args) throws IOException {
MultiFolderReading mf=new MultiFolderReading();
String str="E:\\file";
mf.checkNoOfFiles(str);
}
}
#15
0
In Guava you don't have to wait for a Collection to be returned to you but can actually iterate over the files. It is easy to imagine a IDoSomethingWithThisFile
interface in the signature of the below function:
在Guava中,您不必等待将一个集合返回给您,但实际上可以遍历这些文件。在下面的函数的签名中,很容易想象一个带有这个文件接口的idosomethingsinterface:
public static void collectFilesInDir(File dir) {
TreeTraverser<File> traverser = Files.fileTreeTraverser();
FluentIterable<File> filesInPostOrder = traverser.preOrderTraversal(dir);
for (File f: filesInPostOrder)
System.out.printf("File: %s\n", f.getPath());
}
TreeTraverser also allows you to between various traversal styles.
TreeTraverser还允许您在各种遍历样式之间进行切换。