是否有可能加快PHP中的递归文件扫描?

时间:2021-06-21 23:39:54

I've been trying to replicate Gnu Find ("find .") in PHP, but it seems impossible to get even close to its speed. The PHP implementations use at least twice the time of Find. Are there faster ways of doing this with PHP?

我一直在尝试用PHP复制Gnu Find(“find。”),但似乎不可能接近它的速度。 PHP实现使用至少两倍的Find时间。使用PHP有更快的方法吗?

EDIT: I added a code example using the SPL implementation -- its performance is equal to the iterative approach

编辑:我使用SPL实现添加了一个代码示例 - 它的性能等于迭代方法

EDIT2: When calling find from PHP it was actually slower than the native PHP implementation. I guess I should be satisfied with what I've got :)

EDIT2:从PHP调用find时,它实际上比本机PHP实现慢。我想我应该对我得到的东西感到满意:)

// measured to 317% of gnu find's speed when run directly from a shell
function list_recursive($dir) { 
  if ($dh = opendir($dir)) {
    while (false !== ($entry = readdir($dh))) {
      if ($entry == '.' || $entry == '..') continue;

      $path = "$dir/$entry";
      echo "$path\n";
      if (is_dir($path)) list_recursive($path);       
    }
    closedir($d);
  }
}

// measured to 315% of gnu find's speed when run directly from a shell
function list_iterative($from) {
  $dirs = array($from);  
  while (NULL !== ($dir = array_pop($dirs))) {  
    if ($dh = opendir($dir)) {    
      while (false !== ($entry = readdir($dh))) {      
        if ($entry == '.' || $entry == '..') continue;        

        $path = "$dir/$entry";        
        echo "$path\n";        
        if (is_dir($path)) $dirs[] = $path;        
      }      
      closedir($dh);      
    }    
  }  
}

// measured to 315% of gnu find's speed when run directly from a shell
function list_recursivedirectoryiterator($path) {
  $it = new RecursiveDirectoryIterator($path);
  foreach ($it as $file) {
    if ($file->isDot()) continue;

    echo $file->getPathname();
  }
}

// measured to 390% of gnu find's speed when run directly from a shell
function list_gnufind($dir) { 
  $dir = escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir", "r");
  while ('' != ($s = fread($h, 2048))) {
    echo $s;
  }
  pclose($h);
}

7 个解决方案

#1


PHP just cannot perform as fast as C, plain and simple.

PHP的执行速度不如C,简单明了。

#2


I'm not sure if the performance is better, but you could use a recursive directory iterator to make your code simpler... See RecursiveDirectoryIterator and 'SplFileInfo`.

我不确定性能是否更好,但您可以使用递归目录迭代器来使代码更简单...请参阅RecursiveDirectoryIterator和'SplFileInfo`。

$it = new RecursiveDirectoryIterator($from);
foreach ($it as $file)
{
    if ($file->isDot())
        continue;

    echo $file->getPathname();
}

#3


Before you start changing anything, profile your code.

在开始更改任何内容之前,请分析您的代码。

Use something like Xdebug (plus kcachegrind for a pretty graph) to find out where the slow parts are. If you start changing things blindly, you won't get anywhere.

使用类似Xdebug(加上kcachegrind的漂亮图形)来找出慢速部分的位置。如果你盲目地改变一切,你就无法到达任何地方。

My only other advice is to use the SPL directory iterators as posted already. Letting the internal C code do the work is almost always faster.

我唯一的其他建议是使用已发布的SPL目录迭代器。让内部C代码完成工作几乎总是更快。

#4


Why would you expect the interpreted PHP code to be as fast as the compiled C version of find? Being only twice as slow is actually pretty good.

为什么您希望解释的PHP代码与编译的C版本的find一样快?只有两倍的速度实际上相当不错。

About the only advice I would add is to do a ob_start() at the beginning and ob_get_contents(), ob_end_clean() at the end. That might speed things up.

关于我要添加的唯一建议是在开头做ob_start(),在结尾做ob_get_contents(),ob_end_clean()。这可能会加快速度。

#5


You're keeping N directory streams open where N is the depth of the directory tree. Instead, try reading an entire directory's worth of entries at once, and then iterate over the entries. At the very least you'll maximize use of the desk I/O caches.

您将保持N个目录流打开,其中N是目录树的深度。相反,尝试一次读取整个目录的条目,然后迭代条目。至少,您将最大限度地使用桌面I / O缓存。

#6


You might want to seriously consider just using GNU find. If it's available, and safe mode isn't turned on, you'll probably like the results just fine:

您可能想要认真考虑使用GNU find。如果它可用,并且没有打开安全模式,你可能会喜欢结果:

function list_recursive($dir) { 
  $dir=escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir -type f", "r")
  while ($s = fgets($h,1024)) { 
    echo $s;
  }
  pclose($h);
}

However there might to be some directory that's so big, you're not going to want to bother with this either. Consider amortizing the slowness in other ways. Your second try can be checkpointed (for example) by simply saving the directory stack in the session. If you're giving the user a list of files, simply collect a pageful then save the rest of the state in the session for page 2.

但是可能有一些目录太大了,你也不想打扰这个。考虑以其他方式缓慢缓慢。您可以通过在会话中保存目录堆栈来检查第二次尝试(例如)。如果您要为用户提供文件列表,只需收集一份页面,然后将剩余的状态保存在第2页的会话中。

#7


Try using scandir() to read a whole directory at once, as Jason Cohen has suggested. I've based the following code on code from the php manual comments for scandir()

正如Jason Cohen建议的那样,尝试使用scandir()一次读取整个目录。我基于scandir()的php手册注释中的代码基于以下代码

 function scan( $dir ){
        $dirs = array_diff( scandir( $dir ), Array( ".", ".." ));
        $dir_array = Array();
        foreach( $dirs as $d )
            $dir_array[ $d ] = is_dir($dir."/".$d) ? scan( $dir."/".$d) : print $dir."/".$d."\n";
 }

#1


PHP just cannot perform as fast as C, plain and simple.

PHP的执行速度不如C,简单明了。

#2


I'm not sure if the performance is better, but you could use a recursive directory iterator to make your code simpler... See RecursiveDirectoryIterator and 'SplFileInfo`.

我不确定性能是否更好,但您可以使用递归目录迭代器来使代码更简单...请参阅RecursiveDirectoryIterator和'SplFileInfo`。

$it = new RecursiveDirectoryIterator($from);
foreach ($it as $file)
{
    if ($file->isDot())
        continue;

    echo $file->getPathname();
}

#3


Before you start changing anything, profile your code.

在开始更改任何内容之前,请分析您的代码。

Use something like Xdebug (plus kcachegrind for a pretty graph) to find out where the slow parts are. If you start changing things blindly, you won't get anywhere.

使用类似Xdebug(加上kcachegrind的漂亮图形)来找出慢速部分的位置。如果你盲目地改变一切,你就无法到达任何地方。

My only other advice is to use the SPL directory iterators as posted already. Letting the internal C code do the work is almost always faster.

我唯一的其他建议是使用已发布的SPL目录迭代器。让内部C代码完成工作几乎总是更快。

#4


Why would you expect the interpreted PHP code to be as fast as the compiled C version of find? Being only twice as slow is actually pretty good.

为什么您希望解释的PHP代码与编译的C版本的find一样快?只有两倍的速度实际上相当不错。

About the only advice I would add is to do a ob_start() at the beginning and ob_get_contents(), ob_end_clean() at the end. That might speed things up.

关于我要添加的唯一建议是在开头做ob_start(),在结尾做ob_get_contents(),ob_end_clean()。这可能会加快速度。

#5


You're keeping N directory streams open where N is the depth of the directory tree. Instead, try reading an entire directory's worth of entries at once, and then iterate over the entries. At the very least you'll maximize use of the desk I/O caches.

您将保持N个目录流打开,其中N是目录树的深度。相反,尝试一次读取整个目录的条目,然后迭代条目。至少,您将最大限度地使用桌面I / O缓存。

#6


You might want to seriously consider just using GNU find. If it's available, and safe mode isn't turned on, you'll probably like the results just fine:

您可能想要认真考虑使用GNU find。如果它可用,并且没有打开安全模式,你可能会喜欢结果:

function list_recursive($dir) { 
  $dir=escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir -type f", "r")
  while ($s = fgets($h,1024)) { 
    echo $s;
  }
  pclose($h);
}

However there might to be some directory that's so big, you're not going to want to bother with this either. Consider amortizing the slowness in other ways. Your second try can be checkpointed (for example) by simply saving the directory stack in the session. If you're giving the user a list of files, simply collect a pageful then save the rest of the state in the session for page 2.

但是可能有一些目录太大了,你也不想打扰这个。考虑以其他方式缓慢缓慢。您可以通过在会话中保存目录堆栈来检查第二次尝试(例如)。如果您要为用户提供文件列表,只需收集一份页面,然后将剩余的状态保存在第2页的会话中。

#7


Try using scandir() to read a whole directory at once, as Jason Cohen has suggested. I've based the following code on code from the php manual comments for scandir()

正如Jason Cohen建议的那样,尝试使用scandir()一次读取整个目录。我基于scandir()的php手册注释中的代码基于以下代码

 function scan( $dir ){
        $dirs = array_diff( scandir( $dir ), Array( ".", ".." ));
        $dir_array = Array();
        foreach( $dirs as $d )
            $dir_array[ $d ] = is_dir($dir."/".$d) ? scan( $dir."/".$d) : print $dir."/".$d."\n";
 }