使用Perl在目录中的所有文件中递归替换多个字符串

时间:2021-02-16 21:45:05

I'm new with perl. saw many samples but had problems composing a solution I have a list of strings which each string should be replaced in a different string a->a2, b->b34, etc. list of replacement is in some csv file. need to perform this replacement recursively on all files in directory. might be any other language just thought perl would be the quickest

我是perl的新手。看到很多样本但是在编写解决方案时遇到问题我有一个字符串列表,每个字符串应该用不同的字符串a-> a2,b-> b34等替换。替换列表在某些csv文件中。需要以递归方式对目录中的所有文件执行此替换。可能是任何其他语言,只是认为perl将是最快的

1 个解决方案

#1


3  

Your problem can be split into three steps:

您的问题可以分为三个步骤:

  1. Getting the search-and-replace strings from the CSV file,
  2. 从CSV文件中获取搜索和替换字符串,
  3. Getting a list of all text files inside a given directory incl. subdirectories, and
  4. 获取给定目录中的所有文本文件的列表。子目录,和
  5. Replacing all occurences of the search strings with their replacements.
  6. 用替换替换搜索字符串的所有出现。

So lets do a countdown and see how we can do that :)

所以让我们做一个倒计时,看看我们怎么做:)

#!/usr/bin/perl
use strict; use warnings;

3. Search and replace

We will define a sub searchAndReplace. It takes a file name as argument and accesses an outside hash. We will call this hash %replacements. Each key is a string we want to replace, and the value is the replacement. This "imposes" the restriction that there can only be one replacement per search string, but that should seem natural. I will further assume that each file is reasonably small (i.e. fits into RAM).

我们将定义一个sub searchAndReplace。它将文件名作为参数并访问外部哈希。我们将调用此哈希值替换。每个键都是我们要替换的字符串,值是替换值。这“强加”了每个搜索字符串只能有一个替换的限制,但这看起来很自然。我将进一步假设每个文件都相当小(即适合RAM)。

sub searchAndReplace {
  my ($filename) = @_;
  my $content = do {
    open my $file, "<", $filename or die "Cant open $filename: $!";
    local $/ = undef; # set slurp mode
    <$file>;
  };
  while(my ($string, $replacement) = each %replacements) {
    $content =~ s/\Q$string\E/$replacement/g;
  }
  open my $file, ">", $filename or die "Can't open $filename: $!";
  print $file $content; # I didn't forget the comma
  close $file;
}

This code is pretty straightforward, I escape the $string inside the regex so that the contents aren't treated as a pattern. This implementation has the side effect of possibly replacing part of the $content string where something already was replaced, but one could work around that if this is absolutely neccessary.

这段代码非常简单,我在正则表达式中转义$ string,以便不将内容视为模式。这个实现的副作用可能是替换已经被替换的$ content字符串的一部分,但是如果这是绝对必要的话,可以解决这个问题。

2. Traversing the file tree

We will define a sub called anakinFileWalker. It takes a filename or a name of an directory and the searchAndReplace sub as arguments. If the filename argument is a plain file, it does the searchAndReplace, if it is a directory, it opens the directory and calls itself on each entry.

我们将定义一个名为anakinFileWalker的子。它采用文件名或目录名称,searchAndReplace子作为参数。如果filename参数是普通文件,则它执行searchAndReplace,如果它是目录,它将打开目录并在每个条目上调用自身。

sub anakinFileWalker {
  my ($filename, $action) = @_;
  if (-d $filename) {
    opendir my $dir, $filename or die "Can't open $filename: $!";
    while (defined(my $entry = readdir $dir)) {
      next if $entry eq '.' or $entry eq '..';
      # come to the dark side of recursion
      anakinFileWalker("$filename/$entry", $action); # be sure to give full path
    }
  } else {
    # Houston, we have a plain file:
    $action->($filename);
  }
}

Of course, this sub blows up if you have looping symlinks.

当然,如果你有循环符号链接,这个子程序会爆炸。

1. Setting up the %replacements

There is a nice module Text::CSV which will help you with all your needs. Just make sure that the %replacements meet the definition above, but that isn't hard.

有一个很好的模块Text :: CSV,可以帮助您满足您的所有需求。只需确保%替换符合上面的定义,但这并不难。

Starting it all

When the %replacements are ready, we just do

当%替换准备就绪时,我们就这样做了

anakinFileWalker($topDirectory, \&searchAndReplace);

and it should work. If not, this should have given you an idea about how to solve such a problem.

它应该工作。如果没有,这应该让你知道如何解决这样的问题。

#1


3  

Your problem can be split into three steps:

您的问题可以分为三个步骤:

  1. Getting the search-and-replace strings from the CSV file,
  2. 从CSV文件中获取搜索和替换字符串,
  3. Getting a list of all text files inside a given directory incl. subdirectories, and
  4. 获取给定目录中的所有文本文件的列表。子目录,和
  5. Replacing all occurences of the search strings with their replacements.
  6. 用替换替换搜索字符串的所有出现。

So lets do a countdown and see how we can do that :)

所以让我们做一个倒计时,看看我们怎么做:)

#!/usr/bin/perl
use strict; use warnings;

3. Search and replace

We will define a sub searchAndReplace. It takes a file name as argument and accesses an outside hash. We will call this hash %replacements. Each key is a string we want to replace, and the value is the replacement. This "imposes" the restriction that there can only be one replacement per search string, but that should seem natural. I will further assume that each file is reasonably small (i.e. fits into RAM).

我们将定义一个sub searchAndReplace。它将文件名作为参数并访问外部哈希。我们将调用此哈希值替换。每个键都是我们要替换的字符串,值是替换值。这“强加”了每个搜索字符串只能有一个替换的限制,但这看起来很自然。我将进一步假设每个文件都相当小(即适合RAM)。

sub searchAndReplace {
  my ($filename) = @_;
  my $content = do {
    open my $file, "<", $filename or die "Cant open $filename: $!";
    local $/ = undef; # set slurp mode
    <$file>;
  };
  while(my ($string, $replacement) = each %replacements) {
    $content =~ s/\Q$string\E/$replacement/g;
  }
  open my $file, ">", $filename or die "Can't open $filename: $!";
  print $file $content; # I didn't forget the comma
  close $file;
}

This code is pretty straightforward, I escape the $string inside the regex so that the contents aren't treated as a pattern. This implementation has the side effect of possibly replacing part of the $content string where something already was replaced, but one could work around that if this is absolutely neccessary.

这段代码非常简单,我在正则表达式中转义$ string,以便不将内容视为模式。这个实现的副作用可能是替换已经被替换的$ content字符串的一部分,但是如果这是绝对必要的话,可以解决这个问题。

2. Traversing the file tree

We will define a sub called anakinFileWalker. It takes a filename or a name of an directory and the searchAndReplace sub as arguments. If the filename argument is a plain file, it does the searchAndReplace, if it is a directory, it opens the directory and calls itself on each entry.

我们将定义一个名为anakinFileWalker的子。它采用文件名或目录名称,searchAndReplace子作为参数。如果filename参数是普通文件,则它执行searchAndReplace,如果它是目录,它将打开目录并在每个条目上调用自身。

sub anakinFileWalker {
  my ($filename, $action) = @_;
  if (-d $filename) {
    opendir my $dir, $filename or die "Can't open $filename: $!";
    while (defined(my $entry = readdir $dir)) {
      next if $entry eq '.' or $entry eq '..';
      # come to the dark side of recursion
      anakinFileWalker("$filename/$entry", $action); # be sure to give full path
    }
  } else {
    # Houston, we have a plain file:
    $action->($filename);
  }
}

Of course, this sub blows up if you have looping symlinks.

当然,如果你有循环符号链接,这个子程序会爆炸。

1. Setting up the %replacements

There is a nice module Text::CSV which will help you with all your needs. Just make sure that the %replacements meet the definition above, but that isn't hard.

有一个很好的模块Text :: CSV,可以帮助您满足您的所有需求。只需确保%替换符合上面的定义,但这并不难。

Starting it all

When the %replacements are ready, we just do

当%替换准备就绪时,我们就这样做了

anakinFileWalker($topDirectory, \&searchAndReplace);

and it should work. If not, this should have given you an idea about how to solve such a problem.

它应该工作。如果没有,这应该让你知道如何解决这样的问题。