AWK, SED, REGEX来重命名文件。

时间:2021-09-09 01:08:21

I'm only learning to use REGEX, AWK and SED. I currently have a group of files that I'd like to rename - they all sit in one directory.

我只是在学习使用REGEX、AWK和SED。我目前有一组我想重命名的文件——它们都位于一个目录中。

The naming pattern is consistent, but I would like to re-arrange the filenames, here is the format:

命名模式是一致的,但我想重新安排文件名,以下是格式:

01._HORRIBLE_HISTORIES_S2.mp4
02._HORRIBLE_HISTORIES_S2.mp4

I'd like to rename them to HORRIBLE_HISTORIES_s01e01.mp4 - where the e01 is gleaned from the first column. I know that I want to grab "01" from the first column, stuff it in a variable then paste it after the S2 in each filename, at the same time I want to remove it from the beginning of the filename along with the "._", additionally I want to change the "S2" to "s02".

我想把它们重命名为可怕的历史。mp4 - e01是从第一列收集的。我知道我想从第一列中获取“01”,将它填充到一个变量中,然后在每个文件名中S2之后粘贴它,同时我想从文件名的开头删除它,并将其与“”一起删除。另外,我想把“S2”改为“s02”。

If anyone would be so kind, could you help me write something using awk/sed and explain the procedure, that I might learn from it?

如果有谁能帮我写点东西用awk/sed说明一下这个过程,我可以从中学习一下吗?

5 个解决方案

#1


7  

for f in *.mp4; do 
  echo mv "$f" \
    "$(awk -F '[._]' '{ si = sprintf("%02s", substr($5,2)); 
                          print $3 "_" $4 "_s" si "e" $1 "." $6 }' <<<"$f")"
done 
  • Loops over all *.mp4 files.
  • 循环遍历所有的*。mp4文件。
  • Renames each to the result of the awk command, provided via command substitution ($(...)).
  • 通过命令替换($(…))将每个元素重命名为awk命令的结果。
  • The awk command splits the input filename into tokens by . or "_" (which makes the first token available as $1, the second as $2, ...).
  • awk命令将输入文件名分割为令牌。或“_”(使第一个令牌可用为$1,第二个令牌可用为$2,…)。
  • First, the number in "_S{number}" is left-padded to 2 digits with a 0 (i.e., a 0 is only prepended if the number doesn't already have 2 digits) and stored in variable si (season index); if it's OK to always prepend 0, the awk "program" can be simplified to: { print $3 "_" $4 "_s0" substr($5,2) "e" $1 "." $6 }
  • 首先,“_S{number}”中的数字被左填为两位数,数字为0(即,一个0只有在没有2位数字的情况下才被预写,并存储在变量si(季节指数)中;如果总是prepend 0是可以的,那么awk“程序”可以简化为:{print $3“_”$4“_s0”substr($5,2)1美元的“e”。$ 6 }
  • The result, along with the remaining tokens, is then rearranged to form the desired filename.
  • 然后,将结果和其他标记一起重新排列,以形成所需的文件名。

Note the echo before mv to allow you to safely preview the resulting command - remove it to perform actual renaming.

注意mv之前的echo,以允许您安全地预览结果命令——删除它以执行实际的重命名。

Alternative: a pure bash solution using a regular expression:

可选方案:使用正则表达式的纯粹bash解决方案:

for f in *.mp4; do 
  [[ $f =~ ^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$ ]]
  echo mv "$f" \
"${BASH_REMATCH[2]}_s0${BASH_REMATCH[3]}e${BASH_REMATCH[1]}.${BASH_REMATCH[4]}"
done 
  • Uses bash's regular-expression matching operator, =~, with capture groups (the substrings in (...)) to match against each filename and extract substrings of interest.
  • 使用bash的正则表达式匹配操作符=~和捕获组(在(…)中的子字符串)匹配每个文件名并提取相关的子字符串。
  • The matching results are stored in the special array variable $BASH_REMATCH, with element 0 containing the entire match, 1 containing what matches the first capture group, 2 the second, and so on.
  • 匹配结果存储在特殊的数组变量$BASH_REMATCH中,元素0包含整个匹配,元素1包含与第一个捕获组匹配的内容,元素2包含第二个捕获组,依此类推。
  • The mv command's target argument then assembles the capture-group matches in the desired order; note that in this case, for simplicity, I've made the zero-padding of s{number} unconditional - a 0 is simply prepended.
  • mv命令的目标参数然后按照期望的顺序组装捕获组匹配项;注意,在这种情况下,为了简单起见,我将s{number}的零填充设置为无条件——0仅仅是在前面加上的。

As above, you need to remove echo before mv to perform actual renaming.

如上所述,您需要在mv之前删除echo来执行实际的重命名。

#2


8  

A common way of renaming multiple files according to a pattern, is to use the Perl command rename. It uses Perl regular expressions and is very powerful. Use -n -v to test the pattern without touching the files:

根据模式重命名多个文件的一种常见方法是使用Perl命令rename。它使用Perl正则表达式,功能非常强大。使用-n -v测试模式而不触及文件:

$ rename -n -v 's/^(\d+)._(.+)_S2\.mp4/$2_s02e$1.mp4/' *.mp4
01._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e01.mp4
02._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e02.mp4

Use parentheses to capture strings into variables $1 (first capture), $2 (second capture) etc:

使用圆括号将字符串捕获到变量$1(第一次捕获)、$2(第二次捕获)等:

  • ^(\d+) capture numbers at beginning of filename (into $1)
  • ^(\ d +)捕捉数字开头的文件名(1美元)
  • ._(.+)_S2\.mp4 capture everything between ._ and _S2.mp4 (into $2)
  • ._ _S2 \(+)。mp4捕捉._和_S2之间的一切。mp4(2美元)
  • $2_s02e$1.mp4 assemble your new filename with the captured data as you want it
  • 2美元_s02e 1美元。根据需要将捕获的数据组装到新的文件名中

When you are happy with the result, remove -n from the command and it will rename all the files for real.

当您对结果感到满意时,从命令中删除-n,它会将所有文件重命名为real。

rename is often available by default on Linux (package util-linux). There is a similar discussion here on SO with more details about finding/installing the right command.

在Linux上,rename通常可以默认使用(包util-linux)。这里有一个类似的讨论,关于查找/安装正确命令的更多细节。

#3


1  

You can do it with pure almost bash (with variable expansion):

你可以用pure almost bash(使用变量展开):

for f in in *mp4 ; do
  newfilename="${f:5:20}_s01e${f:1:2}.mp4"
  echo mv $f $newfilename
done

If the output from this command suites your needs you may remove the echo from the cycle or more simply (if your last command was the above) issue: !! | bash

如果您需要这个命令套件的输出,您可以从循环中删除echo,或者更简单地(如果您的最后一个命令是上面的)问题:!| bash

#4


0  

Make the filename string into a textfile then use loop and awk to rename file.

使文件名字符串成为一个textfile,然后使用循环和awk重命名文件。

while read oldname; do
  newname=$(awk -F'.' '{ print substr($2, 2) "_e" $1 "." $3 }' <<< ${oldname} | \
        awk -F'_' '{ print $1 "_s0" substr($2, 2) $3 }');
  mv ${oldname} ${newname};
done<input.txt

#5


0  

If you're willing to use gawk, the regex matching really comes in handy. I find this pipe-based solution a little nicer than worrying about looping constructs.

如果你愿意使用gawk,那么regex匹配真的会派上用场。我发现这种基于管道的解决方案比担心循环结构要好一些。

ls -1 | \
    gawk 'match($0, /.../, a) { printf ... | "sh" } \
    END { close("sh") }'

For ease of reading I've replaced the regex and the mv command with ellipses.

为了便于阅读,我用省略号替换了regex和mv命令。

  • Line 1 lists all the file names in the current directory, one line each and pipes that to the gawk command.
  • 第1行列出当前目录中的所有文件名,每行一行,并将其传输到gawk命令。
  • Line 2 runs the regex match, assigning captured groups to the array variable a. The action converts this into our desired command with printf which is itself piped to sh to execute.
  • 第2行运行regex匹配,将捕获的组分配给数组变量a。该操作将其转换为我们需要的命令,并将其自身通过管道发送到sh执行。
  • Line 3 closes the shell that was implicitly opened when we started piping things to it.
  • 第3行关闭当我们开始向它输送管道时隐式打开的shell。

So then you just fill in your regex and command syntax (borrowing from mklement0). For example (LIVE CODE WARNING):

因此,只需填充regex和命令语法(从mklement0中借用)。例如(实时代码警告):

ls -1 | \
    gawk 'match($0, /^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$/, a) { printf "mv %s %s_s0%se%s.%s\n",a[0],a[2],a[3],a[1],a[4] | "sh" } \
    END { close("sh") }'

To preview that command (as you should) you can simply remove the | "sh" from the second line.

要预览该命令(您应该这样做),只需从第二行删除|“sh”。

#1


7  

for f in *.mp4; do 
  echo mv "$f" \
    "$(awk -F '[._]' '{ si = sprintf("%02s", substr($5,2)); 
                          print $3 "_" $4 "_s" si "e" $1 "." $6 }' <<<"$f")"
done 
  • Loops over all *.mp4 files.
  • 循环遍历所有的*。mp4文件。
  • Renames each to the result of the awk command, provided via command substitution ($(...)).
  • 通过命令替换($(…))将每个元素重命名为awk命令的结果。
  • The awk command splits the input filename into tokens by . or "_" (which makes the first token available as $1, the second as $2, ...).
  • awk命令将输入文件名分割为令牌。或“_”(使第一个令牌可用为$1,第二个令牌可用为$2,…)。
  • First, the number in "_S{number}" is left-padded to 2 digits with a 0 (i.e., a 0 is only prepended if the number doesn't already have 2 digits) and stored in variable si (season index); if it's OK to always prepend 0, the awk "program" can be simplified to: { print $3 "_" $4 "_s0" substr($5,2) "e" $1 "." $6 }
  • 首先,“_S{number}”中的数字被左填为两位数,数字为0(即,一个0只有在没有2位数字的情况下才被预写,并存储在变量si(季节指数)中;如果总是prepend 0是可以的,那么awk“程序”可以简化为:{print $3“_”$4“_s0”substr($5,2)1美元的“e”。$ 6 }
  • The result, along with the remaining tokens, is then rearranged to form the desired filename.
  • 然后,将结果和其他标记一起重新排列,以形成所需的文件名。

Note the echo before mv to allow you to safely preview the resulting command - remove it to perform actual renaming.

注意mv之前的echo,以允许您安全地预览结果命令——删除它以执行实际的重命名。

Alternative: a pure bash solution using a regular expression:

可选方案:使用正则表达式的纯粹bash解决方案:

for f in *.mp4; do 
  [[ $f =~ ^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$ ]]
  echo mv "$f" \
"${BASH_REMATCH[2]}_s0${BASH_REMATCH[3]}e${BASH_REMATCH[1]}.${BASH_REMATCH[4]}"
done 
  • Uses bash's regular-expression matching operator, =~, with capture groups (the substrings in (...)) to match against each filename and extract substrings of interest.
  • 使用bash的正则表达式匹配操作符=~和捕获组(在(…)中的子字符串)匹配每个文件名并提取相关的子字符串。
  • The matching results are stored in the special array variable $BASH_REMATCH, with element 0 containing the entire match, 1 containing what matches the first capture group, 2 the second, and so on.
  • 匹配结果存储在特殊的数组变量$BASH_REMATCH中,元素0包含整个匹配,元素1包含与第一个捕获组匹配的内容,元素2包含第二个捕获组,依此类推。
  • The mv command's target argument then assembles the capture-group matches in the desired order; note that in this case, for simplicity, I've made the zero-padding of s{number} unconditional - a 0 is simply prepended.
  • mv命令的目标参数然后按照期望的顺序组装捕获组匹配项;注意,在这种情况下,为了简单起见,我将s{number}的零填充设置为无条件——0仅仅是在前面加上的。

As above, you need to remove echo before mv to perform actual renaming.

如上所述,您需要在mv之前删除echo来执行实际的重命名。

#2


8  

A common way of renaming multiple files according to a pattern, is to use the Perl command rename. It uses Perl regular expressions and is very powerful. Use -n -v to test the pattern without touching the files:

根据模式重命名多个文件的一种常见方法是使用Perl命令rename。它使用Perl正则表达式,功能非常强大。使用-n -v测试模式而不触及文件:

$ rename -n -v 's/^(\d+)._(.+)_S2\.mp4/$2_s02e$1.mp4/' *.mp4
01._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e01.mp4
02._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e02.mp4

Use parentheses to capture strings into variables $1 (first capture), $2 (second capture) etc:

使用圆括号将字符串捕获到变量$1(第一次捕获)、$2(第二次捕获)等:

  • ^(\d+) capture numbers at beginning of filename (into $1)
  • ^(\ d +)捕捉数字开头的文件名(1美元)
  • ._(.+)_S2\.mp4 capture everything between ._ and _S2.mp4 (into $2)
  • ._ _S2 \(+)。mp4捕捉._和_S2之间的一切。mp4(2美元)
  • $2_s02e$1.mp4 assemble your new filename with the captured data as you want it
  • 2美元_s02e 1美元。根据需要将捕获的数据组装到新的文件名中

When you are happy with the result, remove -n from the command and it will rename all the files for real.

当您对结果感到满意时,从命令中删除-n,它会将所有文件重命名为real。

rename is often available by default on Linux (package util-linux). There is a similar discussion here on SO with more details about finding/installing the right command.

在Linux上,rename通常可以默认使用(包util-linux)。这里有一个类似的讨论,关于查找/安装正确命令的更多细节。

#3


1  

You can do it with pure almost bash (with variable expansion):

你可以用pure almost bash(使用变量展开):

for f in in *mp4 ; do
  newfilename="${f:5:20}_s01e${f:1:2}.mp4"
  echo mv $f $newfilename
done

If the output from this command suites your needs you may remove the echo from the cycle or more simply (if your last command was the above) issue: !! | bash

如果您需要这个命令套件的输出,您可以从循环中删除echo,或者更简单地(如果您的最后一个命令是上面的)问题:!| bash

#4


0  

Make the filename string into a textfile then use loop and awk to rename file.

使文件名字符串成为一个textfile,然后使用循环和awk重命名文件。

while read oldname; do
  newname=$(awk -F'.' '{ print substr($2, 2) "_e" $1 "." $3 }' <<< ${oldname} | \
        awk -F'_' '{ print $1 "_s0" substr($2, 2) $3 }');
  mv ${oldname} ${newname};
done<input.txt

#5


0  

If you're willing to use gawk, the regex matching really comes in handy. I find this pipe-based solution a little nicer than worrying about looping constructs.

如果你愿意使用gawk,那么regex匹配真的会派上用场。我发现这种基于管道的解决方案比担心循环结构要好一些。

ls -1 | \
    gawk 'match($0, /.../, a) { printf ... | "sh" } \
    END { close("sh") }'

For ease of reading I've replaced the regex and the mv command with ellipses.

为了便于阅读,我用省略号替换了regex和mv命令。

  • Line 1 lists all the file names in the current directory, one line each and pipes that to the gawk command.
  • 第1行列出当前目录中的所有文件名,每行一行,并将其传输到gawk命令。
  • Line 2 runs the regex match, assigning captured groups to the array variable a. The action converts this into our desired command with printf which is itself piped to sh to execute.
  • 第2行运行regex匹配,将捕获的组分配给数组变量a。该操作将其转换为我们需要的命令,并将其自身通过管道发送到sh执行。
  • Line 3 closes the shell that was implicitly opened when we started piping things to it.
  • 第3行关闭当我们开始向它输送管道时隐式打开的shell。

So then you just fill in your regex and command syntax (borrowing from mklement0). For example (LIVE CODE WARNING):

因此,只需填充regex和命令语法(从mklement0中借用)。例如(实时代码警告):

ls -1 | \
    gawk 'match($0, /^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$/, a) { printf "mv %s %s_s0%se%s.%s\n",a[0],a[2],a[3],a[1],a[4] | "sh" } \
    END { close("sh") }'

To preview that command (as you should) you can simply remove the | "sh" from the second line.

要预览该命令(您应该这样做),只需从第二行删除|“sh”。