AWK, SED, REGEX来重命名文件。

I'm only learning to use REGEX, AWK and SED. I currently have a group of files that I'd like to rename - they all sit in one directory.

我只是在学习使用REGEX、AWK和SED。我目前有一组我想重命名的文件——它们都位于一个目录中。

The naming pattern is consistent, but I would like to re-arrange the filenames, here is the format:

命名模式是一致的，但我想重新安排文件名，以下是格式:

01._HORRIBLE_HISTORIES_S2.mp4
02._HORRIBLE_HISTORIES_S2.mp4

I'd like to rename them to HORRIBLE_HISTORIES_s01e01.mp4 - where the e01 is gleaned from the first column. I know that I want to grab "01" from the first column, stuff it in a variable then paste it after the S2 in each filename, at the same time I want to remove it from the beginning of the filename along with the "._", additionally I want to change the "S2" to "s02".

我想把它们重命名为可怕的历史。mp4 - e01是从第一列收集的。我知道我想从第一列中获取“01”，将它填充到一个变量中，然后在每个文件名中S2之后粘贴它，同时我想从文件名的开头删除它，并将其与“”一起删除。另外，我想把“S2”改为“s02”。

If anyone would be so kind, could you help me write something using awk/sed and explain the procedure, that I might learn from it?

如果有谁能帮我写点东西用awk/sed说明一下这个过程，我可以从中学习一下吗?

5 个解决方案

#1

for f in *.mp4; do 
  echo mv "$f" \
    "$(awk -F '[._]' '{ si = sprintf("%02s", substr($5,2)); 
                          print $3 "_" $4 "_s" si "e" $1 "." $6 }' <<<"$f")"
done

Loops over all *.mp4 files.
循环遍历所有的*。mp4文件。
Renames each to the result of the awk command, provided via command substitution ($(...)).
通过命令替换($(…))将每个元素重命名为awk命令的结果。
The awk command splits the input filename into tokens by . or "_" (which makes the first token available as $1, the second as $2, ...).
awk命令将输入文件名分割为令牌。或“_”(使第一个令牌可用为$1，第二个令牌可用为$2，…)。
First, the number in "_S{number}" is left-padded to 2 digits with a 0 (i.e., a 0 is only prepended if the number doesn't already have 2 digits) and stored in variable si (season index); if it's OK to always prepend 0, the awk "program" can be simplified to: { print $3 "_" $4 "_s0" substr($5,2) "e" $1 "." $6 }
首先，“_S{number}”中的数字被左填为两位数，数字为0(即，一个0只有在没有2位数字的情况下才被预写，并存储在变量si(季节指数)中;如果总是prepend 0是可以的，那么awk“程序”可以简化为:{print $3“_”$4“_s0”substr($5,2)1美元的“e”。$ 6 }
The result, along with the remaining tokens, is then rearranged to form the desired filename.
然后，将结果和其他标记一起重新排列，以形成所需的文件名。

Note the echo before mv to allow you to safely preview the resulting command - remove it to perform actual renaming.

注意mv之前的echo，以允许您安全地预览结果命令——删除它以执行实际的重命名。

Alternative: a pure bash solution using a regular expression:

可选方案:使用正则表达式的纯粹bash解决方案:

for f in *.mp4; do 
  [[ $f =~ ^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$ ]]
  echo mv "$f" \
"${BASH_REMATCH[2]}_s0${BASH_REMATCH[3]}e${BASH_REMATCH[1]}.${BASH_REMATCH[4]}"
done

Uses bash's regular-expression matching operator, =~, with capture groups (the substrings in (...)) to match against each filename and extract substrings of interest.
使用bash的正则表达式匹配操作符=~和捕获组(在(…)中的子字符串)匹配每个文件名并提取相关的子字符串。
The matching results are stored in the special array variable $BASH_REMATCH, with element 0 containing the entire match, 1 containing what matches the first capture group, 2 the second, and so on.
匹配结果存储在特殊的数组变量$BASH_REMATCH中，元素0包含整个匹配，元素1包含与第一个捕获组匹配的内容，元素2包含第二个捕获组，依此类推。
The mv command's target argument then assembles the capture-group matches in the desired order; note that in this case, for simplicity, I've made the zero-padding of s{number} unconditional - a 0 is simply prepended.
mv命令的目标参数然后按照期望的顺序组装捕获组匹配项;注意，在这种情况下，为了简单起见，我将s{number}的零填充设置为无条件——0仅仅是在前面加上的。

As above, you need to remove echo before mv to perform actual renaming.

如上所述，您需要在mv之前删除echo来执行实际的重命名。

#2

A common way of renaming multiple files according to a pattern, is to use the Perl command rename. It uses Perl regular expressions and is very powerful. Use -n -v to test the pattern without touching the files:

根据模式重命名多个文件的一种常见方法是使用Perl命令rename。它使用Perl正则表达式，功能非常强大。使用-n -v测试模式而不触及文件:

$ rename -n -v 's/^(\d+)._(.+)_S2\.mp4/$2_s02e$1.mp4/' *.mp4
01._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e01.mp4
02._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e02.mp4

Use parentheses to capture strings into variables $1 (first capture), $2 (second capture) etc:

使用圆括号将字符串捕获到变量$1(第一次捕获)、$2(第二次捕获)等:

^(\d+) capture numbers at beginning of filename (into $1)
^(\ d +)捕捉数字开头的文件名(1美元)
._(.+)_S2\.mp4 capture everything between ._ and _S2.mp4 (into $2)
._ _S2 \(+)。mp4捕捉._和_S2之间的一切。mp4(2美元)
$2_s02e$1.mp4 assemble your new filename with the captured data as you want it
2美元_s02e 1美元。根据需要将捕获的数据组装到新的文件名中

When you are happy with the result, remove -n from the command and it will rename all the files for real.

当您对结果感到满意时，从命令中删除-n，它会将所有文件重命名为real。

rename is often available by default on Linux (package util-linux). There is a similar discussion here on SO with more details about finding/installing the right command.

在Linux上，rename通常可以默认使用(包util-linux)。这里有一个类似的讨论，关于查找/安装正确命令的更多细节。

#3

You can do it with pure almost bash (with variable expansion):

你可以用pure almost bash(使用变量展开):

for f in in *mp4 ; do
  newfilename="${f:5:20}_s01e${f:1:2}.mp4"
  echo mv $f $newfilename
done

If the output from this command suites your needs you may remove the echo from the cycle or more simply (if your last command was the above) issue: !! | bash

如果您需要这个命令套件的输出，您可以从循环中删除echo，或者更简单地(如果您的最后一个命令是上面的)问题:!| bash

#4

Make the filename string into a textfile then use loop and awk to rename file.

使文件名字符串成为一个textfile，然后使用循环和awk重命名文件。

while read oldname; do
  newname=$(awk -F'.' '{ print substr($2, 2) "_e" $1 "." $3 }' <<< ${oldname} | \
        awk -F'_' '{ print $1 "_s0" substr($2, 2) $3 }');
  mv ${oldname} ${newname};
done<input.txt

#5

If you're willing to use gawk, the regex matching really comes in handy. I find this pipe-based solution a little nicer than worrying about looping constructs.

如果你愿意使用gawk，那么regex匹配真的会派上用场。我发现这种基于管道的解决方案比担心循环结构要好一些。

ls -1 | \
    gawk 'match($0, /.../, a) { printf ... | "sh" } \
    END { close("sh") }'

For ease of reading I've replaced the regex and the mv command with ellipses.

为了便于阅读，我用省略号替换了regex和mv命令。

Line 1 lists all the file names in the current directory, one line each and pipes that to the gawk command.
第1行列出当前目录中的所有文件名，每行一行，并将其传输到gawk命令。
Line 2 runs the regex match, assigning captured groups to the array variable a. The action converts this into our desired command with printf which is itself piped to sh to execute.
第2行运行regex匹配，将捕获的组分配给数组变量a。该操作将其转换为我们需要的命令，并将其自身通过管道发送到sh执行。
Line 3 closes the shell that was implicitly opened when we started piping things to it.
第3行关闭当我们开始向它输送管道时隐式打开的shell。

So then you just fill in your regex and command syntax (borrowing from mklement0). For example (LIVE CODE WARNING):

因此，只需填充regex和命令语法(从mklement0中借用)。例如(实时代码警告):

ls -1 | \
    gawk 'match($0, /^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$/, a) { printf "mv %s %s_s0%se%s.%s\n",a[0],a[2],a[3],a[1],a[4] | "sh" } \
    END { close("sh") }'

To preview that command (as you should) you can simply remove the | "sh" from the second line.

要预览该命令(您应该这样做)，只需从第二行删除|“sh”。

#1

for f in *.mp4; do 
  echo mv "$f" \
    "$(awk -F '[._]' '{ si = sprintf("%02s", substr($5,2)); 
                          print $3 "_" $4 "_s" si "e" $1 "." $6 }' <<<"$f")"
done

Loops over all *.mp4 files.
循环遍历所有的*。mp4文件。
Renames each to the result of the awk command, provided via command substitution ($(...)).
通过命令替换($(…))将每个元素重命名为awk命令的结果。
The awk command splits the input filename into tokens by . or "_" (which makes the first token available as $1, the second as $2, ...).
awk命令将输入文件名分割为令牌。或“_”(使第一个令牌可用为$1，第二个令牌可用为$2，…)。
First, the number in "_S{number}" is left-padded to 2 digits with a 0 (i.e., a 0 is only prepended if the number doesn't already have 2 digits) and stored in variable si (season index); if it's OK to always prepend 0, the awk "program" can be simplified to: { print $3 "_" $4 "_s0" substr($5,2) "e" $1 "." $6 }
首先，“_S{number}”中的数字被左填为两位数，数字为0(即，一个0只有在没有2位数字的情况下才被预写，并存储在变量si(季节指数)中;如果总是prepend 0是可以的，那么awk“程序”可以简化为:{print $3“_”$4“_s0”substr($5,2)1美元的“e”。$ 6 }
The result, along with the remaining tokens, is then rearranged to form the desired filename.
然后，将结果和其他标记一起重新排列，以形成所需的文件名。