使用grep或sed将regex组从匹配行删除。

时间:2021-09-08 01:47:46

I have a file with contents as this:

我有一个包含如下内容的文件:

- 2 equal files of size 288903252
- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 277436598
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

I want to delete those lines with - X equal files of size without having actual file paths following them. For example first and third bullet point:

我想删除那些大小为- X的行,而不是在它们后面有实际的文件路径。例如第一点和第三点:

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

I formed a regex that matches these lines:

我创建了一个与以下内容匹配的regex:

(^-.*\n)-

which can be checked in action at above link. I want to delete that first group which is essentially the whole line. But not able to guess how do I do the same with grep or sed. Can we do this in single command?

可以在上面的链接中进行检查。我要删除第一个组也就是整条线。但是无法猜到如何用grep或sed执行相同的操作。我们能在一个命令中做到这一点吗?

4 个解决方案

#1


2  

Using sed

使用sed

sed '/^-/{N;/\n-/D}' file

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

Portable version for any version of sed

任何版本的sed的可移植版本

sed -e '/^-/{N' -e '/\
-/D' -e '}' file

If you want to remove the last line if it is -

如果你想删除最后一行,如果它是-

sed -e '/^-/{$d' -e 'N' -e '/\
-/D' -e '}' file

#2


1  

You can just grep it:

你可以这样说:

grep -v -B1 "^-" test_file.txt | grep -v "\-\-"

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

How it works? It's merely selecting all lines and the lines before them that don't start with a -. The second grep just removes the group separator, some grep versions support --no-group-separator so you can do it in one go.

它是如何工作的?它只是选择所有的线和前面的线而不是以-开头。第二个grep只删除组分隔符,一些grep版本支持—无组分隔符,因此您可以一次完成。

#3


0  

Is pepsi perl okay?

perl是百事可乐好吗?

cat input.txt | perl -pe 'BEGIN{undef $/;} s/^-.*?\n-/-/smg'

The BEGIN block allows the multiline search by essentially telling perl that there is no end of line character. Then the s/ part will substitute any part matching your regex with a - (no need for a capturing group).

通过告诉perl行字符没有结束,BEGIN块允许进行多行搜索。然后s/ part将用-(不需要捕获组)替换与regex匹配的任何部分。

Oh, and I slightly modified your regex to be greedy, with a ?. Otherwise, the search being multiline, it would match from the first - to the last one, and remove almost everything.

哦,我把你的正则表达式修改得有点贪心了。否则,搜索是多行的,它将从第一行匹配到最后一行,并删除几乎所有内容。

Edit: here is a lengthy and informative Q/A about multiline search, that shows it will be difficult with sed.

编辑:这里有一个关于多行搜索的冗长且内容丰富的问题/ a,这表明使用sed会有困难。

Edit2: actually quite easy with a modern sed, see @123's answer

实际上使用现代sed很容易,请参见@123的答案

#4


0  

sed is for simple substitutions on individual lines, that is all. For anything else you should be using awk. If you are using sed constructs other than s, g, and p (with -n) then you are using constructs that became obsolete in the mid-1970s when awk was invented.

sed只表示单行上的简单替换,仅此而已。对于其他任何你应该使用awk的东西。如果您使用的是除s、g和p(带-n)之外的sed构造,那么您使用的构造在20世纪70年代中期发明awk时已经过时。

This will work robustly, efficiently, and portably with any awk on any UNIX box:

这对于任何UNIX机箱上的任何awk都将有效地、有效地、可移植地工作:

$ awk '/^ /{print p $0; p=""; next} {p=$0 ORS}' file
- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

#1


2  

Using sed

使用sed

sed '/^-/{N;/\n-/D}' file

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

Portable version for any version of sed

任何版本的sed的可移植版本

sed -e '/^-/{N' -e '/\
-/D' -e '}' file

If you want to remove the last line if it is -

如果你想删除最后一行,如果它是-

sed -e '/^-/{$d' -e 'N' -e '/\
-/D' -e '}' file

#2


1  

You can just grep it:

你可以这样说:

grep -v -B1 "^-" test_file.txt | grep -v "\-\-"

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

How it works? It's merely selecting all lines and the lines before them that don't start with a -. The second grep just removes the group separator, some grep versions support --no-group-separator so you can do it in one go.

它是如何工作的?它只是选择所有的线和前面的线而不是以-开头。第二个grep只删除组分隔符,一些grep版本支持—无组分隔符,因此您可以一次完成。

#3


0  

Is pepsi perl okay?

perl是百事可乐好吗?

cat input.txt | perl -pe 'BEGIN{undef $/;} s/^-.*?\n-/-/smg'

The BEGIN block allows the multiline search by essentially telling perl that there is no end of line character. Then the s/ part will substitute any part matching your regex with a - (no need for a capturing group).

通过告诉perl行字符没有结束,BEGIN块允许进行多行搜索。然后s/ part将用-(不需要捕获组)替换与regex匹配的任何部分。

Oh, and I slightly modified your regex to be greedy, with a ?. Otherwise, the search being multiline, it would match from the first - to the last one, and remove almost everything.

哦,我把你的正则表达式修改得有点贪心了。否则,搜索是多行的,它将从第一行匹配到最后一行,并删除几乎所有内容。

Edit: here is a lengthy and informative Q/A about multiline search, that shows it will be difficult with sed.

编辑:这里有一个关于多行搜索的冗长且内容丰富的问题/ a,这表明使用sed会有困难。

Edit2: actually quite easy with a modern sed, see @123's answer

实际上使用现代sed很容易,请参见@123的答案

#4


0  

sed is for simple substitutions on individual lines, that is all. For anything else you should be using awk. If you are using sed constructs other than s, g, and p (with -n) then you are using constructs that became obsolete in the mid-1970s when awk was invented.

sed只表示单行上的简单替换,仅此而已。对于其他任何你应该使用awk的东西。如果您使用的是除s、g和p(带-n)之外的sed构造,那么您使用的构造在20世纪70年代中期发明awk时已经过时。

This will work robustly, efficiently, and portably with any awk on any UNIX box:

这对于任何UNIX机箱上的任何awk都将有效地、有效地、可移植地工作:

$ awk '/^ /{print p $0; p=""; next} {p=$0 ORS}' file
- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"