
时间:2021-09-08 01:47:46

I have a file with contents as this:


- 2 equal files of size 288903252
- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 277436598
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976

I want to delete those lines with - X equal files of size without having actual file paths following them. For example first and third bullet point:

我想删除那些大小为- X的行,而不是在它们后面有实际的文件路径。例如第一点和第三点:

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976

I formed a regex that matches these lines:



which can be checked in action at above link. I want to delete that first group which is essentially the whole line. But not able to guess how do I do the same with grep or sed. Can we do this in single command?


4 个解决方案



Using sed


sed '/^-/{N;/\n-/D}' file

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976

Portable version for any version of sed


sed -e '/^-/{N' -e '/\
-/D' -e '}' file

If you want to remove the last line if it is -


sed -e '/^-/{$d' -e 'N' -e '/\
-/D' -e '}' file



You can just grep it:


grep -v -B1 "^-" test_file.txt | grep -v "\-\-"

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976

How it works? It's merely selecting all lines and the lines before them that don't start with a -. The second grep just removes the group separator, some grep versions support --no-group-separator so you can do it in one go.




Is pepsi perl okay?


cat input.txt | perl -pe 'BEGIN{undef $/;} s/^-.*?\n-/-/smg'

The BEGIN block allows the multiline search by essentially telling perl that there is no end of line character. Then the s/ part will substitute any part matching your regex with a - (no need for a capturing group).

通过告诉perl行字符没有结束,BEGIN块允许进行多行搜索。然后s/ part将用-(不需要捕获组)替换与regex匹配的任何部分。

Oh, and I slightly modified your regex to be greedy, with a ?. Otherwise, the search being multiline, it would match from the first - to the last one, and remove almost everything.


Edit: here is a lengthy and informative Q/A about multiline search, that shows it will be difficult with sed.

编辑:这里有一个关于多行搜索的冗长且内容丰富的问题/ a,这表明使用sed会有困难。

Edit2: actually quite easy with a modern sed, see @123's answer




sed is for simple substitutions on individual lines, that is all. For anything else you should be using awk. If you are using sed constructs other than s, g, and p (with -n) then you are using constructs that became obsolete in the mid-1970s when awk was invented.


This will work robustly, efficiently, and portably with any awk on any UNIX box:


$ awk '/^ /{print p $0; p=""; next} {p=$0 ORS}' file
- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976



Using sed


sed '/^-/{N;/\n-/D}' file

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976

Portable version for any version of sed


sed -e '/^-/{N' -e '/\
-/D' -e '}' file

If you want to remove the last line if it is -


sed -e '/^-/{$d' -e 'N' -e '/\
-/D' -e '}' file



You can just grep it:


grep -v -B1 "^-" test_file.txt | grep -v "\-\-"

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976

How it works? It's merely selecting all lines and the lines before them that don't start with a -. The second grep just removes the group separator, some grep versions support --no-group-separator so you can do it in one go.




Is pepsi perl okay?


cat input.txt | perl -pe 'BEGIN{undef $/;} s/^-.*?\n-/-/smg'

The BEGIN block allows the multiline search by essentially telling perl that there is no end of line character. Then the s/ part will substitute any part matching your regex with a - (no need for a capturing group).

通过告诉perl行字符没有结束,BEGIN块允许进行多行搜索。然后s/ part将用-(不需要捕获组)替换与regex匹配的任何部分。

Oh, and I slightly modified your regex to be greedy, with a ?. Otherwise, the search being multiline, it would match from the first - to the last one, and remove almost everything.


Edit: here is a lengthy and informative Q/A about multiline search, that shows it will be difficult with sed.

编辑:这里有一个关于多行搜索的冗长且内容丰富的问题/ a,这表明使用sed会有困难。

Edit2: actually quite easy with a modern sed, see @123's answer




sed is for simple substitutions on individual lines, that is all. For anything else you should be using awk. If you are using sed constructs other than s, g, and p (with -n) then you are using constructs that became obsolete in the mid-1970s when awk was invented.


This will work robustly, efficiently, and portably with any awk on any UNIX box:


$ awk '/^ /{print p $0; p=""; next} {p=$0 ORS}' file
- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976