I need a solution to delete duplicate lines where first field is an IPv4 address.For example I have the following lines in a file:
我需要一个解决方案来删除第一个字段为IPv4地址的重复行。例如,我在一个文件中有以下几行:
192.168.0.1/text1/text2
192.168.0.18/text03/text7
192.168.0.15/sometext/sometext
192.168.0.1/text100/ntext
192.168.0.23/othertext/sometext
So all it matches in the previous scenario is the IP address. All I know is that the regex for IP address is:
在前面的场景中,所有匹配的都是IP地址。我所知道的是,IP地址的regex是:
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
It would be nice if the solution is one line and as fast as possible.
如果解是一条线,越快越好。
3 个解决方案
#1
6
If, the file contains lines only in the format you show, i.e. first field is always IP address, you can get away with 1 line of awk:
如果文件中只有你显示的格式的行,即第一个字段总是IP地址,你可以用一行awk:
awk '!x[$1]++' FS="/" $PATH_TO_FILE
EDIT: This removes duplicates based only on IP address. I'm not sure this is what the OP wanted when I wrote this answer.
编辑:仅基于IP地址删除重复。我不确定这就是OP想要的答案。
#2
0
If you don't need to preserve the original ordering, one way to do this is using sort
:
如果您不需要保留原始排序,一种方法是使用sort:
sort -u <file>
#3
0
The awk that ArjunShankar posted worked wonders for me.
ArjunShankar上传的awk为我创造了奇迹。
I had a huge list of items, which had multiple copies in field 1, and a special sequential number in field 2. I needed the "newest" or highest sequential number from each unique field 1.
我有大量的项目列表,在字段1中有多个副本,在字段2中有一个特殊的序号。我需要来自每个唯一字段1的“最新的”或最高的序号。
I had to use sort -rn to push them up to the "first entry" position, as the first step is write, then compare the next entry, as opposed to getting the last/most recent in the list.
我必须使用sort -rn将它们推到“第一项”位置,因为第一步是写,然后比较下一个条目,而不是在列表中获取最近/最近的条目。
Thank ArjunShankar!
谢谢ArjunShankar !
#1
6
If, the file contains lines only in the format you show, i.e. first field is always IP address, you can get away with 1 line of awk:
如果文件中只有你显示的格式的行,即第一个字段总是IP地址,你可以用一行awk:
awk '!x[$1]++' FS="/" $PATH_TO_FILE
EDIT: This removes duplicates based only on IP address. I'm not sure this is what the OP wanted when I wrote this answer.
编辑:仅基于IP地址删除重复。我不确定这就是OP想要的答案。
#2
0
If you don't need to preserve the original ordering, one way to do this is using sort
:
如果您不需要保留原始排序,一种方法是使用sort:
sort -u <file>
#3
0
The awk that ArjunShankar posted worked wonders for me.
ArjunShankar上传的awk为我创造了奇迹。
I had a huge list of items, which had multiple copies in field 1, and a special sequential number in field 2. I needed the "newest" or highest sequential number from each unique field 1.
我有大量的项目列表,在字段1中有多个副本,在字段2中有一个特殊的序号。我需要来自每个唯一字段1的“最新的”或最高的序号。
I had to use sort -rn to push them up to the "first entry" position, as the first step is write, then compare the next entry, as opposed to getting the last/most recent in the list.
我必须使用sort -rn将它们推到“第一项”位置,因为第一步是写,然后比较下一个条目,而不是在列表中获取最近/最近的条目。
Thank ArjunShankar!
谢谢ArjunShankar !