如何过滤掉02文件之间的重复ID

时间:2022-01-08 08:56:51

I have 02 text files with the bellow format:

我有02文本文件的波纹管格式:

  • File 1:

    2017-08-16 00:00:00,115 - [INFO]  TRANSACTIONS: 123456788 id: 123456
    2017-08-16 00:00:00,115 - [INFO]  TRANSACTIONS: 123456789 id: 123457
    
  • 文件1:2017-08-16 00:00:00,115 - [信息]交易:123456788 id:123456 2017-08-16 00:00:00,115 - [INFO]交易:123456789 id:123457

  • File 2:

    123456 123457 123458 123459

    123456 123457 123458 123459

The goal: I would like to get the records from file1 without the id in file2

目标:我想从file1获取没有id2的记录

The commands line and result that i tried:

我试过的命令行和结果:

  • 1st command line: grep -vf file2 file1
  • 第一个命令行:grep -vf file2 file1

  • 2nd command line: comm -23 <(sort file1) <(sort file2)
  • 第二个命令行:comm -23 <(sort file1)<(sort file2)

The both of command worked but there are 3 millions records in file1 and 1 millions records in file2. The 1st command can be complete if there are not much records but it can not complete with 3 millions. The 2nd command is faster than 1st and it can be completed when I executed manually in the ssh console but it did not work with the bash script. The error has showed with "syntax error at "("

这两个命令都有效,但file1中有3百万条记录,file2中有1百万条记录。如果记录不多,则第一个命令可以完成,但不能完成3百万个。第二个命令比第一个命令快,当我在ssh控制台中手动执行但它无法使用bash脚本时,它可以完成。该错误显示“语法错误”(“

Any idea to solve this and complete the goal ?

有什么想法解决这个问题并完成目标吗?

2 个解决方案

#1


0  

awk 'NR==FNR{a[$1];next} !($NF in a)' file2 file1

#2


-1  

I found the way to make it work in the script with the 2nd command:

我找到了使用第二个命令使其在脚本中工作的方法:

sort file1 > file1.txt
sort file2 > file2.txt
comm -23 file1.txt file2.txt > result.txt

#1


0  

awk 'NR==FNR{a[$1];next} !($NF in a)' file2 file1

#2


-1  

I found the way to make it work in the script with the 2nd command:

我找到了使用第二个命令使其在脚本中工作的方法:

sort file1 > file1.txt
sort file2 > file2.txt
comm -23 file1.txt file2.txt > result.txt