如何添加和排序文本行?

时间:2022-05-21 16:07:45

I have 2 files:

我有两个文件:

  • first.txt
  • first.txt
  • second.txt
  • second.txt

first.txt contain:

第一。txt包含:

A
B
C
D
A
B
C
D

second.txt contain:

第二。txt包含:

1 header
123
456
2 header
123
1 header
123
2 header
123
456

How to add and sort every 1 header 123 to 2 header 123 of second.txt into every ABCD of first.txt like below:

如何添加和排序每一个头123到2头123秒。txt首先进入到ABCD中。txt像下图:

A
B
C
D
1 header
123
456
2 header
123
A
B
C
D
1 header
123
2 header
123
456

I tried using cat first.txt second.txt, but it only output like below:

我先用了猫。三秒。txt,但输出如下:

A
B
C
D
A
B
C
D
1 header
123
456
2 header
123
1 header
123
2 header
123
456

Do you guys have any ideas?
These are sample problem, the real problem have million rows of textline, due to sensitive dataset I can only share sample problem only.

你们有什么想法吗?这些都是样本问题,真正的问题有上百万行文本线,由于敏感的数据集我只能分享样本问题。

Thanks,
Am

谢谢,我

2 个解决方案

#1


1  

Could you please try following and let me know if this helps you.

你能不能试着跟我说一下,如果这对你有帮助的话。

awk '
FNR==NR{
  if(FNR%4==0 && FNR>1){
     a[++i]=val ORS $0;
     val="";
     next};
  val=val?val ORS $0:$0;
  next
}
count==3{
  print a[++j] ORS val;
  count="";
  val=""}
/header/{
  count++}
{
  val=val?val ORS $0:$0
}
END{
  if(count){
    print a[++j] ORS val}
}' first.txt second.txt

Output will be as follows.

输出将如下所示。

A
B
C
D
1 header
123
456
2 header
123
A
B
C
D
1 header
123
2 header
123
456

Explanation: Adding explanation of above code too now.

说明:现在也添加了上述代码的解释。

awk '
FNR==NR{                 ##Checking condition if FNR value is eqaul to NR value which will be TRUE when first Input_file is being read.
  if(FNR%4==0 && FNR>1){ ##Checking condition if line is completly divided by 4 and NOT the first line then do following.
     a[++i]=val ORS $0;  ##Creating an array named a whose index is variable i increasing value and value is variable val value along with new line and current line.
     val="";             ##Nullifying the variable val here.
     next};              ##Using next keyword to skip al next statements here.
  val=val?val ORS $0:$0; ##Creating variable named val whose value is concatenating its own value in it.
  next                   ##Using next keyword to skip all further statements from here now.
}
count==3{                ##Checking condition if variable named count is 3 then do following.
  print a[++j] ORS val;  ##Printing value of array a whose index is variable j with increasing value of 1 in it then ORS and value of variable val here.
  count="";              ##Nullifying the variable count here.
  val=""}                ##Nullifying the variable val here now.
/header/{                ##Checking condition if a line is having string header in it then do following.
  count++}               ##Increasing the value of variable count with 1 here.
{
  val=val?val ORS $0:$0  ##Creating variable named val whose value is concatenating its own values.
}
END{                     ##Starting END section here of awk.
  if(count){             ##Checking condition if variable count value is NOT NULL then do following.
    print a[++j] ORS val}##Printing value of array a whose index is variable j and ORS and then value of variable val here.
}' first.txt second.txt  ##Mentioning Input_file(s) named first.txt and second.txt here.

#2


1  

Then it will be quite straightforwards:

那就直截了当:

BUFF=`sed -n '1,4p' first.txt`; awk -v buff="$BUFF" '!/^1 header$/{print}/^1 header$/{print buff;print}' second.txt
A
B
C
D
1 header
123
456
2 header
123
A
B
C
D
1 header
123
2 header
123
456

You store in a variable the 4th lines using sed -n '1,4p'. Then you pass that variable content to awk by using the syntax -v buff="$BUFF". The core of the sed program will read the second file and for each line that does not contain exactly 1 header your print the line, when you reach a line whose content is 1 header then you print the 4 lines extracted with your sed command before printing that specific line.

使用sed -n '1,4p'将第4行变量存储在一个变量中。然后使用语法-v buff=“$ buff”将变量内容传递给awk。sed的核心程序将读取的第二个文件,每一行,不包含1头打印线,当你到达一条线的内容是1头然后你打印4行提取sed命令之前印刷特定行。

#1


1  

Could you please try following and let me know if this helps you.

你能不能试着跟我说一下,如果这对你有帮助的话。

awk '
FNR==NR{
  if(FNR%4==0 && FNR>1){
     a[++i]=val ORS $0;
     val="";
     next};
  val=val?val ORS $0:$0;
  next
}
count==3{
  print a[++j] ORS val;
  count="";
  val=""}
/header/{
  count++}
{
  val=val?val ORS $0:$0
}
END{
  if(count){
    print a[++j] ORS val}
}' first.txt second.txt

Output will be as follows.

输出将如下所示。

A
B
C
D
1 header
123
456
2 header
123
A
B
C
D
1 header
123
2 header
123
456

Explanation: Adding explanation of above code too now.

说明:现在也添加了上述代码的解释。

awk '
FNR==NR{                 ##Checking condition if FNR value is eqaul to NR value which will be TRUE when first Input_file is being read.
  if(FNR%4==0 && FNR>1){ ##Checking condition if line is completly divided by 4 and NOT the first line then do following.
     a[++i]=val ORS $0;  ##Creating an array named a whose index is variable i increasing value and value is variable val value along with new line and current line.
     val="";             ##Nullifying the variable val here.
     next};              ##Using next keyword to skip al next statements here.
  val=val?val ORS $0:$0; ##Creating variable named val whose value is concatenating its own value in it.
  next                   ##Using next keyword to skip all further statements from here now.
}
count==3{                ##Checking condition if variable named count is 3 then do following.
  print a[++j] ORS val;  ##Printing value of array a whose index is variable j with increasing value of 1 in it then ORS and value of variable val here.
  count="";              ##Nullifying the variable count here.
  val=""}                ##Nullifying the variable val here now.
/header/{                ##Checking condition if a line is having string header in it then do following.
  count++}               ##Increasing the value of variable count with 1 here.
{
  val=val?val ORS $0:$0  ##Creating variable named val whose value is concatenating its own values.
}
END{                     ##Starting END section here of awk.
  if(count){             ##Checking condition if variable count value is NOT NULL then do following.
    print a[++j] ORS val}##Printing value of array a whose index is variable j and ORS and then value of variable val here.
}' first.txt second.txt  ##Mentioning Input_file(s) named first.txt and second.txt here.

#2


1  

Then it will be quite straightforwards:

那就直截了当:

BUFF=`sed -n '1,4p' first.txt`; awk -v buff="$BUFF" '!/^1 header$/{print}/^1 header$/{print buff;print}' second.txt
A
B
C
D
1 header
123
456
2 header
123
A
B
C
D
1 header
123
2 header
123
456

You store in a variable the 4th lines using sed -n '1,4p'. Then you pass that variable content to awk by using the syntax -v buff="$BUFF". The core of the sed program will read the second file and for each line that does not contain exactly 1 header your print the line, when you reach a line whose content is 1 header then you print the 4 lines extracted with your sed command before printing that specific line.

使用sed -n '1,4p'将第4行变量存储在一个变量中。然后使用语法-v buff=“$ buff”将变量内容传递给awk。sed的核心程序将读取的第二个文件,每一行,不包含1头打印线,当你到达一条线的内容是1头然后你打印4行提取sed命令之前印刷特定行。