I have 2 files:
我有两个文件:
- first.txt
- first.txt
- second.txt
- second.txt
first.txt contain:
第一。txt包含:
A
B
C
D
A
B
C
D
second.txt contain:
第二。txt包含:
1 header
123
456
2 header
123
1 header
123
2 header
123
456
How to add and sort every 1 header 123 to 2 header 123 of second.txt into every ABCD of first.txt like below:
如何添加和排序每一个头123到2头123秒。txt首先进入到ABCD中。txt像下图:
A
B
C
D
1 header
123
456
2 header
123
A
B
C
D
1 header
123
2 header
123
456
I tried using cat first.txt second.txt
, but it only output like below:
我先用了猫。三秒。txt,但输出如下:
A
B
C
D
A
B
C
D
1 header
123
456
2 header
123
1 header
123
2 header
123
456
Do you guys have any ideas?
These are sample problem, the real problem have million rows of textline, due to sensitive dataset I can only share sample problem only.
你们有什么想法吗?这些都是样本问题,真正的问题有上百万行文本线,由于敏感的数据集我只能分享样本问题。
Thanks,
Am
谢谢,我
2 个解决方案
#1
1
Could you please try following and let me know if this helps you.
你能不能试着跟我说一下,如果这对你有帮助的话。
awk '
FNR==NR{
if(FNR%4==0 && FNR>1){
a[++i]=val ORS $0;
val="";
next};
val=val?val ORS $0:$0;
next
}
count==3{
print a[++j] ORS val;
count="";
val=""}
/header/{
count++}
{
val=val?val ORS $0:$0
}
END{
if(count){
print a[++j] ORS val}
}' first.txt second.txt
Output will be as follows.
输出将如下所示。
A
B
C
D
1 header
123
456
2 header
123
A
B
C
D
1 header
123
2 header
123
456
Explanation: Adding explanation of above code too now.
说明:现在也添加了上述代码的解释。
awk '
FNR==NR{ ##Checking condition if FNR value is eqaul to NR value which will be TRUE when first Input_file is being read.
if(FNR%4==0 && FNR>1){ ##Checking condition if line is completly divided by 4 and NOT the first line then do following.
a[++i]=val ORS $0; ##Creating an array named a whose index is variable i increasing value and value is variable val value along with new line and current line.
val=""; ##Nullifying the variable val here.
next}; ##Using next keyword to skip al next statements here.
val=val?val ORS $0:$0; ##Creating variable named val whose value is concatenating its own value in it.
next ##Using next keyword to skip all further statements from here now.
}
count==3{ ##Checking condition if variable named count is 3 then do following.
print a[++j] ORS val; ##Printing value of array a whose index is variable j with increasing value of 1 in it then ORS and value of variable val here.
count=""; ##Nullifying the variable count here.
val=""} ##Nullifying the variable val here now.
/header/{ ##Checking condition if a line is having string header in it then do following.
count++} ##Increasing the value of variable count with 1 here.
{
val=val?val ORS $0:$0 ##Creating variable named val whose value is concatenating its own values.
}
END{ ##Starting END section here of awk.
if(count){ ##Checking condition if variable count value is NOT NULL then do following.
print a[++j] ORS val}##Printing value of array a whose index is variable j and ORS and then value of variable val here.
}' first.txt second.txt ##Mentioning Input_file(s) named first.txt and second.txt here.
#2
1
Then it will be quite straightforwards:
那就直截了当:
BUFF=`sed -n '1,4p' first.txt`; awk -v buff="$BUFF" '!/^1 header$/{print}/^1 header$/{print buff;print}' second.txt
A
B
C
D
1 header
123
456
2 header
123
A
B
C
D
1 header
123
2 header
123
456
You store in a variable the 4th lines using sed -n '1,4p'
. Then you pass that variable content to awk
by using the syntax -v buff="$BUFF"
. The core of the sed
program will read the second file and for each line that does not contain exactly 1 header
your print the line, when you reach a line whose content is 1 header
then you print the 4 lines extracted with your sed
command before printing that specific line.
使用sed -n '1,4p'将第4行变量存储在一个变量中。然后使用语法-v buff=“$ buff”将变量内容传递给awk。sed的核心程序将读取的第二个文件,每一行,不包含1头打印线,当你到达一条线的内容是1头然后你打印4行提取sed命令之前印刷特定行。
#1
1
Could you please try following and let me know if this helps you.
你能不能试着跟我说一下,如果这对你有帮助的话。
awk '
FNR==NR{
if(FNR%4==0 && FNR>1){
a[++i]=val ORS $0;
val="";
next};
val=val?val ORS $0:$0;
next
}
count==3{
print a[++j] ORS val;
count="";
val=""}
/header/{
count++}
{
val=val?val ORS $0:$0
}
END{
if(count){
print a[++j] ORS val}
}' first.txt second.txt
Output will be as follows.
输出将如下所示。
A
B
C
D
1 header
123
456
2 header
123
A
B
C
D
1 header
123
2 header
123
456
Explanation: Adding explanation of above code too now.
说明:现在也添加了上述代码的解释。
awk '
FNR==NR{ ##Checking condition if FNR value is eqaul to NR value which will be TRUE when first Input_file is being read.
if(FNR%4==0 && FNR>1){ ##Checking condition if line is completly divided by 4 and NOT the first line then do following.
a[++i]=val ORS $0; ##Creating an array named a whose index is variable i increasing value and value is variable val value along with new line and current line.
val=""; ##Nullifying the variable val here.
next}; ##Using next keyword to skip al next statements here.
val=val?val ORS $0:$0; ##Creating variable named val whose value is concatenating its own value in it.
next ##Using next keyword to skip all further statements from here now.
}
count==3{ ##Checking condition if variable named count is 3 then do following.
print a[++j] ORS val; ##Printing value of array a whose index is variable j with increasing value of 1 in it then ORS and value of variable val here.
count=""; ##Nullifying the variable count here.
val=""} ##Nullifying the variable val here now.
/header/{ ##Checking condition if a line is having string header in it then do following.
count++} ##Increasing the value of variable count with 1 here.
{
val=val?val ORS $0:$0 ##Creating variable named val whose value is concatenating its own values.
}
END{ ##Starting END section here of awk.
if(count){ ##Checking condition if variable count value is NOT NULL then do following.
print a[++j] ORS val}##Printing value of array a whose index is variable j and ORS and then value of variable val here.
}' first.txt second.txt ##Mentioning Input_file(s) named first.txt and second.txt here.
#2
1
Then it will be quite straightforwards:
那就直截了当:
BUFF=`sed -n '1,4p' first.txt`; awk -v buff="$BUFF" '!/^1 header$/{print}/^1 header$/{print buff;print}' second.txt
A
B
C
D
1 header
123
456
2 header
123
A
B
C
D
1 header
123
2 header
123
456
You store in a variable the 4th lines using sed -n '1,4p'
. Then you pass that variable content to awk
by using the syntax -v buff="$BUFF"
. The core of the sed
program will read the second file and for each line that does not contain exactly 1 header
your print the line, when you reach a line whose content is 1 header
then you print the 4 lines extracted with your sed
command before printing that specific line.
使用sed -n '1,4p'将第4行变量存储在一个变量中。然后使用语法-v buff=“$ buff”将变量内容传递给awk。sed的核心程序将读取的第二个文件,每一行,不包含1头打印线,当你到达一条线的内容是1头然后你打印4行提取sed命令之前印刷特定行。