我如何解析linux中的大文件

时间:2022-08-03 21:37:12

I am beginner for Linux. I have the following flat file test.txt

我是Linux的初学者。我有以下平面文件test.txt

Iteration 1   
Telephony   


Pass/Fail

5.1.1.1   voiceCallPhoneBook   50   45
5.1.1.4   voiceCallPhoneHistory   50   49
5.1.1.7   receiveCall   100   100
5.1.1.8   deleteContacts   20   19
5.1.1.9   addContacts   20   20
Telephony   16:47:42
Messaging   


Pass/Fail

5.1.2.3   openSMS   50   49
5.1.2.1   smsManuallyEntryOption   50   50
5.1.2.2   smsSelectContactsOption   50   50
Messaging   03:26:31
Email   


Pass/Fail

Email   00:00:48
Email   


Pass/Fail

Email   00:00:40
PIM   


Pass/Fail

5.1.6.1   addAppointment   5   0
5.1.6.2   setAlarm   1   0
5.1.6.3   deleteAppointment   5   0
5.1.6.4   deleteAlarm   1   0
5.1.6.5   addTask   1   0
5.1.6.6   openTask   1   0
5.1.6.7   deleteTask   1   0
PIM   00:03:06
Multi-Media   

teration 2   
Telephony   


Pass/Fail

5.1.1.1   voiceCallPhoneBook   50   47
5.1.1.4   voiceCallPhoneHistory   50   50
5.1.1.7   receiveCall   100   100
5.1.1.8   deleteContacts   20   20
5.1.1.9   addContacts   20   20
Telephony   04:02:05
Messaging   


Pass/Fail

5.1.2.3   openSMS   50   50
5.1.2.1   smsManuallyEntryOption   50   50
5.1.2.2   smsSelectContactsOption   50   50
Messaging   03:20:01
Email   


Pass/Fail

Email   00:00:47
Email   


Pass/Fail

Email   00:00:40
PIM   


Pass/Fail

5.1.6.1   addAppointment   5   5
5.1.6.2   setAlarm   1   1
5.1.6.3   deleteAppointment   5   5
5.1.6.4   deleteAlarm   1   1
5.1.6.5   addTask   1   1
5.1.6.6   openTask   1   1
5.1.6.7   deleteTask   1   1
PIM   00:09:20
Multi-Media   

I want to count the number of occurrences for specific word in the file Eg: if i search with "voiceCallPhoneBook" it's display as 2 times.

我想计算文件中特定单词的出现次数Eg:如果我使用“voiceCallPhoneBook”进行搜索,则显示为2次。

i can use

我可以用

cat reports.txt | grep "5.1.1.4" | cut -d' ' -f1,4,7,10 |

after running this script i got output like below

运行此脚本后,我得到如下输出

5.1.1.4 voiceCallPhoneBook  50  45
5.1.1.4 voiceCallPhoneBook  50  47

It is very large file and i want to make use of loops with bash/awk scripts and also find the average of SUM of 3rd and 4th column value. i am struggling to write in bash scripts. It would be appreciated someone can give the solution for it.

它是一个非常大的文件,我想利用bash / awk脚本的循环,并找到第3和第4列值的平均值。我正在努力用bash脚本编写。可以理解有人可以为它提供解决方案。

Thanks

谢谢

2 个解决方案

#1


0  

This will search for lines containing 5.1.1.4
Make a tally of the 3rd and 4th columns
Then print them all out

这将搜索包含5.1.1.4的行。制作第3和第4列的计数然后将它们全部打印出来

awk '/^5\.?\.?\.?/ {a[$1" " $2] +=$3 ; b[$1" " $2] +=$4 }
END{ for (k in a){
      printf("%-50s%-10i%-10i\n",k,a[k],b[k])}
}' $1

Duplicate from earlier today is here Parse the large test files using awk

从今天早些时候复制是在这里使用awk解析大型测试文件

With headers avg and Occurence count and formatted a bit neater for easier reading :)

随着标题avg和Occurence计数和格式有点整洁,更容易阅读:)

awk 'BEGIN{
            printf("%-50s%-10s%-10s%-10s\n","Name","Col3 Tot","Col4 Tot","Ocurr")
}

/^5\.?\.?\.?/ {
        count++
        c3 = c3 + $3
        c4 = c4 + $4
        a[$1" " $2] +=$3
        b[$1" " $2] +=$4
        c[$1" " $2]++
}

END{
        for (k in a)
                {printf("%-50s%-10i%-10i%-10i\n",k,a[k],b[k],c[k])}

                print "col3 avg: " c3/count "\ncol4 avg: " c4/count
}' $1

#2


1  

#!/usr/bin/awk -f
BEGIN{
c3 = 0
c4 = 0
count = 0
}

/voiceCallPhoneBook/{
    c3 = c3 + $3;
    c4 = c4 + $4;
    count++;
}

END{
 print "column 3 avg: " c3/count
 print "column 4 avg: " c4/count
}

1) save it in a file for example countVoiceCall.awk

1)将其保存在文件中,例如countVoiceCall.awk

2) awk -f countVoiceCall.awk sample.txt

2)awk -f countVoiceCall.awk sample.txt

output:

输出:

column 3 avg: 50
column 4 avg: 46

Briefly explain:

简要说明一下:

a.    BEGIN{...} block uses for variables initialization
b.    /PATTERN/{...}  blocks uses to search your keyword, for example "voiceCallPhoneBook"
c.    END{...} block uses for print the results

#1


0  

This will search for lines containing 5.1.1.4
Make a tally of the 3rd and 4th columns
Then print them all out

这将搜索包含5.1.1.4的行。制作第3和第4列的计数然后将它们全部打印出来

awk '/^5\.?\.?\.?/ {a[$1" " $2] +=$3 ; b[$1" " $2] +=$4 }
END{ for (k in a){
      printf("%-50s%-10i%-10i\n",k,a[k],b[k])}
}' $1

Duplicate from earlier today is here Parse the large test files using awk

从今天早些时候复制是在这里使用awk解析大型测试文件

With headers avg and Occurence count and formatted a bit neater for easier reading :)

随着标题avg和Occurence计数和格式有点整洁,更容易阅读:)

awk 'BEGIN{
            printf("%-50s%-10s%-10s%-10s\n","Name","Col3 Tot","Col4 Tot","Ocurr")
}

/^5\.?\.?\.?/ {
        count++
        c3 = c3 + $3
        c4 = c4 + $4
        a[$1" " $2] +=$3
        b[$1" " $2] +=$4
        c[$1" " $2]++
}

END{
        for (k in a)
                {printf("%-50s%-10i%-10i%-10i\n",k,a[k],b[k],c[k])}

                print "col3 avg: " c3/count "\ncol4 avg: " c4/count
}' $1

#2


1  

#!/usr/bin/awk -f
BEGIN{
c3 = 0
c4 = 0
count = 0
}

/voiceCallPhoneBook/{
    c3 = c3 + $3;
    c4 = c4 + $4;
    count++;
}

END{
 print "column 3 avg: " c3/count
 print "column 4 avg: " c4/count
}

1) save it in a file for example countVoiceCall.awk

1)将其保存在文件中,例如countVoiceCall.awk

2) awk -f countVoiceCall.awk sample.txt

2)awk -f countVoiceCall.awk sample.txt

output:

输出:

column 3 avg: 50
column 4 avg: 46

Briefly explain:

简要说明一下:

a.    BEGIN{...} block uses for variables initialization
b.    /PATTERN/{...}  blocks uses to search your keyword, for example "voiceCallPhoneBook"
c.    END{...} block uses for print the results