如何获得grep的每条输出线的长度

I am very new to bash scripting. I have a network trace file I want to parse. Part of the trace file is (two packets):

我对bash脚本非常陌生。我有一个我想要解析的网络跟踪文件。跟踪文件的一部分是(两个数据包):

    [continues...]
    +---------+---------------+----------+
    05:00:00,727,744   ETHER
    |0  
    |00|03|a0|09|5c|1c|00|10|07|df|a4|20|08|00|45|00|00|38|e7|55|

    +---------+---------------+----------+
    05:00:00,727,751   ETHER
    |0  
    |00|03|a0|09|5c|1c|00|10|07|df|a4|20|08|00|45|00|00|38|e7|56|00|00|3a|01|

    [continues...]

For each packet, I want to print the time stamp, and the length of the packet (the hex values coming on the next line after |0 header) so the output will look like:

对于每个包，我想打印时间戳，以及包的长度(|头后下一行的十六进制值)，因此输出将如下所示:

    05:00:00.727744 20 bytes
    05:00:00.727751 24 bytes

I can get the line with time stamp and the packets separately using grep in bash:

我可以在bash中使用grep分别获取带有时间戳的行和数据包:

times=$(grep  '..\:..\:' $fileName)
packets=$(grep  '..|..|' $fileName)

But I can't work with the separate output lines after that. The whole result is concatenated in the two variables "times" and "packets". How can I get the length of each packet?

但是在那之后我就不能处理单独的输出了。整个结果连接在两个变量“times”和“packet”中。我怎样才能得到每包的长度?

P.S. a good reference that really explains how to do bash programming, rather than just doing examples would be appreciated.

附注:一个很好的参考，真正地解释如何做bash编程，而不是仅仅做示例，将是值得赞赏的。

2 个解决方案

#1

You really don't want to do such things with your shell.

你真的不想用你的壳做这些事。

You want to write a real parser that understands the format to output the needed informations.

您希望编写一个真正的解析器，它能够理解输出所需信息的格式。

For a quick and dirty hack you can do something like that:

对于一个快速而肮脏的黑客，你可以做这样的事情:

perl -wne 'print "$& " if /^\d\S*/; print split(/\|/)-2, " bytes\n" if /^\|..\|/'

#2

Okay, with plain old shell...

好的，用普通的旧壳……

You can get the length of the line like this:

可以得到这条线的长度

line="|00|03|a0|09|5c|1c|00|10|07|df|a4|20|08|00|45|00|00|38|e7|55|"
wc -c<<<$line
62

There are sixty two characters in that line. Think of each character as |00 where 00 can be any digit. In that case, there's an extra | on the end. Plus, the wc -c includes the NL on the end.

这一行有62个字符。把每个字符都看成|00,00可以是任意数字。在这种情况下，最后还有一个额外的|。另外，wc -c末尾包含NL。

So, if we take the value of wc -c, and subtract 2, we get 60. If we divide that by 3, we get 20 which is the number of characters.

如果取wc -c的值，减去2，就得到60。如果除以3，就得到20，也就是字符数。

Okay, now we need a little loop, figure out the various lines, and then parse them:

好的，现在我们需要一个小的循环，找出不同的行，然后解析它们:

#! /bin/bash

while read line
do
    if [[ $line =~ ^[[:digit:]]{2} ]]
    then
        echo -n "${line% *}"
    elif [[ $line =~ ^\|[[:digit:]]{2} ]]
    then
        length=$(wc -c<<<$line)
        ((length-=2))
        ((length=length/3))
        echo "$length bytes"
    fi
done < test.txt

There a PURE BASH solution to your problems!

您的问题有一个纯粹的BASH解决方案!

You're a beginning Bash programmer, and you have no idea what's going on...

你是一个初学者Bash程序员，你不知道发生了什么……

Let's take this one step at a time:

让我们一步一步来:

A common way to loop through a file in BASH is using a while read loop. This combines the while with a read:

在BASH中循环遍历文件的一种常见方法是使用while read循环。这结合了一段时间的阅读:

while read line
do
   echo "My line is '$line'"
done < test.txt

Each line in test.txt is being read into the $line shell variable.

在每一行测试。txt被读入$line shell变量。

Let's take the next one:

我们来看下一个:

if [[ $line =~ ^[[:digit:]]{2} ]]

This is an if statement. Always use the [[ ... ]] brackets because they fix issues with the shell interpolating stuff. Plus, they have a bit more power.

这是一个if语句。总是用[[…][英语背诵文选因为他们解决了壳内插的问题。另外，它们还有点能量。

The =~ is a regular expression match. The [[:digit:]] matches any digit. The ^ anchors the regular expression to the beginning of the line, and {2} means I want exactly two of these. This says if I match a line that starts with two digits (which is your timestamp line), execute this if clause.

=~是一个正则表达式匹配。[[:digit:]]匹配任意数字。^锚的正则表达式的开始,和{ 2 }我想要两个。这表示如果我匹配以两位数开头的行(即您的时间戳行)，执行这个if子句。

${line% *} is a pattern filter. The % says to match the (glob) smallest glob pattern to the right and filter it from my $line variable. I use this to remove the ETHER from my line. The -n tells echo not to do a NL.

${line% *}是一个模式过滤器。%表示要匹配右边(glob)最小的glob模式，并从$line变量中过滤它。我用这个从我的直线上除去醚。n告诉echo不要做NL。

Let's take my elif which is an else if clause.

我们取我的elif，这是一个else if子句。

elif [[ $line =~ ^\|[[:digit:]]{2} ]]

Again, I am matching a regular expression. This regular expression starts with (The ^) a |. I have to put a backslash in front because | is a magical regular expression character and \ kills the magic. It's now just a pipe. Then, that's followed by two digits. Note this skips |0 but catches |00.

同样，我匹配一个正则表达式。这个正则表达式开头(^)|。我必须在前面加上一个反斜杠，因为|是一个神奇的正则表达式字符，并杀死魔法。它现在只是一个管道。然后是两个数字。注意，它跳过|但捕获|00。

Now, we have to do some calculations:

现在，我们要做一些计算:

length=$(wc -c<<<$line)

The $(...) say to execute the enclosed command and resubstitute it back in the line. The wc -c counts the characters and <<<$line is what we're counting. This gave us 62 characters. We have to subtract 2, then divide by 3. That's the next two lines:

$(…)表示执行所包含的命令并将其重新替换回行中。wc -c计数字符，<<<$line是我们所计数的。这给了我们62个字符。我们要减去2，然后除以3。这是接下来的两句话:

((length-=2))
((length/=3))

The ((...)) allows me to do integer based math. The first subtracts 2 from $length and the next divides it by 3. Now, I can echo this out:

(…)允许我做基于整数的数学运算。第一个将2减去$length，下一个将它除以3。现在，我可以重复一下:

echo "$length bytes"

And that's our pure Bash answer to this question.

这就是我们对这个问题的答案。

#1