Bash:将输出读入具有特殊字符的字符串。

时间:2020-12-21 21:16:50

I'm using TShark to read TCP streams of a PCAP into a file of a set format. My code:

我正在使用TShark将PCAP的TCP流读到设置格式的文件中。我的代码:

#!/bin/bash
OUT="*/temp/Temp.txt"
NEW="\"REQ:"
i=0
echo "Generating conversations..."
echo ""  > $OUT
while [ "$COUNT" != 1 ]
do
    BLOCK="$(tshark -r */browser.pcap -q -z follow,tcp,ascii,$i)"
    SUB=$(echo "$BLOCK" | sed -n '5p')
    PORT=${SUB##*:}
    BLOCK="${BLOCK//$'\t'/\"RES:}"
    BLOCK=$(echo "$BLOCK" | tail -n +6)
    BLOCK=$(echo "$BLOCK" | head -n -1)
    COUNT=$(echo "$BLOCK" | wc -l)
    BLOCK=$(echo "$BLOCK" | awk '{print $j"\""}')
    j=1
    while [ $j -lt $(($COUNT+2)) ]
    do
        CHECK=$(echo "$BLOCK" | sed $j'q;d')
        PREF=${CHECK:0:5}
        if [ "$PREF" != "\"RES:" ]; then
            CHECK=$NEW$CHECK
            BLOCK=$(echo "$BLOCK" | sed $j's/.*/'$CHECK'/')
        fi
        j=$(($j+1))
    done
    if [ "$COUNT" != 1 ]; then
        echo ""  >> $OUT
        echo "\$" >> $OUT
        echo "tag = \"gen."$i"\"" >> $OUT
        echo "port = \""$PORT"\"" >> $OUT
        echo "base = \"TCP\"" >> $OUT
        echo "payloads:" >> $OUT
        echo "$BLOCK" >> $OUT
        echo "Generated conversation "$i
    fi
    i=$(($i+1))
done
echo "Generation complete!"

When I run this, I get the following error for each conversation read:

当我运行这个时,我得到如下错误:

> sed: -e expression #1, char 18: unterminated `s' command

I believe the problem lies in the call to TShark on line 9. Originally I used the "raw" argument for the command, which outputs raw hex data. This worked and output correctly. However, my task requires outputting ASCII data. Changing "raw" to "ascii" (both recognized by TShark) causes the aforementioned errors. I believe this is because the ASCII data in the read packets contains special characters; a small piece of data generated by line 9 in command line is:

我认为问题在于9号线对TShark的电话。最初,我使用了命令的“原始”参数,该命令输出原始的十六进制数据。这是正确的工作和输出。但是,我的任务需要输出ASCII数据。将“原始”改为“ascii”(两者都被TShark识别)导致上述错误。我认为这是因为读包中的ASCII数据包含特殊字符;在命令行中由第9行生成的一小段数据是:

..7.<.......Y.|.$.......2...W...v.'#

My question is are the special characters in the ASCII data I'm parsing causing the sed errors? If so, how could I make bash ignore them? Thanks!

我的问题是,我正在解析的ASCII数据中的特殊字符是否导致了sed错误?如果是这样,我如何让bash忽略它们?谢谢!

Edit- I am ultimately trying to get the output of this TShark command, which looks like this...

编辑-我最终想要得到这个TShark命令的输出,它看起来像这样…

===================================================================
Follow: tcp,raw
Filter: tcp.stream eq 4
Node 0: 10.211.55.3:58733
Node 1: 157.127.239.146:80
47455420687474703a2f2f73656d696e617270726f6a656374732e6f72672f6373732e7068703f7374796c6573686565743d393620485454502f312e310d0a486f73743a2073656d696e617270726f6a656374732e6f72670d0a557365722d4167656e743a204d6f7a696c6c612f352e3020285831313b204c696e7578207838365f36343b2072763a33382e3029204765636b6f2f32303130303130312046697265666f782f33382e300d0a4163636570743a20746578742f6373732c2a2f2a3b713d302e310d0a4163636570742d4c616e67756167653a20656e2d55532c656e3b713d302e350d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a526566657265723a20687474703a2f2f73656d696e617270726f6a656374732e6f72672f632f74736861726b2d666f6c6c6f772d7463702d73747265616d0d0a436f6f6b69653a205f5f6366647569643d646564613432383039663566623634356461663239333963366235336565653764313433373734383236323b206d7962625b6c61737476697369745d3d313433373734383333353b206d7962625b6c6173746163746976655d3d313433373734383333353b207369643d31663739303463373761383761656234363537306131636161316462336161310d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a0d0a
    485454502f312e3120323030204f4b0d0a446174653a204672692c203234204a756c20323031352031343a33313a303420474d540d0a436f6e74656e742d547970653a20746578742f6373730d0a582d506f77657265642d42793a205048502f352e342e31360d0a5365727665723a20636c6f7564666c6172652d6e67696e780d0a43462d5241593a20323062303533396434326436313365332d4c41580d0a436f6e74656e742d456e636f64696e673a20677a69700d0a436f6e74656e742d4c656e6774683a203134320d0a4167653a20300d0a5669613a20312e31206e657070737730390d0a0d0a1f8b08000000000000036c8cbd0a03211084ebf52916ac13f2db689bcb6b04bd15919caeac060e42de3d981469325f37df305bcf4ee896436b2e067c2af06ebe47e14721837aba0eac8299171683faf88955e05928c8a6733578a82b365e12a1be9c063fefb977ceff27d511a5120d9eeb6a1564273195efe37e37aa970278030000ffff0300cc348afaa1000000
47455420687474703a2f2f7777772e676f6f676c652d616e616c79746963732e636f6d2f616e616c79746963732e6a7320485454502f312e310d0a486f73743a207777772e676f6f676c652d616e616c79746963732e636f6d0d0a557365722d4167656e743a204d6f7a696c6c612f352e3020285831313b204c696e7578207838365f36343b2072763a33382e3029204765636b6f2f32303130303130312046697265666f782f33382e300d0a4163636570743a202a2f2a0d0a4163636570742d4c616e67756167653a20656e2d55532c656e3b713d302e350d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a526566657265723a20687474703a2f2f73656d696e617270726f6a656374732e6f72672f632f74736861726b2d666f6c6c6f772d7463702d73747265616d0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a49662d4d6f6469666965642d53696e63653a205468752c203039204a756c20323031352032333a35303a353620474d540d0a0d0a
    485454502f312e3120333034204e6f74204d6f6469666965640d0a446174653a204672692c203234204a756c20323031352031343a33303a353520474d540d0a457870697265733a204672692c203234204a756c20323031352031353a35313a343120474d540d0a43616368652d436f6e74726f6c3a207075626c69632c206d61782d6167653d373230300d0a566172793a204163636570742d456e636f64696e670d0a436f6e6e656374696f6e3a20636c6f73650d0a5669613a20312e31206e657070737730390d0a0d0a
===================================================================

...into a custom format for a program to read. The above output is in the working raw hex data format. The custom format looks like this for the corresponding conversation:

为程序的读入定制格式。上面的输出是工作的十六进制数据格式。自定义格式如下所示:

$
tag = "gen.4"
port = "58733"
base = "TCP"
payloads:
"REQ:47455420687474703a2f2f73656d696e617270726f6a656374732e6f72672f6373732e7068703f7374796c6573686565743d393620485454502f312e310d0a486f73743a2073656d696e617270726f6a656374732e6f72670d0a557365722d4167656e743a204d6f7a696c6c612f352e3020285831313b204c696e7578207838365f36343b2072763a33382e3029204765636b6f2f32303130303130312046697265666f782f33382e300d0a4163636570743a20746578742f6373732c2a2f2a3b713d302e310d0a4163636570742d4c616e67756167653a20656e2d55532c656e3b713d302e350d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a526566657265723a20687474703a2f2f73656d696e617270726f6a656374732e6f72672f632f74736861726b2d666f6c6c6f772d7463702d73747265616d0d0a436f6f6b69653a205f5f6366647569643d646564613432383039663566623634356461663239333963366235336565653764313433373734383236323b206d7962625b6c61737476697369745d3d313433373734383333353b206d7962625b6c6173746163746976655d3d313433373734383333353b207369643d31663739303463373761383761656234363537306131636161316462336161310d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a0d0a"
"RES:485454502f312e3120323030204f4b0d0a446174653a204672692c203234204a756c20323031352031343a33313a303420474d540d0a436f6e74656e742d547970653a20746578742f6373730d0a582d506f77657265642d42793a205048502f352e342e31360d0a5365727665723a20636c6f7564666c6172652d6e67696e780d0a43462d5241593a20323062303533396434326436313365332d4c41580d0a436f6e74656e742d456e636f64696e673a20677a69700d0a436f6e74656e742d4c656e6774683a203134320d0a4167653a20300d0a5669613a20312e31206e657070737730390d0a0d0a1f8b08000000000000036c8cbd0a03211084ebf52916ac13f2db689bcb6b04bd15919caeac060e42de3d981469325f37df305bcf4ee896436b2e067c2af06ebe47e14721837aba0eac8299171683faf88955e05928c8a6733578a82b365e12a1be9c063fefb977ceff27d511a5120d9eeb6a1564273195efe37e37aa970278030000ffff0300cc348afaa1000000"
"REQ:47455420687474703a2f2f7777772e676f6f676c652d616e616c79746963732e636f6d2f616e616c79746963732e6a7320485454502f312e310d0a486f73743a207777772e676f6f676c652d616e616c79746963732e636f6d0d0a557365722d4167656e743a204d6f7a696c6c612f352e3020285831313b204c696e7578207838365f36343b2072763a33382e3029204765636b6f2f32303130303130312046697265666f782f33382e300d0a4163636570743a202a2f2a0d0a4163636570742d4c616e67756167653a20656e2d55532c656e3b713d302e350d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a526566657265723a20687474703a2f2f73656d696e617270726f6a656374732e6f72672f632f74736861726b2d666f6c6c6f772d7463702d73747265616d0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a49662d4d6f6469666965642d53696e63653a205468752c203039204a756c20323031352032333a35303a353620474d540d0a0d0a"
"RES:485454502f312e3120333034204e6f74204d6f6469666965640d0a446174653a204672692c203234204a756c20323031352031343a33303a353520474d540d0a457870697265733a204672692c203234204a756c20323031352031353a35313a343120474d540d0a43616368652d436f6e74726f6c3a207075626c69632c206d61782d6167653d373230300d0a566172793a204163636570742d456e636f64696e670d0a436f6e6e656374696f6e3a20636c6f73650d0a5669613a20312e31206e657070737730390d0a0d0a"

1 个解决方案

#1


1  

You can tell bash to not interpret metacharacters by quoting the variable expansion:

您可以通过引用变量扩展来告诉bash不解释元字符:

sed $j's/.*/'"$CHECK"'/'

In fact, there is no reason to use single quotes in the above, so you could just double-quote the entire command argument:

事实上,在上面没有理由使用单引号,因此您可以只引用整个命令参数:

sed "${j}s/.*/$CHECK/"

However, neither of the above will tell sed to avoid interpreting special characters in the replacement part of the s command, so if $CHECK contains a /, then that will prematurely terminate the replacement.

但是,上述两种方法都不会告诉sed避免在s命令的替换部分中解释特殊字符,因此如果$CHECK包含一个/,那么就会过早地终止替换。

So the question really is, is there a better way of accomplishing this:

所以问题是,有没有更好的方法来实现这个目标

BLOCK=$(echo "$BLOCK" | sed $j's/.*/'$CHECK'/')

Apparently, the goal is to replace line $j of the value of $BLOCK with the value of $CHECK. One way to do this, using awk:

显然,目标是用$CHECK的值替换$块的$j。有一种方法,使用awk:

BLOCK="$(awk -v repl="$CHECK" 'NR==$j{print repl;next}1')"

Notes:

  1. Although I didn't fix it in my example, it is very bad style to use ALL CAPS for shell variables. Normally, shell variables in ALL CAPS are reserved for use as known exported variables by bash or system utilities (eg. $PATH; $IFS; $TERM; etc.). Your own variables should be lower-case to avoid conflicts.

    虽然我没有在我的例子中修复它,但是使用所有的shell变量帽是非常糟糕的风格。通常情况下,所有大写的shell变量都是通过bash或系统实用程序(例如,)保留的,以作为已知的导出变量。美元的道路;如果美元;美元的术语;等等)。您自己的变量应该是小写的,以避免冲突。

  2. The full loop that the command is excerpted from could probably be all implemented more efficiently and more cleanly (and more understandably) in awk. Based on the sample output, the following would probably work:

    命令所摘录的完整循环可能在awk中更高效、更干净(也更可以理解)。根据示例输出,可能会有以下工作:

    echo "Generating conversations..."
    i=0
    while 
        tshark -r */browser.pcap -q -z follow,tcp,ascii,$i |
        awk -v idx=$i -v '
          NR==4 { n = split($0, a, /:/); port = a[n]; }
          NR<6  { next; }
          /^=========/ { exit port != 0; }
          port  { print "$"
                  printf "tag = \"gen.%d\"" idx
                  print "port = \"%s\"" port
                  print "base = \"TCP\""
                  print "payloads:"
                  port = 0
                }
          /^\t/ { printf "\"RES:%s\"" substr($0, 2) "\""; next; }
                { printf "\"REQ:%s\"" $0 "\""; }
        ' >> $OUT;
    do
        echo "Generated conversation "$i
    done
    echo "Generation complete!"
    

    I didn't try it. It may well be buggy. I don't understand the termination condition, so I just made a guess. I'm not sure if you really meant to extract the port number from line 5 (as in the code) or line 4 (as in the example.)

    我没有试一试。很可能是马车。我不了解终止条件,所以我只是猜测。我不确定您是否真的打算从第5行(如代码)或第4行(如示例中)提取端口号。

#1


1  

You can tell bash to not interpret metacharacters by quoting the variable expansion:

您可以通过引用变量扩展来告诉bash不解释元字符:

sed $j's/.*/'"$CHECK"'/'

In fact, there is no reason to use single quotes in the above, so you could just double-quote the entire command argument:

事实上,在上面没有理由使用单引号,因此您可以只引用整个命令参数:

sed "${j}s/.*/$CHECK/"

However, neither of the above will tell sed to avoid interpreting special characters in the replacement part of the s command, so if $CHECK contains a /, then that will prematurely terminate the replacement.

但是,上述两种方法都不会告诉sed避免在s命令的替换部分中解释特殊字符,因此如果$CHECK包含一个/,那么就会过早地终止替换。

So the question really is, is there a better way of accomplishing this:

所以问题是,有没有更好的方法来实现这个目标

BLOCK=$(echo "$BLOCK" | sed $j's/.*/'$CHECK'/')

Apparently, the goal is to replace line $j of the value of $BLOCK with the value of $CHECK. One way to do this, using awk:

显然,目标是用$CHECK的值替换$块的$j。有一种方法,使用awk:

BLOCK="$(awk -v repl="$CHECK" 'NR==$j{print repl;next}1')"

Notes:

  1. Although I didn't fix it in my example, it is very bad style to use ALL CAPS for shell variables. Normally, shell variables in ALL CAPS are reserved for use as known exported variables by bash or system utilities (eg. $PATH; $IFS; $TERM; etc.). Your own variables should be lower-case to avoid conflicts.

    虽然我没有在我的例子中修复它,但是使用所有的shell变量帽是非常糟糕的风格。通常情况下,所有大写的shell变量都是通过bash或系统实用程序(例如,)保留的,以作为已知的导出变量。美元的道路;如果美元;美元的术语;等等)。您自己的变量应该是小写的,以避免冲突。

  2. The full loop that the command is excerpted from could probably be all implemented more efficiently and more cleanly (and more understandably) in awk. Based on the sample output, the following would probably work:

    命令所摘录的完整循环可能在awk中更高效、更干净(也更可以理解)。根据示例输出,可能会有以下工作:

    echo "Generating conversations..."
    i=0
    while 
        tshark -r */browser.pcap -q -z follow,tcp,ascii,$i |
        awk -v idx=$i -v '
          NR==4 { n = split($0, a, /:/); port = a[n]; }
          NR<6  { next; }
          /^=========/ { exit port != 0; }
          port  { print "$"
                  printf "tag = \"gen.%d\"" idx
                  print "port = \"%s\"" port
                  print "base = \"TCP\""
                  print "payloads:"
                  port = 0
                }
          /^\t/ { printf "\"RES:%s\"" substr($0, 2) "\""; next; }
                { printf "\"REQ:%s\"" $0 "\""; }
        ' >> $OUT;
    do
        echo "Generated conversation "$i
    done
    echo "Generation complete!"
    

    I didn't try it. It may well be buggy. I don't understand the termination condition, so I just made a guess. I'm not sure if you really meant to extract the port number from line 5 (as in the code) or line 4 (as in the example.)

    我没有试一试。很可能是马车。我不了解终止条件,所以我只是猜测。我不确定您是否真的打算从第5行(如代码)或第4行(如示例中)提取端口号。