sed：在多行中使用变量

I am attempting to "grep" out bind for a specific user from an LDAP log file. The lines I need will be spread across multiple lines in the log. Here is example input:

我试图从LDAP日志文件“grep”出特定用户的绑定。我需要的行将分布在日志中的多行。这是示例输入:

[2009/04/28 17:04:42.414] DoBind on connection 0x7c8affc0
[2009/04/28 17:04:42.414] Bind name:cn=admin,ou=appids,o=admineq, version:3, authentication:simple
[2009/04/28 17:04:42.415] Failed to authenticate local on connection 0x6cc8ee80, err = log account expired (-220)
[2009/04/28 17:04:42.416] Sending operation result 53:"":"NDS error: log account expired (-220)" to connection 0x6cc8ee80
[2009/04/28 17:04:42.416] Operation 0x3:0x60 on connection 0x6cc8ee80 completed in 3 seconds
[2009/04/28 17:04:42.416] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:42.416] Operation 0x1:0x60 on connection 0x7c8affc0 completed in 0 seconds
[2009/04/28 17:04:48.772] DoSearch on connection 0x7c8affc0
[2009/04/28 17:04:48.772] Search request:
base: "o=intranet"
scope:2  dereference:0  sizelimit:0  timelimit:600  attrsonly:0
filter: "(guid='03ADmin)"
attribute: "cn"
attribute: "cn"
attribute: "cn"
attribute: "cn"
attribute: "objectClass"
attribute: "guid"
attribute: "mail"
[2009/04/28 17:04:48.773] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:48.773] Operation 0xe851:0x63 on connection 0x7c8affc0 completed in 0 seconds

For this example the following should be the result:

对于此示例,应该是以下结果:

[2009/04/28 17:04:42.414] DoBind on connection 0x7c8affc0
[2009/04/28 17:04:42.414] Bind name:cn=admin,ou=appids,o=admineq, version:3, authentication:simple
[2009/04/28 17:04:42.416] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:42.416] Operation 0x1:0x60 on connection 0x7c8affc0 completed in 0 seconds

Basically, this is a log of server operations across multiple connections. I need to analyze the time spent in 'bind' operations by the admin user, but this server is very busy so I need to eliminate a lot of noise.

基本上,这是跨多个连接的服务器操作的日志。我需要分析管理员用户在“绑定”操作上花费的时间,但是这个服务器非常繁忙所以我需要消除很多噪音。

In pseudocode:

for each line in file
    if line contains "DoBind" and next line contains "cn=admin"
        print both lines
        find the connection number X in lines
        skip lines until "Sending operation result.*to connection X" is found
        print two lines

I would like to get the "DoBind" lines which are preceeded by the user "cn=admin" and then the result lines, which are listed according to the connection number "0x7c8affc0" in this example. Other operations may take place between the beginning and end of the bind which I do not need, such as the "Failed to authenticate" message, which is taking place on a different connection.

我想得到“DoBind”行,其前面是用户“cn = admin”,然后是结果行,在本例中根据连接号“0x7c8affc0”列出。其他操作可能发生在我不需要的绑定的开始和结束之间,例如“无法验证”消息,该消息发生在不同的连接上。

Furthermore, other operations will take place on the connection after the bind is done which I'm not interested in. In the above, the results of the DoSearch operation happening after the 'bind' must not be captured.

此外,在完成绑定之后,对连接将进行其他操作,这是我不感兴趣的。在上面,不能捕获在“绑定”之后发生的DoSearch操作的结果。

I'm trying to do this with 'sed', which seemed like the right tool for the job. Alas, though, I'm a beginner and this is a learning experience. Here's what I have so far:

我试图用'sed'做这个,这似乎是适合这项工作的工具。不过,不过,我是初学者,这是一次学习经历。这是我到目前为止所拥有的:

/.*DoBind on connection \(0x[0-9a-f]*\)\n.*Bind name:cn=OblixAppId.*/ p
/.*Sending operation result.*to connection \1\nOperation.*on connection \1 completed.*/ p

sed complains about the second line where I use '\1'. I'm trying to capture the connection address and use it in a subsequent search to capture the result strings, but I'm obviously not using it correctly. The '#' variables seem to be local to each search operation.

sed抱怨我使用'\ 1'的第二行。我正在尝试捕获连接地址并在后续搜索中使用它来捕获结果字符串,但我显然没有正确使用它。 '#'变量似乎是每个搜索操作的本地变量。

Is there a way to pass "variables" from one search to another or should I be learning perl instead?

有没有办法将“变量”从一个搜索传递到另一个搜索,或者我应该学习perl吗?

4 个解决方案

#1

As an intellectual challenge, I have come up with a solution using sed (as requested), but I would say that using some other technology (perl in my favorite) would be more easy to understand, and hence easier to support.

作为一项智力挑战,我已经提出了一个使用sed(按要求)的解决方案,但我会说使用其他技术(我最喜欢的perl)会更容易理解,因此更容易支持。

You have a couple of options where is comes to multi-line processing in sed:

你有几个选择,在sed中进行多行处理:

you can use the hold space - which can be used to store all or part of the pattern space for subsequent processing, or

您可以使用保留空间 - 可用于存储全部或部分模式空间以供后续处理,或者

you can append further lines to the pattern space using commands like N.

您可以使用N之类的命令将更多行附加到模式空间。

you can either use the hold space

你可以使用保留空间

Note: the example below uses GNU sed. It can additionally be made to work with Solaris sed by changing the multi-command syntax (';' replaced with ). I have used the GNU sed variation to make the script more compact.

注意:下面的例子使用GNU sed。另外,可以通过更改多命令语法(';'替换为)来使用Solaris sed。我使用GNU sed变体来使脚本更紧凑。

The script below is commented, for the reader's benefit and mine.

下面的脚本被评论,为了读者的利益和我的。

sed -n '
# if we see the line "DoBind" then store the pattern in the hold space
/DoBind/ h

# if we see the line "cn=admin", append the pattern to the holdspace
# and branch to dobind
/cn=admin/{H;b dobind}

# if we see the pattern "Sending...." append the hold space to the
# pattern and  branch to doop
/Sending operation result/{G;b doop}

# branch to the end of the script
b

# we have just seen a cn=admin, ad the hold space contains the last
# two lines
:dobind

# swap hold space with pattern space
x

# print out the pattern space
p

# strip off everying that is not the connection identifier
s/^.*connection //
s/\n.*$//

# put it in the hold space
x

# branch to end of script.
b

# have just seen "Sending operation" and the current stored connection
#identifier has been appended to the pattern space
:doop

# does the connection id on both lines match? Yes do to gotop.
/connection \(0x[0-9a-f]*\).*\n\1$/ b gotop

# branch to end of script
b

# pattern contains two lines "Sending....", and the connection id.
:gotop

# delete the second line
s/\n.*$//

# read the next line and append it to the pattern space.
N

# print it out
p

# clear the pattern space, and put it into the hold space - hence
# clearing the hold space
s/^.*$//
x

#2

You're going to want to look closely at a sed reference if you want it in one pass - you could certainly do it. Look into the sed commands that swap the hold and pattern buffers, and compare the two. You could write a multi-step rule that matches "cn=admin", and swaps it to the hold buffer, and then match the "DoBind" pattern when the hold buffer is not empty.

如果你想要一次通过,你会想要仔细查看一个sed参考 - 你当然可以这样做。查看交换保持和模式缓冲区的sed命令,并比较两者。您可以编写与“cn = admin”匹配的多步规则,并将其交换到保持缓冲区,然后在保持缓冲区不为空时匹配“DoBind”模式。

I can't remember the commands offhand, but it's not terribly complicated; you'll just need to look it up in the reference documentation.

我不记得这些命令,但它并不是非常复杂;你只需要在参考文档中查找它。

#3

fgrep -B1 cn=admin logfile | 
sed -n 's/.*DoBind on connection \(.*\)/\1/p' | 
fgrep -wf - logfile

This first fgrep extracts the Bind line and the previous line (-B1), the sed pulls out the connection number and the final fgrep finds all lines that contain one of the connection numbers.

第一个fgrep提取绑定行和前一行(-B1),sed拉出连接号,最后的fgrep找到包含其中一个连接号的所有行。

This is a two pass solution, a one pass is possible but more complicated to implement.

这是一个两遍解决方案,一次通过是可能的,但实现起来更复杂。

Edit: Here's a solution that does what you want in python. Note however, that this is not fully correct as it won't handle interleaved log lines between different connections correctly - I'll leave it up to you if you care enough to fix it. It's also a bit inefficient, and does more regex compiles and matches than necessary.

编辑:这是一个在python中做你想要的解决方案。但请注意,这并不完全正确,因为它无法正确处理不同连接之间的交错日志行 - 如果您足够关心它,我会留给您。它也有点低效,并且比正常编译和匹配更多。

import re

todo = set()
display_next = False
previous_dobind = None

for line in open('logfile'):
  line = line.strip()
  if display_next:
    print line
    display_next = False
    continue
  dobind = re.search('DoBind on connection (.*)', line)
  bind = re.search('Bind name:cn=admin', line)
  oper = re.search('Sending operation result.*to connection (.*)', line)
  if dobind:
    previous_dobind = (dobind.groups(1), line)
  elif previous_dobind:
    if bind:
      todo.add(previous_dobind[0])
      print previous_dobind[1]
      print line
    previous_dobind = None
  elif oper:
    conn = oper.groups(1)
    if conn in todo:
      print line
      display_next = True
      todo.remove(conn)

#4

Well, I couldn't find a solution with sed alone. Here's my ugly perl solution:

好吧,我找不到单独使用sed的解决方案。这是我丑陋的perl解决方案:

open INFILE, $ARGV[0] or die "Couldn't open file $ARGV[0]";
while (<INFILE>) {
  if (/(.*DoBind on connection (0x[0-9a-f]*))/) {
    $potentialmatch = $1; $connid = $2;
    $currentline = <INFILE>;
    if ($currentline =~ /(.*Bind name:cn=OblixAppId.*)/) {
      print $potentialmatch . "\n" . $1 . "\n";
      $offset = tell INFILE;
      while($currentline = <INFILE>) {
        if ($currentline =~ /(.*Sending operation result.*to connection $connid.*)/) {
          print "$1\n";
          next;
        }
        if ($currentline =~ /(.*Operation.*on connection $connid completed.*)/) {
          print  "$1\n";
          seek INFILE, $offset, 0;
          last;
        }
      }
    }
  }
}

#1