I am attempting to "grep" out bind for a specific user from an LDAP log file. The lines I need will be spread across multiple lines in the log. Here is example input:


[2009/04/28 17:04:42.414] DoBind on connection 0x7c8affc0
[2009/04/28 17:04:42.414] Bind name:cn=admin,ou=appids,o=admineq, version:3, authentication:simple
[2009/04/28 17:04:42.415] Failed to authenticate local on connection 0x6cc8ee80, err = log account expired (-220)
[2009/04/28 17:04:42.416] Sending operation result 53:"":"NDS error: log account expired (-220)" to connection 0x6cc8ee80
[2009/04/28 17:04:42.416] Operation 0x3:0x60 on connection 0x6cc8ee80 completed in 3 seconds
[2009/04/28 17:04:42.416] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:42.416] Operation 0x1:0x60 on connection 0x7c8affc0 completed in 0 seconds
[2009/04/28 17:04:48.772] DoSearch on connection 0x7c8affc0
[2009/04/28 17:04:48.772] Search request:
base: "o=intranet"
scope:2  dereference:0  sizelimit:0  timelimit:600  attrsonly:0
filter: "(guid='03ADmin)"
attribute: "cn"
attribute: "cn"
attribute: "cn"
attribute: "cn"
attribute: "objectClass"
attribute: "guid"
attribute: "mail"
[2009/04/28 17:04:48.773] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:48.773] Operation 0xe851:0x63 on connection 0x7c8affc0 completed in 0 seconds

For this example the following should be the result:


[2009/04/28 17:04:42.414] DoBind on connection 0x7c8affc0
[2009/04/28 17:04:42.414] Bind name:cn=admin,ou=appids,o=admineq, version:3, authentication:simple
[2009/04/28 17:04:42.416] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:42.416] Operation 0x1:0x60 on connection 0x7c8affc0 completed in 0 seconds

Basically, this is a log of server operations across multiple connections. I need to analyze the time spent in 'bind' operations by the admin user, but this server is very busy so I need to eliminate a lot of noise.


In pseudocode:

for each line in file
    if line contains "DoBind" and next line contains "cn=admin"
        print both lines
        find the connection number X in lines
        skip lines until "Sending operation result.*to connection X" is found
        print two lines

I would like to get the "DoBind" lines which are preceeded by the user "cn=admin" and then the result lines, which are listed according to the connection number "0x7c8affc0" in this example. Other operations may take place between the beginning and end of the bind which I do not need, such as the "Failed to authenticate" message, which is taking place on a different connection.

我想得到“DoBind”行,其前面是用户“cn = admin”,然后是结果行,在本例中根据连接号“0x7c8affc0”列出。其他操作可能发生在我不需要的绑定的开始和结束之间,例如“无法验证”消息,该消息发生在不同的连接上。

Furthermore, other operations will take place on the connection after the bind is done which I'm not interested in. In the above, the results of the DoSearch operation happening after the 'bind' must not be captured.


I'm trying to do this with 'sed', which seemed like the right tool for the job. Alas, though, I'm a beginner and this is a learning experience. Here's what I have so far:


/.*DoBind on connection \(0x[0-9a-f]*\)\n.*Bind name:cn=OblixAppId.*/ p
/.*Sending operation result.*to connection \1\nOperation.*on connection \1 completed.*/ p

sed complains about the second line where I use '\1'. I'm trying to capture the connection address and use it in a subsequent search to capture the result strings, but I'm obviously not using it correctly. The '#' variables seem to be local to each search operation.

sed抱怨我使用'\ 1'的第二行。我正在尝试捕获连接地址并在后续搜索中使用它来捕获结果字符串,但我显然没有正确使用它。 '#'变量似乎是每个搜索操作的本地变量。

Is there a way to pass "variables" from one search to another or should I be learning perl instead?


As an intellectual challenge, I have come up with a solution using sed (as requested), but I would say that using some other technology (perl in my favorite) would be more easy to understand, and hence easier to support.


You have a couple of options where is comes to multi-line processing in sed:


  • you can use the hold space - which can be used to store all or part of the pattern space for subsequent processing, or
  • 您可以使用保留空间 - 可用于存储全部或部分模式空间以供后续处理,或者

  • you can append further lines to the pattern space using commands like N.


    you can either use the hold space


Note: the example below uses GNU sed. It can additionally be made to work with Solaris sed by changing the multi-command syntax (';' replaced with ). I have used the GNU sed variation to make the script more compact.

注意:下面的例子使用GNU sed。另外,可以通过更改多命令语法(';'替换为)来使用Solaris sed。我使用GNU sed变体来使脚本更紧凑。

The script below is commented, for the reader's benefit and mine.


sed -n '
# if we see the line "DoBind" then store the pattern in the hold space
/DoBind/ h

# if we see the line "cn=admin", append the pattern to the holdspace
# and branch to dobind
/cn=admin/{H;b dobind}

# if we see the pattern "Sending...." append the hold space to the
# pattern and  branch to doop
/Sending operation result/{G;b doop}

# branch to the end of the script

# we have just seen a cn=admin, ad the hold space contains the last
# two lines

# swap hold space with pattern space

# print out the pattern space

# strip off everying that is not the connection identifier
s/^.*connection //

# put it in the hold space

# branch to end of script.

# have just seen "Sending operation" and the current stored connection
#identifier has been appended to the pattern space

# does the connection id on both lines match? Yes do to gotop.
/connection \(0x[0-9a-f]*\).*\n\1$/ b gotop

# branch to end of script

# pattern contains two lines "Sending....", and the connection id.

# delete the second line

# read the next line and append it to the pattern space.

# print it out

# clear the pattern space, and put it into the hold space - hence
# clearing the hold space



You're going to want to look closely at a sed reference if you want it in one pass - you could certainly do it. Look into the sed commands that swap the hold and pattern buffers, and compare the two. You could write a multi-step rule that matches "cn=admin", and swaps it to the hold buffer, and then match the "DoBind" pattern when the hold buffer is not empty.

如果你想要一次通过,你会想要仔细查看一个sed参考 - 你当然可以这样做。查看交换保持和模式缓冲区的sed命令,并比较两者。您可以编写与“cn = admin”匹配的多步规则,并将其交换到保持缓冲区,然后在保持缓冲区不为空时匹配“DoBind”模式。

I can't remember the commands offhand, but it's not terribly complicated; you'll just need to look it up in the reference documentation.



fgrep -B1 cn=admin logfile | 
sed -n 's/.*DoBind on connection \(.*\)/\1/p' | 
fgrep -wf - logfile

This first fgrep extracts the Bind line and the previous line (-B1), the sed pulls out the connection number and the final fgrep finds all lines that contain one of the connection numbers.


This is a two pass solution, a one pass is possible but more complicated to implement.


Edit: Here's a solution that does what you want in python. Note however, that this is not fully correct as it won't handle interleaved log lines between different connections correctly - I'll leave it up to you if you care enough to fix it. It's also a bit inefficient, and does more regex compiles and matches than necessary.

编辑:这是一个在python中做你想要的解决方案。但请注意,这并不完全正确,因为它无法正确处理不同连接之间的交错日志行 - 如果您足够关心它,我会留给您。它也有点低效,并且比正常编译和匹配更多。

import re

todo = set()
display_next = False
previous_dobind = None

for line in open('logfile'):
  line = line.strip()
  if display_next:
    print line
    display_next = False
  dobind ='DoBind on connection (.*)', line)
  bind ='Bind name:cn=admin', line)
  oper ='Sending operation result.*to connection (.*)', line)
  if dobind:
    previous_dobind = (dobind.groups(1), line)
  elif previous_dobind:
    if bind:
      print previous_dobind[1]
      print line
    previous_dobind = None
  elif oper:
    conn = oper.groups(1)
    if conn in todo:
      print line
      display_next = True


Well, I couldn't find a solution with sed alone. Here's my ugly perl solution:


open INFILE, $ARGV[0] or die "Couldn't open file $ARGV[0]";
while (<INFILE>) {
  if (/(.*DoBind on connection (0x[0-9a-f]*))/) {
    $potentialmatch = $1; $connid = $2;
    $currentline = <INFILE>;
    if ($currentline =~ /(.*Bind name:cn=OblixAppId.*)/) {
      print $potentialmatch . "\n" . $1 . "\n";
      $offset = tell INFILE;
      while($currentline = <INFILE>) {
        if ($currentline =~ /(.*Sending operation result.*to connection $connid.*)/) {
          print "$1\n";
        if ($currentline =~ /(.*Operation.*on connection $connid completed.*)/) {
          print  "$1\n";
          seek INFILE, $offset, 0;


