瘦服务器进程挂在100%CPU处,似乎是一个正则表达式循环。我在哪里可以获得更多调试信息?

时间:2021-02-01 22:05:04

I have a gdb backtrace on it that yields this:

我有一个gdb backtrace,它产生了这个:

#0  match_at (reg=0xcce4a00, 
    str=0xd47b101 "206193045.1297252703.66.40.utmcsr=sendmail|utmccn=52%%20off|utmcmd=email|utmctr=View%20this|utmcct=52%%20off", end=0xd47b1a6 "", 
    sstart=0xd47b101 "206193045.1297252703.66.40.utmcsr=sendmail|utmccn=52%%20off|utmcmd=email|utmctr=View%20this|utmcct=52%%20off", 
    sprev=0xd47b131 "52%%20off|utmcmd=email|utmctr=View%20this|utmcct=52%%20off", 
    msa=0x7fff7bc66870) at regexec.c:2433
#1  0x00002b785390329c in onig_search (reg=0xcce4a00, 
    str=0xd47b101 "206193045.1297252703.66.40.utmcsr=sendmail|utmccn=52%%20off|utmcmd=email|utmctr=View%20this|utmcct=52%%20off", end=0xd47b1a6 "", start=<value optimized out>, 
    range=0xd47b102 "06193045.1297252703.66.40.utmcsr=sendmail|utmccn=52%%20off|utmcmd=email|utmctr=View%20this|utmcct=52%%20off", region=0x7fff7bc66990, option=0) at regexec.c:3646
#2  0x00002b78538ef51e in rb_reg_search (re=137570880, str=218457520, pos=0, reverse=0) at re.c:1372
#3  0x00002b78538efc42 in rb_reg_match (re=214846549, str=218457520) at re.c:2740
#4  0x00002b7853971c36 in vm_exec_core (th=0x8329020, initial=<value optimized out>) at insns.def:2089
#5  0x00002b7853979b99 in vm_exec (th=0xcce4c55) at vm.c:1147
#6  0x00002b7853970cd9 in invoke_block_from_c (th=0x8329020, block=0x2b78572fea38, self=174397400, argc=1, argv=0x0, blockptr=0x0, cref=0x0)
    at vm.c:558
#7  0x00002b785397b037 in vm_yield (val=218457520) at vm.c:588
#8  rb_yield_0 (val=218457520) at vm_eval.c:740
#9  rb_yield (val=218457520) at vm_eval.c:750
#10 0x00002b7853840926 in rb_ary_collect (ary=218457680) at array.c:2166
#11 0x00002b78539716aa in vm_call_cfunc (th=0x8329020, cfp=0x2b78572fea10, num=0, blockptr=0x2b78572fea39, flag=0, id=1544, me=0x83e0ac0, 
    recv=218457680) at vm_insnhelper.c:402
#12 vm_call_method (th=0x8329020, cfp=0x2b78572fea10, num=0, blockptr=0x2b78572fea39, flag=0, id=1544, me=0x83e0ac0, recv=218457680)
    at vm_insnhelper.c:524
#13 0x00002b785397438c in vm_exec_core (th=0x8329020, initial=<value optimized out>) at insns.def:1006
#14 0x00002b7853979b99 in vm_exec (th=0xcce4c55) at vm.c:1147
#15 0x00002b7853970cd9 in invoke_block_from_c (th=0x8329020, block=0x2b78572feb40, self=174397400, argc=1, argv=0x0, blockptr=0x0, cref=0x0)
    at vm.c:558
#16 0x00002b785397b037 in vm_yield (val=218460560) at vm.c:588
#17 rb_yield_0 (val=218460560) at vm_eval.c:740
#18 rb_yield (val=218460560) at vm_eval.c:750
#19 0x00002b785383bc9c in rb_ary_each (ary=218460960) at array.c:1427
#20 0x00002b78539716aa in vm_call_cfunc (th=0x8329020, cfp=0x2b78572feb18, num=0, blockptr=0x2b78572feb41, flag=0, id=424, me=0x83dfb90, 
    recv=218460960) at vm_insnhelper.c:402
#21 vm_call_method (th=0x8329020, cfp=0x2b78572feb18, num=0, blockptr=0x2b78572feb41, flag=0, id=424, me=0x83dfb90, recv=218460960)
    at vm_insnhelper.c:524
#22 0x00002b785397438c in vm_exec_core (th=0x8329020, initial=<value optimized out>) at insns.def:1006
#23 0x00002b7853979b99 in vm_exec (th=0xcce4c55) at vm.c:1147
#24 0x00002b785396b4e5 in vm_call0 (th=0x8329020, recv=180469600, id=384, argc=2, argv=0x7fff7bc67990, me=0xd47b130) at vm_eval.c:66
#25 0x00002b785396d354 in vm_method_missing (th=0x8329020, id=1513486, recv=180469600, num=<value optimized out>, blockptr=0x0, opt=0)
---Type <return> to continue, or q <return> to quit---
    at vm_insnhelper.c:448
#26 0x00002b78539715e3 in vm_call_method (th=0x8329020, cfp=0x0, num=1, blockptr=0x0, flag=0, id=5912, me=0x0, recv=180469600)
    at vm_insnhelper.c:666
#27 0x00002b785397438c in vm_exec_core (th=0x8329020, initial=<value optimized out>) at insns.def:1006
#28 0x00002b7853979b99 in vm_exec (th=0xcce4c55) at vm.c:1147
#29 0x00002b7853970cd9 in invoke_block_from_c (th=0x8329020, block=0x2b78572ffa60, self=218506160, argc=1, argv=0x0, blockptr=0x0, cref=0x0)
    at vm.c:558
#30 0x00002b785397ad01 in rb_yield_0 (tag=30793742, data=<value optimized out>) at vm.c:588
#31 catch_i (tag=30793742, data=<value optimized out>) at vm_eval.c:1459
#32 0x00002b7853968483 in rb_catch_obj (tag=30793742, func=0x2b785397acc0 <catch_i>, data=0) at vm_eval.c:1534
#33 0x00002b785396978d in rb_f_catch (argc=<value optimized out>, argv=<value optimized out>) at vm_eval.c:1510
#34 0x00002b78539716aa in vm_call_cfunc (th=0x8329020, cfp=0x2b78572ffa38, num=1, blockptr=0x2b78572ffa61, flag=8, id=2840, me=0x83b2250, 
    recv=218506160) at vm_insnhelper.c:402
#35 vm_call_method (th=0x8329020, cfp=0x2b78572ffa38, num=1, blockptr=0x2b78572ffa61, flag=8, id=2840, me=0x83b2250, recv=218506160)
    at vm_insnhelper.c:524
#36 0x00002b785397438c in vm_exec_core (th=0x8329020, initial=<value optimized out>) at insns.def:1006
#37 0x00002b7853979b99 in vm_exec (th=0xcce4c55) at vm.c:1147
#38 0x00002b785396b4e5 in vm_call0 (th=0x8329020, recv=218506160, id=40296, argc=1, argv=0x7fff7bc685e0, me=0xd47b130) at vm_eval.c:66
#39 0x00002b785396bf9a in rb_call0 (recv=218506160, mid=40296, n=1) at vm_eval.c:235
#40 rb_call (recv=218506160, mid=40296, n=1) at vm_eval.c:438
#41 rb_funcall (recv=218506160, mid=40296, n=1) at vm_eval.c:638
#42 0x00002aaaac565bb6 in event_callback_wrapper (a1=<value optimized out>, a2=<value optimized out>, a3=<value optimized out>, 
    a4=<value optimized out>) at rubymain.cpp:162
#43 0x00002aaaac555688 in ConnectionDescriptor::_DispatchInboundData (this=0xca4a060, buffer=0x7fff7bc654e0 "\001", size=1402620608)
    at ed.cpp:770
#44 0x00002aaaac55571f in ConnectionDescriptor::Read (this=0xca4a060) at ed.cpp:718
#45 0x00002aaaac55e969 in EventMachine_t::_RunEpollOnce (this=0xc874530) at em.cpp:488
#46 0x00002aaaac55ec46 in EventMachine_t::_RunOnce (this=0xcce4c55) at em.cpp:451
#47 0x00002aaaac55ec93 in EventMachine_t::Run (this=0xc874530) at em.cpp:432
#48 0x00002aaaac565629 in t_run_machine_without_threads (self=214846549) at rubymain.cpp:185
#49 0x00002b78539716aa in vm_call_cfunc (th=0x8329020, cfp=0x2b78572ffbf0, num=0, blockptr=0x1, flag=24, id=39400, me=0x9055d80, 
    recv=139822160) at vm_insnhelper.c:402
#50 vm_call_method (th=0x8329020, cfp=0x2b78572ffbf0, num=0, blockptr=0x1, flag=24, id=39400, me=0x9055d80, recv=139822160)
    at vm_insnhelper.c:524
#51 0x00002b785397438c in vm_exec_core (th=0x8329020, initial=<value optimized out>) at insns.def:1006
#52 0x00002b7853979b99 in vm_exec (th=0xcce4c55) at vm.c:1147
#53 0x00002b7853979ff7 in rb_iseq_eval (iseqval=145595680) at vm.c:1374
#54 0x00002b785386d741 in rb_load_internal (fname=145597920, wrap=<value optimized out>) at load.c:294
#55 0x00002b785386d8a1 in rb_f_load (argc=<value optimized out>, argv=<value optimized out>) at load.c:367
#56 0x00002b78539716aa in vm_call_cfunc (th=0x8329020, cfp=0x2b78572fff08, num=1, blockptr=0x1, flag=8, id=5944, me=0x8421cd0, 
    recv=137928280) at vm_insnhelper.c:402
#57 vm_call_method (th=0x8329020, cfp=0x2b78572fff08, num=1, blockptr=0x1, flag=8, id=5944, me=0x8421cd0, recv=137928280)
---Type <return> to continue, or q <return> to quit---
    at vm_insnhelper.c:524
#58 0x00002b785397438c in vm_exec_core (th=0x8329020, initial=<value optimized out>) at insns.def:1006
#59 0x00002b7853979b99 in vm_exec (th=0xcce4c55) at vm.c:1147
#60 0x00002b7853979e74 in rb_iseq_eval_main (iseqval=145661040) at vm.c:1388
#61 0x00002b785386ae52 in ruby_exec_internal (n=0x8ae9c70) at eval.c:214
#62 0x00002b785386ae79 in ruby_exec_node (n=0x8ae9c70) at eval.c:261
#63 0x00002b785386d1bf in ruby_run_node (n=0x8ae9c70) at eval.c:254
#64 0x00000000004008ef in main (argc=10, argv=0x7fff7bc6df78) at main.c:35

What I can deduce from this:

我可以从中推断出什么:

  • It's hanging on a regular expression performed on something like 52%%20off|utmcmd=email|utmctr=View%20this|utmcct=52%%20off, which appears to be a bastardized form of a query string (any insights on why it's pipes rather than ampersands?). I don't know what the regex they're looking for is, though (any way I can find it?).
  • 它挂在正常表达式上,例如52 %% 20off | utmcmd = email | utmctr =查看%20this | utmcct = 52 %% 20off,这似乎是一个混淆形式的查询字符串(任何关于它为什么是管道的见解而不是&符号?)。我不知道他们正在寻找的正则表达式是什么(我能以任何方式找到它吗?)。

  • It's getting through thin/eventmachine just fine to the rails stack just fine because at #42, it seems to be initializing an event_callback_wrapper, which I take to mean that it's handing it off to the next step in Rack.
  • 它正好通过瘦/事件机器很好地完成了轨道堆栈,因为在#42,它似乎正在初始化一个event_callback_wrapper,我认为这意味着它将它交给Rack的下一步。

And a weird thing:

还有一件奇怪的事:

  • netstat doesn't list any outstanding connections, and nginx logs don't show any requests, successful, unsuccessful, or abandoned, with the query string implied by the string that shows up in the backtrace.
  • netstat不会列出任何未完成的连接,并且nginx日志不显示任何请求,成功,不成功或放弃,查询字符串隐含在回溯中显示的字符串。

Other things I've tried:

我试过的其他事情:

I've tried just going into gdb and nexting a bunch of times, and it just goes in a loopish manner. I've also tried using hijack, but I couldn't find anything useful to do.

我已经尝试过进入gdb并且连续多次使用,它只是以循环方式进行。我也试过使用劫持,但我找不到任何有用的东西。

Things that might be useful that I don't know how to do or whether they are possible:

可能有用的东西,我不知道该怎么做或是否可能:

  • Get an actual ruby code stack.
  • 获取一个实际的ruby代码堆栈。

  • Figure out what's calling the regex.
  • 找出正在使用正则表达式的内容。

  • Nailing down what the regex actually is and what it's being matched against.
  • 确定正则表达式实际上是什么以及它与之匹配的内容。

Any other advice or whatnot would be greatly appreciated.

任何其他建议或诸如此类的东西将不胜感激。

3 个解决方案

#1


4  

For the record, we just had and resolved that same issue (also using gdb, which was the only place we saw any indication of this). The problem is in lib/rack/backports/uri/common.rb, and varies depending on the version of Rack you're using. For the record, we were using Rack 1.3.3, and migrated to 1.3.6 to correct it.

为了记录,我们刚刚解决了同样的问题(也使用了gdb,这是我们看到任何迹象的唯一地方)。问题出在lib / rack / backports / uri / common.rb中,具体取决于您使用的Rack版本。为了记录,我们使用Rack 1.3.3,并迁移到1.3.6来纠正它。

In short, this catastrophic backtracking bug is part of Rack versions up to 1.3.3, and is properly fixed for both 1.8.7 and 1.9.2+ starting at Rack 1.3.4. The string the OP pasted above is the Google Analytics Cookie, and the campaign name in that cookie is "52% off", encoded here as "52%%20off". The solitary, unescaped "%" character in that string (in this, case, followed by the legal percent sign of a URI-encoded space, "%20") triggers a regex designed to catch malformed URI-escaped strings (paste into irb for best results).

简而言之,这个灾难性的回溯错误是Rack版本高达1.3.3的一部分,并且从Rack 1.3.4开始适用于1.8.7和1.9.2+。上面粘贴的OP字符串是Google Analytics Cookie,该Cookie中的广告系列名称为“52%off”,此处编码为“52 %% 20off”。该字符串中单独的,未转义的“%”字符(在此情况下,后跟URI编码空间的合法百分号,“%20”)触发一个用于捕获格式错误的URI转义字符串的正则表达式(粘贴到irb中)为了最好的结果)。

/\A(?:%[0-9a-fA-F]{2}|[^%]+)*\z/

The embedded hunt for [^%]+ inside the (?:)* block drives your process to distraction, and the replacement appears to work much better:

嵌入式搜索(?:)*块中的[^%] +会使您的进程分散注意力,并且替换效果似乎更好:

/\A[^%]*(?:%\h\h[^%]*)*\z/ 

Important to note: The most catastrophic effect of this is on older Rack instances, which will hang and eat up all your CPU, bringing down your box. The "corrected" effect, in which the desired error is raised, can result in the simple failure of your user's request. Good for servers, bad for visitors. You can target specific strings in your cookies and scrub the adjacent percent signs in your Apache configuration as follows:

需要注意的重要事项是:最具灾难性的影响是旧的Rack实例,它会挂起并占用你的所有CPU,从而降低你的盒子。 “纠正”效果会引发所需的错误,可能导致用户请求的简单失败。适合服务器,对访客不利。您可以定位Cookie中的特定字符串,并按如下方式清除Apache配置中的相邻百分号:

RequestHeader edit Cookie "%%" "Percent%"

That ought to be good enough.

这应该足够好了。

#2


1  

Let me guess: Rack 1.3.0? There's a catastrophic backtracking regex bug in that version. Upgrade to 1.3.1 or later.

让我猜一下:机架1.3.0?该版本中存在灾难性的回溯正则表达式错误。升级到1.3.1或更高版本。

#3


0  

I wound up being unable to solve the problem. I finally figured out how to pull together a Ruby trace though and nailed down the stuck infinite loop to lib/ruby/1.9.1/uri/common.rb:778:indecode_www_form_component'`.

我结束了无法解决问题。我终于想出了如何将Ruby跟踪结合起来并将卡住的无限循环钉在lib / ruby​​ / 1.9.1 / uri / common.rb:778:indecode_www_form_component'。

I switched back to 1.8.7 and all was fine again.

我换回1.8.7,一切都很好。

#1


4  

For the record, we just had and resolved that same issue (also using gdb, which was the only place we saw any indication of this). The problem is in lib/rack/backports/uri/common.rb, and varies depending on the version of Rack you're using. For the record, we were using Rack 1.3.3, and migrated to 1.3.6 to correct it.

为了记录,我们刚刚解决了同样的问题(也使用了gdb,这是我们看到任何迹象的唯一地方)。问题出在lib / rack / backports / uri / common.rb中,具体取决于您使用的Rack版本。为了记录,我们使用Rack 1.3.3,并迁移到1.3.6来纠正它。

In short, this catastrophic backtracking bug is part of Rack versions up to 1.3.3, and is properly fixed for both 1.8.7 and 1.9.2+ starting at Rack 1.3.4. The string the OP pasted above is the Google Analytics Cookie, and the campaign name in that cookie is "52% off", encoded here as "52%%20off". The solitary, unescaped "%" character in that string (in this, case, followed by the legal percent sign of a URI-encoded space, "%20") triggers a regex designed to catch malformed URI-escaped strings (paste into irb for best results).

简而言之,这个灾难性的回溯错误是Rack版本高达1.3.3的一部分,并且从Rack 1.3.4开始适用于1.8.7和1.9.2+。上面粘贴的OP字符串是Google Analytics Cookie,该Cookie中的广告系列名称为“52%off”,此处编码为“52 %% 20off”。该字符串中单独的,未转义的“%”字符(在此情况下,后跟URI编码空间的合法百分号,“%20”)触发一个用于捕获格式错误的URI转义字符串的正则表达式(粘贴到irb中)为了最好的结果)。

/\A(?:%[0-9a-fA-F]{2}|[^%]+)*\z/

The embedded hunt for [^%]+ inside the (?:)* block drives your process to distraction, and the replacement appears to work much better:

嵌入式搜索(?:)*块中的[^%] +会使您的进程分散注意力,并且替换效果似乎更好:

/\A[^%]*(?:%\h\h[^%]*)*\z/ 

Important to note: The most catastrophic effect of this is on older Rack instances, which will hang and eat up all your CPU, bringing down your box. The "corrected" effect, in which the desired error is raised, can result in the simple failure of your user's request. Good for servers, bad for visitors. You can target specific strings in your cookies and scrub the adjacent percent signs in your Apache configuration as follows:

需要注意的重要事项是:最具灾难性的影响是旧的Rack实例,它会挂起并占用你的所有CPU,从而降低你的盒子。 “纠正”效果会引发所需的错误,可能导致用户请求的简单失败。适合服务器,对访客不利。您可以定位Cookie中的特定字符串,并按如下方式清除Apache配置中的相邻百分号:

RequestHeader edit Cookie "%%" "Percent%"

That ought to be good enough.

这应该足够好了。

#2


1  

Let me guess: Rack 1.3.0? There's a catastrophic backtracking regex bug in that version. Upgrade to 1.3.1 or later.

让我猜一下:机架1.3.0?该版本中存在灾难性的回溯正则表达式错误。升级到1.3.1或更高版本。

#3


0  

I wound up being unable to solve the problem. I finally figured out how to pull together a Ruby trace though and nailed down the stuck infinite loop to lib/ruby/1.9.1/uri/common.rb:778:indecode_www_form_component'`.

我结束了无法解决问题。我终于想出了如何将Ruby跟踪结合起来并将卡住的无限循环钉在lib / ruby​​ / 1.9.1 / uri / common.rb:778:indecode_www_form_component'。

I switched back to 1.8.7 and all was fine again.

我换回1.8.7,一切都很好。