Ruby on Rails -缓慢加载和在垃圾收集器中花费大量时间

时间:2021-05-13 23:31:35

I've got a large Rails app and I'm looking to improve (dismal) performance.

我有一个大型的Rails应用程序,我希望改进(差劲的)性能。

Running with ruby-prof doesn't help me much, I get output similar to this (running in production mode on thin):

跟ruby-prof一起运行没有多大帮助,我得到了类似的输出(在thin上以生产模式运行):

Thread ID: 9322800
Total: 1.607768
Sort by: self_time

 %self     total     self     wait    child    calls   name
 26.03      0.42     0.42     0.00     0.00     1657   Module#define_method 
  8.03      0.13     0.13     0.00     0.00      267   Set#initialize 
  4.41      0.07     0.07     0.00     0.00       44   PG::Result#values 
  4.28      0.07     0.07     0.00     0.00     1926   ActiveSupport::Callbacks::Callback#start 
  4.21      0.07     0.07     0.00     0.00    14835   Kernel#hash 
  4.13      0.08     0.07     0.00     0.01      469   Module#redefine_method 
  4.11      0.07     0.07     0.00     0.00       63  *<Class::ActiveRecord::Base>#with_scope 
  4.02      0.07     0.06     0.00     0.00      774   ActiveSupport::Callbacks::Callback#_compile_options 
  3.24      0.05     0.05     0.00     0.00       30   PG::Connection#async_exec 
  2.31      0.40     0.04     0.00     0.37     2130  *Module#class_eval 
  1.47      0.02     0.02     0.00     0.00        6   PG::Connection#unescape_bytea 
  1.03      0.05     0.02     0.00     0.03      390  *Array#select 

* indicates recursively called methods

I guessed that maybe it is spending a lot of time in the garbage collector so since I'm running on REE I decided to try using GC.enable_stats to get some more information. I added the following to my application controller:

我猜想它可能花了很多时间在垃圾回收器上,所以由于我正在运行REE,所以我决定尝试使用GC。enable_stats获取更多信息。我在我的应用程序控制器中添加了以下内容:

around_filter :enable_gc_stats

private

def enable_gc_stats
  GC.enable_stats

  begin
    yield
  ensure
    GC.disable_stats
    GC.clear_stats
  end
end

On a relatively large page running on my machine here in production mode with REE and the thin webserver (ruby-prof disabled since it makes it a bit slower) I get:

在我的机器上运行的一个相对较大的页面上,使用REE和瘦的webserver (ruby-prof禁用了,因为它的运行速度稍慢),我得到:

Completed 200 OK in 1093ms (Views: 743.1ms | ActiveRecord: 139.2ms)

GC.collections: 11
GC.time: 666299 us 666.299 ms
GC.growth: 461 KB

GC.allocated_size: 152 MB
GC.num_allocations: 1,924,773
ObjectSpace.live_objects: 1,015,195
ObjectSpace.allocated_objects: 12,393,644

So for a page that took 1093 ms, it seems like almost 700ms was spend in the garbage collector. Has anybody had this kind of problem before? I realize you cannot help with my app in particular (it is quite big with a lot of gems and things) - but are there techniques or tools to get a better idea why so much garbage is being created?

所以对于一个需要1093毫秒的页面来说,似乎有将近700毫秒花在了垃圾收集器上。以前有人遇到过这种问题吗?我意识到你不能特别帮助我的应用(它有很多宝石和东西)——但是有没有技术或工具来更好地了解为什么会产生这么多垃圾?

Any ideas would be very much appreciated!

任何想法都将非常感谢!

1 个解决方案

#1


4  

Your rails log shows most of the time (75%) is spent in view code.

您的rails日志显示大部分时间(75%)花在了视图代码中。

Your profile report shows three obvious hotspots: Module#define_method for self time, Module#class_eval for total time, and Set#initialize.

您的概要报告显示了三个明显的热点:模块#define_method for self time,模块#class_eval for total time,设置#initialize。

define_method and class_eval indicate there's likely a lot of dynamic code execution which seems excessive to me -- generally you want to generate that code early and reuse it instead of repeatadly re generating it. It almost certainly is part of the problem with your excessive object allocation issues. Producing a graph report instead of a flat report should help you find the parent methods which are falling into these expensive paths and that may give you a pointer to where you could optimize.

define_method和class_eval表明,在我看来,可能存在大量的动态代码执行,这可能有些过分——通常您希望尽早生成代码并重用它,而不是重复地重新生成它。几乎可以肯定,这是您过度的对象分配问题的一部分。生成一个图形报告而不是一个平面报告应该可以帮助您找到陷入这些昂贵路径的父方法,这可能会给您一个指针,您可以在那里进行优化。

Set#initialize may be a real artifact of what your code needs to do, or it might be a sign that there's some significant Set[...] or Set::new set creation calls inline which could be done once and assigned to a constant or instance/class var for reuse.

设置#初始化可能是您的代码需要做的一个真正的工件,或者它可能是有一些重要的集合的标志[……或Set::新的Set创建调用内联调用,可以一次完成,并分配给常量或实例/类var以便重用。

ruby-prof is ok, but you might want to also try perftools.rb which is easy to hook up to rack rails with rack-perftools_profiler. perftools has some enhanced visualization tools which can make it much easier to understand hot execution paths.

ruby-prof是可以的,但是您可能也想尝试性能工具。rb很容易与rack-perftools_profiler连接到机架上。性能工具有一些增强的可视化工具,可以使理解热执行路径更加容易。

Since you're running REE and extensive object allocation (and hence garbage collection) is an issue, you could try memprof to get some insight into what and where all these allocations are coming from.

由于您正在运行REE,而大量的对象分配(以及垃圾收集)是一个问题,您可以尝试memprof来了解所有这些分配来自何处以及来自何处。

If you can't find a path to reducing the amount of objects being allocated, you could ease the GC burden at the expense of larger process memory size by tuning the GC to prealloc a heap large enough to hold a typical request's allocation demands. Unicorn offers a rack module for out of band GC. You might be able to adapt this module's approach to work with thin and move all the GC time to between requests -- you'll still pay the cpu cost, but at least you won't delay your responses for garbage collection.

如果您无法找到减少被分配对象数量的路径,那么可以通过将GC调优为预先分配足够大的堆来容纳典型请求的分配需求,从而以牺牲更大的进程内存大小来减轻GC的负担。Unicorn提供带外GC的机架模块。您可能可以使用这个模块的方法来处理瘦,并在请求之间移动所有的GC时间——您仍然需要支付cpu成本,但至少不会延迟垃圾收集的响应。

#1


4  

Your rails log shows most of the time (75%) is spent in view code.

您的rails日志显示大部分时间(75%)花在了视图代码中。

Your profile report shows three obvious hotspots: Module#define_method for self time, Module#class_eval for total time, and Set#initialize.

您的概要报告显示了三个明显的热点:模块#define_method for self time,模块#class_eval for total time,设置#initialize。

define_method and class_eval indicate there's likely a lot of dynamic code execution which seems excessive to me -- generally you want to generate that code early and reuse it instead of repeatadly re generating it. It almost certainly is part of the problem with your excessive object allocation issues. Producing a graph report instead of a flat report should help you find the parent methods which are falling into these expensive paths and that may give you a pointer to where you could optimize.

define_method和class_eval表明,在我看来,可能存在大量的动态代码执行,这可能有些过分——通常您希望尽早生成代码并重用它,而不是重复地重新生成它。几乎可以肯定,这是您过度的对象分配问题的一部分。生成一个图形报告而不是一个平面报告应该可以帮助您找到陷入这些昂贵路径的父方法,这可能会给您一个指针,您可以在那里进行优化。

Set#initialize may be a real artifact of what your code needs to do, or it might be a sign that there's some significant Set[...] or Set::new set creation calls inline which could be done once and assigned to a constant or instance/class var for reuse.

设置#初始化可能是您的代码需要做的一个真正的工件,或者它可能是有一些重要的集合的标志[……或Set::新的Set创建调用内联调用,可以一次完成,并分配给常量或实例/类var以便重用。

ruby-prof is ok, but you might want to also try perftools.rb which is easy to hook up to rack rails with rack-perftools_profiler. perftools has some enhanced visualization tools which can make it much easier to understand hot execution paths.

ruby-prof是可以的,但是您可能也想尝试性能工具。rb很容易与rack-perftools_profiler连接到机架上。性能工具有一些增强的可视化工具,可以使理解热执行路径更加容易。

Since you're running REE and extensive object allocation (and hence garbage collection) is an issue, you could try memprof to get some insight into what and where all these allocations are coming from.

由于您正在运行REE,而大量的对象分配(以及垃圾收集)是一个问题,您可以尝试memprof来了解所有这些分配来自何处以及来自何处。

If you can't find a path to reducing the amount of objects being allocated, you could ease the GC burden at the expense of larger process memory size by tuning the GC to prealloc a heap large enough to hold a typical request's allocation demands. Unicorn offers a rack module for out of band GC. You might be able to adapt this module's approach to work with thin and move all the GC time to between requests -- you'll still pay the cpu cost, but at least you won't delay your responses for garbage collection.

如果您无法找到减少被分配对象数量的路径,那么可以通过将GC调优为预先分配足够大的堆来容纳典型请求的分配需求,从而以牺牲更大的进程内存大小来减轻GC的负担。Unicorn提供带外GC的机架模块。您可能可以使用这个模块的方法来处理瘦,并在请求之间移动所有的GC时间——您仍然需要支付cpu成本,但至少不会延迟垃圾收集的响应。