对于10000个客户/秒问题的解决方案是否有现代的回顾

时间:2021-01-08 00:17:52

(Commonly called the C10K problem)

(通常称为C10K问题)

Is there a more contemporary review of solutions to the c10k problem (Last updated: 2 Sept 2006), specifically focused on Linux (epoll, signalfd, eventfd, timerfd..) and libraries like libev or libevent?

对于c10k问题的解决方案(最近更新:2006年9月2日)是否有更现代的回顾,特别是针对Linux (epoll, signalfd, eventfd, timerfd.. .)和libev或libevent之类的库?

Something that discusses all the solved and still unsolved issues on a modern Linux server?

在现代Linux服务器上讨论所有已解决和仍未解决的问题?

7 个解决方案

#1


11  

The C10K problem generally assumes you're trying to optimize a single server, but as your referenced article points out "hardware is no longer the bottleneck". Therefore, the first step to take is to make sure it isn't easiest and cheapest to just throw more hardware in the mix.

C10K问题通常假定您正在尝试优化单个服务器,但是正如您引用的文章所指出的“硬件不再是瓶颈”。因此,要采取的第一步是确保将更多的硬件混合在一起并不是最容易和最便宜的。

If we've got a $500 box serving X clients per second, it's a lot more efficient to just buy another $500 box to double our throughput instead of letting an employee gobble up who knows how many hours and dollars trying to figure out how squeeze more out of the original box. Of course, that's assuming our app is multi-server friendly, that we know how to load balance, etc, etc...

如果我们有500美元每秒X客户端用的箱子,这是一个更有效就买一箱500美元加倍吞吐量而不是让一个员工吞并谁知道多少小时和美元试图找出如何挤出更多的原始盒子。当然,这是假设我们的应用是多服务器友好的,我们知道如何装载平衡,等等……

#2


10  

Coincidentally, just a few days ago, Programming Reddit or maybe Hacker News mentioned this piece:

巧合的是,就在几天前,Reddit或Hacker News的编程人员提到了这篇文章:

Thousands of Threads and Blocking IO

成千上万的线程和阻塞IO

In the early days of Java, my C programming friends laughed at me for doing socket IO with blocking threads; at the time, there was no alternative. These days, with plentiful memory and processors it appears to be a viable strategy.

在Java的早期,我的C编程朋友嘲笑我用阻塞线程做socket IO;当时,别无选择。如今,由于内存和处理器充足,这似乎是一种可行的策略。

The article is dated 2008, so it pulls your horizon up by a couple of years.

这篇文章的日期是2008年,所以它让你的视野开阔了几年。

#3


5  

To answer OP's question, you could say that today the equivalent document is not about optimizing a single server for load, but optimizing your entire online service for load. From that perspective, the number of combinations is so large that what you are looking for is not a document, it is a live website that collects such architectures and frameworks. Such a website exists and its called www.highscalability.com

要回答OP的问题,您可以说,今天的等效文档不是优化单个服务器以获取负载,而是优化整个在线服务以获取负载。从这个角度来看,组合的数量是如此之多,以至于您所寻找的不是一个文档,而是一个收集此类架构和框架的实时网站。这样的网站存在,并被称为www.highscalability.com

Side Note 1:

注1:

I'd argue against the belief that throwing more hardware at it is a long term solution:

我反对那种认为向它投入更多硬件是一个长期解决方案的观点:

  • Perhaps the cost of an engineer that "gets" performance is high compared to the cost of a single server. What happens when you scale out? Lets say you have 100 servers. A 10 percent improvement in server capacity can save you 10 servers a month.

    与单个服务器相比,“获得”性能的工程师的成本可能很高。当你扩大规模时会发生什么?假设您有100台服务器。服务器容量提高10%可以每月节省10台服务器。

  • Even if you have just two machines, you still need to handle performance spikes. The difference between a service that degrades gracefully under load and one that breaks down is that someone spent time optimizing for the load scenario.

    即使您只有两台机器,您仍然需要处理性能峰值。在负载下优雅地降级的服务和故障的服务之间的区别是,有人花时间对负载场景进行优化。

Side note 2:

注2:

The subject of this post is slightly misleading. The CK10 document does not try to solve the problem of 10k clients per second. (The number of clients per second is irrelevant unless you also define a workload along with sustained throughput under bounded latency. I think Dan Kegel was aware of this when he wrote that doc.). Look at it instead as a compendium of approaches to build concurrent servers, and micro-benchmarks for the same. Perhaps what has changed between then and now is that you could assume at one point of time that the service was for a website that served static pages. Today the service might be a noSQL datastore, a cache, a proxy or one of hundreds of network infrastructure software pieces.

这篇文章的主题有点误导人。CK10文档没有试图解决每秒10k个客户机的问题。(每秒客户机的数量是无关的,除非您还定义了一个工作负载以及有限制的延迟下的持续吞吐量。我认为丹·凯格尔(Dan Kegel)在写《博士》时就意识到了这一点。相反,可以将其视为构建并发服务器的方法的概要,以及构建相同服务器的微基准。也许在那时和现在之间发生的变化是,您可以假设服务是为一个服务于静态页面的网站的。现在,服务可能是noSQL数据存储、缓存、代理或数百个网络基础设施软件片段中的一个。

#4


2  

You can also take a look at this series of articles:

你也可以看看这一系列的文章:

http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3

http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3

He shows a fair amount of performance data and the OS configuration work he had to do in order to support 10K and then 1M connections.

他展示了相当多的性能数据和操作系统配置工作,以便支持10K连接和1M连接。

It seems like a system with 30GB of RAM could handle 1 million connected clients on a sort of social network type of simulation, using a libevent frontend to an Erlang based app server.

似乎一个拥有30GB RAM的系统可以通过一种社交网络类型的模拟来处理100万个连接客户端,使用的是基于Erlang的应用服务器的libevent前端。

#5


1  

libev runs some benchmarks against themselves and libevent...

libev对自己和libevent进行了一些基准测试……

#6


1  

I'd recommend Reading Zed Shaw's poll, epoll, science, and superpoll[1]. Why epoll isn't always the answer, and why sometimes it's even better to go with poll, and how to bring the best of both worlds.

我建议读一下萧伯纳的民意测验、epoll、科学和超级民意测验[1]。为什么epoll不总是答案,为什么有时它甚至更好地与民意调查,以及如何带来最好的两个世界。

[1] http://sheddingbikes.com/posts/1280829388.html

[1]http://sheddingbikes.com/posts/1280829388.html

#7


1  

Have a look at the RamCloud project at Stanford: https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud

看看斯坦福的RamCloud项目:https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud

Their goal is 1,000,000 RPC operations/sec/server. They have numerous benchmarks and commentary on the bottlenecks that are present in a system which would prevent them from reaching their throughput goals.

他们的目标是1,000,000 RPC操作/sec/服务器。他们对系统中存在的瓶颈进行了大量的评测和评论,这些瓶颈会阻止他们实现吞吐量目标。

#1


11  

The C10K problem generally assumes you're trying to optimize a single server, but as your referenced article points out "hardware is no longer the bottleneck". Therefore, the first step to take is to make sure it isn't easiest and cheapest to just throw more hardware in the mix.

C10K问题通常假定您正在尝试优化单个服务器,但是正如您引用的文章所指出的“硬件不再是瓶颈”。因此,要采取的第一步是确保将更多的硬件混合在一起并不是最容易和最便宜的。

If we've got a $500 box serving X clients per second, it's a lot more efficient to just buy another $500 box to double our throughput instead of letting an employee gobble up who knows how many hours and dollars trying to figure out how squeeze more out of the original box. Of course, that's assuming our app is multi-server friendly, that we know how to load balance, etc, etc...

如果我们有500美元每秒X客户端用的箱子,这是一个更有效就买一箱500美元加倍吞吐量而不是让一个员工吞并谁知道多少小时和美元试图找出如何挤出更多的原始盒子。当然,这是假设我们的应用是多服务器友好的,我们知道如何装载平衡,等等……

#2


10  

Coincidentally, just a few days ago, Programming Reddit or maybe Hacker News mentioned this piece:

巧合的是,就在几天前,Reddit或Hacker News的编程人员提到了这篇文章:

Thousands of Threads and Blocking IO

成千上万的线程和阻塞IO

In the early days of Java, my C programming friends laughed at me for doing socket IO with blocking threads; at the time, there was no alternative. These days, with plentiful memory and processors it appears to be a viable strategy.

在Java的早期,我的C编程朋友嘲笑我用阻塞线程做socket IO;当时,别无选择。如今,由于内存和处理器充足,这似乎是一种可行的策略。

The article is dated 2008, so it pulls your horizon up by a couple of years.

这篇文章的日期是2008年,所以它让你的视野开阔了几年。

#3


5  

To answer OP's question, you could say that today the equivalent document is not about optimizing a single server for load, but optimizing your entire online service for load. From that perspective, the number of combinations is so large that what you are looking for is not a document, it is a live website that collects such architectures and frameworks. Such a website exists and its called www.highscalability.com

要回答OP的问题,您可以说,今天的等效文档不是优化单个服务器以获取负载,而是优化整个在线服务以获取负载。从这个角度来看,组合的数量是如此之多,以至于您所寻找的不是一个文档,而是一个收集此类架构和框架的实时网站。这样的网站存在,并被称为www.highscalability.com

Side Note 1:

注1:

I'd argue against the belief that throwing more hardware at it is a long term solution:

我反对那种认为向它投入更多硬件是一个长期解决方案的观点:

  • Perhaps the cost of an engineer that "gets" performance is high compared to the cost of a single server. What happens when you scale out? Lets say you have 100 servers. A 10 percent improvement in server capacity can save you 10 servers a month.

    与单个服务器相比,“获得”性能的工程师的成本可能很高。当你扩大规模时会发生什么?假设您有100台服务器。服务器容量提高10%可以每月节省10台服务器。

  • Even if you have just two machines, you still need to handle performance spikes. The difference between a service that degrades gracefully under load and one that breaks down is that someone spent time optimizing for the load scenario.

    即使您只有两台机器,您仍然需要处理性能峰值。在负载下优雅地降级的服务和故障的服务之间的区别是,有人花时间对负载场景进行优化。

Side note 2:

注2:

The subject of this post is slightly misleading. The CK10 document does not try to solve the problem of 10k clients per second. (The number of clients per second is irrelevant unless you also define a workload along with sustained throughput under bounded latency. I think Dan Kegel was aware of this when he wrote that doc.). Look at it instead as a compendium of approaches to build concurrent servers, and micro-benchmarks for the same. Perhaps what has changed between then and now is that you could assume at one point of time that the service was for a website that served static pages. Today the service might be a noSQL datastore, a cache, a proxy or one of hundreds of network infrastructure software pieces.

这篇文章的主题有点误导人。CK10文档没有试图解决每秒10k个客户机的问题。(每秒客户机的数量是无关的,除非您还定义了一个工作负载以及有限制的延迟下的持续吞吐量。我认为丹·凯格尔(Dan Kegel)在写《博士》时就意识到了这一点。相反,可以将其视为构建并发服务器的方法的概要,以及构建相同服务器的微基准。也许在那时和现在之间发生的变化是,您可以假设服务是为一个服务于静态页面的网站的。现在,服务可能是noSQL数据存储、缓存、代理或数百个网络基础设施软件片段中的一个。

#4


2  

You can also take a look at this series of articles:

你也可以看看这一系列的文章:

http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3

http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3

He shows a fair amount of performance data and the OS configuration work he had to do in order to support 10K and then 1M connections.

他展示了相当多的性能数据和操作系统配置工作,以便支持10K连接和1M连接。

It seems like a system with 30GB of RAM could handle 1 million connected clients on a sort of social network type of simulation, using a libevent frontend to an Erlang based app server.

似乎一个拥有30GB RAM的系统可以通过一种社交网络类型的模拟来处理100万个连接客户端,使用的是基于Erlang的应用服务器的libevent前端。

#5


1  

libev runs some benchmarks against themselves and libevent...

libev对自己和libevent进行了一些基准测试……

#6


1  

I'd recommend Reading Zed Shaw's poll, epoll, science, and superpoll[1]. Why epoll isn't always the answer, and why sometimes it's even better to go with poll, and how to bring the best of both worlds.

我建议读一下萧伯纳的民意测验、epoll、科学和超级民意测验[1]。为什么epoll不总是答案,为什么有时它甚至更好地与民意调查,以及如何带来最好的两个世界。

[1] http://sheddingbikes.com/posts/1280829388.html

[1]http://sheddingbikes.com/posts/1280829388.html

#7


1  

Have a look at the RamCloud project at Stanford: https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud

看看斯坦福的RamCloud项目:https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud

Their goal is 1,000,000 RPC operations/sec/server. They have numerous benchmarks and commentary on the bottlenecks that are present in a system which would prevent them from reaching their throughput goals.

他们的目标是1,000,000 RPC操作/sec/服务器。他们对系统中存在的瓶颈进行了大量的评测和评论,这些瓶颈会阻止他们实现吞吐量目标。