As a side project I'm currently writing a server for an age-old game I used to play. I'm trying to make the server as loosely coupled as possible, but I am wondering what would be a good design decision for multithreading. Currently I have the following sequence of actions:


  • Startup (creates) ->
  • 启动(创建) - >

  • Server (listens for clients, creates) ->
  • 服务器(侦听客户端,创建) - >

  • Client (listens for commands and sends period data)
  • 客户端(侦听命令并发送周期数据)

I'm assuming an average of 100 clients, as that was the max at any given time for the game. What would be the right decision as for threading of the whole thing? My current setup is as follows:


  • 1 thread on the server which listens for new connections, on new connection create a client object and start listening again.
  • 服务器上的1个线程侦听新连接,在新连接上创建客户端对象并再次开始侦听。

  • Client object has one thread, listening for incoming commands and sending periodic data. This is done using a non-blocking socket, so it simply checks if there's data available, deals with that and then sends messages it has queued. Login is done before the send-receive cycle is started.
  • 客户端对象有一个线程,侦听传入命令并发送定期数据。这是使用非阻塞套接字完成的,因此它只检查是否有可用的数据,处理该数据然后发送已排队的消息。登录在发送 - 接收周期开始之前完成。

  • One thread (for now) for the game itself, as I consider that to be separate from the whole client-server part, architecturally speaking.
  • 一个线程(现在)用于游戏本身,因为我认为它与整个客户端 - 服务器部分是分开的,从架构上讲。

This would result in a total of 102 threads. I am even considering giving the client 2 threads, one for sending and one for receiving. If I do that, I can use blocking I/O on the receiver thread, which means that thread will be mostly idle in an average situation.

这将导致总共102个线程。我甚至考虑给客户端2个线程,一个用于发送,一个用于接收。如果我这样做,我可以在接收器线程上使用阻塞I / O,这意味着线程在平均情况下将大部分处于空闲状态。

My main concern is that by using this many threads I'll be hogging resources. I'm not worried about race conditions or deadlocks, as that's something I'll have to deal with anyway.


My design is setup in such a way that I could use a single thread for all client communications, no matter if it's 1 or 100. I've separated the communications logic from the client object itself, so I could implement it without having to rewrite a lot of code.


The main question is: is it wrong to use over 200 threads in an application? Does it have advantages? I'm thinking about running this on a multi-core machine, would it take a lot of advantage of multiple cores like this?



Out of all these threads, most of them will be blocked usually. I don't expect connections to be over 5 per minute. Commands from the client will come in infrequently, I'd say 20 per minute on average.


Going by the answers I get here (the context switching was the performance hit I was thinking about, but I didn't know that until you pointed it out, thanks!) I think I'll go for the approach with one listener, one receiver, one sender, and some miscellaneous stuff ;-)


I write in .NET and I'm not sure if the way I code is due to .NET limitations and their API design or if this is a standard way of doing things, but this is how I've done this kind of thing in the past:


  • A queue object that will be used for processing incoming data. This should be sync locked between the queuing thread and worker thread to avoid race conditions.


  • A worker thread for processing data in the queue. The thread that queues up the data queue uses semaphore to notify this thread to process items in the queue. This thread will start itself before any of the other threads and contain a continuous loop that can run until it receives a shut down request. The first instruction in the loop is a flag to pause/continue/terminate processing. The flag will be initially set to pause so that the thread sits in an idle state (instead of looping continuously) while there is no processing to be done. The queuing thread will change the flag when there are items in the queue to be processed. This thread will then process a single item in the queue on each iteration of the loop. When the queue is empty it will set the flag back to pause so that on the next iteration of the loop it will wait until the queuing process notifies it that there is more work to be done.


  • One connection listener thread which listens for incoming connection requests and passes these off to...


  • A connection processing thread that creates the connection/session. Having a separate thread from your connection listener thread means that you're reducing the potential for missed connection requests due to reduced resources while that thread is processing requests.


  • An incoming data listener thread that listens for incoming data on the current connection. All data is passed off to a queuing thread to be queued up for processing. Your listener threads should do as little as possible outside of basic listening and passing the data off for processing.


  • A queuing thread that queues up the data in the right order so everything can be processed correctly, this thread raises the semaphore to the processing queue to let it know there's data to be processed. Having this thread separate from the incoming data listener means that you're less likely to miss incoming data.


  • Some session object which is passed between methods so that each user's session is self contained throughout the threading model.


This keeps threads down to as simple but as robust a model as I've figured out. I would love to find a simpler model than this, but I've found that if I try and reduce the threading model any further, that I start missing data on the network stream or miss connection requests.


It also assists with TDD (Test Driven Development) such that each thread is processing a single task and is much easier to code tests for. Having hundreds of threads can quickly become a resource allocation nightmare, while having a single thread becomes a maintenance nightmare.


It's far simpler to keep one thread per logical task the same way you would have one method per task in a TDD environment and you can logically separate what each should be doing. It's easier to spot potential problems and far easier to fix them.




use an event stream/queue and a thread pool to maintain the balance; this will adapt better to other machines which may have more or less cores


in general, many more active threads than you have cores will waste time context-switching


if your game consists of a lot of short actions, a circular/recycling event queue will give better performance than a fixed number of threads




To answer the question simply, it is entirely wrong to use 200 threads on today's hardware.


Each thread takes up 1 MB of memory, so you're taking up 200MB of page file before you even start doing anything useful.

每个线程占用1 MB内存,因此在开始执行任何有用的操作之前,您将占用200MB的页面文件。

By all means break your operations up into little pieces that can be safely run on any thread, but put those operations on queues and have a fixed, limited number of worker threads servicing those queues.


Update: Does wasting 200MB matter? On a 32-bit machine, it's 10% of the entire theoretical address space for a process - no further questions. On a 64-bit machine, it sounds like a drop in the ocean of what could be theoretically available, but in practice it's still a very big chunk (or rather, a large number of pretty big chunks) of storage being pointlessly reserved by the application, and which then has to be managed by the OS. It has the effect of surrounding each client's valuable information with lots of worthless padding, which destroys locality, defeating the OS and CPU's attempts to keep frequently accessed stuff in the fastest layers of cache.

更新:浪费200MB重要吗?在32位机器上,它占整个理论地址空间的10% - 没有其他问题。在一台64位的机器上,它听起来像是理论上可用的海洋中的一滴,但实际上它仍然是一个非常大的存储块(或者更确切地说,大量相当大的存储块)被毫无意义地保留在应用程序,然后必须由操作系统管理。它具有围绕每个客户端的有价值信息的效果,其中包含大量无用的填充,这会破坏本地性,破坏操作系统和CPU尝试将频繁访问的内容保存在最快的缓存层中。

In any case, the memory wastage is just one part of the insanity. Unless you have 200 cores (and an OS capable of utilizing) then you don't really have 200 parallel threads. You have (say) 8 cores, each frantically switching between 25 threads. Naively you might think that as a result of this, each thread experiences the equivalent of running on a core that is 25 times slower. But it's actually much worse than that - the OS spends more time taking one thread off a core and putting another one on it ("context switching") than it does actually allowing your code to run.

无论如何,内存浪费只是精神错乱的一部分。除非你有200个核心(以及一个能够利用的操作系统),否则你真的没有200个并行线程。你有(比方说)8个核心,每个核心在25个线程之间疯狂切换。天真的你可能会认为,由于这个原因,每个线程都会遇到运行速度慢25倍的核心。但它实际上要比这更糟糕 - 操作系统花费更多的时间从核心中取出一个线程并在其上放置另一个线程(“上下文切换”),而不是实际允许代码运行。

Just look at how any well-known successful design tackles this kind of problem. The CLR's thread pool (even if you're not using it) serves as a fine example. It starts off assuming just one thread per core will be sufficient. It allows more to be created, but only to ensure that badly designed parallel algorithms will eventually complete. It refuses to create more than 2 threads per second, so it effectively punishes thread-greedy algorithms by slowing them down.

只要看看任何知名的成功设计如何解决这类问题。 CLR的线程池(即使你没有使用它)也是一个很好的例子。假设每个核心只有一个线程就足够了。它允许创建更多,但只是为了确保最终完成设计糟糕的并行算法。它拒绝每秒创建超过2个线程,因此它通过减慢线程贪婪算法来有效地惩罚它们。



What's your platform? If Windows then I'd suggest looking at async operations and thread pools (or I/O Completion Ports directly if you're working at the Win32 API level in C/C++).

你的平台是什么?如果Windows然后我建议查看异步操作和线程池(如果您在C / C ++中使用Win32 API级别,则直接查看I / O完成端口)。

The idea is that you have a small number of threads that deal with your I/O and this makes your system capable of scaling to large numbers of concurrent connections because there's no relationship between the number of connections and the number of threads used by the process that is serving them. As expected, .Net insulates you from the details and Win32 doesn't.

这个想法是你有少量的线程处理你的I / O,这使你的系统能够扩展到大量的并发连接,因为连接数和进程使用的线程数之间没有关系这是为他们服务。正如所料,.Net将您与详细信息隔离开来,而Win32则没有。

The challenge of using async I/O and this style of server is that the processing of client requests becomes a state machine on the server and the data arriving triggers changes of state. Sometimes this takes some getting used to but once you do it's really rather marvellous;)

使用异步I / O和这种类型的服务器的挑战是客户端请求的处理成为服务器上的状态机,并且到达的数据触发状态的改变。有时这需要一些习惯,但一旦你这样做真的相当奇妙;)

I've got some free code that demonstrates various server designs in C++ using IOCP here.

我有一些免费的代码,用于演示使用IOCP的C ++中的各种服务器设计。

If you're using unix or need to be cross platform and you're in C++ then you might want to look at boost ASIO which provides async I/O functionality.

如果您正在使用unix或需要跨平台而且您使用的是C ++,那么您可能需要查看提供异步I / O功能的boost ASIO。



I think the question you should be asking is not if 200 as a general thread number is good or bad, but rather how many of those threads are going to be active.


If only several of them are active at any given moment, while all the others are sleeping or waiting or whatnot, then you're fine. Sleeping threads, in this context, cost you nothing.


However if all of those 200 threads are active, you're going to have your CPU wasting so much time doing thread context switches between all those ~200 threads.




