许多线程或尽可能少的线程?

时间:2021-02-11 20:58:43

As a side project I'm currently writing a server for an age-old game I used to play. I'm trying to make the server as loosely coupled as possible, but I am wondering what would be a good design decision for multithreading. Currently I have the following sequence of actions:

作为一个侧面项目,我正在为一个我以前玩过的古老游戏编写服务器。我试图让服务器尽可能松散耦合,但我想知道什么是多线程的好设计决策。目前我有以下一系列行动:

  • Startup (creates) ->
  • 启动(创建) - >

  • Server (listens for clients, creates) ->
  • 服务器(侦听客户端,创建) - >

  • Client (listens for commands and sends period data)
  • 客户端(侦听命令并发送周期数据)

I'm assuming an average of 100 clients, as that was the max at any given time for the game. What would be the right decision as for threading of the whole thing? My current setup is as follows:

我假设平均有100个客户,因为这是游戏任何给定时间的最大值。对于整个事物的线程化,什么是正确的决定?我目前的设置如下:

  • 1 thread on the server which listens for new connections, on new connection create a client object and start listening again.
  • 服务器上的1个线程侦听新连接,在新连接上创建客户端对象并再次开始侦听。

  • Client object has one thread, listening for incoming commands and sending periodic data. This is done using a non-blocking socket, so it simply checks if there's data available, deals with that and then sends messages it has queued. Login is done before the send-receive cycle is started.
  • 客户端对象有一个线程,侦听传入命令并发送定期数据。这是使用非阻塞套接字完成的,因此它只检查是否有可用的数据,处理该数据然后发送已排队的消息。登录在发送 - 接收周期开始之前完成。

  • One thread (for now) for the game itself, as I consider that to be separate from the whole client-server part, architecturally speaking.
  • 一个线程(现在)用于游戏本身,因为我认为它与整个客户端 - 服务器部分是分开的,从架构上讲。

This would result in a total of 102 threads. I am even considering giving the client 2 threads, one for sending and one for receiving. If I do that, I can use blocking I/O on the receiver thread, which means that thread will be mostly idle in an average situation.

这将导致总共102个线程。我甚至考虑给客户端2个线程,一个用于发送,一个用于接收。如果我这样做,我可以在接收器线程上使用阻塞I / O,这意味着线程在平均情况下将大部分处于空闲状态。

My main concern is that by using this many threads I'll be hogging resources. I'm not worried about race conditions or deadlocks, as that's something I'll have to deal with anyway.

我主要担心的是,通过使用这么多线程,我将占用资源。我并不担心竞争条件或死锁,因为这是我无论如何都要处理的事情。

My design is setup in such a way that I could use a single thread for all client communications, no matter if it's 1 or 100. I've separated the communications logic from the client object itself, so I could implement it without having to rewrite a lot of code.

我的设计是这样设置的,我可以使用单个线程进行所有客户端通信,无论它是1还是100.我已将通信逻辑与客户端对象本身分开,因此我可以实现它而无需重写很多代码。

The main question is: is it wrong to use over 200 threads in an application? Does it have advantages? I'm thinking about running this on a multi-core machine, would it take a lot of advantage of multiple cores like this?

主要问题是:在应用程序中使用超过200个线程是错误的吗?它有优势吗?我正在考虑在多核机器上运行它,它会像这样多个核心占用很多优势吗?

Thanks!


Out of all these threads, most of them will be blocked usually. I don't expect connections to be over 5 per minute. Commands from the client will come in infrequently, I'd say 20 per minute on average.

在所有这些线程中,大多数线程通常会被阻塞。我不认为连接速度超过每分钟5次。来自客户端的命令很少会出现,我平均说每分钟20个。

Going by the answers I get here (the context switching was the performance hit I was thinking about, but I didn't know that until you pointed it out, thanks!) I think I'll go for the approach with one listener, one receiver, one sender, and some miscellaneous stuff ;-)

按照我得到的答案(上下文切换是我想到的性能打击,但我不知道,直到你指出它,谢谢!)我想我会采用一个听众,一个方法接收器,一个寄件人和一些杂项;-)

5 个解决方案

#1


4  

I write in .NET and I'm not sure if the way I code is due to .NET limitations and their API design or if this is a standard way of doing things, but this is how I've done this kind of thing in the past:

我用.NET编写,我不确定我编码的方式是由于.NET限制和他们的API设计,还是这是一种标准的做事方式,但这就是我在做这种事情的方式。过去:

  • A queue object that will be used for processing incoming data. This should be sync locked between the queuing thread and worker thread to avoid race conditions.

    将用于处理传入数据的队列对象。这应该在排队线程和工作线程之间同步锁定以避免竞争条件。

  • A worker thread for processing data in the queue. The thread that queues up the data queue uses semaphore to notify this thread to process items in the queue. This thread will start itself before any of the other threads and contain a continuous loop that can run until it receives a shut down request. The first instruction in the loop is a flag to pause/continue/terminate processing. The flag will be initially set to pause so that the thread sits in an idle state (instead of looping continuously) while there is no processing to be done. The queuing thread will change the flag when there are items in the queue to be processed. This thread will then process a single item in the queue on each iteration of the loop. When the queue is empty it will set the flag back to pause so that on the next iteration of the loop it will wait until the queuing process notifies it that there is more work to be done.

    用于处理队列中数据的工作线程。排队数据队列的线程使用信号量来通知此线程处理队列中的项目。该线程将在任何其他线程之前启动,并包含一个可以运行的连续循环,直到它收到关闭请求。循环中的第一条指令是暂停/继续/终止处理的标志。该标志最初将设置为暂停,以便线程处于空闲状态(而不是连续循环),而不进行任何处理。当队列中有待处理的项目时,排队线程将更改标志。然后,该线程将在循环的每次迭代中处理队列中的单个项目。当队列为空时,它会将标志设置为暂停,这样在循环的下一次迭代中它将等待,直到排队过程通知它还有更多的工作要做。

  • One connection listener thread which listens for incoming connection requests and passes these off to...

    一个连接侦听器线程,它侦听传入的连接请求并将这些请求传递给...

  • A connection processing thread that creates the connection/session. Having a separate thread from your connection listener thread means that you're reducing the potential for missed connection requests due to reduced resources while that thread is processing requests.

    连接处理线程,用于创建连接/会话。从连接侦听器线程中获得一个单独的线程意味着,当该线程处理请求时,由于资源减少,您可能会减少错过连接请求的可能性。

  • An incoming data listener thread that listens for incoming data on the current connection. All data is passed off to a queuing thread to be queued up for processing. Your listener threads should do as little as possible outside of basic listening and passing the data off for processing.

    传入数据侦听器线程,用于侦听当前连接上的传入数据。所有数据都传递给排队线程以排队等待处理。您的侦听器线程应该在基本侦听之外尽可能少地执行,并将数据传递出去进行处理。

  • A queuing thread that queues up the data in the right order so everything can be processed correctly, this thread raises the semaphore to the processing queue to let it know there's data to be processed. Having this thread separate from the incoming data listener means that you're less likely to miss incoming data.

    排队线程以正确的顺序对数据进行排队,以便可以正确处理所有内容,该线程将信号量提升到处理队列,让它知道要处理的数据。将此线程与传入数据侦听器分开意味着您不太可能错过传入数据。

  • Some session object which is passed between methods so that each user's session is self contained throughout the threading model.

    在方法之间传递的一些会话对象,以便每个用户的会话在整个线程模型中自包含。

This keeps threads down to as simple but as robust a model as I've figured out. I would love to find a simpler model than this, but I've found that if I try and reduce the threading model any further, that I start missing data on the network stream or miss connection requests.

这使得线程变得简单,但是我已经想到了一个强大的模型。我希望找到比这更简单的模型,但我发现如果我尝试进一步减少线程模型,那我就开始丢失网络流上的数据或错过连接请求。

It also assists with TDD (Test Driven Development) such that each thread is processing a single task and is much easier to code tests for. Having hundreds of threads can quickly become a resource allocation nightmare, while having a single thread becomes a maintenance nightmare.

它还协助TDD(测试驱动开发),使每个线程处理单个任务,并且更容易编码测试。拥有数百个线程很快就会成为资源分配的噩梦,而拥有一个线程会成为维护的噩梦。

It's far simpler to keep one thread per logical task the same way you would have one method per task in a TDD environment and you can logically separate what each should be doing. It's easier to spot potential problems and far easier to fix them.

保持每个逻辑任务的一个线程与在TDD环境中为每个任务保留一个方法的方式相比要简单得多,并且您可以在逻辑上区分每个应该执行的操作。更容易发现潜在的问题,更容易解决它们。

#2


7  

use an event stream/queue and a thread pool to maintain the balance; this will adapt better to other machines which may have more or less cores

使用事件流/队列和线程池来维持平衡;这将更好地适应可能具有更多或更少核心的其他机器

in general, many more active threads than you have cores will waste time context-switching

通常,比核心更多的活动线程会浪费时间上下文切换

if your game consists of a lot of short actions, a circular/recycling event queue will give better performance than a fixed number of threads

如果您的游戏包含许多短操作,则循环/循环事件队列将提供比固定数量的线程更好的性能

#3


5  

To answer the question simply, it is entirely wrong to use 200 threads on today's hardware.

简单地回答这个问题,在今天的硬件上使用200个线程是完全错误的。

Each thread takes up 1 MB of memory, so you're taking up 200MB of page file before you even start doing anything useful.

每个线程占用1 MB内存,因此在开始执行任何有用的操作之前,您将占用200MB的页面文件。

By all means break your operations up into little pieces that can be safely run on any thread, but put those operations on queues and have a fixed, limited number of worker threads servicing those queues.

通过各种方法将操作分解为可以在任何线程上安全运行的小块,但是将这些操作放在队列上并且具有为这些队列服务的固定的,有限数量的工作线程。

Update: Does wasting 200MB matter? On a 32-bit machine, it's 10% of the entire theoretical address space for a process - no further questions. On a 64-bit machine, it sounds like a drop in the ocean of what could be theoretically available, but in practice it's still a very big chunk (or rather, a large number of pretty big chunks) of storage being pointlessly reserved by the application, and which then has to be managed by the OS. It has the effect of surrounding each client's valuable information with lots of worthless padding, which destroys locality, defeating the OS and CPU's attempts to keep frequently accessed stuff in the fastest layers of cache.

更新:浪费200MB重要吗?在32位机器上,它占整个理论地址空间的10% - 没有其他问题。在一台64位的机器上,它听起来像是理论上可用的海洋中的一滴,但实际上它仍然是一个非常大的存储块(或者更确切地说,大量相当大的存储块)被毫无意义地保留在应用程序,然后必须由操作系统管理。它具有围绕每个客户端的有价值信息的效果,其中包含大量无用的填充,这会破坏本地性,破坏操作系统和CPU尝试将频繁访问的内容保存在最快的缓存层中。

In any case, the memory wastage is just one part of the insanity. Unless you have 200 cores (and an OS capable of utilizing) then you don't really have 200 parallel threads. You have (say) 8 cores, each frantically switching between 25 threads. Naively you might think that as a result of this, each thread experiences the equivalent of running on a core that is 25 times slower. But it's actually much worse than that - the OS spends more time taking one thread off a core and putting another one on it ("context switching") than it does actually allowing your code to run.

无论如何,内存浪费只是精神错乱的一部分。除非你有200个核心(以及一个能够利用的操作系统),否则你真的没有200个并行线程。你有(比方说)8个核心,每个核心在25个线程之间疯狂切换。天真的你可能会认为,由于这个原因,每个线程都会遇到运行速度慢25倍的核心。但它实际上要比这更糟糕 - 操作系统花费更多的时间从核心中取出一个线程并在其上放置另一个线程(“上下文切换”),而不是实际允许代码运行。

Just look at how any well-known successful design tackles this kind of problem. The CLR's thread pool (even if you're not using it) serves as a fine example. It starts off assuming just one thread per core will be sufficient. It allows more to be created, but only to ensure that badly designed parallel algorithms will eventually complete. It refuses to create more than 2 threads per second, so it effectively punishes thread-greedy algorithms by slowing them down.

只要看看任何知名的成功设计如何解决这类问题。 CLR的线程池(即使你没有使用它)也是一个很好的例子。假设每个核心只有一个线程就足够了。它允许创建更多,但只是为了确保最终完成设计糟糕的并行算法。它拒绝每秒创建超过2个线程,因此它通过减慢线程贪婪算法来有效地惩罚它们。

#4


2  

What's your platform? If Windows then I'd suggest looking at async operations and thread pools (or I/O Completion Ports directly if you're working at the Win32 API level in C/C++).

你的平台是什么?如果Windows然后我建议查看异步操作和线程池(如果您在C / C ++中使用Win32 API级别,则直接查看I / O完成端口)。

The idea is that you have a small number of threads that deal with your I/O and this makes your system capable of scaling to large numbers of concurrent connections because there's no relationship between the number of connections and the number of threads used by the process that is serving them. As expected, .Net insulates you from the details and Win32 doesn't.

这个想法是你有少量的线程处理你的I / O,这使你的系统能够扩展到大量的并发连接,因为连接数和进程使用的线程数之间没有关系这是为他们服务。正如所料,.Net将您与详细信息隔离开来,而Win32则没有。

The challenge of using async I/O and this style of server is that the processing of client requests becomes a state machine on the server and the data arriving triggers changes of state. Sometimes this takes some getting used to but once you do it's really rather marvellous;)

使用异步I / O和这种类型的服务器的挑战是客户端请求的处理成为服务器上的状态机,并且到达的数据触发状态的改变。有时这需要一些习惯,但一旦你这样做真的相当奇妙;)

I've got some free code that demonstrates various server designs in C++ using IOCP here.

我有一些免费的代码,用于演示使用IOCP的C ++中的各种服务器设计。

If you're using unix or need to be cross platform and you're in C++ then you might want to look at boost ASIO which provides async I/O functionality.

如果您正在使用unix或需要跨平台而且您使用的是C ++,那么您可能需要查看提供异步I / O功能的boost ASIO。

#5


0  

I think the question you should be asking is not if 200 as a general thread number is good or bad, but rather how many of those threads are going to be active.

我认为你应该问的问题不是200作为一般线程数是好还是坏,而是这些线程中有多少是活跃的。

If only several of them are active at any given moment, while all the others are sleeping or waiting or whatnot, then you're fine. Sleeping threads, in this context, cost you nothing.

如果在任何特定时刻只有其中几个是活跃的,而其他所有人都在睡觉或等待或等等,那么你没事。在这种情况下,睡眠线程不需要任何费用。

However if all of those 200 threads are active, you're going to have your CPU wasting so much time doing thread context switches between all those ~200 threads.

但是,如果这200个线程中的所有线程都处于活动状态,那么您的CPU将耗费大量时间在所有这些~200个线程之间进行线程上下文切换。

#1


4  

I write in .NET and I'm not sure if the way I code is due to .NET limitations and their API design or if this is a standard way of doing things, but this is how I've done this kind of thing in the past:

我用.NET编写,我不确定我编码的方式是由于.NET限制和他们的API设计,还是这是一种标准的做事方式,但这就是我在做这种事情的方式。过去:

  • A queue object that will be used for processing incoming data. This should be sync locked between the queuing thread and worker thread to avoid race conditions.

    将用于处理传入数据的队列对象。这应该在排队线程和工作线程之间同步锁定以避免竞争条件。

  • A worker thread for processing data in the queue. The thread that queues up the data queue uses semaphore to notify this thread to process items in the queue. This thread will start itself before any of the other threads and contain a continuous loop that can run until it receives a shut down request. The first instruction in the loop is a flag to pause/continue/terminate processing. The flag will be initially set to pause so that the thread sits in an idle state (instead of looping continuously) while there is no processing to be done. The queuing thread will change the flag when there are items in the queue to be processed. This thread will then process a single item in the queue on each iteration of the loop. When the queue is empty it will set the flag back to pause so that on the next iteration of the loop it will wait until the queuing process notifies it that there is more work to be done.

    用于处理队列中数据的工作线程。排队数据队列的线程使用信号量来通知此线程处理队列中的项目。该线程将在任何其他线程之前启动,并包含一个可以运行的连续循环,直到它收到关闭请求。循环中的第一条指令是暂停/继续/终止处理的标志。该标志最初将设置为暂停,以便线程处于空闲状态(而不是连续循环),而不进行任何处理。当队列中有待处理的项目时,排队线程将更改标志。然后,该线程将在循环的每次迭代中处理队列中的单个项目。当队列为空时,它会将标志设置为暂停,这样在循环的下一次迭代中它将等待,直到排队过程通知它还有更多的工作要做。

  • One connection listener thread which listens for incoming connection requests and passes these off to...

    一个连接侦听器线程,它侦听传入的连接请求并将这些请求传递给...

  • A connection processing thread that creates the connection/session. Having a separate thread from your connection listener thread means that you're reducing the potential for missed connection requests due to reduced resources while that thread is processing requests.

    连接处理线程,用于创建连接/会话。从连接侦听器线程中获得一个单独的线程意味着,当该线程处理请求时,由于资源减少,您可能会减少错过连接请求的可能性。

  • An incoming data listener thread that listens for incoming data on the current connection. All data is passed off to a queuing thread to be queued up for processing. Your listener threads should do as little as possible outside of basic listening and passing the data off for processing.

    传入数据侦听器线程,用于侦听当前连接上的传入数据。所有数据都传递给排队线程以排队等待处理。您的侦听器线程应该在基本侦听之外尽可能少地执行,并将数据传递出去进行处理。

  • A queuing thread that queues up the data in the right order so everything can be processed correctly, this thread raises the semaphore to the processing queue to let it know there's data to be processed. Having this thread separate from the incoming data listener means that you're less likely to miss incoming data.

    排队线程以正确的顺序对数据进行排队,以便可以正确处理所有内容,该线程将信号量提升到处理队列,让它知道要处理的数据。将此线程与传入数据侦听器分开意味着您不太可能错过传入数据。

  • Some session object which is passed between methods so that each user's session is self contained throughout the threading model.

    在方法之间传递的一些会话对象,以便每个用户的会话在整个线程模型中自包含。

This keeps threads down to as simple but as robust a model as I've figured out. I would love to find a simpler model than this, but I've found that if I try and reduce the threading model any further, that I start missing data on the network stream or miss connection requests.

这使得线程变得简单,但是我已经想到了一个强大的模型。我希望找到比这更简单的模型,但我发现如果我尝试进一步减少线程模型,那我就开始丢失网络流上的数据或错过连接请求。

It also assists with TDD (Test Driven Development) such that each thread is processing a single task and is much easier to code tests for. Having hundreds of threads can quickly become a resource allocation nightmare, while having a single thread becomes a maintenance nightmare.

它还协助TDD(测试驱动开发),使每个线程处理单个任务,并且更容易编码测试。拥有数百个线程很快就会成为资源分配的噩梦,而拥有一个线程会成为维护的噩梦。

It's far simpler to keep one thread per logical task the same way you would have one method per task in a TDD environment and you can logically separate what each should be doing. It's easier to spot potential problems and far easier to fix them.

保持每个逻辑任务的一个线程与在TDD环境中为每个任务保留一个方法的方式相比要简单得多,并且您可以在逻辑上区分每个应该执行的操作。更容易发现潜在的问题,更容易解决它们。

#2


7  

use an event stream/queue and a thread pool to maintain the balance; this will adapt better to other machines which may have more or less cores

使用事件流/队列和线程池来维持平衡;这将更好地适应可能具有更多或更少核心的其他机器

in general, many more active threads than you have cores will waste time context-switching

通常,比核心更多的活动线程会浪费时间上下文切换

if your game consists of a lot of short actions, a circular/recycling event queue will give better performance than a fixed number of threads

如果您的游戏包含许多短操作,则循环/循环事件队列将提供比固定数量的线程更好的性能

#3


5  

To answer the question simply, it is entirely wrong to use 200 threads on today's hardware.

简单地回答这个问题,在今天的硬件上使用200个线程是完全错误的。

Each thread takes up 1 MB of memory, so you're taking up 200MB of page file before you even start doing anything useful.

每个线程占用1 MB内存,因此在开始执行任何有用的操作之前,您将占用200MB的页面文件。

By all means break your operations up into little pieces that can be safely run on any thread, but put those operations on queues and have a fixed, limited number of worker threads servicing those queues.

通过各种方法将操作分解为可以在任何线程上安全运行的小块,但是将这些操作放在队列上并且具有为这些队列服务的固定的,有限数量的工作线程。

Update: Does wasting 200MB matter? On a 32-bit machine, it's 10% of the entire theoretical address space for a process - no further questions. On a 64-bit machine, it sounds like a drop in the ocean of what could be theoretically available, but in practice it's still a very big chunk (or rather, a large number of pretty big chunks) of storage being pointlessly reserved by the application, and which then has to be managed by the OS. It has the effect of surrounding each client's valuable information with lots of worthless padding, which destroys locality, defeating the OS and CPU's attempts to keep frequently accessed stuff in the fastest layers of cache.

更新:浪费200MB重要吗?在32位机器上,它占整个理论地址空间的10% - 没有其他问题。在一台64位的机器上,它听起来像是理论上可用的海洋中的一滴,但实际上它仍然是一个非常大的存储块(或者更确切地说,大量相当大的存储块)被毫无意义地保留在应用程序,然后必须由操作系统管理。它具有围绕每个客户端的有价值信息的效果,其中包含大量无用的填充,这会破坏本地性,破坏操作系统和CPU尝试将频繁访问的内容保存在最快的缓存层中。

In any case, the memory wastage is just one part of the insanity. Unless you have 200 cores (and an OS capable of utilizing) then you don't really have 200 parallel threads. You have (say) 8 cores, each frantically switching between 25 threads. Naively you might think that as a result of this, each thread experiences the equivalent of running on a core that is 25 times slower. But it's actually much worse than that - the OS spends more time taking one thread off a core and putting another one on it ("context switching") than it does actually allowing your code to run.

无论如何,内存浪费只是精神错乱的一部分。除非你有200个核心(以及一个能够利用的操作系统),否则你真的没有200个并行线程。你有(比方说)8个核心,每个核心在25个线程之间疯狂切换。天真的你可能会认为,由于这个原因,每个线程都会遇到运行速度慢25倍的核心。但它实际上要比这更糟糕 - 操作系统花费更多的时间从核心中取出一个线程并在其上放置另一个线程(“上下文切换”),而不是实际允许代码运行。

Just look at how any well-known successful design tackles this kind of problem. The CLR's thread pool (even if you're not using it) serves as a fine example. It starts off assuming just one thread per core will be sufficient. It allows more to be created, but only to ensure that badly designed parallel algorithms will eventually complete. It refuses to create more than 2 threads per second, so it effectively punishes thread-greedy algorithms by slowing them down.

只要看看任何知名的成功设计如何解决这类问题。 CLR的线程池(即使你没有使用它)也是一个很好的例子。假设每个核心只有一个线程就足够了。它允许创建更多,但只是为了确保最终完成设计糟糕的并行算法。它拒绝每秒创建超过2个线程,因此它通过减慢线程贪婪算法来有效地惩罚它们。

#4


2  

What's your platform? If Windows then I'd suggest looking at async operations and thread pools (or I/O Completion Ports directly if you're working at the Win32 API level in C/C++).

你的平台是什么?如果Windows然后我建议查看异步操作和线程池(如果您在C / C ++中使用Win32 API级别,则直接查看I / O完成端口)。

The idea is that you have a small number of threads that deal with your I/O and this makes your system capable of scaling to large numbers of concurrent connections because there's no relationship between the number of connections and the number of threads used by the process that is serving them. As expected, .Net insulates you from the details and Win32 doesn't.

这个想法是你有少量的线程处理你的I / O,这使你的系统能够扩展到大量的并发连接,因为连接数和进程使用的线程数之间没有关系这是为他们服务。正如所料,.Net将您与详细信息隔离开来,而Win32则没有。

The challenge of using async I/O and this style of server is that the processing of client requests becomes a state machine on the server and the data arriving triggers changes of state. Sometimes this takes some getting used to but once you do it's really rather marvellous;)

使用异步I / O和这种类型的服务器的挑战是客户端请求的处理成为服务器上的状态机,并且到达的数据触发状态的改变。有时这需要一些习惯,但一旦你这样做真的相当奇妙;)

I've got some free code that demonstrates various server designs in C++ using IOCP here.

我有一些免费的代码,用于演示使用IOCP的C ++中的各种服务器设计。

If you're using unix or need to be cross platform and you're in C++ then you might want to look at boost ASIO which provides async I/O functionality.

如果您正在使用unix或需要跨平台而且您使用的是C ++,那么您可能需要查看提供异步I / O功能的boost ASIO。

#5


0  

I think the question you should be asking is not if 200 as a general thread number is good or bad, but rather how many of those threads are going to be active.

我认为你应该问的问题不是200作为一般线程数是好还是坏,而是这些线程中有多少是活跃的。

If only several of them are active at any given moment, while all the others are sleeping or waiting or whatnot, then you're fine. Sleeping threads, in this context, cost you nothing.

如果在任何特定时刻只有其中几个是活跃的,而其他所有人都在睡觉或等待或等等,那么你没事。在这种情况下,睡眠线程不需要任何费用。

However if all of those 200 threads are active, you're going to have your CPU wasting so much time doing thread context switches between all those ~200 threads.

但是,如果这200个线程中的所有线程都处于活动状态,那么您的CPU将耗费大量时间在所有这些~200个线程之间进行线程上下文切换。