如何衡量和修复上下文切换瓶颈?

时间:2022-03-31 23:54:05

I have a multi-threaded socket program. I use boost threadpool (http://threadpool.sourceforge.net/) for executing tasks. I create a TCP client socket per thread in threadpool. Whenever I send large amount of data say 500KB (message size), the throughput reduces significantly. I checked my code for:

我有一个多线程套接字程序。我使用boost threadpool(http://threadpool.sourceforge.net/)来执行任务。我在threadpool中为每个线程创建一个TCP客户端套接字。每当我发送大量数据表示500KB(消息大小)时,吞吐量会显着降低。我检查了我的代码:

1) Waits that might cause context-switching 2) Lock/Mutexes

1)可能导致上下文切换的等待2)锁定/互斥锁

For example, a 500KB message is divided into multiple lines and I send each line through the socket using ::send( ).

例如,一条500KB的消息被分成多行,我使用:: send()通过套接字发送每一行。

typedef std::list< std::string > LinesListType;
// now send the lines to the server
for ( LinesListType::const_iterator it = linesOut.begin( );
      it!=linesOut.end( );
      ++it )
{
    std::string line = *it;
    if ( !line.empty( ) && '.' == line[0] )
    {
        line.insert( 0, "." );
    }

   SendData( line + CRLF );
}

SendData:

void SendData( const std::string& data )
{
    try
    {
        uint32_t bytesToSendNo  = data.length();
        uint32_t totalBytesSent = 0;

        ASSERT( m_socketPtr.get( ) != NULL )
        while ( bytesToSendNo > 0 )
        {
            try
            {
                int32_t ret = m_socketPtr->Send( data.data( ) + totalBytesSent, bytesToSendNo );

                if ( 0 == ret )
                {
                    throw;
                }

                bytesToSendNo -= ret;
                totalBytesSent += ret;
            }
            catch( )
            {
            }
        }
    }
    catch()
    {

    }
}

Send Method in Client Socket:

在客户端套接字中发送方法:

int Send( const char* buffer, int length )
{
    try
    {
        int bytes = 0;
        do
        {
            bytes = ::send( m_handle, buffer, length, MSG_NOSIGNAL );
        }
        while ( bytes == -1 && errno == EINTR );

        if ( bytes == -1 )
        {
            throw SocketSendFailed( );
        }

        return bytes;

    }
    catch( )
    {

    }
}

Invoking ::select() before sending caused context switches since ::select could block. Holding a lock on shared mutex caused parallel threads to wait and switch context. That affected the performance.

在发送导致上下文切换之前调用:: select(),因为:: select可以阻塞。对共享互斥锁进行锁定会导致并行线程等待并切换上下文。这影响了性能。

Is there a best practice for avoiding context switches especially in network programming? I have spent at least a week trying to figure out various tools with no luck (vmstat, callgrind in valgrind). Any tools on Linux would help measuring these bottlenecks?

是否有避免上下文切换的最佳实践,尤其是在网络编程中?我花了至少一个星期试图找出没有运气的各种工具(vmstat,valgrind中的callgrind)。 Linux上的任何工具都有助于衡量这些瓶颈吗?

1 个解决方案

#1


1  

In general, not related to networking, you need one thread for each resource that could be used in parallel. In other words, if you have a single network interface, a single thread is enough to service the network interface. Since you don't typically just receive or send data but also do something with it, your thread then switches to consume a different resource like e.g. the CPU for computations or the IO channel to the harddisk for storage or retrieval. This task then needs to be done in a different thread, while the single network thread keeps retrieving messages from the network.

通常,与网络无关,每个资源可以并行使用一个线程。换句话说,如果您有一个网络接口,则单个线程足以为网络接口提供服务。由于您通常不会仅接收或发送数据,而是对其执行某些操作,因此您的线程会切换为使用不同的资源,例如:用于计算的CPU或用于存储或检索的硬盘的IO通道。然后,该任务需要在不同的线程中完成,而单个网络线程不断从网络中检索消息。

As a consequence, your approach of creating a thread for each connection seems a simple way to keep things clean and separate, but it simply doesn't scale since it involves too much unnecessary context switching. Instead, keep the networking in one place if you can. Also, don't reinvent the wheel. There are tools like e.g. zeromq out there that serve several connections, assemble whole messages from fragmented network packets and only invoke a callback when one message was completely received. And it does so performantly, so I'd suggest using this tool as a base for your communication. In addition, it provides a plethora of language bindings, so you can quickly prototype nodes using a scripting language and switch to C++ for performance lateron.

因此,为每个连接创建线程的方法似乎是保持事物清洁和分离的简单方法,但它不会扩展,因为它涉及太多不必要的上下文切换。相反,如果可以的话,将网络保持在一个地方。另外,不要重新发明*。有一些工具,例如zeromq在那里提供多个连接,从分散的网络数据包中组装整个消息,并且只在完全接收到一条消息时调用回调。它确实如此,所以我建议使用这个工具作为沟通的基础。此外,它提供了大量的语言绑定,因此您可以使用脚本语言快速构建节点原型,然后切换到C ++以获得性能。

Lastly, I'm afraid that the library you are using (which does not seem to be part of Boost!) is abandonware, i.e. its development is discontinued. I'm not sure of that, but looking at the changelog, they claim that they made it compatible to Boost 1.37, which is really old. Make sure that what you are using is worth your time!

最后,我担心您使用的库(似乎不是Boost的一部分!)是放弃软件,即它的开发已经停止。我不确定,但是看看更改日志,他们声称他们使它与Boost 1.37兼容,这真的很老。确保您使用的是值得的!

#1


1  

In general, not related to networking, you need one thread for each resource that could be used in parallel. In other words, if you have a single network interface, a single thread is enough to service the network interface. Since you don't typically just receive or send data but also do something with it, your thread then switches to consume a different resource like e.g. the CPU for computations or the IO channel to the harddisk for storage or retrieval. This task then needs to be done in a different thread, while the single network thread keeps retrieving messages from the network.

通常,与网络无关,每个资源可以并行使用一个线程。换句话说,如果您有一个网络接口,则单个线程足以为网络接口提供服务。由于您通常不会仅接收或发送数据,而是对其执行某些操作,因此您的线程会切换为使用不同的资源,例如:用于计算的CPU或用于存储或检索的硬盘的IO通道。然后,该任务需要在不同的线程中完成,而单个网络线程不断从网络中检索消息。

As a consequence, your approach of creating a thread for each connection seems a simple way to keep things clean and separate, but it simply doesn't scale since it involves too much unnecessary context switching. Instead, keep the networking in one place if you can. Also, don't reinvent the wheel. There are tools like e.g. zeromq out there that serve several connections, assemble whole messages from fragmented network packets and only invoke a callback when one message was completely received. And it does so performantly, so I'd suggest using this tool as a base for your communication. In addition, it provides a plethora of language bindings, so you can quickly prototype nodes using a scripting language and switch to C++ for performance lateron.

因此,为每个连接创建线程的方法似乎是保持事物清洁和分离的简单方法,但它不会扩展,因为它涉及太多不必要的上下文切换。相反,如果可以的话,将网络保持在一个地方。另外,不要重新发明*。有一些工具,例如zeromq在那里提供多个连接,从分散的网络数据包中组装整个消息,并且只在完全接收到一条消息时调用回调。它确实如此,所以我建议使用这个工具作为沟通的基础。此外,它提供了大量的语言绑定,因此您可以使用脚本语言快速构建节点原型,然后切换到C ++以获得性能。

Lastly, I'm afraid that the library you are using (which does not seem to be part of Boost!) is abandonware, i.e. its development is discontinued. I'm not sure of that, but looking at the changelog, they claim that they made it compatible to Boost 1.37, which is really old. Make sure that what you are using is worth your time!

最后,我担心您使用的库(似乎不是Boost的一部分!)是放弃软件,即它的开发已经停止。我不确定,但是看看更改日志,他们声称他们使它与Boost 1.37兼容,这真的很老。确保您使用的是值得的!