Can someone explain to me how event-driven IO system calls like select, poll, and epoll relate to blocking vs non-blocking IO?
有人可以向我解释事件驱动的IO系统调用如select,poll和epoll如何与阻塞与非阻塞IO相关联?
I don't understand how related -- if at all, these concepts are
我不明白这些概念是如何相关的 - 如果有的话
3 个解决方案
#1
5
The select
system call is supported in almost all Unixes and provides means for userland applications to watch over a group of descriptors and get information about which subset of this group is ready for reading/writing. Its particular interface is a bit clunky and the implementation in most kernels is mediocre at best.
几乎所有Unix都支持select系统调用,并为用户空间应用程序提供监视一组描述符的方法,并获取有关该组的哪个子集可以读/写的信息。它的特殊界面有点笨重,大多数内核中的实现充其量只是平庸。
epoll
is provided only in Linux for the same purpose, but is a huge improvement over select
in terms of efficiency and programming interface. Other Unixes have their specialised calls too.
epoll仅在Linux中提供用于相同目的,但在效率和编程接口方面是选择的巨大改进。其他Unix也有专门的调用。
That said, the event-driven IO system calls do not require either blocking or non-blocking descriptors. Blocking is a behaviour that affects system calls like read
, write
, accept
and connect
. select
and epoll_wait
do have blocking timeouts, but that is something unrelated to the descriptors.
也就是说,事件驱动的IO系统调用不需要阻塞或非阻塞描述符。阻塞是一种影响读取,写入,接受和连接等系统调用的行为。 select和epoll_wait确实有阻塞超时,但这与描述符无关。
Of course, using these event-driven system calls with blocking descriptors is a bit odd because you would expect that you can immediately read the data without blocking after you have been notified that it is available. Always relying that a blocking descriptor won't block after you have been notified for its readiness is a bit risky because race conditions are possible.
当然,使用带有阻塞描述符的这些事件驱动的系统调用有点奇怪,因为您可能希望在收到通知可用后立即读取数据而不会阻塞。在获得通知准备就绪后,始终依赖阻塞描述符不会阻塞,这有点冒险,因为竞争条件是可能的。
Non-blocking, event-driven IO can make server applications vastly more efficient because threads are not needed for each descriptor (connection). Compare the Apache web server to Nginx or Lighttpd in terms of performance and you'll see the benefit.
非阻塞,事件驱动的IO可以使服务器应用程序更加高效,因为每个描述符(连接)都不需要线程。在性能方面将Apache Web服务器与Nginx或Lighttpd进行比较,您将看到好处。
#2
6
They're largely unrelated, except that you may want to use non-blocking file descriptors with event-driven IO for the following reasons:
它们在很大程度上是不相关的,除了您可能希望将非阻塞文件描述符与事件驱动的IO一起使用,原因如下:
-
Old versions of Linux definitely have bugs in the kernel where
read
can block even afterselect
indicated a socket was readable (it happened with UDP sockets and packets with bad checksums). Current versions of Linux may still have some such bugs; I'm not sure.Linux的旧版本肯定会在内核中出现错误,即使在select指示套接字可读之后,读取也可以阻塞(它发生在UDP套接字和具有错误校验和的数据包中)。当前版本的Linux可能仍然有一些这样的错误;我不确定。
-
If there's any possibility that other processes have access to your file descriptors and will read/write to them, or if your program is multi-threaded and other threads might do so, then there is a race condition between
select
determining that the file descriptor is readable/writable and your program performing IO on it, which could result in blocking.如果其他进程有可能访问您的文件描述符并且将对它们进行读/写,或者如果您的程序是多线程的并且其他线程可能这样做,那么在选择确定文件描述符之间存在竞争条件可读/可写,您的程序在其上执行IO,这可能导致阻塞。
-
You almost surely want to make a socket non-blocking before calling
connect
; otherwise you'll block until the connection is made. Useselect
for writing to determine when it's successfully connected, andselect
for errors to determine if the connection failed.你几乎肯定想在调用connect之前使套接字无阻塞;否则你将阻止,直到建立连接。使用select进行写入以确定何时成功连接,并选择错误以确定连接是否失败。
#3
0
select
and similar functions (you mentioned a few) are usually used to implement an event loop in an event driven system.
select和类似的函数(你提到过几个)通常用于在事件驱动的系统中实现事件循环。
I.e., instead of read()ing directly from a socket or file -- potentially blocking if the no data is available, the application calls select() on multiple file descriptors waiting for data to be available on any one of them.
即,不是直接从套接字或文件中读取() - 如果没有数据可用则可能阻塞,应用程序在多个文件描述符上调用select(),等待数据在其中任何一个上可用。
When a file descriptor becomes available, you can be assured data is available and the read() operation will not block.
当文件描述符可用时,可以确保数据可用并且read()操作不会阻塞。
This is one way of processing data from multiple sources simultaneously without resorting to multiple threads.
这是在不诉诸多个线程的情况下同时处理来自多个源的数据的一种方法。
#1
5
The select
system call is supported in almost all Unixes and provides means for userland applications to watch over a group of descriptors and get information about which subset of this group is ready for reading/writing. Its particular interface is a bit clunky and the implementation in most kernels is mediocre at best.
几乎所有Unix都支持select系统调用,并为用户空间应用程序提供监视一组描述符的方法,并获取有关该组的哪个子集可以读/写的信息。它的特殊界面有点笨重,大多数内核中的实现充其量只是平庸。
epoll
is provided only in Linux for the same purpose, but is a huge improvement over select
in terms of efficiency and programming interface. Other Unixes have their specialised calls too.
epoll仅在Linux中提供用于相同目的,但在效率和编程接口方面是选择的巨大改进。其他Unix也有专门的调用。
That said, the event-driven IO system calls do not require either blocking or non-blocking descriptors. Blocking is a behaviour that affects system calls like read
, write
, accept
and connect
. select
and epoll_wait
do have blocking timeouts, but that is something unrelated to the descriptors.
也就是说,事件驱动的IO系统调用不需要阻塞或非阻塞描述符。阻塞是一种影响读取,写入,接受和连接等系统调用的行为。 select和epoll_wait确实有阻塞超时,但这与描述符无关。
Of course, using these event-driven system calls with blocking descriptors is a bit odd because you would expect that you can immediately read the data without blocking after you have been notified that it is available. Always relying that a blocking descriptor won't block after you have been notified for its readiness is a bit risky because race conditions are possible.
当然,使用带有阻塞描述符的这些事件驱动的系统调用有点奇怪,因为您可能希望在收到通知可用后立即读取数据而不会阻塞。在获得通知准备就绪后,始终依赖阻塞描述符不会阻塞,这有点冒险,因为竞争条件是可能的。
Non-blocking, event-driven IO can make server applications vastly more efficient because threads are not needed for each descriptor (connection). Compare the Apache web server to Nginx or Lighttpd in terms of performance and you'll see the benefit.
非阻塞,事件驱动的IO可以使服务器应用程序更加高效,因为每个描述符(连接)都不需要线程。在性能方面将Apache Web服务器与Nginx或Lighttpd进行比较,您将看到好处。
#2
6
They're largely unrelated, except that you may want to use non-blocking file descriptors with event-driven IO for the following reasons:
它们在很大程度上是不相关的,除了您可能希望将非阻塞文件描述符与事件驱动的IO一起使用,原因如下:
-
Old versions of Linux definitely have bugs in the kernel where
read
can block even afterselect
indicated a socket was readable (it happened with UDP sockets and packets with bad checksums). Current versions of Linux may still have some such bugs; I'm not sure.Linux的旧版本肯定会在内核中出现错误,即使在select指示套接字可读之后,读取也可以阻塞(它发生在UDP套接字和具有错误校验和的数据包中)。当前版本的Linux可能仍然有一些这样的错误;我不确定。
-
If there's any possibility that other processes have access to your file descriptors and will read/write to them, or if your program is multi-threaded and other threads might do so, then there is a race condition between
select
determining that the file descriptor is readable/writable and your program performing IO on it, which could result in blocking.如果其他进程有可能访问您的文件描述符并且将对它们进行读/写,或者如果您的程序是多线程的并且其他线程可能这样做,那么在选择确定文件描述符之间存在竞争条件可读/可写,您的程序在其上执行IO,这可能导致阻塞。
-
You almost surely want to make a socket non-blocking before calling
connect
; otherwise you'll block until the connection is made. Useselect
for writing to determine when it's successfully connected, andselect
for errors to determine if the connection failed.你几乎肯定想在调用connect之前使套接字无阻塞;否则你将阻止,直到建立连接。使用select进行写入以确定何时成功连接,并选择错误以确定连接是否失败。
#3
0
select
and similar functions (you mentioned a few) are usually used to implement an event loop in an event driven system.
select和类似的函数(你提到过几个)通常用于在事件驱动的系统中实现事件循环。
I.e., instead of read()ing directly from a socket or file -- potentially blocking if the no data is available, the application calls select() on multiple file descriptors waiting for data to be available on any one of them.
即,不是直接从套接字或文件中读取() - 如果没有数据可用则可能阻塞,应用程序在多个文件描述符上调用select(),等待数据在其中任何一个上可用。
When a file descriptor becomes available, you can be assured data is available and the read() operation will not block.
当文件描述符可用时,可以确保数据可用并且read()操作不会阻塞。
This is one way of processing data from multiple sources simultaneously without resorting to multiple threads.
这是在不诉诸多个线程的情况下同时处理来自多个源的数据的一种方法。