如何节点。js流工作吗?

时间:2022-09-28 18:28:46

I have a question about Node.js streams - specifically how they work conceptually.

我有一个关于Node的问题。js流——特别是它们在概念上的工作方式。

There is no lack of documentation on how to use streams. But I've had difficulty finding how streams work at the data level.

关于如何使用流的文档并不缺乏。但是我在数据级别上很难找到流是如何工作的。

My limited understanding of web communication, HTTP, is that full "packages" of data are sent back and forth. Similar to an individual ordering a company's catalogue, a client sends a GET (catalogue) request to the server, and the server responds with the catalogue. The browser doesn't receive a page of the catalogue, but the whole book.

我对web通信(HTTP)的有限理解是,完整的数据“包”是来回发送的。类似于订购公司目录的个人,客户端向服务器发送一个GET (catalog)请求,然后服务器响应目录。浏览器接收的不是目录的一页,而是整本书。

Are node streams perhaps multipart messages?

节点流可能是多部分消息吗?

I like the REST model - especially that it is stateless. Every single interaction between the browser and server is completely self contained and sufficient. Are node streams therefore not RESTful? One developer mentioned the similarity with socket pipes, which keep the connection open. Back to my catalogue ordering example, would this be like an infomercial with the line "But wait! There's more!" instead of the fully contained catalogue?

我喜欢REST模型——尤其是它是无状态的。浏览器和服务器之间的每一个交互都是完全自包含的和充分的。因此,节点流不是RESTful吗?一位开发人员提到了与套接字管道的相似之处,这使得连接保持打开。回到我的目录订购示例,这是否像一个带有“但是等等!”而不是完整的目录?

A large part of streams is the ability for the receiver 'down-stream' to send messages like 'pause' & 'continue' upstream. What do these messages consist of? Are they POST?

流的很大一部分是接收方“下游”发送“暂停”和“继续”等消息的能力。这些信息包括什么?他们是邮局吗?

Finally, my limited visual understanding of how Node works includes this event loop. Functions can be placed on separate threads from the thread pool, and the event loop carries on. But shouldn't sending a stream of data keep the event loop occupied (i.e. stopped) until the stream is complete? How is it ALSO keeping watch for the 'pause' request from downstream?n Does the event loop place the stream on another thread from the pool and when it encounters a 'pause' request, retrieve the relevant thread and pause it?

最后,我对Node如何工作的有限的可视化理解包括了这个事件循环。函数可以放在与线程池分离的线程上,事件循环继续。但是发送数据流不应该让事件循环被占用(即停止)直到流完成吗?它如何保持对下游的“暂停”请求的监视?事件循环是否将流放在池中的另一个线程上,当它遇到“暂停”请求时,检索相关的线程并暂停它?

I've read the node.js docs, completed the nodeschool tutorials, built a heroku app, purchased TWO books (real, self contained, books, kinda like the catalogues spoken before and likely not like node streams), asked several "node" instructors at code bootcamps - all speak about how to use streams but none speak about what's actually happening below.

我读过的节点。js文档,完成了nodeschool教程,heroku应用构建的,买了两本书(真实的,自我控制,书,有点像目录之前和口语可能不像节点流),问几个“节点”教练在训练——所有谈论如何使用代码流但没有谈论下面实际上发生了什么。

Perhaps you have come across a good resource explaining how these work? Perhaps a good anthropomorphic analogy for a non CS mind?

也许你遇到了一个很好的资源来解释这些是如何工作的?也许这是一个很好的拟人化的类比来描述一个非神性的人?

3 个解决方案

#1


8  

The first thing to note is: node.js streams are not limited to HTTP requests. HTTP requests / Network resources are just one example of a stream in node.js.

首先要注意的是:node。js流并不仅限于HTTP请求。HTTP请求/网络资源只是node.js中流的一个示例。

Streams are useful for everything that can be processed in small chunks. They allow you to process potentially huge resources in smaller chunks that fit into your RAM more easily.

流对于可以以小块方式处理的所有东西都很有用。它们允许您以更容易的方式处理适合您的RAM的较小的块来处理潜在的巨大资源。

Say you have a file (several gigabytes in size) and want to convert all lowercase into uppercase characters and write the result to another file. The naive approach would read the whole file using fs.readFile (error handling omitted for brevity):

假设您有一个文件(大小为几g),并希望将所有小写字符转换为大写字符,并将结果写入另一个文件。天真的方法是使用fs读取整个文件。readFile(为了简洁,省略了错误处理):

fs.readFile('my_huge_file', function (err, data) {
    var convertedData = data.toString().toUpperCase();

    fs.writeFile('my_converted_file', convertedData);
});

Unfortunately this approch will easily overwhelm your RAM as the whole file has to be stored before processing it. You would also waste precious time waiting for the file to be read. Wouldn't it make sense to process the file in smaller chunks? You could start processing as soon as you get the first bytes while waiting for the hard disk to provide the remaining data:

不幸的是,这种方法很容易使您的RAM崩溃,因为整个文件必须在处理它之前被存储。您还将浪费宝贵的时间等待文件被读取。用更小的块来处理文件不是很有意义吗?在等待硬盘提供剩余数据时,您可以在获得第一个字节的同时开始处理:

var readStream = fs.createReadStream('my_huge_file');
var writeStream = fs.createWriteStream('my_converted_file');
readStream.on('data', function (chunk) {
    var convertedChunk = chunk.toString().toUpperCase();
    writeStream.write(convertedChunk);
});
readStream.on('end', function () {
    writeStream.end();
});

This approach is much better:

这种方法要好得多:

  1. You will only deal with small parts of data that will easily fit into your RAM.
  2. 您将只处理小部分的数据,这些数据将容易地适合您的RAM。
  3. You start processing once the first byte arrived and don't waste time doing nothing, but waiting.
  4. 一旦第一个字节到达,就开始处理,不要浪费时间什么都不做,而是等待。

Once you open the stream node.js will open the file and start reading from it. Once the operating system passes some bytes to the thread that's reading the file it will be passed along to your application.

一旦您打开流节点。js将打开文件并开始从中读取。一旦操作系统将一些字节传递给正在读取文件的线程,它将传递给应用程序。


Coming back to the HTTP streams:

回到HTTP流:

  1. The first issue is valid here as well. It is possible that an attacker sends you large amounts of data to overwhelm your RAM and take down (DoS) your service.
  2. 第一个问题在这里也是有效的。攻击者可能会向您发送大量的数据,以压倒您的RAM,并降低您的服务。
  3. However the second issue is even more important in this case: The network may be very slow (think smartphones) and it may take a long time until everything is sent by the client. By using a stream you can start processing the request and cut response times.
  4. 然而,在这种情况下,第二个问题更重要:网络可能非常慢(想想智能手机),可能要花很长时间,直到所有的东西都被客户机发送出去。通过使用流,您可以开始处理请求并缩短响应时间。

On pausing the HTTP stream: This is not done at the HTTP level, but way lower. If you pause the stream node.js will simply stop reading from the underlying TCP socket. What is happening then is up to the kernel. It may still buffer the incoming data, so it's ready for you once you finished your current work. It may also inform the sender at the TCP level that it should pause sending data. Applications don't need to deal with that. That is none of their business. In fact the sender application probably does not even realize that you are no longer actively reading!

在暂停HTTP流时:这不是在HTTP级别完成的,而是在更低的级别。如果暂停流节点。js将直接停止从底层TCP套接字进行读取。接下来发生的事情取决于内核。它仍然可以缓冲传入的数据,因此一旦完成当前工作,它就可以为您准备好了。它还可以通知TCP级别的发送方应该暂停发送数据。应用程序不需要处理这些。这不关他们的事。事实上,发件人应用程序可能甚至没有意识到您已经不再积极地阅读!

So it's basically about being provided data as soon as it is available, but without overwhelming your resources. The underlying hard work is done either by the operating system (e.g. net, fs, http) or by the author of the stream you are using (e.g. zlib which is a Transform stream and usually bolted onto fs or net).

所以这基本上是指在有数据的时候就提供数据,但又不影响你的资源。底层的艰苦工作要么由操作系统完成(例如net、fs、http),要么由您正在使用的流的作者完成(例如zlib,它是一个转换流,通常固定在fs或net上)。

#2


3  

The below chart seems to be a pretty accurate 10.000 feet overview / diagram for the the node streams class.

下面的图表似乎是节点流类的一个非常精确的10.000英尺概述/图表。

It represents streams3, contributed by Chris Dickinson.

它代表了Chris Dickinson贡献的streams3。

如何节点。js流工作吗?

#3


2  

I think you are overthinking how all this works and I like it.

我觉得你想得太多了,我很喜欢。

What streams are good for

Streams are good for two things:

溪流有两种好处:

  • when an operation is slow and it can give you partials results as it gets them. For example read a file, it is slow because HDDs are slow and it can give you parts of the file as it reads it. With streams you can use these parts of the file and start to process them right away.

    当一个操作是缓慢的,它可以给你的部分结果,因为它得到它们。例如,读取一个文件,它是缓慢的,因为HDDs是缓慢的,它可以在读取文件时为您提供文件的一部分。对于流,您可以使用文件的这些部分并立即开始处理它们。

  • they are also good to connect programs together (read functions). Just as in the command line you can pipe different programs together to produce the desired output. Example: cat file | grep word.

    它们还可以很好地将程序连接在一起(读取函数)。就像在命令行中一样,您可以将不同的程序组合在一起来产生所需的输出。示例:cat文件| grep word。

How they work under the hood...

Most of these operations that take time to process and can give you partial results as it gets them are not done by Node.js they are done by the V8 JS Engine and it only hands those results to JS for you to work with them.

这些操作中大部分需要花费时间来处理,并且可以在获得部分结果时提供给您,这些操作不是由Node完成的。它们是由V8 js引擎完成的,它只将这些结果交给js,供您使用。

To understand your http example you need to understand how http works

There are different encodings a web page can be send as. In the beginning there was only one way. Where a whole page was sent when it was requested. Now it has more efficient encodings to do this. One of them is chunked where parts of the web page are sent until the whole page is sent. This is good because a web page can be processed as it is received. Imagine a web browser. It can start to render websites before the download is complete.

有不同的编码网页可以发送。一开始只有一种方法。当请求时,整个页面被发送。现在它有了更有效的编码。其中一个是分块的,部分网页被发送,直到整个页面被发送。这很好,因为可以在接收到web页面时进行处理。想象一个web浏览器。它可以在下载完成之前开始呈现网站。

Your .pause and .continue questions

First, Node.js streams only work within the same Node.js program. Node.js streams can't interact with a stream in another server or even program.

首先,节点。js流只能在同一个节点中工作。js程序。节点。js流不能与其他服务器或程序中的流交互。

That means that in the example below, Node.js can't talk to the webserver. It can't tell it to pause or resume.

这意味着在下面的示例中,节点。js无法与webserver对话。它不能让它暂停或继续。

Node.js <-> Network <-> Webserver

节点。js <->网络<-> Webserver。

What really happens is that Node.js asks for a webpage and it starts to download it and there is no way to stop that download. Just dropping the socket.

真正发生的是这个节点。js需要一个网页,然后开始下载,无法停止下载。只是把套接字。

So, what really happens when you make in Node.js .pause or .continue?

It starts to buffer the request until you are ready to start to consume it again. But the download never stopped.

它开始缓冲请求,直到您准备再次使用它为止。但下载从未停止。

Event Loop

I have a whole answer prepared to explain how the Event Loop works but I think it is better for you to watch this talk.

我准备了一个完整的答案来解释事件循环是如何工作的,但是我认为你最好看一下这个演讲。

#1


8  

The first thing to note is: node.js streams are not limited to HTTP requests. HTTP requests / Network resources are just one example of a stream in node.js.

首先要注意的是:node。js流并不仅限于HTTP请求。HTTP请求/网络资源只是node.js中流的一个示例。

Streams are useful for everything that can be processed in small chunks. They allow you to process potentially huge resources in smaller chunks that fit into your RAM more easily.

流对于可以以小块方式处理的所有东西都很有用。它们允许您以更容易的方式处理适合您的RAM的较小的块来处理潜在的巨大资源。

Say you have a file (several gigabytes in size) and want to convert all lowercase into uppercase characters and write the result to another file. The naive approach would read the whole file using fs.readFile (error handling omitted for brevity):

假设您有一个文件(大小为几g),并希望将所有小写字符转换为大写字符,并将结果写入另一个文件。天真的方法是使用fs读取整个文件。readFile(为了简洁,省略了错误处理):

fs.readFile('my_huge_file', function (err, data) {
    var convertedData = data.toString().toUpperCase();

    fs.writeFile('my_converted_file', convertedData);
});

Unfortunately this approch will easily overwhelm your RAM as the whole file has to be stored before processing it. You would also waste precious time waiting for the file to be read. Wouldn't it make sense to process the file in smaller chunks? You could start processing as soon as you get the first bytes while waiting for the hard disk to provide the remaining data:

不幸的是,这种方法很容易使您的RAM崩溃,因为整个文件必须在处理它之前被存储。您还将浪费宝贵的时间等待文件被读取。用更小的块来处理文件不是很有意义吗?在等待硬盘提供剩余数据时,您可以在获得第一个字节的同时开始处理:

var readStream = fs.createReadStream('my_huge_file');
var writeStream = fs.createWriteStream('my_converted_file');
readStream.on('data', function (chunk) {
    var convertedChunk = chunk.toString().toUpperCase();
    writeStream.write(convertedChunk);
});
readStream.on('end', function () {
    writeStream.end();
});

This approach is much better:

这种方法要好得多:

  1. You will only deal with small parts of data that will easily fit into your RAM.
  2. 您将只处理小部分的数据,这些数据将容易地适合您的RAM。
  3. You start processing once the first byte arrived and don't waste time doing nothing, but waiting.
  4. 一旦第一个字节到达,就开始处理,不要浪费时间什么都不做,而是等待。

Once you open the stream node.js will open the file and start reading from it. Once the operating system passes some bytes to the thread that's reading the file it will be passed along to your application.

一旦您打开流节点。js将打开文件并开始从中读取。一旦操作系统将一些字节传递给正在读取文件的线程,它将传递给应用程序。


Coming back to the HTTP streams:

回到HTTP流:

  1. The first issue is valid here as well. It is possible that an attacker sends you large amounts of data to overwhelm your RAM and take down (DoS) your service.
  2. 第一个问题在这里也是有效的。攻击者可能会向您发送大量的数据,以压倒您的RAM,并降低您的服务。
  3. However the second issue is even more important in this case: The network may be very slow (think smartphones) and it may take a long time until everything is sent by the client. By using a stream you can start processing the request and cut response times.
  4. 然而,在这种情况下,第二个问题更重要:网络可能非常慢(想想智能手机),可能要花很长时间,直到所有的东西都被客户机发送出去。通过使用流,您可以开始处理请求并缩短响应时间。

On pausing the HTTP stream: This is not done at the HTTP level, but way lower. If you pause the stream node.js will simply stop reading from the underlying TCP socket. What is happening then is up to the kernel. It may still buffer the incoming data, so it's ready for you once you finished your current work. It may also inform the sender at the TCP level that it should pause sending data. Applications don't need to deal with that. That is none of their business. In fact the sender application probably does not even realize that you are no longer actively reading!

在暂停HTTP流时:这不是在HTTP级别完成的,而是在更低的级别。如果暂停流节点。js将直接停止从底层TCP套接字进行读取。接下来发生的事情取决于内核。它仍然可以缓冲传入的数据,因此一旦完成当前工作,它就可以为您准备好了。它还可以通知TCP级别的发送方应该暂停发送数据。应用程序不需要处理这些。这不关他们的事。事实上,发件人应用程序可能甚至没有意识到您已经不再积极地阅读!

So it's basically about being provided data as soon as it is available, but without overwhelming your resources. The underlying hard work is done either by the operating system (e.g. net, fs, http) or by the author of the stream you are using (e.g. zlib which is a Transform stream and usually bolted onto fs or net).

所以这基本上是指在有数据的时候就提供数据,但又不影响你的资源。底层的艰苦工作要么由操作系统完成(例如net、fs、http),要么由您正在使用的流的作者完成(例如zlib,它是一个转换流,通常固定在fs或net上)。

#2


3  

The below chart seems to be a pretty accurate 10.000 feet overview / diagram for the the node streams class.

下面的图表似乎是节点流类的一个非常精确的10.000英尺概述/图表。

It represents streams3, contributed by Chris Dickinson.

它代表了Chris Dickinson贡献的streams3。

如何节点。js流工作吗?

#3


2  

I think you are overthinking how all this works and I like it.

我觉得你想得太多了,我很喜欢。

What streams are good for

Streams are good for two things:

溪流有两种好处:

  • when an operation is slow and it can give you partials results as it gets them. For example read a file, it is slow because HDDs are slow and it can give you parts of the file as it reads it. With streams you can use these parts of the file and start to process them right away.

    当一个操作是缓慢的,它可以给你的部分结果,因为它得到它们。例如,读取一个文件,它是缓慢的,因为HDDs是缓慢的,它可以在读取文件时为您提供文件的一部分。对于流,您可以使用文件的这些部分并立即开始处理它们。

  • they are also good to connect programs together (read functions). Just as in the command line you can pipe different programs together to produce the desired output. Example: cat file | grep word.

    它们还可以很好地将程序连接在一起(读取函数)。就像在命令行中一样,您可以将不同的程序组合在一起来产生所需的输出。示例:cat文件| grep word。

How they work under the hood...

Most of these operations that take time to process and can give you partial results as it gets them are not done by Node.js they are done by the V8 JS Engine and it only hands those results to JS for you to work with them.

这些操作中大部分需要花费时间来处理,并且可以在获得部分结果时提供给您,这些操作不是由Node完成的。它们是由V8 js引擎完成的,它只将这些结果交给js,供您使用。

To understand your http example you need to understand how http works

There are different encodings a web page can be send as. In the beginning there was only one way. Where a whole page was sent when it was requested. Now it has more efficient encodings to do this. One of them is chunked where parts of the web page are sent until the whole page is sent. This is good because a web page can be processed as it is received. Imagine a web browser. It can start to render websites before the download is complete.

有不同的编码网页可以发送。一开始只有一种方法。当请求时,整个页面被发送。现在它有了更有效的编码。其中一个是分块的,部分网页被发送,直到整个页面被发送。这很好,因为可以在接收到web页面时进行处理。想象一个web浏览器。它可以在下载完成之前开始呈现网站。

Your .pause and .continue questions

First, Node.js streams only work within the same Node.js program. Node.js streams can't interact with a stream in another server or even program.

首先,节点。js流只能在同一个节点中工作。js程序。节点。js流不能与其他服务器或程序中的流交互。

That means that in the example below, Node.js can't talk to the webserver. It can't tell it to pause or resume.

这意味着在下面的示例中,节点。js无法与webserver对话。它不能让它暂停或继续。

Node.js <-> Network <-> Webserver

节点。js <->网络<-> Webserver。

What really happens is that Node.js asks for a webpage and it starts to download it and there is no way to stop that download. Just dropping the socket.

真正发生的是这个节点。js需要一个网页,然后开始下载,无法停止下载。只是把套接字。

So, what really happens when you make in Node.js .pause or .continue?

It starts to buffer the request until you are ready to start to consume it again. But the download never stopped.

它开始缓冲请求,直到您准备再次使用它为止。但下载从未停止。

Event Loop

I have a whole answer prepared to explain how the Event Loop works but I think it is better for you to watch this talk.

我准备了一个完整的答案来解释事件循环是如何工作的,但是我认为你最好看一下这个演讲。