写入大文件时的性能问题?

时间:2023-01-28 22:01:13

I have been recently involved in handling the console logs for a server and I was wondering, out of curiosity, that is there a performance issue in writing to a large file as compared to small ones.

我最近参与处理服务器的控制台日志,我想知道,出于好奇,与小文件相比,写入大文件存在性能问题。

For instance is it a good idea to keep the log file size small instead of letting them grow bulky, but I was not able to argue much in favor of either approach.

例如,保持日志文件大小不是让它们变得笨重是一个好主意,但我无法支持任何一种方法。

There might be problems in reading or searching in the file, but right now I am more interested in knowing if writing can be affected in any way. Looking for an expert advice.

在阅读或搜索文件时可能存在问题,但是现在我更想知道写作是否会受到任何影响。寻找专家意见。

Edit: The way I thought it was that the OS only has to open a file handle and push the data to the file system. There is little correlation to the file size, since you have to keep on appending the data to the end of the file and whenever a block of data is full, OS will assign another block to the file. As I said earlier, there can be problems in reading and searching because of defragmentation of file blocks, but I could not find much difference while writing.

编辑:我认为操作系统只需打开文件句柄并将数据推送到文件系统。与文件大小几乎没有关联,因为您必须继续将数据附加到文件的末尾,并且每当数据块已满时,OS将为文件分配另一个块。正如我之前所说,由于文件块的碎片整理,在读取和搜索方面可能存在问题,但在写入时我找不到太多差异。

2 个解决方案

#1


9  

As a general rule, there should be no practical difference between appending a block to a small file (or writing the first block which is appending to a zero-length file) or appending a block to a large file.

作为一般规则,将块附加到小文件(或写入附加到零长度文件的第一个块)或将块附加到大文件之间应该没有实际区别。

There are special cases (like trying to fault in a triple-indirect block or the initial open having to read all mapping information) which could add additional I/O's. but the steady-state should be the same.

有一些特殊情况(比如尝试在三重间接块中出错或初始打开必须读取所有映射信息),这可能会增加额外的I / O.但稳态应该是一样的。

I'd be more worried about the manageability of having huge files: slow to backup, slow to copy, slow to view, etc.

我更担心的是拥有大文件的可管理性:备份速度慢,复制速度慢,查看速度慢等。

#2


2  

I am not an expert, but I will try to answer anyway.

我不是专家,但无论如何我都会尽力回答。

Larger files may take longer to write on disk and in fact it is not a programming issue. It is file system issue. Perhaps there are file systems, which does not have such issues, but on Windows large files cannot be write down in one piece so fragmenting them will take time (for the simple reason that head will have to move to some other cylinder). Assuming that we are talking about "classic" hard drives...

较大的文件在磁盘上写入可能需要更长的时间,实际上它不是编程问题。这是文件系统问题。也许有文件系统,没有这样的问题,但在Windows上大文件不能写成一个部分,因此分割它们需要时间(原因很简单,头部将不得不移动到其他一些圆柱体)。假设我们正在谈论“经典”硬盘......

If you want an advice, I would go for writing down smaller files and rotating them either daily or when they hit some size (or both actually). That is rather common approach I saw in an enterprise-grade products.

如果您需要建议,我会写下较小的文件并每天或当它们达到某种尺寸(或实际两者)时旋转它们。这是我在企业级产品中看到的相当常见的方法。

#1


9  

As a general rule, there should be no practical difference between appending a block to a small file (or writing the first block which is appending to a zero-length file) or appending a block to a large file.

作为一般规则,将块附加到小文件(或写入附加到零长度文件的第一个块)或将块附加到大文件之间应该没有实际区别。

There are special cases (like trying to fault in a triple-indirect block or the initial open having to read all mapping information) which could add additional I/O's. but the steady-state should be the same.

有一些特殊情况(比如尝试在三重间接块中出错或初始打开必须读取所有映射信息),这可能会增加额外的I / O.但稳态应该是一样的。

I'd be more worried about the manageability of having huge files: slow to backup, slow to copy, slow to view, etc.

我更担心的是拥有大文件的可管理性:备份速度慢,复制速度慢,查看速度慢等。

#2


2  

I am not an expert, but I will try to answer anyway.

我不是专家,但无论如何我都会尽力回答。

Larger files may take longer to write on disk and in fact it is not a programming issue. It is file system issue. Perhaps there are file systems, which does not have such issues, but on Windows large files cannot be write down in one piece so fragmenting them will take time (for the simple reason that head will have to move to some other cylinder). Assuming that we are talking about "classic" hard drives...

较大的文件在磁盘上写入可能需要更长的时间,实际上它不是编程问题。这是文件系统问题。也许有文件系统,没有这样的问题,但在Windows上大文件不能写成一个部分,因此分割它们需要时间(原因很简单,头部将不得不移动到其他一些圆柱体)。假设我们正在谈论“经典”硬盘......

If you want an advice, I would go for writing down smaller files and rotating them either daily or when they hit some size (or both actually). That is rather common approach I saw in an enterprise-grade products.

如果您需要建议,我会写下较小的文件并每天或当它们达到某种尺寸(或实际两者)时旋转它们。这是我在企业级产品中看到的相当常见的方法。