如何使用Ruby删除文本文件中间的数据行

I know how to write to a file, and read from a file, but I don't know how to modify a file besides reading the entire file into memory, manipulating it, and rewriting the entire file. For large files this isn't very productive.

我知道如何写入文件，并从文件中读取，但除了将整个文件读入内存，操作它并重写整个文件之外，我不知道如何修改文件。对于大文件，这不是很有效率。

I don't really know the difference between append and write.

我真的不知道追加和写的区别。

E.g.

例如。

If I have a file containing:

如果我有一个文件包含：

Person1,will,23
Person2,Richard,32
Person3,Mike,44

How would I be able just to delete line containing Person2?

我怎么能只删除包含Person2的行？

4 个解决方案

#1

You can delete a line in a several ways:

您可以通过多种方式删除一行：

Simulate deletion. That is, just overwrite line's content with spaces. Later, when you read and process the file, just ignore such empty lines.

模拟删除。也就是说，只用空格覆盖行的内容。稍后，当您阅读并处理文件时，只需忽略这些空行。

Pros: this is easy and fast. Cons: it's not real deletion of data (file doesn't shrink) and you need to do more work when reading/processing the file.

优点：这很简单快捷。缺点：它不是真正的数据删除（文件不缩小），你需要在阅读/处理文件时做更多的工作。

Code:

码：
```
f = File.new(filename, 'r+')
f.each do |line|
  if should_be_deleted(line)
    # seek back to the beginning of the line.
    f.seek(-line.length, IO::SEEK_CUR)

    # overwrite line with spaces and add a newline char
    f.write(' ' * (line.length - 1))
    f.write("\n")
  end
end
f.close

File.new(filename).each {|line| p line }

# >> "Person1,will,23\n"
# >> "                  \n"
# >> "Person3,Mike,44\n"
```
Do real deletion. This means that line will no longer exist. So you will have to read next line and overwrite the current line with it. Then repeat this for all following lines until the end of file is reached. This seems to be error prone task (lines of different lengths, etc), so here's an error-free alternative: open temp file, write to it lines up to (but not including) the line you want to delete, skip the line you want to delete, write the rest to the temp file. Delete the original file and rename temporary one to use its name. Done.

做真正的删除。这意味着该行将不再存在。因此，您必须阅读下一行并用它覆盖当前行。然后对所有后续行重复此操作，直到到达文件末尾。这似乎是容易出错的任务（不同长度的行等），所以这里是一个无错误的替代方法：打开临时文件，写入它排队（但不包括）你要删除的行，跳过你的行想要删除，将其余部分写入临时文件。删除原始文件并重命名临时文件以使用其名称。完成。

While this is technically a total rewrite of the file, it does differ from what you asked. The file doesn't need to be loaded fully to memory. You need only one line at a time. Ruby provides a method for this: IO#each_line.

虽然这在技术上是对文件的完全重写，但它确实与您提出的要求不同。该文件不需要完全加载到内存。您一次只需要一行。 Ruby提供了一种方法：IO＃each_line。

Pros: No assumptions. Lines get deleted. Reading code needs not to be altered. Cons: lots more work when deleting the line (not only the code, but also IO/CPU time).

优点：没有假设。线条被删除。阅读代码不需要改变。缺点：删除行时更多的工作（不仅是代码，还有IO / CPU时间）。

There is a snippet that illustrates this approach in @azgult's answer.

在@ azgult的回答中有一个片段说明了这种方法。

#2

As files are saved essentially as a continuous block of data to the disk, removing any part of it necessitates rewriting at least what comes after it. This does in essence mean that - as you say - it isn't particularly efficient for large files. It is therefore generally a good idea to limit file sizes so that such problems don't occur.

由于文件基本上是作为连续的数据块保存到磁盘上，因此删除它的任何部分都需要至少重写之后的内容。这实际上意味着 - 正如你所说 - 它对于大文件来说并不是特别有效。因此，限制文件大小通常是一个好主意，这样就不会出现这样的问题。

A few "compromise" solutions might be to copy the file over line by line to a second file and then moving that to replace the first. This avoids loading the file into memory but does not avoid any hard disk access:

一些“妥协”解决方案可能是将文件逐行复制到第二个文件，然后移动它以替换第一个文件。这样可以避免将文件加载到内存中，但不会避免任何硬盘访问：

require 'fileutils'

open('file.txt', 'r') do |f|
  open('file.txt.tmp', 'w') do |f2|
    f.each_line do |line|
       f2.write(line) unless line.start_with? "Person2"
    end
  end
end
FileUtils.mv 'file.txt.tmp', 'file.txt'

Even more efficiently would be to read-write open the file and skip ahead to the position you want to delete and then shift the rest of the data back - but that would make for some quite ugly code (and I can't be asked to do that now).

更有效的是读写打开文件并跳到你想要删除的位置，然后将其余的数据移回 - 但这会产生一些非常难看的代码（我不能被要求现在就这样做）。

#3

You could open the file and read it line by line, appending lines you want to keep to a new file. This allows you the most control over which lines are kept, without destroying the original file.

您可以打开文件并逐行读取，将要保留的行追加到新文件中。这使您可以最大程度地控制保留哪些行，而不会破坏原始文件。

File.open('output_file_path', 'w') do |output| # 'w' for a new file, 'a' append to existing
  File.open('input_file_path', 'r') do |input|
    line = input.readline
    if keep_line(line) # logic here to determine if the line should be kept
      output.write(line)
    end
  end
end

If you know the position of the beginning and end of the chunk you want to remove, you can open the file, read to the start, then seek to the end and continue reading.

如果您知道要删除的块的开头和结尾的位置，则可以打开文件，读取开头，然后搜索结束并继续阅读。

Look up parameters to the read method, and read about seeking here:

查找read方法的参数，并阅读有关在此处搜索的内容：

http://ruby-doc.org/core-2.0/IO.html#method-i-read

#4

Read here:

在这里阅读：

File.open('output.txt', 'w') do |out_file|
  File.open('input.txt', 'r').each do |line|
    out_file.print line.sub('Person2', '')
  end
end

#1