Ruby CSV:如何跳过前两行文件?

时间:2021-12-15 21:50:19

I have a file where the first line is a useless line, and the 2nd is a header. The problem is that when I'm looping through the file, it counts those are rows. Is there a way to use foreach with options to skip 2 lines? I know there's a read method on CSV, but that loads the data into RAM and if the file is too big I don't think it'll scale well.

我有一个文件,第一行是无用的一行,第二个是标题。问题是,当我循环遍历文件时,它会计算这些是行。有没有办法用foreach来跳过两行?我知道CSV中有一个读取方法,但它会将数据加载到RAM中,如果文件太大,我认为它不会扩展得很好。

However, if there is no other option I will consider it. This is what I have so far:

但是,如果没有其他选择,我会考虑的。这是我目前所拥有的:

CSV.foreach(filename, col_sep: "\t") do |row|
  until listings.size == limit
    listing_class = 'Sale'
    address = row[7]
    unit = row[8]
    price = row[2]
    url = row[0]
    listings << {listing_class: listing_class, address: address, unit: unit, url: url, price: price}
  end
end

3 个解决方案

#1


5  

I didn't benchmark, but try this:

我没有做基准测试,但是试试这个:

CSV.to_enum(:foreach, filename, col_sep: "\t").drop(2).each do |row|

#2


1  

Use a counter var, initialize it to 0, and increment it at every line, so if it's smaller than 2 then you can skip to the next row.

使用一个计数器var,将它初始化为0,并在每一行增加它,所以如果它小于2,那么你可以跳到下一行。

#3


1  

You can also use #read or #readlines like so

您还可以像这样使用#read或#readlines

CSV.readlines(filename, col_sep: "\t")[2..-1] do |row|

#readlines is an alias for #read so it does not matter which you use but it splits the CSV into an Array of Arrays so [2..-1] means use rows 3 through the end.

#readlines是#read的别名,因此使用哪个并不重要,但它将CSV分割为数组,因此[2..]-1]表示从最后使用第3行。

Both this and @Nakilon's answer are probably better and definitely cleaner than using a counter.

这个和@Nakilon的答案都可能比使用计数器更好、更干净。

As always Ruby classes are well documented and reading the Docs can be much more beneficial than just waiting for someone to hand you an answer.

Ruby类一直都有很好的文档记录,阅读文档比仅仅等待别人给你答案要有用得多。

#1


5  

I didn't benchmark, but try this:

我没有做基准测试,但是试试这个:

CSV.to_enum(:foreach, filename, col_sep: "\t").drop(2).each do |row|

#2


1  

Use a counter var, initialize it to 0, and increment it at every line, so if it's smaller than 2 then you can skip to the next row.

使用一个计数器var,将它初始化为0,并在每一行增加它,所以如果它小于2,那么你可以跳到下一行。

#3


1  

You can also use #read or #readlines like so

您还可以像这样使用#read或#readlines

CSV.readlines(filename, col_sep: "\t")[2..-1] do |row|

#readlines is an alias for #read so it does not matter which you use but it splits the CSV into an Array of Arrays so [2..-1] means use rows 3 through the end.

#readlines是#read的别名,因此使用哪个并不重要,但它将CSV分割为数组,因此[2..]-1]表示从最后使用第3行。

Both this and @Nakilon's answer are probably better and definitely cleaner than using a counter.

这个和@Nakilon的答案都可能比使用计数器更好、更干净。

As always Ruby classes are well documented and reading the Docs can be much more beneficial than just waiting for someone to hand you an answer.

Ruby类一直都有很好的文档记录,阅读文档比仅仅等待别人给你答案要有用得多。