I have a file where the first line is a useless line, and the 2nd is a header. The problem is that when I'm looping through the file, it counts those are rows. Is there a way to use foreach
with options to skip 2 lines? I know there's a read
method on CSV, but that loads the data into RAM and if the file is too big I don't think it'll scale well.
我有一个文件,第一行是无用的一行,第二个是标题。问题是,当我循环遍历文件时,它会计算这些是行。有没有办法用foreach来跳过两行?我知道CSV中有一个读取方法,但它会将数据加载到RAM中,如果文件太大,我认为它不会扩展得很好。
However, if there is no other option I will consider it. This is what I have so far:
但是,如果没有其他选择,我会考虑的。这是我目前所拥有的:
CSV.foreach(filename, col_sep: "\t") do |row|
until listings.size == limit
listing_class = 'Sale'
address = row[7]
unit = row[8]
price = row[2]
url = row[0]
listings << {listing_class: listing_class, address: address, unit: unit, url: url, price: price}
end
end
3 个解决方案
#1
5
I didn't benchmark, but try this:
我没有做基准测试,但是试试这个:
CSV.to_enum(:foreach, filename, col_sep: "\t").drop(2).each do |row|
#2
1
Use a counter var, initialize it to 0, and increment it at every line, so if it's smaller than 2 then you can skip to the next row.
使用一个计数器var,将它初始化为0,并在每一行增加它,所以如果它小于2,那么你可以跳到下一行。
#3
1
You can also use #read
or #readlines
like so
您还可以像这样使用#read或#readlines
CSV.readlines(filename, col_sep: "\t")[2..-1] do |row|
#readlines
is an alias for #read
so it does not matter which you use but it splits the CSV
into an Array of Arrays so [2..-1] means use rows 3 through the end.
#readlines是#read的别名,因此使用哪个并不重要,但它将CSV分割为数组,因此[2..]-1]表示从最后使用第3行。
Both this and @Nakilon's answer are probably better and definitely cleaner than using a counter.
这个和@Nakilon的答案都可能比使用计数器更好、更干净。
As always Ruby classes are well documented and reading the Docs can be much more beneficial than just waiting for someone to hand you an answer.
Ruby类一直都有很好的文档记录,阅读文档比仅仅等待别人给你答案要有用得多。
#1
5
I didn't benchmark, but try this:
我没有做基准测试,但是试试这个:
CSV.to_enum(:foreach, filename, col_sep: "\t").drop(2).each do |row|
#2
1
Use a counter var, initialize it to 0, and increment it at every line, so if it's smaller than 2 then you can skip to the next row.
使用一个计数器var,将它初始化为0,并在每一行增加它,所以如果它小于2,那么你可以跳到下一行。
#3
1
You can also use #read
or #readlines
like so
您还可以像这样使用#read或#readlines
CSV.readlines(filename, col_sep: "\t")[2..-1] do |row|
#readlines
is an alias for #read
so it does not matter which you use but it splits the CSV
into an Array of Arrays so [2..-1] means use rows 3 through the end.
#readlines是#read的别名,因此使用哪个并不重要,但它将CSV分割为数组,因此[2..]-1]表示从最后使用第3行。
Both this and @Nakilon's answer are probably better and definitely cleaner than using a counter.
这个和@Nakilon的答案都可能比使用计数器更好、更干净。
As always Ruby classes are well documented and reading the Docs can be much more beneficial than just waiting for someone to hand you an answer.
Ruby类一直都有很好的文档记录,阅读文档比仅仅等待别人给你答案要有用得多。