All I need to do is get the headers from a CSV file.
我需要做的就是从CSV文件中获取标题。
file.csv is:
file.csv是:
"A", "B", "C"
"1", "2", "3"
My code is:
我的代码是:
table = CSV.open("file.csv", :headers => true)
puts table.headers
table.each do |row|
puts row
end
Which gives me:
这给了我:
true
"1", "2", "3"
I've been looking at Ruby CSV documentation for hours and this is driving me crazy. I am convinced that there must be a simple one-liner that can return the headers to me. Any ideas?
我已经看了几个小时的Ruby CSV文档,这让我发疯。我确信必须有一个简单的单行程序可以将标题返回给我。有任何想法吗?
3 个解决方案
#1
12
It looks like CSV.read
will give you access to a headers
method:
看起来CSV.read会让您访问header方法:
headers = CSV.read("file.csv", headers: true).headers
# => ["A", "B", "C"]
The above is really just a shortcut for CSV.open("file.csv", headers: true).read.headers
. You could have gotten to it using CSV.open
as you tried, but since CSV.open
doesn't actually read the file when you call the method, there is no way for it to know what the headers are until it's actually read some data. This is why it just returns true
in your example. After reading some data, it would finally return the headers:
以上只是CSV.open(“file.csv”,headers:true).read.headers的快捷方式。您可以尝试使用CSV.open来实现它,但由于CSV.open在您调用方法时实际上并没有读取文件,因此在实际读取某些数据之前,它无法知道标题是什么。这就是为什么它只是在你的例子中返回true。读完一些数据后,最终会返回标题:
table = CSV.open("file.csv", :headers => true)
table.headers
# => true
table.read
# => #<CSV::Table mode:col_or_row row_count:2>
table.headers
# => ["A", "B", "C"]
#2
4
In my opinion the best way to do this is:
在我看来,最好的方法是:
headers = CSV.foreach('file.csv').first
headers = CSV.foreach('file.csv')。first
Please note that its very tempting to use CSV.read('file.csv'. headers: true).headers
but the catch is, CSV.read
loads complete file in memory and hence increases your memory footprint and as also it makes it very slow to use for bigger files. Whenever possible please use CSV.foreach
. Below are the benchmarks for just a 20 MB file:
请注意,使用CSV.read('file.csv'。headers:true).headers非常诱人,但问题是,CSV.read会在内存中加载完整的文件,因此会增加内存占用,同时也会使使用较慢的文件较慢。请尽可能使用CSV.foreach。以下是仅20 MB文件的基准:
Ruby version: ruby 2.4.1p111
File size: 20M
****************
Time and memory usage with CSV.foreach:
Time: 0.0 seconds
Memory: 0.04 MB
****************
Time and memory usage with CSV.read:
Time: 5.88 seconds
Memory: 314.25 MB
A 20MB file increase memory footprint by 314 MB with CSV.read
imagine what a 1GB file. In short please do not use CSV.read
, i did and system went down for a 300MB file.
一个20MB的文件使用CSV.read将内存占用增加了314 MB,想象一下1GB的文件。简而言之,请不要使用CSV.read,我做了,系统关闭了300MB文件。
For further reading: If you want to read more about this, here is a very good article on handling big files.
进一步阅读:如果你想了解更多关于这一点,这里有一篇关于处理大文件的非常好的文章。
Also below is the script i used for benchmarking CSV.foreach
and CSV.read
:
以下是我用于对CSV.foreach和CSV.read进行基准测试的脚本:
require 'benchmark'
require 'csv'
def print_memory_usage
memory_before = `ps -o rss= -p #{Process.pid}`.to_i
yield
memory_after = `ps -o rss= -p #{Process.pid}`.to_i
puts "Memory: #{((memory_after - memory_before) / 1024.0).round(2)} MB"
end
def print_time_spent
time = Benchmark.realtime do
yield
end
puts "Time: #{time.round(2)} seconds"
end
file_path = '{path_to_csv_file}'
puts 'Ruby version: ' + `ruby -v`
puts 'File size:' + `du -h #{file_path}`
puts 'Time and memory usage with CSV.foreach: '
print_memory_usage do
print_time_spent do
headers = CSV.foreach(file_path, headers: false).first
end
end
puts 'Time and memory usage with CSV.read:'
print_memory_usage do
print_time_spent do
headers = CSV.read(file_path, headers: true).headers
end
end
#3
0
If you want a shorter answer then can try:
如果您想要更短的答案,那么可以尝试:
headers = CSV.open("file.csv", &:readline)
# => ["A", "B", "C"]
#1
12
It looks like CSV.read
will give you access to a headers
method:
看起来CSV.read会让您访问header方法:
headers = CSV.read("file.csv", headers: true).headers
# => ["A", "B", "C"]
The above is really just a shortcut for CSV.open("file.csv", headers: true).read.headers
. You could have gotten to it using CSV.open
as you tried, but since CSV.open
doesn't actually read the file when you call the method, there is no way for it to know what the headers are until it's actually read some data. This is why it just returns true
in your example. After reading some data, it would finally return the headers:
以上只是CSV.open(“file.csv”,headers:true).read.headers的快捷方式。您可以尝试使用CSV.open来实现它,但由于CSV.open在您调用方法时实际上并没有读取文件,因此在实际读取某些数据之前,它无法知道标题是什么。这就是为什么它只是在你的例子中返回true。读完一些数据后,最终会返回标题:
table = CSV.open("file.csv", :headers => true)
table.headers
# => true
table.read
# => #<CSV::Table mode:col_or_row row_count:2>
table.headers
# => ["A", "B", "C"]
#2
4
In my opinion the best way to do this is:
在我看来,最好的方法是:
headers = CSV.foreach('file.csv').first
headers = CSV.foreach('file.csv')。first
Please note that its very tempting to use CSV.read('file.csv'. headers: true).headers
but the catch is, CSV.read
loads complete file in memory and hence increases your memory footprint and as also it makes it very slow to use for bigger files. Whenever possible please use CSV.foreach
. Below are the benchmarks for just a 20 MB file:
请注意,使用CSV.read('file.csv'。headers:true).headers非常诱人,但问题是,CSV.read会在内存中加载完整的文件,因此会增加内存占用,同时也会使使用较慢的文件较慢。请尽可能使用CSV.foreach。以下是仅20 MB文件的基准:
Ruby version: ruby 2.4.1p111
File size: 20M
****************
Time and memory usage with CSV.foreach:
Time: 0.0 seconds
Memory: 0.04 MB
****************
Time and memory usage with CSV.read:
Time: 5.88 seconds
Memory: 314.25 MB
A 20MB file increase memory footprint by 314 MB with CSV.read
imagine what a 1GB file. In short please do not use CSV.read
, i did and system went down for a 300MB file.
一个20MB的文件使用CSV.read将内存占用增加了314 MB,想象一下1GB的文件。简而言之,请不要使用CSV.read,我做了,系统关闭了300MB文件。
For further reading: If you want to read more about this, here is a very good article on handling big files.
进一步阅读:如果你想了解更多关于这一点,这里有一篇关于处理大文件的非常好的文章。
Also below is the script i used for benchmarking CSV.foreach
and CSV.read
:
以下是我用于对CSV.foreach和CSV.read进行基准测试的脚本:
require 'benchmark'
require 'csv'
def print_memory_usage
memory_before = `ps -o rss= -p #{Process.pid}`.to_i
yield
memory_after = `ps -o rss= -p #{Process.pid}`.to_i
puts "Memory: #{((memory_after - memory_before) / 1024.0).round(2)} MB"
end
def print_time_spent
time = Benchmark.realtime do
yield
end
puts "Time: #{time.round(2)} seconds"
end
file_path = '{path_to_csv_file}'
puts 'Ruby version: ' + `ruby -v`
puts 'File size:' + `du -h #{file_path}`
puts 'Time and memory usage with CSV.foreach: '
print_memory_usage do
print_time_spent do
headers = CSV.foreach(file_path, headers: false).first
end
end
puts 'Time and memory usage with CSV.read:'
print_memory_usage do
print_time_spent do
headers = CSV.read(file_path, headers: true).headers
end
end
#3
0
If you want a shorter answer then can try:
如果您想要更短的答案,那么可以尝试:
headers = CSV.open("file.csv", &:readline)
# => ["A", "B", "C"]