I'm generating some CSV output using Ruby's built-in CSV. Everything works fine, but the customer wants the name field in the output to have wrapping double-quotes so the output looks like the input file. For instance, the input looks something like this:
我正在使用Ruby的内置CSV生成一些CSV输出。一切正常,但客户希望输出中的name字段包含双引号,因此输出看起来像输入文件。例如,输入看起来像这样:
1,1.1.1.1,"Firstname Lastname",more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields
CSV's output, which is correct, looks like:
CSV的输出是正确的,如下所示:
1,1.1.1.1,Firstname Lastname,more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields
I know CSV is doing the right thing by not double-quoting the third field just because it has embedded blanks, and wrapping the field with double-quotes when it has the embedded comma. What I'd like to do, to help the customer feel warm and fuzzy, is tell CSV to always double-quote the third field.
我知道CSV正在做正确的事情,因为它没有引用第三个字段只是因为它嵌入了空白,并且当它有嵌入的逗号时用双引号包装字段。我想做的是,帮助客户感到温暖和模糊,告诉CSV总是双引号第三个字段。
I tried wrapping the field in double-quotes in my to_a
method, which creates a "Firstname Lastname"
field being passed to CSV, but CSV laughed at my puny-human attempt and output """Firstname Lastname"""
. That is the correct thing to do because it's escaping the double-quotes, so that didn't work.
我尝试在我的to_a方法中用双引号包装字段,这会创建一个传递给CSV的“名字姓氏”字段,但是CSV嘲笑我的小人类尝试并输出“”“姓氏姓氏”“”。这是正确的做法,因为它正在逃避双引号,所以这不起作用。
Then I tried setting CSV's :force_quotes => true
in the open
method, which output double-quotes wrapping all fields as expected, but the customer didn't like that, which I expected also. So, that didn't work either.
然后我尝试在open方法中设置CSV's:force_quotes => true,输出双引号按预期包装所有字段,但客户不喜欢,我也是这样。所以,这也没有用。
I've looked through the Table and Row docs and nothing appeared to give me access to the "generate a String field" method, or a way to set a "for field n always use quoting" flag.
我查看了Table和Row文档,似乎没有任何东西可以让我访问“生成字符串字段”方法,或者设置“for field n always use quoting”标记的方法。
I'm about to dive into the source to see if there's some super-secret tweaks, or if there's a way to monkey-patch CSV and bend it to do my will, but wondered if anyone had some special knowledge or had run into this before.
我即将潜入消息来源,看看是否有一些超级秘密的调整,或者是否有一种方法来修补CSV并弯曲它以实现我的意愿,但是想知道是否有人有一些特殊的知识或者遇到过这个问题之前。
And, yes, I know I could roll my own CSV output, but I prefer to not reinvent well-tested wheels. And, I'm also aware of FasterCSV; That's now part of Ruby 1.9.2, which I'm using, so explicitly using FasterCSV buys me nothing special. Also, I'm not using Rails and have no intention of rewriting it in Rails, so unless you have a cute way of implementing it using a small subset of Rails, don't bother. I'll downvote any recommendations to use any of those ways just because you didn't bother to read this far.
而且,是的,我知道我可以滚动自己的CSV输出,但我更喜欢不重新发明经过良好测试的*。而且,我也知道FasterCSV;这是我正在使用的Ruby 1.9.2的一部分,因此明确使用FasterCSV并没有什么特别之处。另外,我没有使用Rails并且无意在Rails中重写它,所以除非你有一个可爱的方法使用一小部分Rails实现它,所以不要打扰。我会低估任何使用这些方法的建议,因为你没有费心去读这篇文章。
6 个解决方案
#1
9
Well, there's a way to do it but it wasn't as clean as I'd hoped the CSV code could allow.
好吧,有一种方法可以做到,但它并不像我希望CSV代码允许的那样干净。
I had to subclass CSV, then override the CSV::Row.<<=
method and add another method forced_quote_fields=
to make it possible to define the fields I want to force-quoting on, plus pull two lambdas from other methods. At least it works for what I want:
我必须继承CSV,然后重写CSV :: Row。<< =方法并添加另一个方法forced_quote_fields =以便可以定义我想强制引用的字段,再从其他方法中拉出两个lambdas。至少它适用于我想要的东西:
require 'csv'
class MyCSV < CSV
def <<(row)
# make sure headers have been assigned
if header_row? and [Array, String].include? @use_headers.class
parse_headers # won't read data for Array or String
self << @headers if @write_headers
end
# handle CSV::Row objects and Hashes
row = case row
when self.class::Row then row.fields
when Hash then @headers.map { |header| row[header] }
else row
end
@headers = row if header_row?
@lineno += 1
@do_quote ||= lambda do |field|
field = String(field)
encoded_quote = @quote_char.encode(field.encoding)
encoded_quote +
field.gsub(encoded_quote, encoded_quote * 2) +
encoded_quote
end
@quotable_chars ||= encode_str("\r\n", @col_sep, @quote_char)
@forced_quote_fields ||= []
@my_quote_lambda ||= lambda do |field, index|
if field.nil? # represent +nil+ fields as empty unquoted fields
""
else
field = String(field) # Stringify fields
# represent empty fields as empty quoted fields
if (
field.empty? or
field.count(@quotable_chars).nonzero? or
@forced_quote_fields.include?(index)
)
@do_quote.call(field)
else
field # unquoted field
end
end
end
output = row.map.with_index(&@my_quote_lambda).join(@col_sep) + @row_sep # quote and separate
if (
@io.is_a?(StringIO) and
output.encoding != raw_encoding and
(compatible_encoding = Encoding.compatible?(@io.string, output))
)
@io = StringIO.new(@io.string.force_encoding(compatible_encoding))
@io.seek(0, IO::SEEK_END)
end
@io << output
self # for chaining
end
alias_method :add_row, :<<
alias_method :puts, :<<
def forced_quote_fields=(indexes=[])
@forced_quote_fields = indexes
end
end
That's the code. Calling it:
那是代码。打电话给:
data = [
%w[1 2 3],
[ 2, 'two too', 3 ],
[ 3, 'two, too', 3 ]
]
quote_fields = [1]
puts "Ruby version: #{ RUBY_VERSION }"
puts "Quoting fields: #{ quote_fields.join(', ') }", "\n"
csv = MyCSV.generate do |_csv|
_csv.forced_quote_fields = quote_fields
data.each do |d|
_csv << d
end
end
puts csv
results in:
结果是:
# >> Ruby version: 1.9.2
# >> Quoting fields: 1
# >>
# >> 1,"2",3
# >> 2,"two too",3
# >> 3,"two, too",3
#2
5
This post is old, but I can't believe no one thought of this.
这篇文章很老,但我不敢相信没有人想到这一点。
Why not do:
为什么不这样做:
csv = CSV.generate :quote_char => "\0" do |csv|
where \0 is a null character, then just add quotes to each field where they are needed:
其中\ 0是一个空字符,然后只需将引号添加到需要它们的每个字段:
csv << [product.upc, "\"" + product.name + "\"" # ...
Then at the end you can do a
然后在最后你可以做一个
csv.gsub!(/\0/, '')
#3
4
I doubt if this will help the customer feeling warm and fuzzy after all this time, but this seems to work:
我怀疑这是否能帮助顾客在这段时间后感到温暖和模糊,但这似乎有效:
require 'csv'
#prepare a lambda which converts field with index 2
quote_col2 = lambda do |field, fieldinfo|
# fieldinfo has a line- ,header- and index-method
if fieldinfo.index == 2 && !field.start_with?('"') then
'"' + field + '"'
else
field
end
end
# specify above lambda as one of the converters
csv = CSV.read("test1.csv", :converters => [quote_col2])
p csv
# => [["aaa", "bbb", "\"ccc\"", "ddd"], ["fff", "ggg", "\"hhh\"", "iii"]]
File.open("test1.txt","w"){|out| csv.each{|line|out.puts line.join(",")}}
#4
0
It doesn't look like there's any way to do this with the existing CSV implementation short of monkey-patching/rewriting it.
现有的CSV实现缺少猴子修补/重写它看起来没有办法做到这一点。
However, assuming you have full control over the source data, you could do this:
但是,假设您可以完全控制源数据,则可以执行以下操作:
- Append a custom string including a comma (i.e. one that would never be naturally found in the data) to the end of the field in question for each row; maybe something like "FORCE_COMMAS,".
- 将包含逗号的自定义字符串(即数据中永远不会自然找到的字符串)附加到每行的相关字段的末尾;也许像“FORCE_COMMAS”这样的东西。
- Generate the CSV output.
- 生成CSV输出。
- Now that you have CSV output with quotes on every row for your field, remove the custom string:
csv.gsub!(/FORCE_COMMAS,/, "")
- 现在您的字段的每一行都有CSV输出,并删除自定义字符串:csv.gsub!(/ FORCE_COMMAS,/,“”)
- Customer feels warm and fuzzy.
- 顾客感到温暖和模糊。
#5
0
CSV
has a force_quotes
option that will force it to quote all fields (it may not have been there when you posted this originally). I realize this isn't exactly what you were proposing, but it's less monkey patching.
CSV有一个force_quotes选项,它会强制它引用所有字段(当你最初发布它时它可能不存在)。我意识到这不完全是你提出的建议,但它不是猴子修补。
2.1.0 :008 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields']
1,1.1.1.1,Firstname Lastname,more,fields
2.1.0 :009 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields'], force_quotes: true
"1","1.1.1.1","Firstname Lastname","more","fields"
The drawback is that the first integer value ends up listed as a string, which changes things when you import into Excel.
缺点是第一个整数值最终列为字符串,这会在导入Excel时发生变化。
#6
0
CSV has changed a bit in Ruby 2.1 as mentioned by @jwadsack, however here's an working version of @the-tin-man's MyCSV. Bit modified, you set the forced_quote_fields via options.
如@jwadsack所述,CSV在Ruby 2.1中有所改变,但是这是@ the-tin-man的MyCSV的工作版本。位修改后,您可以通过选项设置forced_quote_fields。
MyCSV.generate(forced_quote_fields: [1]) do |_csv|...
The modified code
修改后的代码
require 'csv'
class MyCSV < CSV
def <<(row)
# make sure headers have been assigned
if header_row? and [Array, String].include? @use_headers.class
parse_headers # won't read data for Array or String
self << @headers if @write_headers
end
# handle CSV::Row objects and Hashes
row = case row
when self.class::Row then row.fields
when Hash then @headers.map { |header| row[header] }
else row
end
@headers = row if header_row?
@lineno += 1
output = row.map.with_index(&@quote).join(@col_sep) + @row_sep # quote and separate
if @io.is_a?(StringIO) and
output.encoding != (encoding = raw_encoding)
if @force_encoding
output = output.encode(encoding)
elsif (compatible_encoding = Encoding.compatible?(@io.string, output))
@io.set_encoding(compatible_encoding)
@io.seek(0, IO::SEEK_END)
end
end
@io << output
self # for chaining
end
def init_separators(options)
# store the selected separators
@col_sep = options.delete(:col_sep).to_s.encode(@encoding)
@row_sep = options.delete(:row_sep) # encode after resolving :auto
@quote_char = options.delete(:quote_char).to_s.encode(@encoding)
@forced_quote_fields = options.delete(:forced_quote_fields) || []
if @quote_char.length != 1
raise ArgumentError, ":quote_char has to be a single character String"
end
#
# automatically discover row separator when requested
# (not fully encoding safe)
#
if @row_sep == :auto
if [ARGF, STDIN, STDOUT, STDERR].include?(@io) or
(defined?(Zlib) and @io.class == Zlib::GzipWriter)
@row_sep = $INPUT_RECORD_SEPARATOR
else
begin
#
# remember where we were (pos() will raise an exception if @io is pipe
# or not opened for reading)
#
saved_pos = @io.pos
while @row_sep == :auto
#
# if we run out of data, it's probably a single line
# (ensure will set default value)
#
break unless sample = @io.gets(nil, 1024)
# extend sample if we're unsure of the line ending
if sample.end_with? encode_str("\r")
sample << (@io.gets(nil, 1) || "")
end
# try to find a standard separator
if sample =~ encode_re("\r\n?|\n")
@row_sep = $&
break
end
end
# tricky seek() clone to work around GzipReader's lack of seek()
@io.rewind
# reset back to the remembered position
while saved_pos > 1024 # avoid loading a lot of data into memory
@io.read(1024)
saved_pos -= 1024
end
@io.read(saved_pos) if saved_pos.nonzero?
rescue IOError # not opened for reading
# do nothing: ensure will set default
rescue NoMethodError # Zlib::GzipWriter doesn't have some IO methods
# do nothing: ensure will set default
rescue SystemCallError # pipe
# do nothing: ensure will set default
ensure
#
# set default if we failed to detect
# (stream not opened for reading, a pipe, or a single line of data)
#
@row_sep = $INPUT_RECORD_SEPARATOR if @row_sep == :auto
end
end
end
@row_sep = @row_sep.to_s.encode(@encoding)
# establish quoting rules
@force_quotes = options.delete(:force_quotes)
do_quote = lambda do |field|
field = String(field)
encoded_quote = @quote_char.encode(field.encoding)
encoded_quote +
field.gsub(encoded_quote, encoded_quote * 2) +
encoded_quote
end
quotable_chars = encode_str("\r\n", @col_sep, @quote_char)
@quote = if @force_quotes
do_quote
else
lambda do |field, index|
if field.nil? # represent +nil+ fields as empty unquoted fields
""
else
field = String(field) # Stringify fields
# represent empty fields as empty quoted fields
if field.empty? or
field.count(quotable_chars).nonzero? or
@forced_quote_fields.include?(index)
do_quote.call(field)
else
field # unquoted field
end
end
end
end
end
end
#1
9
Well, there's a way to do it but it wasn't as clean as I'd hoped the CSV code could allow.
好吧,有一种方法可以做到,但它并不像我希望CSV代码允许的那样干净。
I had to subclass CSV, then override the CSV::Row.<<=
method and add another method forced_quote_fields=
to make it possible to define the fields I want to force-quoting on, plus pull two lambdas from other methods. At least it works for what I want:
我必须继承CSV,然后重写CSV :: Row。<< =方法并添加另一个方法forced_quote_fields =以便可以定义我想强制引用的字段,再从其他方法中拉出两个lambdas。至少它适用于我想要的东西:
require 'csv'
class MyCSV < CSV
def <<(row)
# make sure headers have been assigned
if header_row? and [Array, String].include? @use_headers.class
parse_headers # won't read data for Array or String
self << @headers if @write_headers
end
# handle CSV::Row objects and Hashes
row = case row
when self.class::Row then row.fields
when Hash then @headers.map { |header| row[header] }
else row
end
@headers = row if header_row?
@lineno += 1
@do_quote ||= lambda do |field|
field = String(field)
encoded_quote = @quote_char.encode(field.encoding)
encoded_quote +
field.gsub(encoded_quote, encoded_quote * 2) +
encoded_quote
end
@quotable_chars ||= encode_str("\r\n", @col_sep, @quote_char)
@forced_quote_fields ||= []
@my_quote_lambda ||= lambda do |field, index|
if field.nil? # represent +nil+ fields as empty unquoted fields
""
else
field = String(field) # Stringify fields
# represent empty fields as empty quoted fields
if (
field.empty? or
field.count(@quotable_chars).nonzero? or
@forced_quote_fields.include?(index)
)
@do_quote.call(field)
else
field # unquoted field
end
end
end
output = row.map.with_index(&@my_quote_lambda).join(@col_sep) + @row_sep # quote and separate
if (
@io.is_a?(StringIO) and
output.encoding != raw_encoding and
(compatible_encoding = Encoding.compatible?(@io.string, output))
)
@io = StringIO.new(@io.string.force_encoding(compatible_encoding))
@io.seek(0, IO::SEEK_END)
end
@io << output
self # for chaining
end
alias_method :add_row, :<<
alias_method :puts, :<<
def forced_quote_fields=(indexes=[])
@forced_quote_fields = indexes
end
end
That's the code. Calling it:
那是代码。打电话给:
data = [
%w[1 2 3],
[ 2, 'two too', 3 ],
[ 3, 'two, too', 3 ]
]
quote_fields = [1]
puts "Ruby version: #{ RUBY_VERSION }"
puts "Quoting fields: #{ quote_fields.join(', ') }", "\n"
csv = MyCSV.generate do |_csv|
_csv.forced_quote_fields = quote_fields
data.each do |d|
_csv << d
end
end
puts csv
results in:
结果是:
# >> Ruby version: 1.9.2
# >> Quoting fields: 1
# >>
# >> 1,"2",3
# >> 2,"two too",3
# >> 3,"two, too",3
#2
5
This post is old, but I can't believe no one thought of this.
这篇文章很老,但我不敢相信没有人想到这一点。
Why not do:
为什么不这样做:
csv = CSV.generate :quote_char => "\0" do |csv|
where \0 is a null character, then just add quotes to each field where they are needed:
其中\ 0是一个空字符,然后只需将引号添加到需要它们的每个字段:
csv << [product.upc, "\"" + product.name + "\"" # ...
Then at the end you can do a
然后在最后你可以做一个
csv.gsub!(/\0/, '')
#3
4
I doubt if this will help the customer feeling warm and fuzzy after all this time, but this seems to work:
我怀疑这是否能帮助顾客在这段时间后感到温暖和模糊,但这似乎有效:
require 'csv'
#prepare a lambda which converts field with index 2
quote_col2 = lambda do |field, fieldinfo|
# fieldinfo has a line- ,header- and index-method
if fieldinfo.index == 2 && !field.start_with?('"') then
'"' + field + '"'
else
field
end
end
# specify above lambda as one of the converters
csv = CSV.read("test1.csv", :converters => [quote_col2])
p csv
# => [["aaa", "bbb", "\"ccc\"", "ddd"], ["fff", "ggg", "\"hhh\"", "iii"]]
File.open("test1.txt","w"){|out| csv.each{|line|out.puts line.join(",")}}
#4
0
It doesn't look like there's any way to do this with the existing CSV implementation short of monkey-patching/rewriting it.
现有的CSV实现缺少猴子修补/重写它看起来没有办法做到这一点。
However, assuming you have full control over the source data, you could do this:
但是,假设您可以完全控制源数据,则可以执行以下操作:
- Append a custom string including a comma (i.e. one that would never be naturally found in the data) to the end of the field in question for each row; maybe something like "FORCE_COMMAS,".
- 将包含逗号的自定义字符串(即数据中永远不会自然找到的字符串)附加到每行的相关字段的末尾;也许像“FORCE_COMMAS”这样的东西。
- Generate the CSV output.
- 生成CSV输出。
- Now that you have CSV output with quotes on every row for your field, remove the custom string:
csv.gsub!(/FORCE_COMMAS,/, "")
- 现在您的字段的每一行都有CSV输出,并删除自定义字符串:csv.gsub!(/ FORCE_COMMAS,/,“”)
- Customer feels warm and fuzzy.
- 顾客感到温暖和模糊。
#5
0
CSV
has a force_quotes
option that will force it to quote all fields (it may not have been there when you posted this originally). I realize this isn't exactly what you were proposing, but it's less monkey patching.
CSV有一个force_quotes选项,它会强制它引用所有字段(当你最初发布它时它可能不存在)。我意识到这不完全是你提出的建议,但它不是猴子修补。
2.1.0 :008 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields']
1,1.1.1.1,Firstname Lastname,more,fields
2.1.0 :009 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields'], force_quotes: true
"1","1.1.1.1","Firstname Lastname","more","fields"
The drawback is that the first integer value ends up listed as a string, which changes things when you import into Excel.
缺点是第一个整数值最终列为字符串,这会在导入Excel时发生变化。
#6
0
CSV has changed a bit in Ruby 2.1 as mentioned by @jwadsack, however here's an working version of @the-tin-man's MyCSV. Bit modified, you set the forced_quote_fields via options.
如@jwadsack所述,CSV在Ruby 2.1中有所改变,但是这是@ the-tin-man的MyCSV的工作版本。位修改后,您可以通过选项设置forced_quote_fields。
MyCSV.generate(forced_quote_fields: [1]) do |_csv|...
The modified code
修改后的代码
require 'csv'
class MyCSV < CSV
def <<(row)
# make sure headers have been assigned
if header_row? and [Array, String].include? @use_headers.class
parse_headers # won't read data for Array or String
self << @headers if @write_headers
end
# handle CSV::Row objects and Hashes
row = case row
when self.class::Row then row.fields
when Hash then @headers.map { |header| row[header] }
else row
end
@headers = row if header_row?
@lineno += 1
output = row.map.with_index(&@quote).join(@col_sep) + @row_sep # quote and separate
if @io.is_a?(StringIO) and
output.encoding != (encoding = raw_encoding)
if @force_encoding
output = output.encode(encoding)
elsif (compatible_encoding = Encoding.compatible?(@io.string, output))
@io.set_encoding(compatible_encoding)
@io.seek(0, IO::SEEK_END)
end
end
@io << output
self # for chaining
end
def init_separators(options)
# store the selected separators
@col_sep = options.delete(:col_sep).to_s.encode(@encoding)
@row_sep = options.delete(:row_sep) # encode after resolving :auto
@quote_char = options.delete(:quote_char).to_s.encode(@encoding)
@forced_quote_fields = options.delete(:forced_quote_fields) || []
if @quote_char.length != 1
raise ArgumentError, ":quote_char has to be a single character String"
end
#
# automatically discover row separator when requested
# (not fully encoding safe)
#
if @row_sep == :auto
if [ARGF, STDIN, STDOUT, STDERR].include?(@io) or
(defined?(Zlib) and @io.class == Zlib::GzipWriter)
@row_sep = $INPUT_RECORD_SEPARATOR
else
begin
#
# remember where we were (pos() will raise an exception if @io is pipe
# or not opened for reading)
#
saved_pos = @io.pos
while @row_sep == :auto
#
# if we run out of data, it's probably a single line
# (ensure will set default value)
#
break unless sample = @io.gets(nil, 1024)
# extend sample if we're unsure of the line ending
if sample.end_with? encode_str("\r")
sample << (@io.gets(nil, 1) || "")
end
# try to find a standard separator
if sample =~ encode_re("\r\n?|\n")
@row_sep = $&
break
end
end
# tricky seek() clone to work around GzipReader's lack of seek()
@io.rewind
# reset back to the remembered position
while saved_pos > 1024 # avoid loading a lot of data into memory
@io.read(1024)
saved_pos -= 1024
end
@io.read(saved_pos) if saved_pos.nonzero?
rescue IOError # not opened for reading
# do nothing: ensure will set default
rescue NoMethodError # Zlib::GzipWriter doesn't have some IO methods
# do nothing: ensure will set default
rescue SystemCallError # pipe
# do nothing: ensure will set default
ensure
#
# set default if we failed to detect
# (stream not opened for reading, a pipe, or a single line of data)
#
@row_sep = $INPUT_RECORD_SEPARATOR if @row_sep == :auto
end
end
end
@row_sep = @row_sep.to_s.encode(@encoding)
# establish quoting rules
@force_quotes = options.delete(:force_quotes)
do_quote = lambda do |field|
field = String(field)
encoded_quote = @quote_char.encode(field.encoding)
encoded_quote +
field.gsub(encoded_quote, encoded_quote * 2) +
encoded_quote
end
quotable_chars = encode_str("\r\n", @col_sep, @quote_char)
@quote = if @force_quotes
do_quote
else
lambda do |field, index|
if field.nil? # represent +nil+ fields as empty unquoted fields
""
else
field = String(field) # Stringify fields
# represent empty fields as empty quoted fields
if field.empty? or
field.count(quotable_chars).nonzero? or
@forced_quote_fields.include?(index)
do_quote.call(field)
else
field # unquoted field
end
end
end
end
end
end