如何解析Excel文件,该文件将为我提供与视觉上完全相同的数据?

时间:2021-05-04 20:44:30

I'm on Rails 5 (Ruby 2.4). I want to read an .xls doc and I would like to get the data into CSV format, just as it appears in the Excel file. Someone recommended I use Roo, and so I have

我在Rails 5(Ruby 2.4)上。我想阅读.xls文档,我想将数据转换为CSV格式,就像它出现在Excel文件中一样。有人建议我使用Roo,所以我有

book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
text = sheet.to_csv
arr_of_arrs = CSV.parse(text)

However what is getting returned is not the same as what I see in the spreadsheet. For isntance, a cell in the spreadsheet has

但是返回的内容与我在电子表格中看到的不同。对于isntance,电子表格中的单元格具有

16:45.81

and when I get the CSV data from above, what is returned is

当我从上面得到CSV数据时,返回的是

"0.011641319444444444"

How do I parse the Excel doc and get exactly what I see? I don't care if I use Roo to parse or not, just as long as I can get CSV data that is a representation of what I see rather than some weird internal representation. For reference the file type I was parsing givies this when I run "file name_of_file.xls" ...

如何解析Excel文档并获得我所看到的内容?我不在乎我是否使用Roo来解析,只要我能获得CSV数据,这是我所看到的,而不是一些奇怪的内部表示。作为参考,当我运行“file name_of_file.xls”时,我正在解析的文件类型为...

Composite Document File V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252, Author: Dwight Schroot, Last Saved By: Dwight Schroot, Name of Creating Application: Microsoft Excel, Create Time/Date: Tue Sep 21 17:05:21 2010, Last Saved Time/Date: Wed Oct 13 16:52:14 2010, Security: 0

4 个解决方案

#1


3  

You need to save the custom formula in a text format on the .xls side. If your opening the .xls file from the internet this won't work but this will fix your problem if you can manipulate the file. You can do this using the function =TEXT(A2, "mm:ss.0") A2 is just the cell I'm using as an example.

您需要在.xls端以文本格式保存自定义公式。如果您从互联网上打开.xls文件,这将无法正常工作,但如果您可以操作该文件,这将解决您的问题。您可以使用函数= TEXT(A2,“mm:ss.0”)执行此操作.A2就是我正在使用的单元格作为示例。

如何解析Excel文件,该文件将为我提供与视觉上完全相同的数据?

book = ::Roo::Spreadsheet.open(file_location)
puts book.cell('B', 2) 
=> '16.45.8' 

If manipulating the file is not an option you could just pass a custom converter to CSV.new() and convert the decimal time back to the correct format you need.

如果操作文件不是一个选项,您可以将自定义转换器传递给CSV.new()并将小数时间转换回您需要的正确格式。

require 'roo-xls'
require 'csv'

CSV::Converters[:time_parser] = lambda do |field, info| 
  case info[:header].strip
  when "time" then  begin 
                      # 0.011641319444444444 * 24 hours * 3600 seconds = 1005.81 
                      parse_time =  field.to_f * 24 * 3600
                      # 1005.81.divmod(60) = [16, 45.809999999999999945]
                      mm, ss = parse_time.divmod(60)
                      # returns "16:45.81"
                      time = "#{mm}:#{ss.round(2)}"  
                      time 
                    rescue
                      field 
                    end
  else 
    field  
  end
end

book = ::Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
csv = CSV.new(sheet.to_csv, headers: true, converters: [:time_parser]).map {|row| row.to_hash}
puts csv 
=> {"time "=>"16:45.81"}
   {"time "=>"12:46.0"}

#2


1  

Under the hood roo-xls gem uses the spreadsheet gem to parse the xls file. There was a similar issue to yours logged here, but it doesn't appear that there was any real resolution. Internally xls stores 16:45.81 as a Number and associates some formatting with it. I believe the issue has something to do with the spreadsheet gem not correctly handling the cell format.

在引擎盖下,roo-xls gem使用电子表格gem来解析xls文件。这里记录了类似的问题,但似乎没有任何真正的解决方案。内部xls将16:45.81存储为数字并将一些格式与其关联。我认为这个问题与电子表格gem没有正确处理单元格格式有关。

I did try messing around with adding a format mm:ss.0 by following this guide but I couldn't get it to work, maybe you'll have more luck.

我确实试图通过遵循本指南添加格式mm:ss.0,但我无法让它工作,也许你会有更多的运气。

#3


0  

You can use converters option. It seems looking like this:

您可以使用转换器选项。看起来像这样:

arr_of_arrs = CSV.parse(text, {converters: :date_time})

arr_of_arrs = CSV.parse(text,{converters :: date_time})

http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html

#4


0  

Your problem seems to be with the way you're parsing (reading) the input file.

您的问题似乎与您解析(读取)输入文件的方式有关。

roo parses only Excel 2007-2013 (.xlsx) files. From you question, you want to parse .xls, which is a different format.

roo仅解析Excel 2007-2013(.xlsx)文件。从你的问题,你想解析.xls,这是一种不同的格式。

Like the documentation says, use the roo-xls gem instead.

就像文档说的那样,请使用roo-xls gem。

#1


3  

You need to save the custom formula in a text format on the .xls side. If your opening the .xls file from the internet this won't work but this will fix your problem if you can manipulate the file. You can do this using the function =TEXT(A2, "mm:ss.0") A2 is just the cell I'm using as an example.

您需要在.xls端以文本格式保存自定义公式。如果您从互联网上打开.xls文件,这将无法正常工作,但如果您可以操作该文件,这将解决您的问题。您可以使用函数= TEXT(A2,“mm:ss.0”)执行此操作.A2就是我正在使用的单元格作为示例。

如何解析Excel文件,该文件将为我提供与视觉上完全相同的数据?

book = ::Roo::Spreadsheet.open(file_location)
puts book.cell('B', 2) 
=> '16.45.8' 

If manipulating the file is not an option you could just pass a custom converter to CSV.new() and convert the decimal time back to the correct format you need.

如果操作文件不是一个选项,您可以将自定义转换器传递给CSV.new()并将小数时间转换回您需要的正确格式。

require 'roo-xls'
require 'csv'

CSV::Converters[:time_parser] = lambda do |field, info| 
  case info[:header].strip
  when "time" then  begin 
                      # 0.011641319444444444 * 24 hours * 3600 seconds = 1005.81 
                      parse_time =  field.to_f * 24 * 3600
                      # 1005.81.divmod(60) = [16, 45.809999999999999945]
                      mm, ss = parse_time.divmod(60)
                      # returns "16:45.81"
                      time = "#{mm}:#{ss.round(2)}"  
                      time 
                    rescue
                      field 
                    end
  else 
    field  
  end
end

book = ::Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
csv = CSV.new(sheet.to_csv, headers: true, converters: [:time_parser]).map {|row| row.to_hash}
puts csv 
=> {"time "=>"16:45.81"}
   {"time "=>"12:46.0"}

#2


1  

Under the hood roo-xls gem uses the spreadsheet gem to parse the xls file. There was a similar issue to yours logged here, but it doesn't appear that there was any real resolution. Internally xls stores 16:45.81 as a Number and associates some formatting with it. I believe the issue has something to do with the spreadsheet gem not correctly handling the cell format.

在引擎盖下,roo-xls gem使用电子表格gem来解析xls文件。这里记录了类似的问题,但似乎没有任何真正的解决方案。内部xls将16:45.81存储为数字并将一些格式与其关联。我认为这个问题与电子表格gem没有正确处理单元格格式有关。

I did try messing around with adding a format mm:ss.0 by following this guide but I couldn't get it to work, maybe you'll have more luck.

我确实试图通过遵循本指南添加格式mm:ss.0,但我无法让它工作,也许你会有更多的运气。

#3


0  

You can use converters option. It seems looking like this:

您可以使用转换器选项。看起来像这样:

arr_of_arrs = CSV.parse(text, {converters: :date_time})

arr_of_arrs = CSV.parse(text,{converters :: date_time})

http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html

#4


0  

Your problem seems to be with the way you're parsing (reading) the input file.

您的问题似乎与您解析(读取)输入文件的方式有关。

roo parses only Excel 2007-2013 (.xlsx) files. From you question, you want to parse .xls, which is a different format.

roo仅解析Excel 2007-2013(.xlsx)文件。从你的问题,你想解析.xls,这是一种不同的格式。

Like the documentation says, use the roo-xls gem instead.

就像文档说的那样,请使用roo-xls gem。