如何自定义电子表格gem / output？

I have a program using the spreadsheet gem to create a CSV file; I have not been able to find the way to configure the functionality that I need.

我有一个程序使用电子表格gem来创建一个CSV文件;我无法找到配置我需要的功能的方法。

This is what I would like the gem to do: The model number and additional_image field should be "in sync", that is, each additional image written to the spreadsheet doc should be a new line and should not be wrapped.

这就是我想要宝石做的事情:模型编号和additional_image字段应该是“同步的”,也就是说,写入电子表格文档的每个附加图像应该是一个新行,不应该被包装。

Here are some snippets of the desired output in contrast with the current. These fields are defined by XPath objects that are screen scraped using another gem. The program won't know for sure how many objects it will encounter in the additional image field but due to business logic the number of objects in the additional image field should mirror the number of model number objects that are written to the spreadsheet.

以下是与当前相比所需输出的一些片段。这些字段由XPath对象定义,这些对象使用另一个gem进行屏幕抓取。程序将无法确定在附加图像字段中将遇到多少个对象,但由于业务逻辑,附加图像字段中的对象数量应该反映写入电子表格的模型编号对象的数量。

model
168868837a
168868837a
168868837a
168868837a
168868837a 
168868837a 

additional_image
1688688371.jpg
1688688372.jpg
1688688373.jpg
1688688374.jpg
1688688375.jpg
1688688376.jpg

This is the current code:

这是当前的代码:

require "capybara/dsl"
require "spreadsheet"
require "fileutils"
require "open-uri"

LOCAL_DIR = 'data-hold/images'

 FileUtils.makedirs(LOCAL_DIR) unless File.exists?LOCAL_DIR
 Capybara.run_server = false
 Capybara.default_driver = :selenium
 Capybara.default_selector = :xpath
 Spreadsheet.client_encoding = 'UTF-8'

 class Tomtop
   include Capybara::DSL

   def initialize
     @excel = Spreadsheet::Workbook.new
     @work_list = @excel.create_worksheet
     @row = 0
   end

   def go
     visit_main_link
   end

   def retryable(options = {}, &block)
      opts = { :tries => 1, :on => Exception }.merge(options)

      retry_exception, retries = opts[:on], opts[:tries]

      begin
        return yield
      rescue retry_exception
        retry if (retries -= 1) > 0
      end

      yield
    end

   def visit_main_link
     retryable(:tries => 1, :on => OpenURI::HTTPError) do
     visit "http://www.example.com/clothing-accessories?dir=asc&limit=72&order=position"
     results = all("//h5/a[contains(@onclick, 'analyticsLog')]")
     item = []

     results.each do |a|
       item << a[:href]
     end
     item.each do |link|
          visit link
          save_item
      end
     @excel.write "inventory.csv"
    end

   end

    def save_item
      data = all("//*[@id='content-wrapper']/div[2]/div/div")
      data.each do |info|
        @work_list[@row, 0] = info.find("//*[@id='productright']/div/div[1]/h1").text
        price = info.first("//div[contains(@class, 'price font left')]")
        @work_list[@row, 1] = (price.text.to_f * 1.33).round(2) if price
        @work_list[@row, 2] = info.find("//*[@id='productright']/div/div[11]").text
        @work_list[@row, 3] = info.find("//*[@id='tabcontent1']/div/div").text.strip
        color = info.all("//dd[1]//select[contains(@name, 'options')]//*[@price='0']")
        @work_list[@row, 4] = color.collect(&:text).join(', ')
        size = info.all("//dd[2]//select[contains(@name, 'options')]//*[@price='0']")
        @work_list[@row, 5] = size.collect(&:text).join(', ')
        model = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
        @work_list[@row, 6] = model.gsub!(/\D/, "")
        @work_list[@row, 7] = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
        additional_image = info.all("//*[@rel='lightbox[rotation]']")
        @work_list[@row, 8] = additional_image.map { |link| File.basename(link['href']) }.join(', ')  
        images = imagelink.map { |link| link['href'] }
        images.each do |image|
          File.open(File.basename("#{LOCAL_DIR}/#{image}"), 'w') do |f|
            f.write(open(image).read)
         end

       end
       @row = @row + 1
     end

   end

 end


 tomtop = Tomtop.new
 tomtop.go

I would like this to do two things that I'm not sure how to do:

我希望这可以做两件我不确定该怎么做的事情:

Each additional image should print to a new line (currently it prints all in one cell).

每个附加图像应打印到一个新行(目前它在一个单元格中打印)。

I would like the model field to be duplicated exactly as many times as there are additional_images in the same new line manner.

我希望模型字段与同一新行方式中的additional_images一样多次重复。

1 个解决方案

#1

Use the CSV gem. I took the long way of writing this so you can see how it works.

使用CSV gem。我花了很长时间写这篇文章,这样你就可以看到它是如何工作的。

require 'csv'

DOC = "file.csv"
profile = []
profile[0] = "model"

CSV.open(DOC, "a") do |me|
me << profile
end 


img_url = ['pic_1.jpg','pic_2.jpg','pic_3.jpg','pic_4.jpg','pic_5.jpg','pic_6.jpg']

a = 0
b = img_url.length
while a < b
 profile = []
 profile[0] = img_url[a]

 CSV.open(DOC, "a") do |me|
 me << profile    
 end

 a += 1
end

The csv file should look like this

csv文件应如下所示

model
pic_1.jpg
pic_2.jpg
pic_3.jpg
pic_4.jpg
pic_5.jpg
pic_6.jpg

for your last question

你的上一个问题

whatever = []
whatever = temp[1] + " " + temp[2]
profile[x] = whatever

profile[x] = temp[1] + " " + temp[2]

NIL error in array

数组中的NIL错误

if temp[2] == nil 
 profile[x] = temp[1]
else 
 profile[x] = temp[1] + " " + temp[2]
end

#1