I have a program using the spreadsheet gem to create a CSV file; I have not been able to find the way to configure the functionality that I need.
我有一个程序使用电子表格gem来创建一个CSV文件;我无法找到配置我需要的功能的方法。
This is what I would like the gem to do: The model number and additional_image
field should be "in sync", that is, each additional image written to the spreadsheet doc should be a new line and should not be wrapped.
这就是我想要宝石做的事情:模型编号和additional_image字段应该是“同步的”,也就是说,写入电子表格文档的每个附加图像应该是一个新行,不应该被包装。
Here are some snippets of the desired output in contrast with the current. These fields are defined by XPath objects that are screen scraped using another gem. The program won't know for sure how many objects it will encounter in the additional image field but due to business logic the number of objects in the additional image field should mirror the number of model number objects that are written to the spreadsheet.
以下是与当前相比所需输出的一些片段。这些字段由XPath对象定义,这些对象使用另一个gem进行屏幕抓取。程序将无法确定在附加图像字段中将遇到多少个对象,但由于业务逻辑,附加图像字段中的对象数量应该反映写入电子表格的模型编号对象的数量。
model
168868837a
168868837a
168868837a
168868837a
168868837a
168868837a
additional_image
1688688371.jpg
1688688372.jpg
1688688373.jpg
1688688374.jpg
1688688375.jpg
1688688376.jpg
This is the current code:
这是当前的代码:
require "capybara/dsl"
require "spreadsheet"
require "fileutils"
require "open-uri"
LOCAL_DIR = 'data-hold/images'
FileUtils.makedirs(LOCAL_DIR) unless File.exists?LOCAL_DIR
Capybara.run_server = false
Capybara.default_driver = :selenium
Capybara.default_selector = :xpath
Spreadsheet.client_encoding = 'UTF-8'
class Tomtop
include Capybara::DSL
def initialize
@excel = Spreadsheet::Workbook.new
@work_list = @excel.create_worksheet
@row = 0
end
def go
visit_main_link
end
def retryable(options = {}, &block)
opts = { :tries => 1, :on => Exception }.merge(options)
retry_exception, retries = opts[:on], opts[:tries]
begin
return yield
rescue retry_exception
retry if (retries -= 1) > 0
end
yield
end
def visit_main_link
retryable(:tries => 1, :on => OpenURI::HTTPError) do
visit "http://www.example.com/clothing-accessories?dir=asc&limit=72&order=position"
results = all("//h5/a[contains(@onclick, 'analyticsLog')]")
item = []
results.each do |a|
item << a[:href]
end
item.each do |link|
visit link
save_item
end
@excel.write "inventory.csv"
end
end
def save_item
data = all("//*[@id='content-wrapper']/div[2]/div/div")
data.each do |info|
@work_list[@row, 0] = info.find("//*[@id='productright']/div/div[1]/h1").text
price = info.first("//div[contains(@class, 'price font left')]")
@work_list[@row, 1] = (price.text.to_f * 1.33).round(2) if price
@work_list[@row, 2] = info.find("//*[@id='productright']/div/div[11]").text
@work_list[@row, 3] = info.find("//*[@id='tabcontent1']/div/div").text.strip
color = info.all("//dd[1]//select[contains(@name, 'options')]//*[@price='0']")
@work_list[@row, 4] = color.collect(&:text).join(', ')
size = info.all("//dd[2]//select[contains(@name, 'options')]//*[@price='0']")
@work_list[@row, 5] = size.collect(&:text).join(', ')
model = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
@work_list[@row, 6] = model.gsub!(/\D/, "")
@work_list[@row, 7] = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
additional_image = info.all("//*[@rel='lightbox[rotation]']")
@work_list[@row, 8] = additional_image.map { |link| File.basename(link['href']) }.join(', ')
images = imagelink.map { |link| link['href'] }
images.each do |image|
File.open(File.basename("#{LOCAL_DIR}/#{image}"), 'w') do |f|
f.write(open(image).read)
end
end
@row = @row + 1
end
end
end
tomtop = Tomtop.new
tomtop.go
I would like this to do two things that I'm not sure how to do:
我希望这可以做两件我不确定该怎么做的事情:
- Each additional image should print to a new line (currently it prints all in one cell).
- I would like the model field to be duplicated exactly as many times as there are
additional_images
in the same new line manner.
每个附加图像应打印到一个新行(目前它在一个单元格中打印)。
我希望模型字段与同一新行方式中的additional_images一样多次重复。
1 个解决方案
#1
1
Use the CSV gem. I took the long way of writing this so you can see how it works.
使用CSV gem。我花了很长时间写这篇文章,这样你就可以看到它是如何工作的。
require 'csv'
DOC = "file.csv"
profile = []
profile[0] = "model"
CSV.open(DOC, "a") do |me|
me << profile
end
img_url = ['pic_1.jpg','pic_2.jpg','pic_3.jpg','pic_4.jpg','pic_5.jpg','pic_6.jpg']
a = 0
b = img_url.length
while a < b
profile = []
profile[0] = img_url[a]
CSV.open(DOC, "a") do |me|
me << profile
end
a += 1
end
The csv file should look like this
csv文件应如下所示
model
pic_1.jpg
pic_2.jpg
pic_3.jpg
pic_4.jpg
pic_5.jpg
pic_6.jpg
for your last question
你的上一个问题
whatever = []
whatever = temp[1] + " " + temp[2]
profile[x] = whatever
OR
profile[x] = temp[1] + " " + temp[2]
NIL error in array
数组中的NIL错误
if temp[2] == nil
profile[x] = temp[1]
else
profile[x] = temp[1] + " " + temp[2]
end
#1
1
Use the CSV gem. I took the long way of writing this so you can see how it works.
使用CSV gem。我花了很长时间写这篇文章,这样你就可以看到它是如何工作的。
require 'csv'
DOC = "file.csv"
profile = []
profile[0] = "model"
CSV.open(DOC, "a") do |me|
me << profile
end
img_url = ['pic_1.jpg','pic_2.jpg','pic_3.jpg','pic_4.jpg','pic_5.jpg','pic_6.jpg']
a = 0
b = img_url.length
while a < b
profile = []
profile[0] = img_url[a]
CSV.open(DOC, "a") do |me|
me << profile
end
a += 1
end
The csv file should look like this
csv文件应如下所示
model
pic_1.jpg
pic_2.jpg
pic_3.jpg
pic_4.jpg
pic_5.jpg
pic_6.jpg
for your last question
你的上一个问题
whatever = []
whatever = temp[1] + " " + temp[2]
profile[x] = whatever
OR
profile[x] = temp[1] + " " + temp[2]
NIL error in array
数组中的NIL错误
if temp[2] == nil
profile[x] = temp[1]
else
profile[x] = temp[1] + " " + temp[2]
end