Ruby on Rails -从CSV文件导入数据

时间:2022-04-19 14:05:03

I would like to import data from a CSV file into an existing database table. I do not want to save the CSV file, just take the data from it and put it into the existing table. I am using Ruby 1.9.2 and Rails 3.

我想从CSV文件中导入数据到现有的数据库表中。我不想保存CSV文件,只是从其中获取数据并将其放入现有的表中。我正在使用Ruby 1.9.2和Rails 3。

This is my table:

这是我的表:

create_table "mouldings", :force => true do |t|
  t.string   "suppliers_code"
  t.datetime "created_at"
  t.datetime "updated_at"
  t.string   "name"
  t.integer  "supplier_id"
  t.decimal  "length",         :precision => 3, :scale => 2
  t.decimal  "cost",           :precision => 4, :scale => 2
  t.integer  "width"
  t.integer  "depth"
end

Can you give me some code to show me the best way to do this, thanks.

你能给我一些代码告诉我最好的方法吗,谢谢。

11 个解决方案

#1


310  

require 'csv'    

csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
  Moulding.create!(row.to_hash)
end

#2


172  

Simpler version of yfeldblum's answer, that is simpler and works well also with large files:

yfeldblum回答的简单版本,更简单,也适用于大型文件:

require 'csv'    

CSV.foreach(filename, :headers => true) do |row|
  Moulding.create!(row.to_hash)
end

No need for with_indifferent_access or symbolize_keys, and no need to read in the file to a string first.

不需要with_indifference _access或symbol ze_keys,也不需要先将文件读入一个字符串。

It doesnt't keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.

它不会一次将整个文件保存在内存中,而是逐行读取并创建每行的模压。

#3


10  

The smarter_csv gem was specifically created for this use-case: to read data from CSV file and quickly create database entries.

smarter_csv gem是专门为这个用例创建的:从CSV文件中读取数据并快速创建数据库条目。

  require 'smarter_csv'
  options = {}
  SmarterCSV.process('input_file.csv', options) do |chunk|
    chunk.each do |data_hash|
      Moulding.create!( data_hash )
    end
  end

You can use the option chunk_size to read N csv-rows at a time, and then use Resque in the inner loop to generate jobs which will create the new records, rather than creating them right away - this way you can spread the load of generating entries to multiple workers.

您可以使用选项chunk_size一次读取N个csv行,然后在内部循环中使用Resque来生成将创建新记录的作业,而不是立即创建它们——这样您就可以将生成条目的负载分散到多个worker上。

See also: https://github.com/tilo/smarter_csv

参见:https://github.com/tilo/smarter_csv

#4


4  

This can help. It has code examples too:

这可以帮助。它也有代码示例:

http://csv-mapper.rubyforge.org/

http://csv-mapper.rubyforge.org/

Or for a rake task for doing the same:

或者用耙子任务做同样的事情:

http://erikonrails.snowedin.net/?p=212

http://erikonrails.snowedin.net/?p=212

#5


4  

You might try Upsert:

你也可以尝试插入:

require 'upsert' # add this to your Gemfile
require 'csv'    

u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
  selector = { name: row['name'] } # this treats "name" as the primary key and prevents the creation of duplicates by name
  setter = row.to_hash
  u.row selector, setter
end

If this is what you want, you might also consider getting rid of the auto-increment primary key from the table and setting the primary key to name. Alternatively, if there is some combination of attributes that form a primary key, use that as the selector. No index is necessary, it will just make it faster.

如果这是您想要的,您还可以考虑从表中删除自动递增主键,并将主键设置为name。另外,如果有一些组成主键的属性组合,则将其用作选择器。不需要索引,它只会使它更快。

#6


1  

It is better to wrap the database related process inside a transaction block. Code snippet blow is a full process of seeding a set of languages to Language model,

最好在事务块中封装数据库相关的进程。代码片段blow是将一组语言移植到语言模型的完整过程,

require 'csv'

namespace :lan do
  desc 'Seed initial languages data with language & code'
  task init_data: :environment do
    puts '>>> Initializing Languages Data Table'
    ActiveRecord::Base.transaction do
      csv_path = File.expand_path('languages.csv', File.dirname(__FILE__))
      csv_str = File.read(csv_path)
      csv = CSV.new(csv_str).to_a
      csv.each do |lan_set|
        lan_code = lan_set[0]
        lan_str = lan_set[1]
        Language.create!(language: lan_str, code: lan_code)
        print '.'
      end
    end
    puts ''
    puts '>>> Languages Database Table Initialization Completed'
  end
end

Snippet below is a partial of languages.csv file,

下面的代码片段是部分语言。csv文件,

aa,Afar
ab,Abkhazian
af,Afrikaans
ak,Akan
am,Amharic
ar,Arabic
as,Assamese
ay,Aymara
az,Azerbaijani
ba,Bashkir
...

#7


0  

Use this gem: https://rubygems.org/gems/active_record_importer

使用这个宝石:https://rubygems.org/gems/active_record_importer

class Moulding < ActiveRecord::Base
  acts_as_importable
end

Then you may now use:

那么您现在可以使用:

Moulding.import!(file: File.open(PATH_TO_FILE))

Just be sure to that your headers match the column names of your table

只要确保您的标题与表的列名匹配即可

#8


0  

The better way is to include it in a rake task. Create import.rake file inside /lib/tasks/ and put this code to that file.

更好的方法是将它包含到rake任务中。创建导入。在/lib/tasks/中使用rake文件,并将此代码放到该文件中。

desc "Imports a CSV file into an ActiveRecord table"
task :csv_model_import, [:filename, :model] => [:environment] do |task,args|
  lines = File.new(args[:filename], "r:ISO-8859-1").readlines
  header = lines.shift.strip
  keys = header.split(',')
  lines.each do |line|
    values = line.strip.split(',')
    attributes = Hash[keys.zip values]
    Module.const_get(args[:model]).create(attributes)
  end
end

After that run this command in your terminal rake csv_model_import[file.csv,Name_of_the_Model]

然后在终端rake csv_model_import[file.csv,Name_of_the_Model]中运行这个命令

#9


0  

I know it's old question but it still in first 10 links in google.

我知道这是个老问题,但它仍然在谷歌的前10个链接中。

It is not very efficient to save rows one-by-one because it cause database call in the loop and you better avoid that, especially when you need to insert huge portions of data.

将行逐个保存不是很有效,因为这会导致循环中的数据库调用,您最好避免这种情况,尤其是当您需要插入大量数据时。

It's better (and significantly faster) to use batch insert.

使用批处理插入更好(而且更快)。

INSERT INTO `mouldings` (suppliers_code, name, cost)
VALUES
    ('s1', 'supplier1', 1.111), 
    ('s2', 'supplier2', '2.222')

You can build such a query manually and than do Model.connection.execute(RAW SQL STRING) (not recomended) or use gem activerecord-import (it was first released on 11 Aug 2010) in this case just put data in array rows and call Model.import rows

您可以手工构建这样的查询,而不是使用Model.connection。执行(原始SQL字符串)(不recomended)或使用gem activerecord-import(它在2010年8月11日首次发布),在这种情况下,只需将数据放入数组行并调用模型。进口的行

refer to gem docs for details

详情请参阅gem文档

#10


-2  

It's better to use CSV::Table and use String.encode(universal_newline: true). It converting CRLF and CR to LF

最好使用CSV:::Table和String。编码(universal_newline:真)。它将CRLF和CR转换为LF

#11


-3  

If you want to Use SmartCSV

如果你想使用SmartCSV

all_data = SmarterCSV.process(
             params[:file].tempfile, 
             { 
               :col_sep => "\t", 
               :row_sep => "\n" 
             }
           )

This represents tab delimited data in each row "\t" with rows separated by new lines "\n"

这表示每行“\t”中的带标签分隔数据,行之间用新行“\n”分隔

#1


310  

require 'csv'    

csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
  Moulding.create!(row.to_hash)
end

#2


172  

Simpler version of yfeldblum's answer, that is simpler and works well also with large files:

yfeldblum回答的简单版本,更简单,也适用于大型文件:

require 'csv'    

CSV.foreach(filename, :headers => true) do |row|
  Moulding.create!(row.to_hash)
end

No need for with_indifferent_access or symbolize_keys, and no need to read in the file to a string first.

不需要with_indifference _access或symbol ze_keys,也不需要先将文件读入一个字符串。

It doesnt't keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.

它不会一次将整个文件保存在内存中,而是逐行读取并创建每行的模压。

#3


10  

The smarter_csv gem was specifically created for this use-case: to read data from CSV file and quickly create database entries.

smarter_csv gem是专门为这个用例创建的:从CSV文件中读取数据并快速创建数据库条目。

  require 'smarter_csv'
  options = {}
  SmarterCSV.process('input_file.csv', options) do |chunk|
    chunk.each do |data_hash|
      Moulding.create!( data_hash )
    end
  end

You can use the option chunk_size to read N csv-rows at a time, and then use Resque in the inner loop to generate jobs which will create the new records, rather than creating them right away - this way you can spread the load of generating entries to multiple workers.

您可以使用选项chunk_size一次读取N个csv行,然后在内部循环中使用Resque来生成将创建新记录的作业,而不是立即创建它们——这样您就可以将生成条目的负载分散到多个worker上。

See also: https://github.com/tilo/smarter_csv

参见:https://github.com/tilo/smarter_csv

#4


4  

This can help. It has code examples too:

这可以帮助。它也有代码示例:

http://csv-mapper.rubyforge.org/

http://csv-mapper.rubyforge.org/

Or for a rake task for doing the same:

或者用耙子任务做同样的事情:

http://erikonrails.snowedin.net/?p=212

http://erikonrails.snowedin.net/?p=212

#5


4  

You might try Upsert:

你也可以尝试插入:

require 'upsert' # add this to your Gemfile
require 'csv'    

u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
  selector = { name: row['name'] } # this treats "name" as the primary key and prevents the creation of duplicates by name
  setter = row.to_hash
  u.row selector, setter
end

If this is what you want, you might also consider getting rid of the auto-increment primary key from the table and setting the primary key to name. Alternatively, if there is some combination of attributes that form a primary key, use that as the selector. No index is necessary, it will just make it faster.

如果这是您想要的,您还可以考虑从表中删除自动递增主键,并将主键设置为name。另外,如果有一些组成主键的属性组合,则将其用作选择器。不需要索引,它只会使它更快。

#6


1  

It is better to wrap the database related process inside a transaction block. Code snippet blow is a full process of seeding a set of languages to Language model,

最好在事务块中封装数据库相关的进程。代码片段blow是将一组语言移植到语言模型的完整过程,

require 'csv'

namespace :lan do
  desc 'Seed initial languages data with language & code'
  task init_data: :environment do
    puts '>>> Initializing Languages Data Table'
    ActiveRecord::Base.transaction do
      csv_path = File.expand_path('languages.csv', File.dirname(__FILE__))
      csv_str = File.read(csv_path)
      csv = CSV.new(csv_str).to_a
      csv.each do |lan_set|
        lan_code = lan_set[0]
        lan_str = lan_set[1]
        Language.create!(language: lan_str, code: lan_code)
        print '.'
      end
    end
    puts ''
    puts '>>> Languages Database Table Initialization Completed'
  end
end

Snippet below is a partial of languages.csv file,

下面的代码片段是部分语言。csv文件,

aa,Afar
ab,Abkhazian
af,Afrikaans
ak,Akan
am,Amharic
ar,Arabic
as,Assamese
ay,Aymara
az,Azerbaijani
ba,Bashkir
...

#7


0  

Use this gem: https://rubygems.org/gems/active_record_importer

使用这个宝石:https://rubygems.org/gems/active_record_importer

class Moulding < ActiveRecord::Base
  acts_as_importable
end

Then you may now use:

那么您现在可以使用:

Moulding.import!(file: File.open(PATH_TO_FILE))

Just be sure to that your headers match the column names of your table

只要确保您的标题与表的列名匹配即可

#8


0  

The better way is to include it in a rake task. Create import.rake file inside /lib/tasks/ and put this code to that file.

更好的方法是将它包含到rake任务中。创建导入。在/lib/tasks/中使用rake文件,并将此代码放到该文件中。

desc "Imports a CSV file into an ActiveRecord table"
task :csv_model_import, [:filename, :model] => [:environment] do |task,args|
  lines = File.new(args[:filename], "r:ISO-8859-1").readlines
  header = lines.shift.strip
  keys = header.split(',')
  lines.each do |line|
    values = line.strip.split(',')
    attributes = Hash[keys.zip values]
    Module.const_get(args[:model]).create(attributes)
  end
end

After that run this command in your terminal rake csv_model_import[file.csv,Name_of_the_Model]

然后在终端rake csv_model_import[file.csv,Name_of_the_Model]中运行这个命令

#9


0  

I know it's old question but it still in first 10 links in google.

我知道这是个老问题,但它仍然在谷歌的前10个链接中。

It is not very efficient to save rows one-by-one because it cause database call in the loop and you better avoid that, especially when you need to insert huge portions of data.

将行逐个保存不是很有效,因为这会导致循环中的数据库调用,您最好避免这种情况,尤其是当您需要插入大量数据时。

It's better (and significantly faster) to use batch insert.

使用批处理插入更好(而且更快)。

INSERT INTO `mouldings` (suppliers_code, name, cost)
VALUES
    ('s1', 'supplier1', 1.111), 
    ('s2', 'supplier2', '2.222')

You can build such a query manually and than do Model.connection.execute(RAW SQL STRING) (not recomended) or use gem activerecord-import (it was first released on 11 Aug 2010) in this case just put data in array rows and call Model.import rows

您可以手工构建这样的查询,而不是使用Model.connection。执行(原始SQL字符串)(不recomended)或使用gem activerecord-import(它在2010年8月11日首次发布),在这种情况下,只需将数据放入数组行并调用模型。进口的行

refer to gem docs for details

详情请参阅gem文档

#10


-2  

It's better to use CSV::Table and use String.encode(universal_newline: true). It converting CRLF and CR to LF

最好使用CSV:::Table和String。编码(universal_newline:真)。它将CRLF和CR转换为LF

#11


-3  

If you want to Use SmartCSV

如果你想使用SmartCSV

all_data = SmarterCSV.process(
             params[:file].tempfile, 
             { 
               :col_sep => "\t", 
               :row_sep => "\n" 
             }
           )

This represents tab delimited data in each row "\t" with rows separated by new lines "\n"

这表示每行“\t”中的带标签分隔数据,行之间用新行“\n”分隔