将数据从CSV文件导入Amazon Web Services RDS MySQL数据库

时间:2021-10-27 23:06:57

I have created a Relational Database (MySQL) hosted on Amazon Web Services. What I would like to do next is, import the data in my local CSV files into this database. I would really appreciate if someone provides me an outline on how to go about it.Thanks!

我创建了一个托管在Amazon Web Services上的关系数据库(MySQL)。接下来我要做的是,将本地CSV文件中的数据导入到此数据库中。如果有人向我提供如何去做的大纲,我将非常感激。谢谢!

3 个解决方案

#1


1  

This is easiest and most hands-off by using MySQL command line. For large loads, consider spinning up a new EC2 instance, installing MySQL CL tools, and transferring your file to that machine. Then, after connecting to your database via CL, you'd do something like:

这是使用MySQL命令行最容易和最轻松的事情。对于大型负载,请考虑启动新的EC2实例,安装MySQL CL工具以及将文件传输到该计算机。然后,在通过CL连接到您的数据库之后,您将执行以下操作:

mysql> LOAD DATA LOCAL INFILE 'C:/upload.csv' INTO TABLE myTable;

Also options to match your file's details and ignore header (plenty more in the docs)

还可以选择匹配文件的详细信息并忽略标题(在文档中有更多内容)

mysql> LOAD DATA LOCAL INFILE 'C:/upload.csv' INTO TABLE myTable FIELDS TERMINATED BY ','
ENCLOSED BY '"' IGNORE 1 LINES;

If you're hesitant to use CL, download MySQL Workbench. It connects no prob to AWS RDS.

如果您对使用CL犹豫不决,请下载MySQL Workbench。它不会将任何问题连接到AWS RDS。

Closing thoughts:

结束思路:

  • MySQL LOAD DATA Docs
  • MySQL LOAD DATA文档
  • AWS' Aurora RDS is MySQL-compatible so command works there too
  • AWS的Aurora RDS与MySQL兼容,因此命令也适用于此
  • "LOCAL" flag actually transfers the file from your client machine (where you're running the command) to the DB server. Without LOCAL, the file must be on the DB server (not possible to transfer it there in advance with RDS)
  • “LOCAL”标志实际上将文件从客户端计算机(运行命令的位置)传输到数据库服务器。如果没有LOCAL,文件必须位于数据库服务器上(不能使用RDS事先将其传输到那里)
  • Works great on huge files too! Just sent a 8.2GB file via this method (260 million rows). Took just over 10 hours from a t2-medium EC2 to db.t2.small Aurora
  • 在大文件上也很棒!刚刚通过这种方法发送了8.2GB的文件(2.6亿行)。从t2-medium EC2到db.t2.small Aurora只花了10多个小时
  • Not a solution if you need to watch out for unique keys or read the CSV row-by-row and change the data before inserting/updating
  • 如果您需要注意唯一键或逐行读取CSV并在插入/更新之前更改数据,则不是解决方案

#2


0  

I think your best bet would be to develop a script in your language of choice to connect to the database and import it.

我认为您最好的选择是用您选择的语言开发一个脚本来连接数据库并导入它。

If your database is internet accessible then you can run that script locally. If it is in a private subnet then you can either run that script on an EC2 instance with access to the private subnet or on lambda connected to your VPC. You should really only use lambda if you expect runtime to be less than 5 minutes or so.

如果您的数据库可以访问Internet,则可以在本地运行该脚本。如果它位于私有子网中,那么您可以在可以访问私有子网的EC2实例上运行该脚本,也可以在连接到VPC的lambda上运行该脚本。如果您希望运行时间少于5分钟,那么您应该只使用lambda。

Edit: Note that lambda only supports a handful of languages

编辑:请注意,lambda仅支持少数几种语言

AWS Lambda supports code written in Node.js (JavaScript), Python, Java (Java 8 compatible), and C# (.NET Core).

AWS Lambda支持用Node.js(JavaScript),Python,Java(Java 8兼容)和C#(.NET Core)编写的代码。

#3


0  

I did some digging and found this official AWS documentation on how to import data from any source to MySQL hosted on RDS.

我做了一些挖掘,发现了这个官方AWS文档,介绍了如何将数据从任何源导入到RDS上托管的MySQL。

It is a very detailed step by step guide and icludes an explanation on how to import CSV files.

这是一个非常详细的分步指南,其中包含有关如何导入CSV文件的说明。

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.AnySource.html

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.AnySource.html

Basically, each table must have its own file. Data for multiple tables cannot be combined in the same file. Give each file the same name as the table it corresponds to. The file extension can be anything you like. For example, if the table name is "sales", the file name could be "sales.csv" or "sales.txt", but not "sales_01.csv".

基本上,每个表必须有自己的文件。多个表的数据不能组合在同一个文件中。为每个文件指定与其对应的表相同的名称。文件扩展名可以是您喜欢的任何内容。例如,如果表名是“sales”,则文件名可以是“sales.csv”或“sales.txt”,但不能是“sales_01.csv”。

Whenever possible, order the data by the primary key of the table being loaded. This drastically improves load times and minimizes disk storage requirements.

只要有可能,请通过正在加载的表的主键对数据进行排序。这大大缩短了加载时间并最大限度地减少了磁盘存储需求。

There is another option to import data to MySQL database, you can use an external tool Alooma that can do the data import for you in real time.

将数据导入MySQL数据库还有另一种选择,您可以使用外部工具Alooma,可以实时为您进行数据导入。

#1


1  

This is easiest and most hands-off by using MySQL command line. For large loads, consider spinning up a new EC2 instance, installing MySQL CL tools, and transferring your file to that machine. Then, after connecting to your database via CL, you'd do something like:

这是使用MySQL命令行最容易和最轻松的事情。对于大型负载,请考虑启动新的EC2实例,安装MySQL CL工具以及将文件传输到该计算机。然后,在通过CL连接到您的数据库之后,您将执行以下操作:

mysql> LOAD DATA LOCAL INFILE 'C:/upload.csv' INTO TABLE myTable;

Also options to match your file's details and ignore header (plenty more in the docs)

还可以选择匹配文件的详细信息并忽略标题(在文档中有更多内容)

mysql> LOAD DATA LOCAL INFILE 'C:/upload.csv' INTO TABLE myTable FIELDS TERMINATED BY ','
ENCLOSED BY '"' IGNORE 1 LINES;

If you're hesitant to use CL, download MySQL Workbench. It connects no prob to AWS RDS.

如果您对使用CL犹豫不决,请下载MySQL Workbench。它不会将任何问题连接到AWS RDS。

Closing thoughts:

结束思路:

  • MySQL LOAD DATA Docs
  • MySQL LOAD DATA文档
  • AWS' Aurora RDS is MySQL-compatible so command works there too
  • AWS的Aurora RDS与MySQL兼容,因此命令也适用于此
  • "LOCAL" flag actually transfers the file from your client machine (where you're running the command) to the DB server. Without LOCAL, the file must be on the DB server (not possible to transfer it there in advance with RDS)
  • “LOCAL”标志实际上将文件从客户端计算机(运行命令的位置)传输到数据库服务器。如果没有LOCAL,文件必须位于数据库服务器上(不能使用RDS事先将其传输到那里)
  • Works great on huge files too! Just sent a 8.2GB file via this method (260 million rows). Took just over 10 hours from a t2-medium EC2 to db.t2.small Aurora
  • 在大文件上也很棒!刚刚通过这种方法发送了8.2GB的文件(2.6亿行)。从t2-medium EC2到db.t2.small Aurora只花了10多个小时
  • Not a solution if you need to watch out for unique keys or read the CSV row-by-row and change the data before inserting/updating
  • 如果您需要注意唯一键或逐行读取CSV并在插入/更新之前更改数据,则不是解决方案

#2


0  

I think your best bet would be to develop a script in your language of choice to connect to the database and import it.

我认为您最好的选择是用您选择的语言开发一个脚本来连接数据库并导入它。

If your database is internet accessible then you can run that script locally. If it is in a private subnet then you can either run that script on an EC2 instance with access to the private subnet or on lambda connected to your VPC. You should really only use lambda if you expect runtime to be less than 5 minutes or so.

如果您的数据库可以访问Internet,则可以在本地运行该脚本。如果它位于私有子网中,那么您可以在可以访问私有子网的EC2实例上运行该脚本,也可以在连接到VPC的lambda上运行该脚本。如果您希望运行时间少于5分钟,那么您应该只使用lambda。

Edit: Note that lambda only supports a handful of languages

编辑:请注意,lambda仅支持少数几种语言

AWS Lambda supports code written in Node.js (JavaScript), Python, Java (Java 8 compatible), and C# (.NET Core).

AWS Lambda支持用Node.js(JavaScript),Python,Java(Java 8兼容)和C#(.NET Core)编写的代码。

#3


0  

I did some digging and found this official AWS documentation on how to import data from any source to MySQL hosted on RDS.

我做了一些挖掘,发现了这个官方AWS文档,介绍了如何将数据从任何源导入到RDS上托管的MySQL。

It is a very detailed step by step guide and icludes an explanation on how to import CSV files.

这是一个非常详细的分步指南,其中包含有关如何导入CSV文件的说明。

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.AnySource.html

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.AnySource.html

Basically, each table must have its own file. Data for multiple tables cannot be combined in the same file. Give each file the same name as the table it corresponds to. The file extension can be anything you like. For example, if the table name is "sales", the file name could be "sales.csv" or "sales.txt", but not "sales_01.csv".

基本上,每个表必须有自己的文件。多个表的数据不能组合在同一个文件中。为每个文件指定与其对应的表相同的名称。文件扩展名可以是您喜欢的任何内容。例如,如果表名是“sales”,则文件名可以是“sales.csv”或“sales.txt”,但不能是“sales_01.csv”。

Whenever possible, order the data by the primary key of the table being loaded. This drastically improves load times and minimizes disk storage requirements.

只要有可能,请通过正在加载的表的主键对数据进行排序。这大大缩短了加载时间并最大限度地减少了磁盘存储需求。

There is another option to import data to MySQL database, you can use an external tool Alooma that can do the data import for you in real time.

将数据导入MySQL数据库还有另一种选择,您可以使用外部工具Alooma,可以实时为您进行数据导入。