In Amazon Redshift's Getting Started Guide, data is pulled from Amazon S3 and loaded into an Amazon Redshift Cluster utilizing SQLWorkbench/J. I'd like to mimic the same process of connecting to the cluster and loading sample data into the cluster utilizing Boto3.
在Amazon Redshift的入门指南中,数据从Amazon S3中提取并使用SQLWorkbench / J加载到Amazon Redshift群集中。我想模仿连接到集群的相同过程,并使用Boto3将样本数据加载到集群中。
However in Boto3's documentation of Redshift, I'm unable to find a method that would allow me to upload data into Amazon Redshift cluster.
但是在Boto3的Redshift文档中,我无法找到允许我将数据上传到Amazon Redshift集群的方法。
I've been able to connect with Redshift utilizing Boto3 with the following code:
我已经能够使用Boto3与Redshift连接,代码如下:
client = boto3.client('redshift')
But I'm not sure what method would allow me to either create tables or upload data to Amazon Redshift the way it's done in the tutorial with SQLWorkbenchJ.
但是我不确定哪种方法可以让我创建表或将数据上传到Amazon Redshift,就像在SQLWorkbenchJ教程中完成的那样。
2 个解决方案
#1
5
Go back to step 4 in that tutorial you linked. See where it shows you how to get the URL of the cluster? You have to connect to that URL with a PostgreSQL driver. The AWS SDKs such as Boto3 provide access to the AWS API. You need to connect to Redshift over a PostgreSQL API, just like you would connect to a PostgreSQL database on RDS.
在您链接的教程中返回步骤4。查看它向您展示如何获取群集的URL?您必须使用PostgreSQL驱动程序连接到该URL。 AWS软件开发工具包(如Boto3)提供对AWS API的访问。您需要通过PostgreSQL API连接到Redshift,就像您将连接到RDS上的PostgreSQL数据库一样。
#2
12
Right, you need psycopg2
Python module to execute COPY command.
对,你需要psycopg2 Python模块来执行COPY命令。
My code looks like this:
我的代码如下所示:
import psycopg2
#Amazon Redshift connect string
conn_string = "dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"
#connect to Redshift (database should be open to the world)
con = psycopg2.connect(conn_string);
sql="""COPY %s FROM '%s' credentials
'aws_access_key_id=%s; aws_secret_access_key=%s'
delimiter '%s' FORMAT CSV %s %s; commit;""" %
(to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,delim,quote,gzip)
#Here
# fn - s3://path_to__input_file.gz
# gzip = 'gzip'
cur = con.cursor()
cur.execute(sql)
con.close()
I used boto3/psycopg2 to write CSV_Loader_For_Redshift
我使用boto3 / psycopg2来编写CSV_Loader_For_Redshift
#1
5
Go back to step 4 in that tutorial you linked. See where it shows you how to get the URL of the cluster? You have to connect to that URL with a PostgreSQL driver. The AWS SDKs such as Boto3 provide access to the AWS API. You need to connect to Redshift over a PostgreSQL API, just like you would connect to a PostgreSQL database on RDS.
在您链接的教程中返回步骤4。查看它向您展示如何获取群集的URL?您必须使用PostgreSQL驱动程序连接到该URL。 AWS软件开发工具包(如Boto3)提供对AWS API的访问。您需要通过PostgreSQL API连接到Redshift,就像您将连接到RDS上的PostgreSQL数据库一样。
#2
12
Right, you need psycopg2
Python module to execute COPY command.
对,你需要psycopg2 Python模块来执行COPY命令。
My code looks like this:
我的代码如下所示:
import psycopg2
#Amazon Redshift connect string
conn_string = "dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"
#connect to Redshift (database should be open to the world)
con = psycopg2.connect(conn_string);
sql="""COPY %s FROM '%s' credentials
'aws_access_key_id=%s; aws_secret_access_key=%s'
delimiter '%s' FORMAT CSV %s %s; commit;""" %
(to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,delim,quote,gzip)
#Here
# fn - s3://path_to__input_file.gz
# gzip = 'gzip'
cur = con.cursor()
cur.execute(sql)
con.close()
I used boto3/psycopg2 to write CSV_Loader_For_Redshift
我使用boto3 / psycopg2来编写CSV_Loader_For_Redshift