I have a Python script that uses psycopg2
to execute a COPY
command to copy data from S3 to Redshift, this is running fine on a cron schedule.
我有一个Python脚本,使用psycopg2执行COPY命令将数据从S3复制到Redshift,这在cron计划中运行正常。
Now I want to do some checks that the data has loaded properly each time and want to query the STL_LOAD_COMMITS
and STL_LOAD_ERRORS
tables.
现在我想做一些检查,每次都正确加载数据,并想查询STL_LOAD_COMMITS和STL_LOAD_ERRORS表。
Does anyone know if there is a way of getting the query ID
returned from the COPY
command so it can be used to query the tables above and retrieve the relevant log record?
有没有人知道是否有办法获取从COPY命令返回的查询ID,以便它可用于查询上面的表并检索相关的日志记录?
I don't believe COPY
returns anything at all, but if someone has come across some clever way of getting checking loads in code I'd be interested.
我不相信COPY会返回任何内容,但如果有人遇到一些聪明的方法来检查代码中的负载,我会感兴趣。
EDIT: Perhaps the right way to do this is to query using the filename instead of the query ID since I know the names of the files I've loaded.
编辑:也许正确的方法是使用文件名而不是查询ID进行查询,因为我知道我加载的文件的名称。
select *
from STL_LOAD_COMMITS
where filename in ('s3://bucket/4f737c05-8f16-4ba7-8f50-30423369c389.csv.gz',
's3://bucket/5fe4fea9-a9e4-4622-b9f6-ed3f98f7d1e2.csv.gz')
1 个解决方案
#1
2
Using PG_LAST_COPY_ID()
will, as it suggests, return the last executed COPY
query ID.
正如建议的那样,使用PG_LAST_COPY_ID()将返回上次执行的COPY查询ID。
Source AWS Redshift PG_LAST_COPY_ID()
源AWS Redshift PG_LAST_COPY_ID()
#1
2
Using PG_LAST_COPY_ID()
will, as it suggests, return the last executed COPY
query ID.
正如建议的那样,使用PG_LAST_COPY_ID()将返回上次执行的COPY查询ID。
Source AWS Redshift PG_LAST_COPY_ID()
源AWS Redshift PG_LAST_COPY_ID()