以Sortkey顺序和VACUUM加载数据

时间:2022-12-25 23:06:29

I am loading a nightly snapshot of a table into Redshift. I have added a column called "rundate" at the end of the table, which just represents when the data was pulled through my ETL process. It is also the primary sortkey.

我正在将一张桌子的夜间快照加载到Redshift中。我在表的末尾添加了一个名为“rundate”的列,它只表示数据是通过我的ETL进程提取的。它也是主要的排序键。

The tables every night just gets longer and longer, and there are 400+ columns in many of them.

每晚的桌子都变得越来越长,其中许多都有400多列。

Right now I am using FILLRECORD in conjunction with EMPTYASNULL in order to get NULLS into the table, and once the COPY command has finished, I use

现在我将FILLRECORD与EMPTYASNULL一起使用以便将NULLS放入表中,一旦COPY命令完成,我使用

update table set rundate = 'date' where rundate is NULL

in order to have the correct snapshot date.

为了拥有正确的快照日期。

I am wondering if it is still considered as "loading the data in sortkey order", where I will not need to vacuum. Aside from this, no updates/deletes are done to any of the records.

我想知道它是否仍被视为“以sortkey顺序加载数据”,我不需要真空吸尘器。除此之外,不对任何记录进行更新/删除。

1 个解决方案

#1


Unfortunately no. An update will perform a delete/insert leaving your entire load each day as dead records requiring VACUUM. I would recommend loading into an empty stage table instead and then inserting the data with the extra rundate column after staging.

很不幸的是,不行。更新将执行删除/插入,每天将您的整个负载作为需要VACUUM的死记录。我建议加载到一个空的阶段表中,然后在分段后插入额外的rundate列数据。

#1


Unfortunately no. An update will perform a delete/insert leaving your entire load each day as dead records requiring VACUUM. I would recommend loading into an empty stage table instead and then inserting the data with the extra rundate column after staging.

很不幸的是,不行。更新将执行删除/插入,每天将您的整个负载作为需要VACUUM的死记录。我建议加载到一个空的阶段表中,然后在分段后插入额外的rundate列数据。