在JOIN查询上进行Redshift性能调优

时间:2021-03-09 23:07:53

I'm having trouble with performance on the following query:

我在以下查询中遇到性能问题:

SELECT [COLUMNS] FROM TABLE A JOIN TABLE B ON [KEYS]

选择[COLUMNS] FROM TABLE A JOIN TABLE B ON [KEYS]

If I remove the join, leaving only the select the query takes seconds. With the join, it takes 30 minutes.

如果我删除了连接,只留下选择查询需要几秒钟。加入需要30分钟。

Table sizes are A (844,082,912) & B (1,540,379,815) rows. Distribution and sort keys are equivalent to the join KEYS.

表格大小为A(844,082,912)和B(1,540,379,815)行。分发和排序键等同于连接KEYS。

Looking on AWS graphs, I see (attached) one node with has some 100% CPU utilisation for a short time. 在JOIN查询上进行Redshift性能调优

查看AWS图表,我看到(附加)一个节点在短时间内具有100%的CPU利用率。

Looking on system table (svv_diskusage) I am not sure what I see (attached), as it does not indicate (as far as I can tell) if one node has much more data than the others.在JOIN查询上进行Redshift性能调优

查看系统表(svv_diskusage)我不确定我看到了什么(附件),因为它没有表明(据我所知)一个节点是否有比其他节点多得多的数据。

if the issue is faulty distribution, how can I see it? is it something else?

如果问题是错误分配,我怎么能看到它?是别的吗?

1 个解决方案

#1


Here https://aws.amazon.com/articles/8341516668711341 (Uneven Distribution) you can see an example of the same graph style: one node is working harder than the others, which indicates your data is not evenly distributed.

在这里https://aws.amazon.com/articles/8341516668711341(不均匀分布)您可以看到相同图形样式的示例:一个节点比其他节点更加努力,这表明您的数据分布不均匀。

Regarding svv_diskusage, it describes the values stored in each slice. If the slices are not relatively evenly used, that's an indicator for a bad distribution key. Try the following query to get a higher abstraction over distribution amooung nodes and not slices:

关于svv_diskusage,它描述了存储在每个切片中的值。如果切片没有相对均匀地使用,那么这是错误分发密钥的指示符。尝试以下查询以获得更高的分布amooung节点而不是切片的抽象:

select owner, host, diskno, used, capacity,
(used-tossed)/capacity::numeric *100 as pctused 
from stv_partitions order by owner;
set search_path to '$user', 'public', 'ic';
select * from pg_table_def where tablename = '{TableNameHere}';

#1


Here https://aws.amazon.com/articles/8341516668711341 (Uneven Distribution) you can see an example of the same graph style: one node is working harder than the others, which indicates your data is not evenly distributed.

在这里https://aws.amazon.com/articles/8341516668711341(不均匀分布)您可以看到相同图形样式的示例:一个节点比其他节点更加努力,这表明您的数据分布不均匀。

Regarding svv_diskusage, it describes the values stored in each slice. If the slices are not relatively evenly used, that's an indicator for a bad distribution key. Try the following query to get a higher abstraction over distribution amooung nodes and not slices:

关于svv_diskusage,它描述了存储在每个切片中的值。如果切片没有相对均匀地使用,那么这是错误分发密钥的指示符。尝试以下查询以获得更高的分布amooung节点而不是切片的抽象:

select owner, host, diskno, used, capacity,
(used-tossed)/capacity::numeric *100 as pctused 
from stv_partitions order by owner;
set search_path to '$user', 'public', 'ic';
select * from pg_table_def where tablename = '{TableNameHere}';