I'm having trouble with performance on the following query:
我在以下查询中遇到性能问题:
SELECT [COLUMNS] FROM TABLE A JOIN TABLE B ON [KEYS]
选择[COLUMNS] FROM TABLE A JOIN TABLE B ON [KEYS]
If I remove the join, leaving only the select the query takes seconds. With the join, it takes 30 minutes.
如果我删除了连接,只留下选择查询需要几秒钟。加入需要30分钟。
Table sizes are A (844,082,912) & B (1,540,379,815) rows. Distribution and sort keys are equivalent to the join KEYS.
表格大小为A(844,082,912)和B(1,540,379,815)行。分发和排序键等同于连接KEYS。
Looking on AWS graphs, I see (attached) one node with has some 100% CPU utilisation for a short time.
查看AWS图表,我看到(附加)一个节点在短时间内具有100%的CPU利用率。
Looking on system table (svv_diskusage) I am not sure what I see (attached), as it does not indicate (as far as I can tell) if one node has much more data than the others.
查看系统表(svv_diskusage)我不确定我看到了什么(附件),因为它没有表明(据我所知)一个节点是否有比其他节点多得多的数据。
if the issue is faulty distribution, how can I see it? is it something else?
如果问题是错误分配,我怎么能看到它?是别的吗?
1 个解决方案
#1
Here https://aws.amazon.com/articles/8341516668711341 (Uneven Distribution) you can see an example of the same graph style: one node is working harder than the others, which indicates your data is not evenly distributed.
在这里https://aws.amazon.com/articles/8341516668711341(不均匀分布)您可以看到相同图形样式的示例:一个节点比其他节点更加努力,这表明您的数据分布不均匀。
Regarding svv_diskusage, it describes the values stored in each slice. If the slices are not relatively evenly used, that's an indicator for a bad distribution key. Try the following query to get a higher abstraction over distribution amooung nodes and not slices:
关于svv_diskusage,它描述了存储在每个切片中的值。如果切片没有相对均匀地使用,那么这是错误分发密钥的指示符。尝试以下查询以获得更高的分布amooung节点而不是切片的抽象:
select owner, host, diskno, used, capacity,
(used-tossed)/capacity::numeric *100 as pctused
from stv_partitions order by owner;
set search_path to '$user', 'public', 'ic';
select * from pg_table_def where tablename = '{TableNameHere}';
#1
Here https://aws.amazon.com/articles/8341516668711341 (Uneven Distribution) you can see an example of the same graph style: one node is working harder than the others, which indicates your data is not evenly distributed.
在这里https://aws.amazon.com/articles/8341516668711341(不均匀分布)您可以看到相同图形样式的示例:一个节点比其他节点更加努力,这表明您的数据分布不均匀。
Regarding svv_diskusage, it describes the values stored in each slice. If the slices are not relatively evenly used, that's an indicator for a bad distribution key. Try the following query to get a higher abstraction over distribution amooung nodes and not slices:
关于svv_diskusage,它描述了存储在每个切片中的值。如果切片没有相对均匀地使用,那么这是错误分发密钥的指示符。尝试以下查询以获得更高的分布amooung节点而不是切片的抽象:
select owner, host, diskno, used, capacity,
(used-tossed)/capacity::numeric *100 as pctused
from stv_partitions order by owner;
set search_path to '$user', 'public', 'ic';
select * from pg_table_def where tablename = '{TableNameHere}';