I'm merging two datasets that each have ~1M rows using Google SQL Cloud (MySQL 5.5 w/4GB ram) and it takes over 5 hours to run. I run the following query from Sequel Pro:
我正在使用谷歌SQL Cloud (MySQL 5.5 w/4GB ram)合并两个数据集,每个数据集有大约1M行,运行时间超过5小时。我从Sequel Pro运行以下查询:
create table newtable as (select * from table1 t1 left join table2 t2 using (key))
Each table has approximately 20 VARCHAR columns. Key is also a VARCHAR.
每个表大约有20个VARCHAR列。Key也是VARCHAR。
I've created an index on key in both tables, but that didn't really change performance. I've searched a lot, but can't find any direct advice on how to improve the query time. Is this expected query time for MySQL?
我在两个表中都创建了一个键索引,但这并没有真正改变性能。我搜索了很多,但是找不到任何关于如何改进查询时间的直接建议。这是MySQL的预期查询时间吗?
EDIT: each table is ~250MB
编辑:每个表是~250MB
2 个解决方案
#1
3
The first thing that I noticed was that your KEY is set as a VARCHAR. This could be a major cause of the poor performance that you are experiencing. This can be improved by adding an auto-incremented Integer PRIMARY KEY. Since each String of the million KEY values in ‘table1’ is being individually compared against each of the million KEY values within ‘table2’, this makes for a very performance intensive task, made more so by the comparison between each of the characters in each of the Strings. As using Integers is a simple value to value comparison it will have much less of an impact.
我注意到的第一件事是你的钥匙被设置成一个VARCHAR。这可能是你正在经历的糟糕表现的一个主要原因。这可以通过添加一个自动递增的整数主键来改进。由于在“table1”中百万键值中的每一个字符串都与“table2”中百万键值中的每一个分别进行比较,这就导致了一个非常高的性能任务,通过对每个字符串中的每个字符的比较,这一任务变得更加复杂。由于使用整数是一个简单的值来进行值比较,因此它的影响要小得多。
The tier size of your Cloud SQl Instance will also have a big effect on performance due to the physical hardware constraints on your instance. You can change the tier of your instance temporarily to test it within the ‘Edit’ section of your Cloud SQL user interface or by using the Cloud SDK.
由于实例上的物理硬件约束,云SQl实例的层大小也会对性能产生很大的影响。您可以临时更改实例的层,以便在云SQL用户界面的“编辑”部分或使用云SDK对其进行测试。
#2
0
Silly as it may sound you might have better luck exporting your table with mysqldump, changing the table name, and then re-importing it.
虽然听起来很傻,但您可能更幸运地使用mysqldump导出表,更改表名,然后重新导入表。
#1
3
The first thing that I noticed was that your KEY is set as a VARCHAR. This could be a major cause of the poor performance that you are experiencing. This can be improved by adding an auto-incremented Integer PRIMARY KEY. Since each String of the million KEY values in ‘table1’ is being individually compared against each of the million KEY values within ‘table2’, this makes for a very performance intensive task, made more so by the comparison between each of the characters in each of the Strings. As using Integers is a simple value to value comparison it will have much less of an impact.
我注意到的第一件事是你的钥匙被设置成一个VARCHAR。这可能是你正在经历的糟糕表现的一个主要原因。这可以通过添加一个自动递增的整数主键来改进。由于在“table1”中百万键值中的每一个字符串都与“table2”中百万键值中的每一个分别进行比较,这就导致了一个非常高的性能任务,通过对每个字符串中的每个字符的比较,这一任务变得更加复杂。由于使用整数是一个简单的值来进行值比较,因此它的影响要小得多。
The tier size of your Cloud SQl Instance will also have a big effect on performance due to the physical hardware constraints on your instance. You can change the tier of your instance temporarily to test it within the ‘Edit’ section of your Cloud SQL user interface or by using the Cloud SDK.
由于实例上的物理硬件约束,云SQl实例的层大小也会对性能产生很大的影响。您可以临时更改实例的层,以便在云SQL用户界面的“编辑”部分或使用云SDK对其进行测试。
#2
0
Silly as it may sound you might have better luck exporting your table with mysqldump, changing the table name, and then re-importing it.
虽然听起来很傻,但您可能更幸运地使用mysqldump导出表,更改表名,然后重新导入表。