大型MySQL表,选择速度很慢

时间:2021-03-13 23:47:39

I have a large table in MySQL (running within MAMP) it has 28 million rows and its 3.1GB in size. Here is its structure

我在MySQL中有一个大表(在MAMP中运行),它有2800万行,大小为3.1GB。这是它的结构

    CREATE TABLE `termusage` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `termid` bigint(20) DEFAULT NULL,
  `date` datetime DEFAULT NULL,
  `dest` varchar(255) DEFAULT NULL,
  `cost_type` tinyint(4) DEFAULT NULL,
  `cost` decimal(10,3) DEFAULT NULL,
  `gprsup` bigint(20) DEFAULT NULL,
  `gprsdown` bigint(20) DEFAULT NULL,
  `duration` time DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `termid_idx` (`termid`),
  KEY `date_idx` (`date`),
  KEY `cost_type_idx` (`cost_type`),
  CONSTRAINT `termusage_cost_type_cost_type_cost_code` FOREIGN KEY (`cost_type`) REFERENCES `cost_type` (`cost_code`),
  CONSTRAINT `termusage_termid_terminal_id` FOREIGN KEY (`termid`) REFERENCES `terminal` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=28680315 DEFAULT CHARSET=latin1

Here is the output from SHOW TABLE STATUS :

以下是SHOW TABLE STATUS的输出:

Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment    
'termusage', 'InnoDB', '10', 'Compact', '29656469', '87', '2605711360', '0', '2156920832', '545259520', '28680315', '2011-08-16 15:16:08', NULL, NULL, 'latin1_swedish_ci', NULL, '', ''

Im trying to run the following select statement :

我试图运行以下select语句:

    select u.id from termusage u
    where u.date between '2010-11-01' and '2010-12-01'

it takes 35 minutes to return to result (approx 14 million rows) - this is using MySQL Worksbench.

返回结果需要35分钟(大约1400万行) - 这是使用MySQL Worksbench。

I have the following MySQL config setup :

我有以下MySQL配置设置:

Variable_name              Value
bulk_insert_buffer_size    8388608
innodb_buffer_pool_instances   1
innodb_buffer_pool_size    3221225472
innodb_change_buffering    all
innodb_log_buffer_size     8388608
join_buffer_size               131072
key_buffer_size            8388608
myisam_sort_buffer_size    8388608
net_buffer_length              16384
preload_buffer_size            32768
read_buffer_size               131072
read_rnd_buffer_size       262144
sort_buffer_size               2097152
sql_buffer_result              OFF

Eventually im trying to run a larger query - that joins a couple of tables and groups some data, all based on the variable - customer id -

最终我试图运行一个更大的查询 - 连接几个表和组一些数据,所有数据都基于变量 - 客户ID -

select c.id,u.termid,u.cost_type,count(*) as count,sum(u.cost) as cost,(sum(u.gprsup) + sum(u.gprsdown)) as gprsuse,sum(time_to_sec(u.duration)) as duration 
from customer c
inner join terminal t
on (c.id = t.customer)
inner join termusage u
on (t.id = u.termid)
where c.id = 1 and u.date between '2011-03-01' and '2011-04-01' group by c.id,u.termid,u.cost_type

This returns a maximum of 8 rows (as there are only 8 separate cost_types - but this query runs OK where there are not many (less than 1 million) rows in the termusage table to calculate - but takes forever when the number of rows in the termusage table is large - how can I reduce the select time.

这最多返回8行(因为只有8个单独的cost_types - 但是这个查询在termusage表中要计算的行数不多(少于1百万)时运行正常 - 但是当行中的行数为0时termusage表很大 - 如何减少选择时间。

Data is added to the termusage table once a month from CSV files using LOAD DATA method - so it doesn't need to be quite so tuned for inserts.

使用LOAD DATA方法每月从CSV文件中将数据添加到termusage表中 - 因此不需要对插入进行非常精确的调整。

EDIT : Show explain on main query :

编辑:显示主要查询说明:

id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,c,const,PRIMARY,PRIMARY,8,const,1,"Using index; Using temporary; Using filesort"
1,SIMPLE,u,ALL,"termid_idx,date_idx",NULL,NULL,NULL,29656469,"Using where"
1,SIMPLE,t,eq_ref,"PRIMARY,customer_idx",PRIMARY,8,wlnew.u.termid,1,"Using where"

3 个解决方案

#1


3  

Looks like you're asking two questions - correct?

看起来你问了两个问题 - 对吗?

The most likely reason the first query is taking so long is because it's IO-bound. It takes a long time to transfer 14 million records from disk and down the wire to your MySQL work bench.

第一个查询花费这么长时间的最可能原因是因为它受IO限制。将大量1400万条记录从磁盘传输到MySQL工作台需要很长时间。

Have you tried putting the second query though "explain"? Yes, you only get back 8 rows - but the SUM operation may be summing millions of records.

您是否尝试过“解释”第二个查询?是的,你只能获得8行 - 但是SUM操作可能会汇总数百万条记录。

I'm assuming the "customer" and "terminal" tables are appropriately indexed? As you're joining on the primary key on termusage, that should be really quick...

我假设“客户”和“终端”表格已正确编入索引?当你加入关于termusage的主键时,那应该非常快......

#2


0  

You could try removing the where clause restricting by date and instead put an IF statement in the select so that if the date is within these boundaries, the value is returned otherwise a zero value is returned. The SUM will then of course only sum values which lie in this range as all others will be zero.

您可以尝试删除按日期限制的where子句,而是在select中放入IF语句,以便如果日期在这些边界内,则返回该值,否则返回零值。然后,SUM当然只对这个范围内的值求和,因为所有其他值都为零。

It sounds a bit nonsensical to fetch more rows than you need but we observed recently on an Oracle DB that this made quite a huge improvement. Of course it will be dependent on many other factors but it might be worth a try.

获取比你需要的更多的行听起来有点荒谬但我们最近在Oracle DB上观察到这取得了相当大的改进。当然它将取决于许多其他因素,但它可能值得一试。

#3


0  

You may also think about breaking down the table into years or months. So you have a termusage_2010, termusage_2011, ... or something like this.

您也可以考虑将表分解为数年或数月。所以你有一个termusage_2010,termusage_2011,......或类似的东西。

Not a very nice solution, but seeing your table is quite large it might be usefull on a smaller server.

不是一个非常好的解决方案,但看到你的表非常大,它可能在较小的服务器上是有用的。

#1


3  

Looks like you're asking two questions - correct?

看起来你问了两个问题 - 对吗?

The most likely reason the first query is taking so long is because it's IO-bound. It takes a long time to transfer 14 million records from disk and down the wire to your MySQL work bench.

第一个查询花费这么长时间的最可能原因是因为它受IO限制。将大量1400万条记录从磁盘传输到MySQL工作台需要很长时间。

Have you tried putting the second query though "explain"? Yes, you only get back 8 rows - but the SUM operation may be summing millions of records.

您是否尝试过“解释”第二个查询?是的,你只能获得8行 - 但是SUM操作可能会汇总数百万条记录。

I'm assuming the "customer" and "terminal" tables are appropriately indexed? As you're joining on the primary key on termusage, that should be really quick...

我假设“客户”和“终端”表格已正确编入索引?当你加入关于termusage的主键时,那应该非常快......

#2


0  

You could try removing the where clause restricting by date and instead put an IF statement in the select so that if the date is within these boundaries, the value is returned otherwise a zero value is returned. The SUM will then of course only sum values which lie in this range as all others will be zero.

您可以尝试删除按日期限制的where子句,而是在select中放入IF语句,以便如果日期在这些边界内,则返回该值,否则返回零值。然后,SUM当然只对这个范围内的值求和,因为所有其他值都为零。

It sounds a bit nonsensical to fetch more rows than you need but we observed recently on an Oracle DB that this made quite a huge improvement. Of course it will be dependent on many other factors but it might be worth a try.

获取比你需要的更多的行听起来有点荒谬但我们最近在Oracle DB上观察到这取得了相当大的改进。当然它将取决于许多其他因素,但它可能值得一试。

#3


0  

You may also think about breaking down the table into years or months. So you have a termusage_2010, termusage_2011, ... or something like this.

您也可以考虑将表分解为数年或数月。所以你有一个termusage_2010,termusage_2011,......或类似的东西。

Not a very nice solution, but seeing your table is quite large it might be usefull on a smaller server.

不是一个非常好的解决方案,但看到你的表非常大,它可能在较小的服务器上是有用的。