This morning I noticed that our MySQL server load was going sky high. Max should be 8 but it hit over 100 at one point. When I checked the process list I found loads of update queries (simple ones, incrementing a "hitcounter") that were in query end
state. We couldn't kill them (well, we could, but they remained in the killed
state indefinitely) and our site ground to a halt.
今天早上我注意到我们的MySQL服务器负载非常高。最大值应该是8,但有一点超过了100。当我检查进程列表时,我发现了大量的更新查询(简单的查询,增加一个“hitcounter”),这些查询位于查询端状态。我们不能杀死他们(嗯,我们可以,但是他们无限期地呆在被杀害的状态中),我们的基地陷入停顿。
We had loads of problems restarting the service and had to forcibly kill some processes. When we did we were able to get MySQLd to come back up but the processes started to build up again immediately. As far as we're aware, no configuration had been changed at this point.
我们在重新启动服务时遇到了很多问题,不得不强行终止一些进程。当我们这样做的时候,我们可以让MySQLd回来,但是过程又开始了。就我们所知,目前还没有更改配置。
So, we changed innodb_flush_log_at_trx_commit
from 2 to 1 (note that we need ACID compliance) in the hope that this would resolve the problem, and set the connections in PHP/PDO to be persistent. This seemed to work for an hour or so, and then the connections started to run out again.
因此,我们将innodb_flush_log_at_trx_commit从2改为1(注意我们需要ACID遵从性),希望这样可以解决问题,并将PHP/PDO中的连接设置为持久性。这似乎可以工作一个小时左右,然后连接又开始中断。
Fortunately, I set a slave server up a couple of months ago and was able to promote it and it's taking up the slack for now, but I need to understand why this has happened and how to stop it, since the slave server is significantly underpowered compared to the master, so I need to switch back soon.
幸运的是,我几个月前设置一个从属服务器,能够促进填补空缺的是它的现在,但我需要理解为什么会发生这一切,如何阻止它,因为从服务器到主相比明显动力不足,所以我需要尽快切换。
Has anyone any ideas? Could it be that something needs clearing out? I don't know what, maybe the binary logs or something? Any ideas at all? It's extremely important that we can get this server back as the master ASAP but frankly I have no idea where to look and everything I have tried so far has only resulted in a temporary fix.
有人有什么想法吗?是不是有什么东西需要清理?我不知道,也许是二进制的对数?有什么想法吗?非常重要的是,我们可以尽快让这个服务器作为主服务器返回,但坦率地说,我不知道应该在哪里查找,我迄今为止所尝试的一切都只导致了一个临时修复。
Help! :)
的帮助!:)
3 个解决方案
#1
22
I'll answer my own question here. I checked the partition sizes with a simple df
command and there I could see that /var was 100% full. I found an archive that someone had left that was 10GB in size. Deleted that, started MySQL, ran a PURGE LOGS BEFORE '2012-10-01 00:00:00'
query to clear out a load of space and reduced the /var/lib/mysql directory size from 346GB to 169GB. Changed back to master and everything is running great again.
我将在这里回答我自己的问题。我用一个简单的df命令检查了分区大小,在那里我可以看到/var是100%满的。我发现有人留下了一个10GB大小的档案。删除后,启动MySQL,在“2012-10-01 00:00:00”查询之前运行一个清除日志,以清除空间负载,并将/var/lib/mysql目录大小从346GB减少到169GB。回到了master,一切又开始了。
From this I've learnt that our log files get VERY large, VERY quickly. So I'll be establishing a maintenance routine to not only keep the log files down, but also to alert me when we're nearing a full partition.
从这里我了解到我们的日志文件变得非常大,非常快。因此,我将建立一个维护例程,不仅要将日志文件保存下来,还要在接近一个完整分区时通知我。
I hope that's some use to someone in the future who stumbles across this with the same problem. Check your drive space! :)
我希望这对将来遇到同样问题的人有用。检查你的驾驶空间!:)
#2
6
We've been having a very similar problem, where the mysql processlist showed that almost all of our connections were stuck in the "query end" state. Our problem was also related to replication and writing the binlog.
我们遇到了一个非常类似的问题,mysql processlist显示几乎所有的连接都陷入了“查询端”状态。我们的问题也与复制和编写binlog有关。
We changed the sync_binlog variable from 1 to 0, which means that instead of flushing binlog changes to disk on each commit, it allows the operating system to decide when to fsync() to the binlog. That entirely resolved the "query end" problem for us.
我们将sync_binlog变量从1更改为0,这意味着,与每次提交时刷新对磁盘的binlog更改不同,它允许操作系统决定何时将fsync()更改为binlog。这完全解决了我们的“查询端”问题。
According to this post from Mats Kindahl, writing to the binlog won't be as much of a problem in the 5.6 release of MySQL.
根据Mats Kindahl的这篇文章,在MySQL的5.6版中,给binlog写信不会成为那么大的问题。
#3
3
In my case, it was indicative of maxing out the I/O on disk. I had already reduced fsyncs to a minimum, so it wasn't that. Another symptoms is "log*.tokulog*" files start accumulating because the system can't catch up all the writes.
在我的例子中,这意味着磁盘上的I/O将达到最大值。我已经把fsyncs降到了最低,所以不是那样的。另一个症状是“日志*。文件开始累加,因为系统无法捕捉所有的写入。
#1
22
I'll answer my own question here. I checked the partition sizes with a simple df
command and there I could see that /var was 100% full. I found an archive that someone had left that was 10GB in size. Deleted that, started MySQL, ran a PURGE LOGS BEFORE '2012-10-01 00:00:00'
query to clear out a load of space and reduced the /var/lib/mysql directory size from 346GB to 169GB. Changed back to master and everything is running great again.
我将在这里回答我自己的问题。我用一个简单的df命令检查了分区大小,在那里我可以看到/var是100%满的。我发现有人留下了一个10GB大小的档案。删除后,启动MySQL,在“2012-10-01 00:00:00”查询之前运行一个清除日志,以清除空间负载,并将/var/lib/mysql目录大小从346GB减少到169GB。回到了master,一切又开始了。
From this I've learnt that our log files get VERY large, VERY quickly. So I'll be establishing a maintenance routine to not only keep the log files down, but also to alert me when we're nearing a full partition.
从这里我了解到我们的日志文件变得非常大,非常快。因此,我将建立一个维护例程,不仅要将日志文件保存下来,还要在接近一个完整分区时通知我。
I hope that's some use to someone in the future who stumbles across this with the same problem. Check your drive space! :)
我希望这对将来遇到同样问题的人有用。检查你的驾驶空间!:)
#2
6
We've been having a very similar problem, where the mysql processlist showed that almost all of our connections were stuck in the "query end" state. Our problem was also related to replication and writing the binlog.
我们遇到了一个非常类似的问题,mysql processlist显示几乎所有的连接都陷入了“查询端”状态。我们的问题也与复制和编写binlog有关。
We changed the sync_binlog variable from 1 to 0, which means that instead of flushing binlog changes to disk on each commit, it allows the operating system to decide when to fsync() to the binlog. That entirely resolved the "query end" problem for us.
我们将sync_binlog变量从1更改为0,这意味着,与每次提交时刷新对磁盘的binlog更改不同,它允许操作系统决定何时将fsync()更改为binlog。这完全解决了我们的“查询端”问题。
According to this post from Mats Kindahl, writing to the binlog won't be as much of a problem in the 5.6 release of MySQL.
根据Mats Kindahl的这篇文章,在MySQL的5.6版中,给binlog写信不会成为那么大的问题。
#3
3
In my case, it was indicative of maxing out the I/O on disk. I had already reduced fsyncs to a minimum, so it wasn't that. Another symptoms is "log*.tokulog*" files start accumulating because the system can't catch up all the writes.
在我的例子中,这意味着磁盘上的I/O将达到最大值。我已经把fsyncs降到了最低,所以不是那样的。另一个症状是“日志*。文件开始累加,因为系统无法捕捉所有的写入。