I am having the weirdest issue that I've been trying to track for months now. I've added lines and lines of debugging code that create log entries in a MySQL based log, and the result makes no sense.
我遇到了几个月来我一直试图追踪的最奇怪的问题。我添加了一些代码行和代码行,这些代码在基于MySQL的日志中创建日志条目,结果毫无意义。
Basically, the script simply stops sometimes. Sometimes it does so randomly, then it does it a dozen times in the same location, then it might continue all the way through, then it does it again the next time.
基本上,脚本有时会停止。有时它是随机的,然后在相同的位置上做了十几次,然后它可能一直这样做,然后下次再做一次。
More details:
更多的细节:
Every 15 minutes, I am looping through a list of clients, and every client has a list of data that needs to be parsed and collected for emails. If a previous version of this script is already running (i.e. a log entry exists that is less than 5 minutes old), the script is not executed again. So if I see a break in log entries longer than a few minutes and then another start 15 minutes after the first start, I know something is wrong. The most bizarre situation from the log is as follows:
每隔15分钟,我就循环遍历一个客户列表,每个客户都有一个需要解析和收集的数据列表,以备邮件使用。如果该脚本的前一个版本已经在运行(即存在一个小于5分钟的日志条目),则不会再次执行该脚本。因此,如果我看到日志条目中有超过几分钟的中断,然后在第一次开始后15分钟又开始,我就知道出了问题。日志中最奇异的情况如下:
I put into the log that I am about to create the database query for client X. Then I create a variable with SQL code that contains the client id and the day of the week ( date("l", strtotime("now"))
). Then I log that the query was created successfully. Note that the query ONLY exists in a PHP variable and was NOT submitted to MySQL yet!
我将即将为客户端x创建数据库查询的日志放入其中,然后使用SQL代码创建一个变量,其中包含客户端id和一周的日期(日期(“l”、strtotime(“now”)))。然后我记录查询被成功创建。注意,该查询仅存在于PHP变量中,尚未提交给MySQL !
So let me give you an example of what I see in the log:
我给你们举个例子,我在log中看到的
- 3:00:00pm - (script starts)
- 3:00:00pm(脚本开始)
- 3:00:00pm - (it loops through clients)
- 3:00:00pm -(通过客户循环)
- 3:00:04pm - (it has gone through some clients and is now working on client 20)
- 3:00:04pm(它已经经过了一些客户,现在正在处理客户20)
- 3:00:04pm - creating query for client 20
- 3:00:04pm -为客户端20创建查询
- (log ends here until the script is automatically restarted 15 minutes later if there has not been any log entry for at least 5 minutes)
- (如果至少5分钟内没有任何日志条目,日志将在这里结束,直到脚本在15分钟后自动重新启动)
- 3:15:00pm - (script starts)
- 3:15:00pm(脚本开始)
- 3:15:00pm - (it loops through clients)
- 3:15:00 -(在客户端循环)
- 3:15:04pm - (it has gone through some clients, is skipping client 20 because that obviously had a problem, and is now working on client 21)
- 3:15:04pm -(已经有一些客户跳过了客户20,因为很明显有问题,现在正在处理客户21)
- 3:15:04pm - creating query for client 21
- 3:15:04pm -为客户端21创建查询
- 3:15:04pm - successfully created query for client 21
- 3:15:04pm -为客户端21成功创建查询
- (log ends here until the script is automatically restarted 15 minutes later if there has not been any log entry for at least 5 minutes)
- (如果至少5分钟内没有任何日志条目,日志将在这里结束,直到脚本在15分钟后自动重新启动)
- 3:30:00pm - (script starts)
- 3:30:00pm(脚本开始)
- 3:30:00pm - (it loops through clients)
- 3:30 pm -(它循环通过客户端)
- 3:30:04pm - (it has gone through some clients and is now working on client 20)
- 3:30:04pm(它已经通过了一些客户,现在正在处理客户20)
- 3:30:04pm - creating query for client 20
- 3:30:04pm -为客户端20创建查询
And rinse and repeat. Now for a few hours, it will alternate between failing just before it creates the query for client 20 and failing just after it created the query for client 21. Then, suddenly, it might make it all the way through the rest of the clients. Then the script starts again and will continue the same weird loop. And every day or so, this will happen with one or two other clients.
清洗和重复。现在,在几个小时内,它将在为客户机20创建查询之前失败,和在为客户机21创建查询之后失败之间进行交替。然后,突然之间,它可能会一直贯穿到其他客户端。然后脚本再次启动,并将继续相同的奇怪循环。每一天左右,这将发生在一个或两个其他客户身上。
The query is simple enough, something like this:
查询非常简单,如下所示:
$sql = "
select fldClientName
from client
where fldClientId = $clientId
and fldEmail".date("l", strtotime("now"))." = 1
";
Basically, if today is Monday, it should check if fldEmailMonday is set to 1 to let us know that this client needs to be emailed today.
基本上,如果今天是星期一,它应该检查fldEmailMonday是否设置为1,让我们知道这个客户今天需要发邮件。
This works for tons of our clients, but it just randomly gets stuck at one or two clients that change from day to day. And again, this happens BEFORE $sql
is submitted to MySQL! We get stuck in the creation of the variable $sql
.
这对我们的客户来说是可行的,但它只是偶尔会被困在一两个客户身上,而这些客户每天都在变化。同样,这发生在$sql提交给MySQL之前!我们陷入了变量$sql的创建中。
Granted, the actual query is much more complex than what I've written here, but $clientId
and date("l", strtotime("now"))
are the only variable parts in an otherwise static piece of text.
当然,实际的查询比我在这里写的要复杂得多,但是$clientId和date(“l”,strtotime(“现在”))是其他静态文本中唯一的可变部分。
Furthermore, we've had the same issue over years (I've only now started tracking it down more), and by now, we have gone through three PHP servers and two MySQL servers - and we're still having the same issue, so we're moderately sure it's not a hardware issue (like memory or hard drive).
此外,多年来我们有同样的问题(我现在才开始跟踪下来更多),现在,我们已经经历了三个PHP服务器和两个MySQL服务器,我们仍然有同样的问题,所以我们适度确定这不是硬件问题(如内存或硬盘)。
I don't know if this might be the problem, but this script is run by a cron job that starts it in lynx. This was established before I took on the code, and I don't know the reason for it. Whenever we run it manually (I usually use php index.php
instead of lynx hostname://index.php
), it doesn't seem to fail, ever.
我不知道这是不是问题,但是这个脚本是由cron作业运行的,它在lynx中启动它。这是在我开始编写代码之前建立的,我不知道原因。无论何时我们手动运行它(我通常使用php索引)。php代替了lynx主机名://index.php),它似乎从来没有失败过。
So could this be an issue with Lynx? If so, why does it work 70% of the time and fail otherwise? Why the randomness?
那么这对Lynx来说会是一个问题吗?如果是这样,为什么它有70%的时间是有效的,否则就会失败?为什么随机性?
Or is there a PHP issue that I should figure out? We're running version 5.3.2 (yes, a bit old, but our server admin doesn't want to mess with it unless absolutely necessary).
或者是否存在一个PHP问题?我们正在运行5.3.2版本(是的,有点旧了,但是我们的服务器管理员不希望弄乱它,除非绝对必要)。
I am guessing that Lynx is used so we get the Apache log (which by the way is empty except for some deprecated code warnings that I'm trying to get rid of now, too), and I am guessing that if I run php index.php
, I don't get an apache log. Maybe there's another log file I'm missing that might help here?
我猜测使用Lynx是为了获得Apache日志(顺便说一下,Apache日志是空的,除了一些我现在也试图删除的不赞成的代码警告),我猜测如果我运行php索引的话。php,我没有apache日志。也许我还漏掉了另一个日志文件可能会对这里有帮助?
Also, it can't be the logging code itself that's causing it because it only submits plain static text and the client id that is established before the SQL code is created.
而且,导致这种情况的不可能是日志代码本身,因为它只提交纯静态文本和创建SQL代码之前建立的客户端id。
The script fails in plenty more locations, but this is the one that really makes no sense to me.
这个脚本在很多地方都失败了,但这对我来说是毫无意义的。
Any thoughts at all on what could cause something like this?
有什么想法能引起这样的事吗?
Any thoughts on what else I can do to track it down? I mean, I'm logging right before and after I create a variable, and it dies in between - now sure how much more I could be logging here...
我还能做些什么来追踪它呢?我的意思是,我在创建一个变量之前和之后都要进行日志记录,然后它就会在这两者之间消失——现在我要确定我还能在这里记录多少……
3 个解决方案
#1
1
To summarise the conversation, my main strategy was to simplify the number of software components involved in the cron call. So, this means changing from:
为了总结对话,我的主要策略是简化cron调用中涉及的软件组件的数量。这意味着从
Cron -> Lynx -> Apache -> Script
to
来
Cron -> Script
If problems are still exhibited in the new approach, then at least two components have been ruled out (and it may be faster too). It sounds like, from your comments, that this fixed it; if so, great.
如果新方法仍然存在问题,那么至少有两个组件被排除在外(而且速度可能更快)。听起来,从你的评论来看,这解决了问题;如果是这样的话,很好。
A wider point I'd reiterate here is that, whilst a challenging problem can be interesting to look into, there is a point where for cost reasons it is best to solve the problem another way. This may be such an occasion, and the mystery of Lynx failing might just have to persist!
我在这里重申的一个更广泛的观点是,尽管一个具有挑战性的问题值得关注,但出于成本原因,最好以另一种方式解决这个问题。这可能是一个这样的场合,而且Lynx失败的神秘可能不得不坚持下去!
#2
2
It might be your issue is not related to PHP, MySQL or Apache, might be more of server environment case, some OS freezing and dropping zombies case. Did you try that cronjobs in any other environment than Lynx? i mean maybe locally or at some other server.
可能是你的问题与PHP、MySQL或Apache无关,可能是服务器环境的问题,可能是一些操作系统冻结和删除僵尸的问题。除了Lynx之外,你在其他任何环境中尝试过那个cronjobs吗?我的意思是可能是本地或者其他服务器。
Have you checked up server logs for any alert/error messages?
您是否检查过服务器日志中是否有任何警告/错误消息?
#3
1
My first step in debugging this would be to extract the day variable from the $sql generation. I can't see anything wrong with it so I doubt it'll accomplish much, but at least it'll confirm one way or the other if that is contributing to the problem.
我调试它的第一步是从$sql生成中提取day变量。我看不出它有什么问题,所以我怀疑它能完成多少工作,但至少它肯定了一种或另一种方式,如果那是导致问题的原因。
$day = date("l", strtotime("now"));
$sql = "
select fldClientName
from client
where fldClientId = $clientId
and fldEmail{$day} = 1
";
You could even check the value of $day
and throw a RuntimeException if it isn't valid. But as I mentioned, I suspect the answer lies elsewhere.
您甚至可以检查$day的值,如果它无效,还可以抛出RuntimeException。但正如我提到的,我怀疑答案在别处。
#1
1
To summarise the conversation, my main strategy was to simplify the number of software components involved in the cron call. So, this means changing from:
为了总结对话,我的主要策略是简化cron调用中涉及的软件组件的数量。这意味着从
Cron -> Lynx -> Apache -> Script
to
来
Cron -> Script
If problems are still exhibited in the new approach, then at least two components have been ruled out (and it may be faster too). It sounds like, from your comments, that this fixed it; if so, great.
如果新方法仍然存在问题,那么至少有两个组件被排除在外(而且速度可能更快)。听起来,从你的评论来看,这解决了问题;如果是这样的话,很好。
A wider point I'd reiterate here is that, whilst a challenging problem can be interesting to look into, there is a point where for cost reasons it is best to solve the problem another way. This may be such an occasion, and the mystery of Lynx failing might just have to persist!
我在这里重申的一个更广泛的观点是,尽管一个具有挑战性的问题值得关注,但出于成本原因,最好以另一种方式解决这个问题。这可能是一个这样的场合,而且Lynx失败的神秘可能不得不坚持下去!
#2
2
It might be your issue is not related to PHP, MySQL or Apache, might be more of server environment case, some OS freezing and dropping zombies case. Did you try that cronjobs in any other environment than Lynx? i mean maybe locally or at some other server.
可能是你的问题与PHP、MySQL或Apache无关,可能是服务器环境的问题,可能是一些操作系统冻结和删除僵尸的问题。除了Lynx之外,你在其他任何环境中尝试过那个cronjobs吗?我的意思是可能是本地或者其他服务器。
Have you checked up server logs for any alert/error messages?
您是否检查过服务器日志中是否有任何警告/错误消息?
#3
1
My first step in debugging this would be to extract the day variable from the $sql generation. I can't see anything wrong with it so I doubt it'll accomplish much, but at least it'll confirm one way or the other if that is contributing to the problem.
我调试它的第一步是从$sql生成中提取day变量。我看不出它有什么问题,所以我怀疑它能完成多少工作,但至少它肯定了一种或另一种方式,如果那是导致问题的原因。
$day = date("l", strtotime("now"));
$sql = "
select fldClientName
from client
where fldClientId = $clientId
and fldEmail{$day} = 1
";
You could even check the value of $day
and throw a RuntimeException if it isn't valid. But as I mentioned, I suspect the answer lies elsewhere.
您甚至可以检查$day的值,如果它无效,还可以抛出RuntimeException。但正如我提到的,我怀疑答案在别处。