I have a custom-written Windows service that I run on a number of Hyper-V VMs. The VMs get rebooted a couple times an hour as part of some automated tests being run. The service is set to automatic start and almost all of the time, it starts up fine.
我有一个自定义编写的Windows服务,我在许多Hyper-V vm上运行它。在运行的一些自动化测试中,vm每小时重新启动几次。该服务被设置为自动启动,几乎在所有时间,它启动良好。
However, maybe 5% of the time, with no pattern that I can discern, the service fails to start. When it fails, I get an error in Event Viewer saying
然而,可能有5%的时间,由于没有模式可以识别,服务无法启动。当它失败时,我在Event Viewer中得到一个错误
A timeout was reached (30000 milliseconds) while waiting for the My Service Name service to connect.
在等待我的服务名称服务连接时,超时(30000毫秒)。
When this occurs, I can start the service manually, or restart again, and the service will start fine.
当发生这种情况时,我可以手动启动服务,或者重新启动服务,服务将正常启动。
The thing I can't figure out is that the 30 second timeout doesn't appear to be occurring in my code. The very first line of my service class's OnStart() method logs "Starting..." to its log4net log. When the service fails to start, I don't even get anything logged at all, which indicates to me that either log4net can't log for whatever reason, or the timeout is occurring before my OnStart() gets called.
我搞不懂的是,30秒超时在我的代码中没有出现。我的服务类OnStart()方法的第一行将“start…”记录到它的log4net日志。当服务启动失败时,我甚至没有任何日志记录,这向我表明,log4net无论出于什么原因都无法进行日志记录,或者在调用OnStart()之前发生超时。
The service runs on a variety of OSes, from XP all the way up to Win7 and 2008R2, and I know that setting the service to delayed start may solve this for Vista and later, but that seems like a hack.
该服务运行在各种操作系统上,从XP一直到Win7和2008R2,我知道将服务设置为延迟启动可能会解决Vista和以后的问题,但这似乎是一个技巧。
I haven't been able to remote debug this because of the fact that it happens so intermittently and during system startup, and I'm at a loss as to further ways to try to figure out what's going on. Any ideas?
我无法进行远程调试,因为它是间歇性的,在系统启动过程中,我很茫然,想要进一步了解到底发生了什么。什么好主意吗?
7 个解决方案
#1
10
You may want to look at this post, it's not identical to your situation but the solution does offer sound advice on Windows services and startup functionality.
你可能想看看这篇文章,它与你的情况不同,但是这个解决方案确实提供了关于Windows服务和启动功能的建议。
#2
5
My guess - and that's all it is - is that the disk is thrashing hard during startup, to the point where the .NET Framework itself isn't starting in the 30 seconds that Windows allocates for services to start.
我的猜测——仅此而已——是磁盘在启动过程中剧烈地抖动,以至于。net框架本身在Windows分配服务启动的30秒内还没有启动。
A kludgy workaround may be to set the service to start manually, then write a very small stub service in unmanaged code (e.g. C++, Delphi) to start the service.
一个复杂的解决方案可能是将服务设置为手动启动,然后用非托管代码(例如c++、Delphi)编写一个非常小的存根服务来启动服务。
Another approach may be to start the service remotely from another machine. The sc
command should do the job nicely.
另一种方法可能是从另一台机器远程启动服务。sc命令应该做得很好。
#3
3
For what it's worth, I discovered that I received this message (almost immediately upon service startup) because I did not have version 4.5 of the .NET framework installed on the target machine. I rolled back the version I was using to version 4.0 (which was already installed on the target machine) and the service worked as expected.
值得注意的是,我发现我收到了这条消息(几乎是在服务启动时),因为我没有在目标机器上安装。net框架的4.5版本。我将使用的版本回滚到4.0版本(该版本已经安装在目标机器上),服务按照预期工作。
#4
2
I was seeing this error in the Event Viewer when trying to install a service with powershell.
我在使用powershell安装服务时在事件查看器中看到了这个错误。
The problem I had was that I had different values for "Service Name" and "Service Display Name" in my powershell script to those that I had specified in the program.cs file of my Console Application.
我遇到的问题是,powershell脚本中的“服务名”和“服务显示名”的值与程序中指定的值不同。我的控制台应用程序的cs文件。
#5
1
I think I may have also found another contributing factor to this kind of does not start on reboot error.
我想我可能还发现了另一个因素,导致这种不启动重新启动错误。
It appears that if the Windows Event Log is set to Overwrite Events > 7days.. size 512kb.. But a lot of activity has occurred within this window, then Event Log is effectively full because it can't overwrite the number of events generated inside that timeframe. If you set the eventlog to a much larger size OR to Overwrite as needed then you won't experience this issue
看来,如果Windows事件日志被设置为覆盖事件> 7天。512 kb大小. .但是在这个窗口中发生了很多活动,因此事件日志实际上是满的,因为它不能覆盖在那个时间框架内生成的事件的数量。如果您将eventlog设置为更大的大小,或者根据需要重写,那么您将不会遇到这个问题
#6
0
My issue with the same error was that the .Net installation on the server was not working correctly.
我的问题是服务器上的。net安装没有正常工作。
To figure this out:
算出来:
I made a small console app with identical logic as the executing service, and I made a try-catch around the whole code piece, dumping it all out to console.
我制作了一个与执行服务逻辑相同的小型控制台应用程序,并对整个代码片段进行了试戴,并将其全部转储到控制台。
Not sure why the information didn't bubble up, but we saw the valuable messages about the Framework errors that we would never have seen otherwise.
不知道为什么这些信息没有冒出来,但是我们看到了关于框架错误的有价值的信息,否则我们将永远不会看到这些信息。
#7
0
We are having the same problem on Windows 2016 Server.
我们在Windows 2016服务器上遇到了同样的问题。
A fix that seems to be working is changing the user under which the service running from Local Service Account to local Administrator (not sure what's the cause).
一个似乎正在工作的修复程序正在改变从本地服务帐户运行到本地管理员(不确定是什么原因)的服务。
#1
10
You may want to look at this post, it's not identical to your situation but the solution does offer sound advice on Windows services and startup functionality.
你可能想看看这篇文章,它与你的情况不同,但是这个解决方案确实提供了关于Windows服务和启动功能的建议。
#2
5
My guess - and that's all it is - is that the disk is thrashing hard during startup, to the point where the .NET Framework itself isn't starting in the 30 seconds that Windows allocates for services to start.
我的猜测——仅此而已——是磁盘在启动过程中剧烈地抖动,以至于。net框架本身在Windows分配服务启动的30秒内还没有启动。
A kludgy workaround may be to set the service to start manually, then write a very small stub service in unmanaged code (e.g. C++, Delphi) to start the service.
一个复杂的解决方案可能是将服务设置为手动启动,然后用非托管代码(例如c++、Delphi)编写一个非常小的存根服务来启动服务。
Another approach may be to start the service remotely from another machine. The sc
command should do the job nicely.
另一种方法可能是从另一台机器远程启动服务。sc命令应该做得很好。
#3
3
For what it's worth, I discovered that I received this message (almost immediately upon service startup) because I did not have version 4.5 of the .NET framework installed on the target machine. I rolled back the version I was using to version 4.0 (which was already installed on the target machine) and the service worked as expected.
值得注意的是,我发现我收到了这条消息(几乎是在服务启动时),因为我没有在目标机器上安装。net框架的4.5版本。我将使用的版本回滚到4.0版本(该版本已经安装在目标机器上),服务按照预期工作。
#4
2
I was seeing this error in the Event Viewer when trying to install a service with powershell.
我在使用powershell安装服务时在事件查看器中看到了这个错误。
The problem I had was that I had different values for "Service Name" and "Service Display Name" in my powershell script to those that I had specified in the program.cs file of my Console Application.
我遇到的问题是,powershell脚本中的“服务名”和“服务显示名”的值与程序中指定的值不同。我的控制台应用程序的cs文件。
#5
1
I think I may have also found another contributing factor to this kind of does not start on reboot error.
我想我可能还发现了另一个因素,导致这种不启动重新启动错误。
It appears that if the Windows Event Log is set to Overwrite Events > 7days.. size 512kb.. But a lot of activity has occurred within this window, then Event Log is effectively full because it can't overwrite the number of events generated inside that timeframe. If you set the eventlog to a much larger size OR to Overwrite as needed then you won't experience this issue
看来,如果Windows事件日志被设置为覆盖事件> 7天。512 kb大小. .但是在这个窗口中发生了很多活动,因此事件日志实际上是满的,因为它不能覆盖在那个时间框架内生成的事件的数量。如果您将eventlog设置为更大的大小,或者根据需要重写,那么您将不会遇到这个问题
#6
0
My issue with the same error was that the .Net installation on the server was not working correctly.
我的问题是服务器上的。net安装没有正常工作。
To figure this out:
算出来:
I made a small console app with identical logic as the executing service, and I made a try-catch around the whole code piece, dumping it all out to console.
我制作了一个与执行服务逻辑相同的小型控制台应用程序,并对整个代码片段进行了试戴,并将其全部转储到控制台。
Not sure why the information didn't bubble up, but we saw the valuable messages about the Framework errors that we would never have seen otherwise.
不知道为什么这些信息没有冒出来,但是我们看到了关于框架错误的有价值的信息,否则我们将永远不会看到这些信息。
#7
0
We are having the same problem on Windows 2016 Server.
我们在Windows 2016服务器上遇到了同样的问题。
A fix that seems to be working is changing the user under which the service running from Local Service Account to local Administrator (not sure what's the cause).
一个似乎正在工作的修复程序正在改变从本地服务帐户运行到本地管理员(不确定是什么原因)的服务。