Two RAID volumes, VMware kernel/console running on a RAID1, vmdks live on a RAID5. Entering a login at the console just results in SCSI errors, no password prompt. Praise be, the VMs are actually still running. We're thinking, though, that upon reboot the kernel may not start again and the VMs will be down.
两个RAID卷,在RAID1上运行的VMware内核/控制台,vmdks在RAID5上运行。在控制台输入登录只会导致SCSI错误,无密码提示。值得称赞的是,VM实际上仍在运行。但是,我们正在考虑,重新启动后,内核可能无法再次启动,并且VM将会关闭。
We have database and disk backups of the VMs, but not backups of the vmdks themselves.
我们有VM的数据库和磁盘备份,但没有vmdks本身的备份。
What are my options?
我有什么选择?
Our current best idea is
我们目前最好的想法是
- Use VMware Converter to create live vmdks from the running VMs, as if it was a P2V migration.
- Reboot host server and run RAID diagnostics, figure out what in the "h" happened
- Attempt to start ESX again, possibly after rebuilding its RAID volume
- Possibly have to re-install ESX on its volume and re-attach VMs
- If that doesn't work, attach the "live" vmdks created in step 1 to a different VM host.
使用VMware Converter从正在运行的VM创建实时vmdk,就好像它是P2V迁移一样。
重新启动主机服务器并运行RAID诊断程序,找出“h”中发生的情况
尝试重新启动ESX,可能是在重建其RAID卷之后
可能必须在其卷上重新安装ESX并重新连接VM
如果这不起作用,请将步骤1中创建的“实时”vmdks附加到其他VM主机。
1 个解决方案
#1
1
It was the backplane. Both drives of the RAID1 and one drive of the RAID5 were inaccessible. Incredibly, the VMware hypervisor continued to run for three days from memory with no access to its host disk, keeping the VMs it managed alive.
这是背板。 RAID1的两个驱动器和RAID5的一个驱动器都无法访问。令人难以置信的是,VMware虚拟机管理程序继续从内存运行三天而无法访问其主机磁盘,从而使其管理的虚拟机保持活动状态。
At step 3 above we diagnosed the hardware problem and replaced the RAID controller, cables, and backplane. After restart, we re-initialized the RAID by instructing the controller to query the drives for their configurations. Both were degraded and both were repaired successfully.
在上面的步骤3中,我们诊断出硬件问题并更换了RAID控制器,电缆和背板。重启后,我们通过指示控制器查询驱动器的配置来重新初始化RAID。两者都退化了,两者都成功修复了。
At step 4, it was not necessary to reinstall ESX; although, at bootup, it did not want to register the VMs. We had to dig up some buried management stuff to instruct the kernel to resignature the VMs. (Search VM docs for "resignature.")
在步骤4,没有必要重新安装ESX;虽然,在启动时,它不想注册VM。我们不得不挖掘一些隐藏的管理内容来指示内核重新签名VM。 (在VM文档中搜索“resignature”。)
I believe that our fallback plan would have worked, the VMware Converter images of the VMs that were running "orphaned" were tested and ran fine with no data loss. I highly recommend performing a VMware Converter imaging of any VM that gets into this state, after shutting down as many services as possible and getting the VM into as read-only a state as possible. Loading a vmdk either elsewhere or on the original host as a repair is usually going to be WAY faster than rebuilding a server from the ground up with backups.
我相信我们的后备计划会起作用,运行“孤立”的虚拟机的VMware转换器映像已经过测试并运行正常,没有数据丢失。我强烈建议在关闭尽可能多的服务并使VM尽可能处于只读状态后,对进入此状态的任何VM执行VMware Converter映像。将vmdk加载到其他地方或原始主机上作为修复通常比通过备份从头开始重建服务器更快。
#1
1
It was the backplane. Both drives of the RAID1 and one drive of the RAID5 were inaccessible. Incredibly, the VMware hypervisor continued to run for three days from memory with no access to its host disk, keeping the VMs it managed alive.
这是背板。 RAID1的两个驱动器和RAID5的一个驱动器都无法访问。令人难以置信的是,VMware虚拟机管理程序继续从内存运行三天而无法访问其主机磁盘,从而使其管理的虚拟机保持活动状态。
At step 3 above we diagnosed the hardware problem and replaced the RAID controller, cables, and backplane. After restart, we re-initialized the RAID by instructing the controller to query the drives for their configurations. Both were degraded and both were repaired successfully.
在上面的步骤3中,我们诊断出硬件问题并更换了RAID控制器,电缆和背板。重启后,我们通过指示控制器查询驱动器的配置来重新初始化RAID。两者都退化了,两者都成功修复了。
At step 4, it was not necessary to reinstall ESX; although, at bootup, it did not want to register the VMs. We had to dig up some buried management stuff to instruct the kernel to resignature the VMs. (Search VM docs for "resignature.")
在步骤4,没有必要重新安装ESX;虽然,在启动时,它不想注册VM。我们不得不挖掘一些隐藏的管理内容来指示内核重新签名VM。 (在VM文档中搜索“resignature”。)
I believe that our fallback plan would have worked, the VMware Converter images of the VMs that were running "orphaned" were tested and ran fine with no data loss. I highly recommend performing a VMware Converter imaging of any VM that gets into this state, after shutting down as many services as possible and getting the VM into as read-only a state as possible. Loading a vmdk either elsewhere or on the original host as a repair is usually going to be WAY faster than rebuilding a server from the ground up with backups.
我相信我们的后备计划会起作用,运行“孤立”的虚拟机的VMware转换器映像已经过测试并运行正常,没有数据丢失。我强烈建议在关闭尽可能多的服务并使VM尽可能处于只读状态后,对进入此状态的任何VM执行VMware Converter映像。将vmdk加载到其他地方或原始主机上作为修复通常比通过备份从头开始重建服务器更快。