Rescan VMFS results in a deadlock of vCenter Server 4.x
January 13, 2011 Leave a comment
When you’re using different types of storage in your vSphere environment, you might need to use different kind of alarms. So I thought to be smart and create a lot of folders and assign different alarms to these folders. When the folders and alarms where ready, I moved the Datastores into the folders. Everything looks perfect so far. I was able to add different alarms for each type of Datastores. This scenario is also described by Jeremy Waldrop. The setup looks like this:
So far so good.
But… when I added a new Datastore and started a rescan on a cluster ……. vCenter freezes with a deadlock!
VMware released a KB article: KB1031225 which describes the following symptoms:
- Defining alarms (for example, to monitor storage space or overallocation) on datastore folders can cause vCenter Server to become unresponsive
- Defining alarms causes a deadlock issue, which causes vCenter Server to become unresponsive
The solution and temporary workaround is simple. But it’s not the answer I was looking for:
Restarting the VirtualCenter Server service temporarily resolves this issue. For more information, see Stopping, starting, or restarting vCenter services (1003895).
To avoid this issue, do not define alarms on datastore folders. Only define alarms on Datacenter or datacenter folders.
It was worth a shot because the deadlocks where getting really annoying. So before I deleted all custom created alarms. I wanted to create a backup of the settings. Unfortunately this is not a standard feature within vCenter Server. But, I remembered a script from @LucD22 which he mentioned during his presentation PowerCLI is for Administrators with @Alanrenouf at the #Dutch VMUG last December. During this presentation he mentioned two scripts to export and import alarm settings. This scripts will be published in the upcoming PowerCLI book: VMware vSphere PowerCLI Reference: Automating vSphere Administrators more info soon at www.powerclibook.com. So I contacted Luc and asked if I could test these scripts in my environment. Lucky me the answer was Yes. So after a quick test and fixing a name convention issue with the alarm names. Quick tip: don’t use [ ] in your alarm names. I was able to export the alarm settings.
Now I have a backup of the alarm settings, I could test the solution mentioned in the KB article and if this was not the solution, I had a rollback scenario. So I removed all the alarms from the Datastore Folders and started some testing with rescans. First I added a new Datastore. The rescan went fine. The I started a rescan on the whole Datacenter. This was also not a problem. The last test was the remove a Datastore from vCenter so It will perform an automatic rescan on all the vSphere hosts that where connected to the Datastore. And…. This final test was also successful. So NO deadlocks in vCenter anymore!
The last step was to setup a new or use the default Datastore alarm definition at vCenter level.
After creating the alarm I started the whole test procedure again. And I have never seen a deadlock again.
I hope VMware will fix this issue in the upcoming release of vSphere.