An important vSphere 4 storage bug is solved in patch ESX400-200912401-BG


image

Chad Sakac over at http://virtualgeek.typepad.com already blogged about the APD bug in December last year. You can find his post here. 

Just a short quote from Chad his post about the symptoms of this APD bug:

Recently saw a little uptick (still a small number) in customers running into a specific issue – and I wanted to share the symptom and resolution.   Common behavior:

  1. They want to remove a LUN from a vSphere 4 cluster
  2. They move or Storage vMotion the VMs off the datastore who is being removed (otherwise, the VMs would hard crash if you just yank out the datastore)
  3. After removing the LUN, VMs on OTHER datastores would become unavailable (not crashing, but becoming periodically unavailable on the network)
  4. the ESX logs would show a series of errors starting with “NMP”

Examples of the error messages include:

    “NMP: nmp_DeviceAttemptFailover: Retry world failover device "naa._______________" – failed to issue command due to Not found (APD)”

    “NMP: nmp_DeviceUpdatePathStates: Activated path "NULL" for NMP device "naa.__________________".

What a weird one…   I also found that this was affecting multiple storage vendors (suggesting an ESX-side issue).  You can see the VMTN thread on this here.

 

We found out about this issue during a big storage project. We where creating a lot of new LUNs and where removing a lot of the old LUNs. If you remove a LUN on a way not mentioned in Chad his post:

This workaround falls under “operational excellence”.   The sequence of operations here is important – the issue only occurs if the LUN is removed while the datastore and disk device are expected by the ESX host.   The correct sequence for removing a LUN backing a datastore.

  1. In the vSphere client, vacate the VMs from the datastore being removed (migrate or Storage vMotion)
  2. In the vSphere client, remove the Datastore
  3. In the vSphere client, remove the storage device
  4. Only then, in your array management tool remove the LUN from the host.
  5. In the vSphere client, rescan the bus.

So when we used the workaround described above, everything went fine. But at my current employer, we use a large LeftHand iSCSI SAN.  One of the great things of Lefthand SAN is the ability to move LUNs between different clusters. With the APD bug, we couldn’t use this option anymore.

When we discovered this APD bug we contacted VMware Support. After a couple of weeks we received an e-mail with the following fix.

I can now confirm that the APD (All paths dead) issue has been resolved by a patch released as part of P03.

To install this patch, please upgrade your hosts to vSphere Update 1 and use Update Manager to install the latest patches.

Please ensure that ESX400-200912401-BG is installed as this resolves the APD problem

We upgraded one of our clusters to Update 1 and installed the latest patches including the ESX400-200912401-BG patch. After installing the patch, we did some tests and I can confirm that the APD bug is history!!

To reproduce this issue I created two iSCSI LUNs on the EMC VSA. Instead of removing the LUNs I disconnected the iSCSI network to simulate this. So before I disconnected the iSCSI network, all LUNs are working just fine:

image

After I disconnected the iSCSI network and waited a while, all the paths to the EMC LUNs are dead and they are colored red:

image

This is just normal behavior but before installing the ESX400-200912401-BG patch, the ESX host will stall for 30 till 60 seconds. This means that all the VMs running on a host of which a LUN was disconnected will stall, even though the VM is on a different datastore!! I am happy that VMware has solved this APD bug.

 

If you want to make sure if you already installed the APD patch, you can easily verify this with the vCenter Update Manager.

Go to the tab Update Manager and open the Admin View. Add a new baseline. Select the Host Patch option:

image

In the next screen select Fixed:

image 

Now we are going to create a filter. Enter the name of the patch:

image

Select the ESX400-200912401-BG patch:

image

When the new baseline is ready, return to the Compliance view and attach the new baseline:

image

The final step is to perform a scan on your Datacenter, Cluster or ESX Host. Now wait and see if the patch is already installed or not.

 

More info about the patch can be found here:

For the readers who cannot upgrade to vSphere Update 1 and the latest patches, you can find some workarounds here:

VI Toolkit: Patch an ESX host with VUM


image

This script will automate the following steps:

  • Enter Maintenance mode
  • Attach baselines
  • Scan Host and check Compliance
  • Remediate  Host
  • Detach baselines
  • Exit Maintenance mode

This is my first “trial & error” script with the VMware.VumAutomation snapin. If you have any tips, please leave a comment.

You can download the script here: http://poshcode.org/1001

How to: Update ESX3i (USB) Without VUM


image

In this post you’ll find the information you need, to update your ESX3 i host without the VMware Update Manager.

In this whitepaper  vi3_35_25_3i_i_setup.pdf (page 115), you’ll find the following information about the Infrastructure Update tool:

When you install the VI Client, the software installs Infrastructure Update. Infrastructure Update lets you learn about, download, and install maintenance and patch releases, which provide security, stability, and feature enhancements for VMware Infrastructure.

Infrastructure Update downloads available updates. The downloads are background tasks and do not disrupt normal operation. The update service does not install updates for you. Instead, the update service displays a list of available updates that you can choose to install.
When new updates are available, the system tray icon for Infrastructure Update displays a notification. The notifications appear only if you keep automatic update notifications enabled.

This is how it works:

Open the VI Client. Logon to your ESX 3i server. Close the VI Client. Go to Start –> Programs –> VMware and open VMware Infrastructure Update.

image

To update an ESX3i host via VMware Infrastructure Update tool. You will need to follow these three steps:

Continue reading

VI Toolkit 1.5 + VUM = Error


image

I was experimenting with the VUM Powershell library. I ran the Get-Baseline cmdlet and then the following error occurred:

image

After some digging around at goolge and vmware.com. I found that  the VUM library doesn’t work with the new version of the VI Toolkit. VMware is aware of the problem according to the post from Carter on VMTN and the Release notes:

Carter Shanklin posted this information on VMTN:

Hi everyone,

There is an incompatibility between VI Toolkit 1.5 and the VUM cmdlets. For now if you need to use the VUM cmdlets you will need to do so on a system that has VITK 1.0 installed.
This incompatibility will be resolved by a minor update to VI Toolkit that will be shipped when the next version of VUM ships. The next version of VUM will have cmdlets that support that version as well as the current version.

We’ve updated the release notes to note the incompatibility. Sorry for the inconvenience.

From the release notes:

VI Toolkit (for Windows) 1.5 is not compatible with VMware Update Manager – PowerShell Library 1.0.

windowstoolkit15-200901-releasenotes.html

To fix this “problem” I removed the VI Toolkit v 1.5 and installed the VI Toolkit v 1 again. After the rollback, the VUM library is working again :-).

I hope that VMware will fix this soon.

VMware: VUM Perfomance Whitepaper


VMware Update Manager (VUM) provides a patch management framework for VMware Virtual Infrastructure. IT administrators can use it to patch VMware ESX, Windows, and certain versions of Linux virtual machines.  As data centers get bigger, performance implications become more important for patch management. This study covers the following topics:

  • Benchmarking Methodology
  • VUM Server Host Deployment
  • Latency Overview
  • Resource Consumption Matrix
  • Guest Operating System Tuning
  • Network Latencies
  • On-Access Virus Scanning

Download de whitepaper hier: http://www.vmware.com/pdf/vum_1.0_performance.pdf