vSphere: Storage vMotion of a virtual machine might stop responding at 18%


For some reason a couple of storage vMotions went really slow. So I looked in the logs and found the following lines in the vmware.log:

Oct 15 11:43:37.889: Worker#0| DISKLIB-VMFS :CopyData ‘/vmfs/volumes/xxxxx/vm1/vm1_1-flat.vmdk’ : failed to move data (Cannot allocate memory:0xc0009).
Oct 15 11:43:37.890: Worker#0| DISKLIB-LIB : DiskLib_Clone: Could not perform clone using vmkernel data mover. Falling back to non-accelerated clone. Cannot allocate memory

I started a Google search and found the following thread http://communities.vmware.com/message/1545132 in the communities. A short quote from richardt his post:

"This problem is caused by the VMFS3-DM (Data Mover) having to use contiguous memory space on the host. Apparently, when host’s kernel memory usage is >50% and memory has been fragmented, the DM cannot allocate more space and throws the errors.

I couldn’t get any fix that day so a made an internal case and got focused on another case. The next day I was reading the release notes of patch ESX400-302009402-SG and found the following quote:

Storage vMotion of a virtual machine might stop responding at 18%, and might be completed after a long time, even though other Storage vMotion operations on the host might continue without any errors. If you try to cancel the Storage vMotion operation when it stops responding, the system disconnects the ESX host from the vCenter Server and automatically connects it after a few minutes.

The vmware.log file might contain the following error:
Could not perform clone using vmkernel data mover. Falling back to non-accelerated clone. Cannot allocate memory

The VMkernel log file might contain the following error:
status Out of memory copying 16777216 bytes from file offset 0 to file offset 0, bytesTransferred = 0 extentsTransferred: 0

This issue occurs because the DataMover cannot allocate contiguous physical space when the host’s kernel memory usage is around 50% and the memory is fragmented. The operation fails back to the Application layer data movement. The operation continues and succeeds, but might take more time when compared to the usual DataMove time. The DataMover requires 16MB of contiguous physical memory on the ESX host for each DataMover thread. This patch provides a fix to make DataMover work with fragmented memory.

Installing patch ESX400-302009402-SG did resolve this issue.

 

Source: KB1023759

Advertisements

2 thoughts on “vSphere: Storage vMotion of a virtual machine might stop responding at 18%

  1. Scott

    Once the SvMotion task has stopped responding and you have canceled the task, how long till the “task” clears or completes/fails/finishes?

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s