vSphere – Page 2 – ICT-Freak.nl

December 30, 2010December 30, 2010 afokkema VMware

vMotion error: Virtual machine must be running in order to be migrated

Today I wanted to Storage vMotion a VM to a new datastore. But for the first time I got a general error message:

followed up by a general system error message:

I got the same message when I tried to start a “normal” vMotion. So I start to troubleshoot this error. First I looked at the vmware.log of the VM. Nothing unusual in there. So the next stop was the VMkernel. But there was nothing unusual in it too. So I used the good old Microsoft like fix to restart the VMware services at the Service Console using the following command:

service mgmt-vmware restart

after a minute or so I was able to start a vMotion again and after the vMotion completed I started the storage vMotion I was planning to do and this worked like a charm again.

To recap. Sometimes you need to restart the mgmt-vmware to fix the connection between the vSphere host and vCenter.

November 30, 2010February 3, 2011 afokkema Automation

PowerCLI: RE: Disallowing Multiple VM Console Sessions

Frank Denneman posted today about disallowing multiple VM console session in a high-secure virtual infrastructure design: http://frankdenneman.nl/2010/11/disallowing-multiple-vm-console-sessions

The first thing popped up in my mind was why not automate this setting with PowerCLI. So I created a function called Set-MaxMKSConnections:

Function Set-MaxMKSConnections{
param(
    [parameter(Mandatory = $true)]
    [string[]]$vmName,
    $Sessions
)
    $vmConfigSpec = New-Object VMware.Vim.VirtualMachineConfigSpec

       $extra = New-Object VMware.Vim.optionvalue
    $extra.Key="RemoteDisplay.maxConnections"
    $extra.Value="$Sessions"
    $vmConfigSpec.extraconfig += $extra

        $vm = Get-VM $vmName | Get-View
        $vm.ReconfigVM($vmConfigSpec)
}

You can run this function by copying the code into the PowerCLI window. To run it on a single VM, you can use the following line:

Set-MaxMKSConnections -vmName Thinapp -Sessions 1

To run it on all your VM’s, you can use the following foreach loop:

$vms = Get-VM
foreach($vm in $vms){
    Set-MaxMKSConnections -vmName $vm -Sessions 1
}

The configuration is changed even on Virtual Machines that are powered on (you need to restart the VM to activate the new setting):

If you want to raise the maxConnections value back to 2 or another value, you can change the –Sessions parameter with the correct value and run the script again.

November 22, 2010November 27, 2010 afokkema Automation

Reconfigure DNS settings and add vSphere hosts to Windows DNS

I needed to change the DNS setup in our vSphere environment. Instead of doing this by hand on every host I decided to create a script. First I needed a script to add the A and PTR records to the Windows DNS servers. I remembered a post by the scripting guys so I took their function and added it to my script. The final step is to change de vSphere host DNS configuration. This one is easy with PowerCLI and a simle for loop.

Warning! If you are using vSphere 4.1 and the vSphere hosts are joined to a Windows domain. You are not able to change the DNS settings!

From the Hey Scripting Guy post I quote the following about the new-dnsrecord function:

I’ve written various scripts in the past to work with individual record types, and I’ve found that each class has slightly different syntax and requirements. This makes life awkward when you want to start automating this process, because you have to have a different script or function for each record type. I decided I wanted a universal script for creating records so that I could create multiple records at the same time from minimal information. The following script shows the function that I came up with to create A, PTR, MX, and CNAME records—these being the most common ones I have to deal with. We will be using the MicrosoftDNS_ResourceRecord class with varying inputs.

I have combined the new-dnsrecord function with some PowerCLI code to accomplish my goal of migrating the DNS settings of all the vSphere hosts and to add all the hosts to the DNS servers. I did this task by running the following script:

Continue reading “Reconfigure DNS settings and add vSphere hosts to Windows DNS” →

November 19, 2010November 19, 2010 afokkema Uncategorized

Demo: Uniserver IaaS platform (dutch)

Sinds oktober 2009 ben ik werkzaam bij Uniserver Internet. In de tijd dat ik binnen kwam werd er hard gewerkt aan een oplossing om IaaS aan te gaan bieden op ons UniStructure (vSphere/Dell/HP/Juniper) platform. Sinds dit jaar is deze dienst ook in de markt gezet via een partner model. De klant moet dus eerst partner worden om toegang te krijgen tot de IaaS omgeving. De partners zijn Automatiseerders vanuit het hele land.

Maar hoe werkt deze dienst nu? Dit kun je zien in volgende demo:

De bovenstaande demo is gisteren gegeven op de ICT Dag Midden- Nederland: http://www.ictdag2010.nl/

Meer informatie over de IaaS dienst vind je hier: http://uniserver.nl/

November 2, 2010November 2, 2010 afokkema Linux

Linux: Disk Timeout settings not increased by VMware Tools

Recently I had some issues with Linux VM’s which became read-only. In my earlier post about disk-timeout settings I wrote about the timeout value being increased during the VMware Tools installation. But how does the VMware Tools install change this value. I though the solution can be found within the vmware-config-tools.pl script. So to find the vmware-config-tools.pl script just run:

[root@linuxvm1 ~]# type vmware-config-tools.pl
vmware-config-tools.pl is /usr/bin/vmware-config-tools.pl

No run the less commmand:

less /usr/bin/vmware-config-tools.pl

press / and type 180 now you see the info we are looking for:

The disk timeout value can only be changed with Linux kernel 2.6.13 or higher. Ok so what if you use a Linux distribution with a kernel older than 2.6.13? From KB51306:

VMware has identified a problem wherein file systems may become read-only after encountering busy I/O retry or SAN or iSCSI path failover errors.

The same behavior is expected even on a native Linux environment, where the time required for the file system to become read-only depends on the number of paths available to a particular target, the multi-path software installed on the operating system, and whether the failing I/O was to an EXT3 Journal. However, the problem is aggravated in an ESX Server environment because ESX Server manages multiple paths to the storage target and provides a single path to the guest operating system, which effectively reduces the number of retries done by the guest operating system.

These guest operating systems are affected:

RHEL5 (RedHat)

RHEL4 U6

RHEL4 U4

RHEL4 U3

SLES10

SLES9 SP3
Note: This issue may affect other Linux distributions based on early 2.6 kernels as well, such as Ubuntu 7.04.

This situation can lead to serious issues and can only be solved with a reboot of the VM. But there is a workaround. From KB1009465:

Increasing the timeout value

The timeout value for a Linux block device can be set using sysfs.
Note: This is usually increased automatically when deploying VMware-Tools, but if it is not installed, you will need increase it manually.

Check the current values using the following command:

for a in /sys/class/scsi_generic/*/device/timeout; do echo -n "$a "; cat "$a" ; done;

Increase the timeout value for an individual disk using the following command. For example to change the values for device sdc, run:

echo 180 > /sys/block/sdc/device/timeout

Run the following command to change the timeout values for all devices to 180:

for i in /sys/class/scsi_generic/*/device/timeout; do echo 180 > "$i"; done

you can add the following command:

for i in /sys/class/scsi_generic/*/device/timeout; do echo 180 > "$i"; done

to the /etc/rc.d/rc.local file to make sure the disk timeout is changed during startup.

Source:
KB1009465	http://kb.vmware.com/
KB51306	http://kb.vmware.com/
VMTN communities	http://communities.vmware.com/thread/257251

October 25, 2010October 27, 2010 afokkema Veeam

Veeam: Instant Recovery Fails – Unable to Mount filesystem

Before I start with the solution to the error mentioned in the title of this post I want to share some information about Veeam Instant Recovery. I wanted to test the new feature Instant Recovery. So how does Instant Recovery work? This quote comes from the veeam_backup_5_0_user_guide.pdf user guide:

With Veeam Backup & Replication, you can immediately recover a VM from a created backup file. Instant VM recovery accelerates VM restore, allowing you to improve recovery time objectives and decrease downtime of production VMs.

When performing instant recovery, Veeam Backup & Replication creates an independent temporary copy of a VM in your VMware environment and immediately starts it (if necessary). You can then move this copy to your production storage using Storage vMotion and cold migration to finalize recovery, or alternatively, replicate a restored VM with Veeam Backup & Replication and then fail over to the created replica during the next maintenance window. You can also use a recovered VM for testing purposes to make sure VM guest OS and applications are functioning properly.

Similar to the SureBackup recovery verification technology, instant VM recovery does not require you to extract a VM from a backup and move it across datacenter — it mounts a VM directly from a compressed backup file on a selected ESX host. The archived image of a VM remains in a read-only state to avoid unexpected modifications. All changes to a virtual disk that take place while a VM is running are logged to an auxiliary file on the Veeam Backup server or any datastore you select. These changes are discarded as soon as a restored VM is removed.

Let’s start an Instant Recovery restore job:

Continue reading “Veeam: Instant Recovery Fails – Unable to Mount filesystem” →

October 19, 2010 afokkema VMware

vSphere: Storage vMotion of a virtual machine might stop responding at 18%

For some reason a couple of storage vMotions went really slow. So I looked in the logs and found the following lines in the vmware.log:

Oct 15 11:43:37.889: Worker#0| DISKLIB-VMFS :CopyData ‘/vmfs/volumes/xxxxx/vm1/vm1_1-flat.vmdk’ : failed to move data (Cannot allocate memory:0xc0009).
Oct 15 11:43:37.890: Worker#0| DISKLIB-LIB : DiskLib_Clone: Could not perform clone using vmkernel data mover. Falling back to non-accelerated clone. Cannot allocate memory

I started a Google search and found the following thread http://communities.vmware.com/message/1545132 in the communities. A short quote from richardt his post:

"This problem is caused by the VMFS3-DM (Data Mover) having to use contiguous memory space on the host. Apparently, when host’s kernel memory usage is >50% and memory has been fragmented, the DM cannot allocate more space and throws the errors.

I couldn’t get any fix that day so a made an internal case and got focused on another case. The next day I was reading the release notes of patch ESX400-302009402-SG and found the following quote:

Storage vMotion of a virtual machine might stop responding at 18%, and might be completed after a long time, even though other Storage vMotion operations on the host might continue without any errors. If you try to cancel the Storage vMotion operation when it stops responding, the system disconnects the ESX host from the vCenter Server and automatically connects it after a few minutes.

The vmware.log file might contain the following error:
Could not perform clone using vmkernel data mover. Falling back to non-accelerated clone. Cannot allocate memory

The VMkernel log file might contain the following error:
status Out of memory copying 16777216 bytes from file offset 0 to file offset 0, bytesTransferred = 0 extentsTransferred: 0

This issue occurs because the DataMover cannot allocate contiguous physical space when the host’s kernel memory usage is around 50% and the memory is fragmented. The operation fails back to the Application layer data movement. The operation continues and succeeds, but might take more time when compared to the usual DataMove time. The DataMover requires 16MB of contiguous physical memory on the ESX host for each DataMover thread. This patch provides a fix to make DataMover work with fragmented memory.

Installing patch ESX400-302009402-SG did resolve this issue.

Source: KB1023759

October 17, 2010October 18, 2010 afokkema VMware

VMware Tools: VMwareService.exe or vmware-guestd High Memory Usage

Ever seen the VMwareService.exe or vmware-guestd process taking all your memory? Well I did see this issue on a few VM’s. Like this VM with Windows Server 2003 SP2 R2:

The VMwareService,exe took 1,3GB of RAM. Ok this is a problem but what started this memory “leaking”.

Continue reading “VMware Tools: VMwareService.exe or vmware-guestd High Memory Usage” →

October 8, 2010June 14, 2011 afokkema VMware

How to manage iSCSI targets with PowerCLI part 1

In part 1 of this series I want to show some basic reporting and how you can add a single target and multiple targets to your vSphere hosts. Let’s start with a simple script to report all the targets on all your vSphere hosts:

$esxHosts = Get-VMhost
foreach($esx in $esxhosts){
$hba = $esx | Get-VMHostHba -Type iScsi 
    Write-Host "=========================================="
    Write-Host "iSCSI Targets on $esx"
    Write-Host "=========================================="
    Get-IScsiHbaTarget -IScsiHba $hba -Type Send | Sort Address
    Write-Host " "
}

The following output will be generated:

Continue reading “How to manage iSCSI targets with PowerCLI part 1” →

October 6, 2010October 12, 2010 afokkema VMware

PowerCLI: Enable Beacon Probing or Link Status Only on a vSwitch

In some cases you need to troubleshoot your network infrastructure from your vSphere hosts all the way back into your core network. In this case you can enable beacon probing and watch your log files for issues. But before I continue to show how to enable beacon probing I want to share some information about what beacon probing is and when is it recommended to be used :

Beacon Probing is a network failover detection mechanism that sends out and listens for beacon probes on all NICs in the team and uses this information, in addition to link status, to determine link failure.

This detects failures, such as cable pulls and physical switch power failures, but not configuration errors, such as a physical switch port being blocked by spanning tree or misconfigured to the wrong VLAN or cable pulls on the other side of a physical switch that are not detected by link status alone.
Beacon probing is most useful to detect failures in the closest switch to the ESX hosts, where the failure does not cause a link-down event for the host.

You can use beaconing with 2 NICs, but this only detects failures on the immediate uplink. If you really want to detect upstream failures, use beaconing with 3 or more NICs.
When there are only two NICs in service and one of them loses connectivity it is unclear which NIC needs to be taken out of service as both no longer receive beacons. Using at least 3 NICs in such a team allows for N – 2 failures where N is the number of NICs in the team before getting into an ambiguous situation.

Source:KB1005577

More information about beacon probing can be found here: http://blogs.vmware.com/networking/ and in part 4 of the great vSwitch debate: http://kensvirtualreality.wordpress.com/2009/04/10/the-great-vswitch-debate-part-4/

Continue reading “PowerCLI: Enable Beacon Probing or Link Status Only on a vSwitch” →