Linux: Disk Timeout settings not increased by VMware Tools

Recently I had some issues with Linux VM’s which became read-only. In my earlier post about disk-timeout settings I wrote about the timeout value being increased during the VMware Tools installation. But how does the VMware Tools install change this value. I though the solution can be found within the vmware-config-tools.pl script. So to find the vmware-config-tools.pl script just run:

[root@linuxvm1 ~]# type vmware-config-tools.pl
vmware-config-tools.pl is /usr/bin/vmware-config-tools.pl

No run the less commmand:

less /usr/bin/vmware-config-tools.pl

press / and type 180 now you see the info we are looking for:

image

The disk timeout value can only be changed with Linux kernel 2.6.13 or higher. Ok so what if you use a Linux distribution with a kernel older than 2.6.13? From KB51306:

VMware has identified a problem wherein file systems may become read-only after encountering busy I/O retry or SAN or iSCSI path failover errors.

The same behavior is expected even on a native Linux environment, where the time required for the file system to become read-only depends on the number of paths available to a particular target, the multi-path software installed on the operating system, and whether the failing I/O was to an EXT3 Journal. However, the problem is aggravated in an ESX Server environment because ESX Server manages multiple paths to the storage target and provides a single path to the guest operating system, which effectively reduces the number of retries done by the guest operating system.

These guest operating systems are affected:

  • RHEL5 (RedHat)
  • RHEL4 U6
  • RHEL4 U4
  • RHEL4 U3
  • SLES10
  • SLES9 SP3 
    Note: This issue may affect other Linux distributions based on early 2.6 kernels as well, such as Ubuntu 7.04.

This situation can lead to serious issues and can only be solved with a reboot of the VM. But there is a workaround. From KB1009465:

Increasing the timeout value

The timeout value for a Linux block device can be set using sysfs.
Note: This is usually increased automatically when deploying VMware-Tools, but if it is not installed, you will need increase it manually.

Check the current values using the following command:

for a in /sys/class/scsi_generic/*/device/timeout; do echo -n "$a "; cat "$a" ; done;

Increase the timeout value for an individual disk using the following command. For example to change the values for device sdc, run:

echo 180 > /sys/block/sdc/device/timeout

Run the following command to change the timeout values for all devices to 180:

for i in /sys/class/scsi_generic/*/device/timeout; do echo 180 > "$i"; done

you can add the following command:

for i in /sys/class/scsi_generic/*/device/timeout; do echo 180 > "$i"; done

to the /etc/rc.d/rc.local file to make sure the disk timeout is changed during startup.

Source:  
KB1009465 http://kb.vmware.com/
KB51306 http://kb.vmware.com/
VMTN communities http://communities.vmware.com/thread/257251

Veeam: Instant Recovery Fails – Unable to Mount filesystem

Before I start with the solution to the error mentioned in the title of this post I want to share some information about Veeam Instant Recovery. I wanted to test the new feature Instant Recovery. So how does Instant Recovery work? This quote comes from the veeam_backup_5_0_user_guide.pdf user guide:

With Veeam Backup & Replication, you can immediately recover a VM from a created backup file. Instant VM recovery accelerates VM restore, allowing you to improve recovery time objectives and decrease downtime of production VMs.

When performing instant recovery, Veeam Backup & Replication creates an independent temporary copy of a VM in your VMware environment and immediately starts it (if necessary). You can then move this copy to your production storage using Storage vMotion and cold migration to finalize recovery, or alternatively, replicate a restored VM with Veeam Backup & Replication and then fail over to the created replica during the next maintenance window. You can also use a recovered VM for testing purposes to make sure VM guest OS and applications are functioning properly.

Similar to the SureBackup recovery verification technology, instant VM recovery does not require you to extract a VM from a backup and move it across datacenter — it mounts a VM directly from a compressed backup file on a selected ESX host. The archived image of a VM remains in a read-only state to avoid unexpected modifications. All changes to a virtual disk that take place while a VM is running are logged to an auxiliary file on the Veeam Backup server or any datastore you select. These changes are discarded as soon as a restored VM is removed.

Let’s start an Instant Recovery restore job:

image 

Read more of this post

vSphere: Storage vMotion of a virtual machine might stop responding at 18%

For some reason a couple of storage vMotions went really slow. So I looked in the logs and found the following lines in the vmware.log:

Oct 15 11:43:37.889: Worker#0| DISKLIB-VMFS :CopyData ‘/vmfs/volumes/xxxxx/vm1/vm1_1-flat.vmdk’ : failed to move data (Cannot allocate memory:0xc0009).
Oct 15 11:43:37.890: Worker#0| DISKLIB-LIB : DiskLib_Clone: Could not perform clone using vmkernel data mover. Falling back to non-accelerated clone. Cannot allocate memory

I started a Google search and found the following thread http://communities.vmware.com/message/1545132 in the communities. A short quote from richardt his post:

"This problem is caused by the VMFS3-DM (Data Mover) having to use contiguous memory space on the host. Apparently, when host’s kernel memory usage is >50% and memory has been fragmented, the DM cannot allocate more space and throws the errors.

I couldn’t get any fix that day so a made an internal case and got focused on another case. The next day I was reading the release notes of patch ESX400-302009402-SG and found the following quote:

Storage vMotion of a virtual machine might stop responding at 18%, and might be completed after a long time, even though other Storage vMotion operations on the host might continue without any errors. If you try to cancel the Storage vMotion operation when it stops responding, the system disconnects the ESX host from the vCenter Server and automatically connects it after a few minutes.

The vmware.log file might contain the following error:
Could not perform clone using vmkernel data mover. Falling back to non-accelerated clone. Cannot allocate memory

The VMkernel log file might contain the following error:
status Out of memory copying 16777216 bytes from file offset 0 to file offset 0, bytesTransferred = 0 extentsTransferred: 0

This issue occurs because the DataMover cannot allocate contiguous physical space when the host’s kernel memory usage is around 50% and the memory is fragmented. The operation fails back to the Application layer data movement. The operation continues and succeeds, but might take more time when compared to the usual DataMove time. The DataMover requires 16MB of contiguous physical memory on the ESX host for each DataMover thread. This patch provides a fix to make DataMover work with fragmented memory.

Installing patch ESX400-302009402-SG did resolve this issue.

 

Source: KB1023759

VMware Tools: VMwareService.exe or vmware-guestd High Memory Usage

Ever seen the VMwareService.exe or vmware-guestd process taking all your memory? Well I did see this issue on a few VM’s. Like this VM with Windows Server 2003 SP2 R2:

image

The VMwareService,exe took 1,3GB of RAM. Ok this is a problem but what started this memory “leaking”.

Read more of this post

How to manage iSCSI targets with PowerCLI part 1

In part 1 of this series I want to show some basic reporting and how you can add a single target and multiple targets to your vSphere hosts. Let’s start with a simple script to report all the targets on all your vSphere hosts:

$esxHosts = Get-VMhost
foreach($esx in $esxhosts){
$hba = $esx | Get-VMHostHba -Type iScsi 
    Write-Host "=========================================="
    Write-Host "iSCSI Targets on $esx"
    Write-Host "=========================================="
    Get-IScsiHbaTarget -IScsiHba $hba -Type Send | Sort Address
    Write-Host " "
}

The following output will be generated:

image

Read more of this post

PowerCLI: Enable Beacon Probing or Link Status Only on a vSwitch

In some cases you need to troubleshoot your network infrastructure from your vSphere hosts all the way back into your core network. In this case you can enable beacon probing and watch your log files for issues. But before I continue to show how to enable beacon probing I want to share some information about what beacon probing is and when is it recommended to be used :

Beacon Probing is a network failover detection mechanism that sends out and listens for beacon probes on all NICs in the team and uses this information, in addition to link status, to determine link failure.

This detects failures, such as cable pulls and physical switch power failures, but not configuration errors, such as a physical switch port being blocked by spanning tree or misconfigured to the wrong VLAN or cable pulls on the other side of a physical switch that are not detected by link status alone.
Beacon probing is most useful to detect failures in the closest switch to the ESX hosts, where the failure does not cause a link-down event for the host.

You can use beaconing with 2 NICs, but this only detects failures on the immediate uplink. If you really want to detect upstream failures, use beaconing with 3 or more NICs.
When there are only two NICs in service and one of them loses connectivity it is unclear which NIC needs to be taken out of service as both no longer receive beacons. Using at least 3 NICs in such a team allows for N – 2 failures where N is the number of NICs in the team before getting into an ambiguous situation.

Source:KB1005577

More information about beacon probing can be found here: http://blogs.vmware.com/networking/ and in part 4 of the great vSwitch debate: http://kensvirtualreality.wordpress.com/2009/04/10/the-great-vswitch-debate-part-4/

Read more of this post

vSphere: Set NFS Advanced Configuration Settings via esxcfg-advcfg

Yesterday I created a post about changing the advanced configuration settings for NFS via PowerCLI. Today I will show you how you can change the advanced configuration settings with the use of esxcfg-advcfg. This is quite useful for kickstart installations.

This is a snippet from my ks.cfg file:

# Set NFS advanced Configuration Settings
/usr/sbin/esxcfg-advcfg -s 30 /Net/TcpipHeapSize
/usr/sbin/esxcfg-advcfg -s 120 /Net/TcpipHeapMax
/usr/sbin/esxcfg-advcfg -s 10 /NFS/HeartbeatMaxFailures
/usr/sbin/esxcfg-advcfg -s 12 /NFS/HeartbeatFrequency
/usr/sbin/esxcfg-advcfg -s 5 /NFS/HeartbeatTimeout
/usr/sbin/esxcfg-advcfg -s 32 /NFS/MaxVolumes

So how do you know what values you need to enter when you want to use this command. Bouke has a html version of the esxcfg manuals on his blog: http://www.jume.nl/esx4man/man8/esxcfg-advcfg.8.html. But this page doesn’t show the information I needed. Open the Advanced Settings screen in the vSphere client.

image

Open the NFS settings. Let’s use the NFS.MaxVolumes in this example. NFS is the ‘root’ folder the setting in this case MaxVolumes is the child folder. So if you want to change this setting via /usr/sbin/esxcfg-advcfg we need to use the /NFS/MaxVolumes. If you want to know what the current value is, just run the following command from the service console:

/usr/sbin/esxcfg-advcfg –g /NFS/MaxVolumes

This will be the output:

image

When you change the value to 32 via this command:

/usr/sbin/esxcfg-advcfg -s 32 /NFS/MaxVolumes

This will be the output:

image

vSphere ISO containing Intel 82575 and 82576 Gigabit Ethernet Adapter drivers

This post is a complete re-post from Eric Sarakaitis blog: http://www.vmwareadmins.com

 

So, after spending three days on this, I was finally able to get the  Intel 82575 and 82576 Gigabit Ethernet Adapter driver slipstreamed onto the installation media.

I first followed this: http://patrickvanbeek.wordpress.com/2010/01/30/slipstreaming-drivers-in-the-esx4i-install-iso/ to get the post install drivers working.

But I still had the problem of not being able to see the NIC’s during the install.

to do that, I had to explode the ESX 4.0u1 ISO and grab the initrd.img file from the isolinux folder.

To do the modifications of the img file I needed a linux guest… so I fired up a Ubuntu image on Lab Manager and SCP’d the img file there.

To extract the IMG file do:

1.mkdir ~/tmp
2.cd ~/tmp
3.cp /boot/initrd.img ./initrd.gz
4.gunzip initrd.gz
5.mkdir tmp2
6.cd tmp2
7.cpio -id < ../initrd.img

now you should have a lot of files in ~/tmp/tmp2 directories, including a lot of subdirectories like sbin,lib

Now you need to extract the igb.xml and igb.o from the VMware RPM (http://www.vmware.com/support/vsphere4/doc/drivercd/esx40-net-igb_400.1.3.19.12-1.0.4.html.)

I then moved these files to their respective locations within the exploded initrd.

the igb.xml went into

1./usr/share/hwdata/pciids/

the igb.o went into

1./usr/lib/vmware/vmkmod/

then pack the files back into the archive using the following command

1.cd ~/tmp/tmp2
2.find . | cpio –create –format=’newc’ > ~/tmp/newinitrd
3.cd ~/tmp
4.gzip newinitrd

now you would have a newinitrd.gz
rename this now –
mv newinitrd.gz as newinitrd.img
this is the new boot image now !!

I then re-created the ISO. Oddly enough, it worked on the first try :)

and the link to the ISO… http://www.vmwareadmins.com

vSphere: Hot Add or Remove a VMDK with a Linux VM

In this post I will show you how to hot add a new VMDK to a Linux VM. I will also post how to remove a VMDK if necessary.

 

Hot Add a new VMDK

Add the new VMDK:

image

After you added the new VMDK login to the VM and run fdisk –l

[root@nagios ~]#  fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        2610    20860402+  8e  Linux LVM

The new disk isn’t available yet so we have to do a SCSI bus rescan. You can run the following command to do a rescan:

echo "- – -">/sys/class/scsi_host/host0/scan

When you run the fdisk –l command after the rescan, you will see the new disk.

[root@nagios ~]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        2610    20860402+  8e  Linux LVM

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn’t contain a valid partition table

The new disk doesn’t contain a valid parition table. This can be fixed with running the fdisk /dev/sdb command:

fdisk –l /dev/sdb n p 1 1 {enter} x b 1 128 w q

The options x b 1 128 will align the new parition.  For more info about, see Bob Plankers his post here: http://lonesysadmin.net/2010/03/30/i-will-keep-saying-it-align-your-partitions/

Now we have a valid parition table but no file system. Run the mkfs.ext3 /dev/sdb1 command to accomplish this task:

[root@nagios ~]# mkfs.ext3 /dev/sdb1
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
1310720 inodes, 2620595 blocks
131029 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2684354560
80 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 39 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Run the fdisk –l command to verify the new configuration:

[root@nagios ~]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        2610    20860402+  8e  Linux LVM

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1        1305    10482381   83  Linux

if you want to auto mount the new disk, you have to create a new folder and add an entry to the /etc/fstab file.

mkdir /disk2
nano or vi /etc/fstab

add the following line:
/dev/sdb1               /disk2                  ext2    defaults        1 2

Now you are ready to mount the new disk.

mount /dev/sdb1 /disk2/

These are all the steps.

Hot Remove a VMDK

If you want to remove an extra VMDK from a Linux VM,you need to follow these steps.

First you need to unmount the /dev/sdb1:

umount /dev/sdb1

Remove the /disk2 folder:

rmdir /disk2/

Remove the entry from the /etc/fstab:

nano or vi /etc/fstab

remove the following line:
/dev/sdb1               /disk2                  ext2    defaults        1 2

Delete the device:

echo 1 > /sys/block/sdb1/device/delete

Remove the VMDK:

image

vSphere: Unattended ESX4 installation Tips & Tricks

In this post I will share some tips / tricks and scripts, which I used to create an unattended ESX4 installation.

 

One of the important lessons I have learned with creating a ks.cfg file for vSphere is how to use proper escaping.

for each $ in your script use a \ to escape it properly. See the example below:

VMHBA=\$(/usr/sbin/esxcfg-scsidevs -a |grep "Software iSCSI" |awk ‘{print \$1}’)

This form of escaping was necessary to get my script working. My script started with the following lines:

%post

cat > /root/esx01.sh <<EOF1

#!/bin/sh

and these are the last lines of the script:

##########################
# Finish
##########################
echo "Making sure the script runs only once"

EOF1

###Make esxcfg.sh eXcutable
chmod +x /root/esx01.sh

###Backup original rc.local file
cp /etc/rc.d/rc.local /etc/rc.d/rc.local.bak

###Make esx01.sh run from rc.local and make rc.local reset itself
cat >> /etc/rc.d/rc.local <<EOF
cd /tmp
/root/esx01.sh
mv -f /etc/rc.d/rc.local.bak /etc/rc.d/rc.local
shutdown -r now
EOF

The rest of this post, I will show you some tips about configuring Syslog, iSCSI, User creation, Change service console memory, Install Dell Open Manage agent, Set the host into maintenance mode.

But before I start with the tips mentioned above, I want to share a little trick a learned from  a comment from David on an excellent blog post by Robert Patton. In stead of using a long sleep at the beginning of your script, you can use the following tip:

hostd-vmdb

Before you start the post script, you have to wait until the hostd-vmdb service is ready. This is necessary  if you want to use the /usr/bin/vmware-vim-cmd command. With the following while loop, you can check the status of the hostd-vmdb service. When the service is ready, the script continues to configure your ESX server.

####################################################
#Wait until host service is ready
####################################################
while ! vmware-vim-cmd /hostsvc/runtimeinfo; do
sleep 20
done

 

I configured the Syslog settings at the beginning of my script, so I can monitor al the steps via the Syslog service:

Syslog

This is just an easy one. The only thing you have to do is echo the following lines:

####################################################
# Configure Syslog
####################################################
echo "# remote syslog server Splunk" >> /etc/syslog.conf
echo "*.* @192.168.123.219" >> /etc/syslog.conf
service syslog restart

The next tips is about the configuration of iSCSI.

Configure iSCSI

The following script part will add a new vSwitch1 called iSCSI and set the IP settings.

####################################################
# Add Storage Networking
####################################################
/usr/sbin/esxcfg-vswitch –add-pg="iSCSI" vSwitch1
/usr/sbin/esxcfg-vswitch –pg="iSCSI" -v 36 vSwitch1
/usr/sbin/esxcfg-vmknic -a -i 172.1.1.202 -n 255.255.255.0 "iSCSI"

/usr/sbin/esxcfg-route 192.168.123.254

# Refresh network settings
/usr/bin/vmware-vim-cmd internalsvc/refresh_network

The next step is to enable the iSCSI initiator and add a rule to the Firewall. After the 10 seconds sleep, the correct VMHBA will be selected for the rest of the steps. The VMHBA is saved in a variable which will be used to set the CHAP password, add the iSCSI Send Targets and perform a VMHBA rescan.

####################################################
# Configure iSCSI
####################################################
/usr/bin/vmware-vim-cmd hostsvc/firewall_enable_ruleset swISCSIClient
/usr/bin/vmware-vim-cmd hostsvc/storage/software_iscsi_enabled true

sleep 10

VMHBA=\$(/usr/sbin/esxcfg-scsidevs -a |grep "Software iSCSI" |awk ‘{print \$1}’)

# Set CHAP password
/usr/bin/vmware-vim-cmd hostsvc/storage/iscsi_enable_chap \$VMHBA iscsi_cluster_01 <chap_password>

# Add iSCSI Send Targets
/usr/bin/vmware-vim-cmd hostsvc/storage/iscsi_add_send_target \$VMHBA 172.1.1.10
/usr/bin/vmware-vim-cmd hostsvc/storage/iscsi_add_send_target \$VMHBA 172.1.1.11

sleep 15

/usr/sbin/esxcfg-rescan \$VMHBA

The rest of the vSwitches / Portgroups are left out of this post.

 

Add Users

If you want to add users with encrypted passwords, You can use the openssl passwd –1 command on
an existing ESX Server to generate a MD5 encrypted password.

image

This little trick can be used to generate the root password for ESX and to generate passwords for other users.

You can use the following line to set the root password during the installation:

# root Password
rootpw –iscrypted $1$EpQvSrYkznF6yCLKPQqZPUYr6z

and if you want to add more users to the Service console, you can use the following lines:

####################################################
# Add users
####################################################
/usr/sbin/useradd -p ‘\$1\$L4fGhr0F\$ImLwX47v3xZkAH4HrmBjr0′ -c "Arne Fokkema" afokkema

Instead of generating passwords, you can also use the string from the /etc/shadow file. You can open de file with cat and copy the string:

image

 

Change the vSwitch portnumber value to 120

To change the vSwitch portnumber to 120, you can use the following command:

####################################################
# Change the vSwitch portnumber to 120
####################################################
/usr/bin/vmware-vim-cmd  hostsvc/net/vswitch_setnumports vSwitch0 128

This will change the default setting to 120:

image

 

Change the Service Console Memory to 800MB

To change the Service Console memory to 800MB, you can use the following commands. These settings are applied after a reboot.

####################################################
# Configure Service Console Memory to 800MB
####################################################
/usr/bin/vmware-vim-cmd /hostsvc/memoryinfo 838860800
/usr/sbin/esxcfg-boot -b
/usr/sbin/esxcfg-boot -t

This is how it looks like in the vSphere client:

 image

Dell Open Manage Agent

The script below is a based on a script by Scot Hanson (aka @DellServerGeek) which you can find here.

This script will download the OM agent from an internal Webserver and opens the firewall for the Open Manage agent.

####################################################
# Dell OM Agent        
####################################################

mkdir -p /root/OM

#Download OM.tar.gz
esxcfg-firewall –allowOutgoing
lwp-download http://webserver/OM/OM.tar.gz /root/OM/.
esxcfg-firewall –blockOutgoing

cd /root/OM
tar -zxf OM.tar.gz
chmod a+x *.*

./linux/supportscripts/srvadmin-install.sh -x
#./linux/supportscripts/srvadmin-services.sh start

/usr/sbin/esxcfg-firewall -o 1311,tcp,in,OpenManageRequest

Enable vMotion

To enable vMotion, We use another variable to capture the right vmkernel portgroup:

####################################################
# Enable vMotion on the vMotion PG
####################################################

service mgmt-vmware restart
sleep 1m

VMK=\$(esxcfg-vmknic -l |grep vMotion |awk ‘{print \$1}’)
/usr/bin/vmware-vim-cmd hostsvc/vmotion/vnic_set \$VMK

# Refresh network settings
/usr/bin/vmware-vim-cmd internalsvc/refresh_network

Enter Maintenance mode

When the installation is ready, the ESX host will enter maintenance mode before it restarts to finalize the installation.

####################################################
# Enter Maintenance mode
####################################################
/usr/bin/vmware-vim-cmd /hostsvc/maintenance_mode_enter

 

It can cost you a lot of time to create a ks.cfg to match your vSphere environment. But when it’s ready, it will save you a lot of time deploying new hosts or redeploy other hosts.

If you have any additional scripts or tips please leave a comment or contact me on twitter: @afokkema

 

Sources:

Follow

Get every new post delivered to your Inbox.

Join 975 other followers