Failing disk checks on normal filesystems, read-only filesystems.
* Fsck could not correct all errors, manual repair needed [ !! ]
Give root password for maintenance
(or type Control-D to continue):
Kernel panic – not syncing: Attempted to kill init!
These are some of the errors I have seen during startup of VM’s.
In all of these cases the symptoms were the same. Virtual Machines (VM’s) were crashing and restarting them failed.
While watching the console windows during startup, they all seem to have problems with their filesystems. The errors occured on VM’s which, the engineers assured me, had enough disk space available.
After taking a closer look at the Storage Repository in which the VM’s were created, I found that the engineers had assigned more virtual disk space to the VM’s in Storage Repository than there was physical diskspace available. That is possible because the VM’s were created using Sparse Allocation.
Sparse Allocation creates a sparse disk, so the size of the disk is initially small and increases as it is used. Sparse allocation is faster than using Non-Sparse Allocation when creating a virtual machine. This is a great feature and has some other nice advantages, like faster migration, etc..
The caveat however, lies in the fact that you can overspend your physical diskspace.
When VM’s grow, they take up more and more space in your Storage Repository. Until ….. you run out of physical diskspace!! The Operating System (O/S) within the VM however, thinks that there is enough space. At that point the O/S within the VM’s start having i/o errors, choking up and finally crashing the VM.
To solve the problems start by creating some free space in the Storage Repository, either by extending the iSCSI LUN, NFS Share or by simply moving a VM to another Storage Repository. After the free space has been created, come some old fashion linux filesystem repair actions.
I found several different scenario’s after startup of VM’s and not all of them are recoverable!
- At startup, the O/S forced checks and corrections of filesystems and was able to repair any inconsistencies automatically and the system boots up correct.
Problem solved
- At startup, the O/S forced checks and corrections of filesystems and was unable to correct all errors and forced a manual repair.
After running fsck on the filesystem(s) in maintenance mode, the system boots up correct.
Problem solved
- At startup, the O/S failed with a Kernel Panic. Outch, this can be very nasty!
This usually means that there is something wrong on a part of either the boot or root filesystem, needed by the kernel.
Change the VM to start from DVD/ISO* and boot into Linux Rescue mode. In rescue mode I was able to repair a filesystem only ones. In other cases the filesystems had sustained to much damage and the VM’s needed to be re-created/re-installed.
In one case the customer was very lucky. This VM had two virtual disks, one for O/S and one for Oracle software and database. The O/S disk was unrecoverable, but the other disk had no problems. The VM had just recently been migrated from another OVM Server. The ‘old’ VM was stil available and very little changes were made to the O/S. I was able to migrate the O/S disk from the old OVM Server and start the VM again.
Just to see what happens when you procede with filesystem repair on a filesystem with to much damage. I ran an fsck -y on this device. This ended up with a very large Lost&Found directory filled with #xxxxxx files and directories. Not quite the filesystem you can boot from. :-)
I wrote this article not to show you how to repair disk corruptions.
There ise enough information around to help you with that.
However, you can prevent this from happening!
Don’t overspend disk space in your Oracle VM environment!
(Or at least put some strict monitoring on physical diskspace usage)
* Since it is not possible to boot a PVM from DVD/ISO, change the VM to HVM.
see MOS Doc ID
884085.1: Oracle VM: How to configure a guest Virtual Machine to boot from CDROM/DVDROM
No comments:
Post a Comment