Technological Wanderings - md http://www.technologicalwanderings.co.uk/taxonomy/term/14 en Linux soft RAID hanging on boot at Mounting Root http://www.technologicalwanderings.co.uk/node/8 <div class="field field-name-taxonomy-vocabulary-1 field-type-taxonomy-term-reference field-label-above"><div class="field-label">Keywords:&nbsp;</div><div class="field-items"><div class="field-item even"><a href="/taxonomy/term/13">raid</a></div><div class="field-item odd"><a href="/taxonomy/term/14">md</a></div><div class="field-item even"><a href="/taxonomy/term/15">mdadm</a></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>I have a Linux (Gentoo) server which has been somewhat unreliable, and suffers from frequent lockups[1]. Today, it started to hang at boot on "Mounting Root Filesystem".</p> <p>I booted a recovery CD and took a look at the RAID filesystems, all using Linux's MD software RAID1. They all assembled fine, and mounted the ext3 and reiser3 filesystems without trouble. So I started to look in more detail:</p> <p>On doing a query of one of the components of the root RAID, I found:</p> <p># mdadm -Q /dev/hda2<br /> /dev/hda2: is not an md array<br /> /dev/hda2: device 0 in 2 device mismatch raid1 /dev/md3. Use mdadm --examine for more detail.</p> <p>"Mismatch!" All the others show "active" or "inactive". I look closer and note "md3" - my root is md1, /boot is md3!<br /> What is happening is that the RAID block device notes in its superblock which md device node it is assigned to. When booting, Linux is looking for /dev/md3 to mount the root. Knowing this to an MD RAID, it examines devices and starts those that match.</p> <p>In this case, I've probably made a mistake during a previous recovery and mounted / as md3, which it has remembered. So on bootup, I have two filesystems claiming to be for the root device, which is set as /dev/md3 in the LILO boot loader.</p> <p>To fix this, you need to update the super block. This is done when assembling the device, so do it from a fresh boot off your recovery disk.</p> <p>This is what I did:</p> <p># mdadm --assemble /dev/md3 --update=super-minor /dev/hda2 /dev/hdd2</p> <p>Once done, a query shows:<br /> # mdadm -Q /dev/hda2<br /> /dev/hda2: is not an md array<br /> /dev/hda2: device 0 in 2 device active raid1 /dev/md1. Use mdadm --examine for more detail.</p> <p>Rebooting, the root is mounted instantly and everything works. Huzzah!</p> <p>[1] Once every couple of days, and almost certainly temperature related as the environment has been getting very hot and humid at the same time. It has a hardware based watchdog which brings it back up - I do like real server hardware.. I pulled the heatsinks off the CPUs and noticed a lot of thermal transfer compound (which would be my fault) - I've wiped these down and left just a very thin film and will see how well it works now.</p> <p>========</p> <p>Update:<br /> I noticed that the machine is running the disks on mdma2, rather than udma5.<br /> So I played with the kernel options (2.6.22-r9) to try to fix that and on rebooting got the same problem again. Going back to kernel 2.6.21-r5 solved both the mounting root and UDMA issues. So I suspect the real reason behind all this is a broken kernel revision, at least with Broadcom CSB5 (Intel SDS2 board).</p> </div></div></div> Sun, 25 Nov 2007 17:26:41 +0000 techuser 8 at http://www.technologicalwanderings.co.uk http://www.technologicalwanderings.co.uk/node/8#comments