For the past few days I’ve been struggling and trying to get this new Xen Platform to work.
Basically the problem lies in the fact that after the OS install (CentOS 5), the machine would refuse to boot with 2 kind of error. One of a frozen initial boot screen and the other is a kernel panic (can’t find VolGroup00).
Since the total space of the harddrives were more then 2 terabytes (6 750 gb drives in raid10 and 2 hot spares), I had to either have autocarve on OR use a boot volume and thereby converting the 2nd (sdb) into a GPT format.
Since I can’t get the machine to boot, my initial reaction was to update the 3ware raid card and the motherboard to the latest BIOS version. But I still can’t get pass the frozen screen or the kernel panic screen no matter what I did. I tried every possible senario I can think of and we’re talking EVERYTHING.
Ram tested good, HDs are fine, tried IDE cdrom, DVDrom, sata dvdrom,
Found some interesting threads
http://forums.fedoraforum.org/showthread.php?p=1073621
http://www.3ware.com/kb/article.aspx?id=15388
But nothing worked although the thread on the fedoraforum looked interesting and I thought I had something with it but ended up with NO results.
Then I had this bright idea of reducing the speed of the bus to pci-x 66 from 133. Now I start thinking perhaps something was wrong the card since we bought a used one. Well the install went fine but afterward during the brute force test, a ton of errors pop out on the screen and the thing just looks horribly flaky.
I called 3ware and asked them for some tips. The guy was very helpful giving me some things I haven’t tried yet (although I have tried pretty much everything else). I tried his tips and nothing worked.
Well finally I took the card home and plugged it into my Tyan server at home (my sweet new fileserver) and bam it worked FLAWLESSLY.
This is when I started to realize that it could be the motherboard itself. I did try a little bit more random stuff and finally just decided to RMA and cross ship the board. We use supermicro boards at work and even the server I built for SW used supermicro. I mean this is how much love I have for supermicro that I didn’t remotely even considered that it could be the board being flaky.
Well when the new board arrived, I plugged the parts in and it was basically a WAM BAM THANK YOU MAM. Worked flawlessly. It looked like the PCI-X ports for the board was just bad. Perhaps if the raid card was PCI-E, that slot might have worked perfectly.
Lesson I learned: Don’t think that the motherboard (just cause the brand is hot) isn’t the weakest link.