I’ve been having a reoccurring problem with one node of our Windows 2008 NLB cluster. When joined into the cluster it runs for awhile and then blue-screened. When not in the cluster it can run normally. It was extremely frustrating as the only way to trouble shoot it was to add it to the production cluster and give it some load. I tried an OS re-install, and a hardware swap. Still no good, but today I think I got it fixed.
The node in question is a Dell PowerEdge M600. It’s a blade that comes with 4 Broadcom NICS. I had been checking Windows Update to check to see if they were up to date, but when I actually took a look at the drivers in the Device Manager, I saw that the drivers were two years old. A quick look at Dell’s support page showed new drivers updated in October.
They were installed, and now machine is now in the cluster, and hasn’t had a blue screen yet.
I feel really dumb about missing this. I’ve been out of the hardware game for awhile, but this is pretty low level and dumb. Oh well, hopefully I won’t forget again.