I’ve been having a reoccurring problem with one node of our Windows 2008 NLB cluster. When joined into the cluster it runs for awhile and then blue-screened. When not in the cluster it can run normally. It was extremely frustrating as the only way to trouble shoot it was to add it to the production cluster and give it some load. I tried an OS re-install, and a hardware swap. Still no good, but today I think I got it fixed.
The node in question is a Dell PowerEdge M600. It’s a blade that comes with 4 Broadcom NICS. I had been checking Windows Update to check to see if they were up to date, but when I actually took a look at the drivers in the Device Manager, I saw that the drivers were two years old. A quick look at Dell’s support page showed new drivers updated in October.
They were installed, and now machine is now in the cluster, and hasn’t had a blue screen yet.
I feel really dumb about missing this. I’ve been out of the hardware game for awhile, but this is pretty low level and dumb. Oh well, hopefully I won’t forget again.
One thought on “More Issues with Windows 2008 Clustering”
Hi – enjoyed the CF9 preso.
I’m looking into a really simple failover scenario with NLB with single standalone CF8 instances on 2 machines in an NLB cluster. I don’t fully understand though how NLB determines if CF is unresponsive.
Is it just checking if the server is alive? Or can it detect if CF iteself is unresponsive even if the server is still up?