Tuesday, 29 July 2008

Crazy from the Heat...

Computers don't like heat. Apparently. Years ago, I was putting together a system for my brother based on one of the old AMD Athlon CPUs.  Built it, tested it, installed Windows, everything running beautifully. Fire it up an hour or two before he arrives to pick it up... it bluescreens and won't boot. Open up the case, check everything's seated properly... you know the drill. It's all fine, of course. Three hours later, I still can't work out what's wrong. Every component is fine. Every diagnostic passes. The disks are fine. The memory is fine. Eventually, and completely by chance, I actually move the case off the desk onto the floor whilst it's running... and it crashes. Turns out the heatsink clamp was ever-so-slightly bent out of shape. Unlike the LGA775 heatsinks of today with their wonderfully-engineered motherboard mountings, the old Athlon heatsinks just clipped onto the plastic CPU socket, and what was happening was that when the box - a generic mini-tower case - was up on the desk, on its side, running tests and diagnostics, everything was fine. When I put it back together and flipped it the right way up - i.e. standing vertically - the weight of the heatsink combined with the bent clip was just enough to pull the heatsink out of contact with the CPU, which would then shoot up to 96°C and crash spectacularly. A new heatsink clip and some arctic silver and it worked perfectly.

Anyway. Moral of the story is, in my experience, PCs go funny in the summer. Whether it's the heat or just plain coincidence I don't know, but they do. And when they do, the first thing to check - always - is the memory. Get the Ultimate Boot CD, load up MemTest86, and let it run overnight. (If anything's wrong, it generally shows up in about two minutes... but if it'll run overnight without any problems, your RAM is almost certainly OK.)

Faulty memory creates the most bewildering array of crashes, faults, errors and bluescreens I have ever seen. Having inadvertently run a system with a stick of bad RAM for a couple of weeks, I would at various points have sworn it was the RAID controller, the hard drive, the video card, Windows, the printer driver - in fact, pretty much every component of the system seemed to have caused it to crash at one point or another. I'd ignored the possibility of the memory, because the system in question isn't that old and it was tested when I put it together... I was wrong, and just running Memtest86 in the first place would have saved literally hours of troubleshooting and head-scratching.

No comments: