Last night I logged into work from home to initiate a reboot of all the servers. Windows Updates were pending, and had been pending for about a week, but it’s hard to reboot production servers in the middle of the day when people are using it. Throw in some Flex Hours, and they’re in use from 6am to about 8pm.
The Domain Controllers have their own policy for updates, and they’re still required to be initiated manually, and then “restart now” clicked to reboot them.
When new “critical” patches are released and there are known 0-day flaws being exploited, I’ll use the ‘deadline’ feature in Windows Software Update Services (sort of a mini Windows Update server you can run on your own, approve and distribute updates around your own network but only downloading it once from Microsoft) where if a deadline passes and a user has been clicking “restart later” it will disable that button and start a 15 minute countdown before it forcibly reboots.
There was no deadline on this latest batch of updates from the last Patch Tuesday, so the (member) servers were politely asking to be rebooted. I logged into each of them one by one and clicked “restart now” and then waited for them to shut down, restart, and start back up again.
All of them worked and came back up (according to pinging them for responsiveness) except one. It SEEMED to come back up. I could ping it and it responded, so I moved on to the next and the next and the next.
It wasn’t until this morning when I walked in the door and had four people waiting for me saying “the network is down” (which of course was a misnomer, the network wasn’t down, it was just the shares on THE MAIN FILE SERVER that were disconnected) I poked my head into the server room, and the KVM was already set to that server and on the screen (which was blue, but not that Blue) was “Configuring Updates stage 3 of 3 0% Do not turn off your computer” I watched it for a minute to see what happens, as the hard drive LEDs were blinking away, so it WAS doing SOMETHING… then the screen went black.
The cursor was flashing up in the upper-left, so I waited some more… then the BIOS splash screen came up. The server had rebooted itself.
Turns out it had been in this startup, stage 3, fail, reboot loop since 9:00 last night.
Step 1, try a cold-boot. I waited for it to fail again, and then I held down the power button until it powered off. I removed the power cables and let it sit for 30 seconds to make sure everything had powered off, plugged it back in and tried again. Same result.
Step 2, try Safe Mode…. Applying Computer Settings… Configuring Updates stage 3 of 3… reboot. Crap.
Step 3, Last Known Good Configuration. This resets key windows files back to how they were the last time you successfully logged in. You would think that this would break it out of a bad update loop. You would be wrong.
Step 4, booted from the Windows Server 2008 x64 DVD and clicked on Repair. There’s a new “Startup repair” tool that’s incl-wait, it’s not? only in Server 2008 R2 that’s based on Windows 7 and NOT in Server 2008 that’s based on Vista? There are NO repair options for Server 2008 other than re-imaging of the system from the latest full-system-image? You DO have one of those, right?
Step 5, Uncle Google suggested I click through to “Get Vista out of the Infinite Reboot Loop” and the comment there by Tribus was:
I know a different way to resolve this issue without using a restore point.
1. Insert your Vista Media into your dirve and boot from it.
2. Select "Repair your Computer" from the list.
3. Select "Command Prompt" from the recovery choices.
4. At the command prompt change your directory to C:WindowsWinSxS
5. Type: del pending.xml
6. Exit and reboot
This will fix all Windows update reboot loops and does not require you to restore your PC to and earlier state.
Figuring I had nothing else left to lose, I gave this suggestion a shot, even though it was for Vista. If this didn’t work, then I’d be getting on the horn to Microsoft Support for some help. Instead of deleting it, I renamed it pending.xml.old and then exited and rebooted.
“Applying computer settings…” OK so far so good…
“Configuring Updates stage 3 of 3 0%. Do not turn off your computer…” FUCKBURGERS!!!
“Press Ctrl+Alt+Del to Begin” WHAAAAAAAAAAAT? it worked.
Once it was up and running the first thing I did (other than tell the users they could access their files again) was to look in the event log and see what happened. On the first reboot last night at 9pm, there was an event from source Winlogon, Event ID 6004 “The winlogon notification subscriber <TrustedInstaller>failed a critical notification event".”
So the next step is to research that error and see if I can figure out WHICH update caused it… it could be a moot point though because my co-worker turned up some early results that once you do this, you’ve pretty much broken Windows Update on this computer forever. I can live with that for now, because people are working and the data is intact. If I figure out that that is the case, and figure out a workaround, I’ll post a follow-up.