server outage
We had a server outage from about 2012-11-06 19:00 GMT to 2012-11-06 23:00 GMT. The primary cause was a PDU, owned and operated by the data center, going bad. There was some back and forth as the tech initially claimed the server power supply was dead (he made a few more trips to check things when I questioned the simultaneous death of the dual PSUs). It seems to be functioning normally and is still handling a higher than normal load catching up with all the clients. Please make us aware of any issues that this may have caused. further timeline details: 19:05: PowerBlade and I received alerts 19:18: ticket opened for remote hands given the server *and* out of band management were not reachable 19:58: call to the DC to ask what the hold up was 20:16: tech claims the power supply is dead 20:23: I fired back questioning their conclusion 20:49: briefly we regained power 20:59: request for another check of the machine 21:32: machine made available again file system checks, db integrity checks, RAID integrity confirmation, etc., etc. 22:46: initially bringing everything back online 23:00: noticed and started 1 missed important service (upload handler)

More...