JeffD said:
Good time to review the backup procedure. I can't figure out how a failure on the server would screw up your backup, but I guess it did. I started backing up my web server off site daily, after an event similar to yours convinced me it was essential. Now I sleep better at night.
Well I had 4 backups in play luckily and between them we were able to recover everything.
1 backup on server's secondary drive (it was partially corrupted as it was a copy of the corrupt primary disk)
1 decommissioned server in california (it's data was good but from the 15th)
1 off site full rsync backup (data was good but only had apug.org and was from the 17th)
1 full site backup from the 19th on my local pc with good data but hard to transfer with my slow NZDSL connection
The problem is most of the data was corrupt in production, and that corrupt data then copied over to the nightly backups which made the backups corrupt. We had no indication there was an issue until the crashes. When you have about 6.5gigs of data in the backup you never truly know it's good until you try to use it. I never anticipated this because usually a primary disk dies and you fall back to the secondary with last nights data. In this case last nights data was also bad. Luckily we had a variety of backups + acquiring some still good db's off the old disks.
In the future I plan to not overwrite nightly backups anymore, for example maybe have 15days worth of backups on the backup disk. If data goes bad we can keep going back a day until it's clean (hopefully always having a recoverable set of databases less than a day old). I'll be examining a lot of options. At this very moment I have a newly generated zip of the entire site transferring to my offsite storage and my local pc. I'll then take nightly database backups until the new solution is in place..