A Message From Jude About This Last Outage
Oct 23, 2008 at 6:02 PM Thread Starter Post #1 of 1
Dear Friends,

You may have noticed that some of your posts are missing, so, for that, I apologize. We had to restore from our last backup--a backup made about 10 or 11 hours ago, so I estimate a loss of approximately 600 to 700 posts, as most of that time was off-peak. Nevertheless, that's a substantial loss, and, again, I'm very sorry.

Over the past few weeks, you may have noticed an increase in up and down periods, with maintenance messages posted by me about system work going on from time to time. This type of activity has increased substantially over the past few weeks, as we have been (and are preparing) to make a major move to (somewhat ironically at this point) increase our data safety--we have been very sensitive about this issue since last November's huge outage.

This particular outage was somewhat unexpected, as what we were doing was pretty routine. We back up the the forums every other day to daily now to a storage appliance and locally. And, as a matter of practice, before any work is performed that we think poses any possible risk that is even somewhat higher, we back up just before doing so. We were working on the database servers (which replicate data amongst themselves in real time), doing what was pretty routine work, when a corruption we inadvertently caused on one server started replicating to the other database server(s). I take full responsibility for not backing up immediately before the maintenance (as that decision was mine, and quickly made, given the nature of what we were doing)--again, I thought that what we were doing was pretty routine (or should have been). More simply put, as it is currently set up, we use replication as our first line of defense, and the backups when that fails. Replication bit us today (again, I'll take the blame for that), so the last backup (again, made about 10 or 11 hours ago) is where we're at right now.

The work continues. We'll be even more careful than we've been. But big moves are coming to accommodate the growth, and the growing rapidity with which content is being created by the community here (and so the stakes ever higher), so you'll still be seeing more and more system maintenance notices (ideally all planned-for notices). In short, we're looking to completely move away from direclty managing our own Head-Fi.org hardware, and moving to more managed hosting services that are more ideally suited to Head-Fi's growing needs. But, again, I'm sorry I lost some of the content you created today (the posts, private messages, and registrations made since last night's backup to the point of the maintenance-induced corruption an hour or so ago).

Admin/Webmaster, Head-Fi.org

(This message is being copied to the homepage, as well as to all of the big sub-forums.)

Users who are viewing this thread