We had a couple of 5 minutes blips on one of our servers today that meant we had to log people off that server and onto another one. This was caused by the server in question getting overloaded and failing faster than we could monitor. So we had to close it down and move everybody who was logged onto that server to another one, which meant those people had to log in again.
The good news is that our failover /replication worked perfectly, so no data was lost and users could just log in again and continue working.
This evening we’ve turned on some more servers and smartened up the load balancing so things should run smoothly again tomorrow.
It is a strange fact that servers never seem to degrade slowly: one minute they are fine, the next minute they slip into a death spiral with no warning. We’ve been signing up a lot of new users in the last few months, we had new servers racked and standing by ready to go but the suddeness of the extra load took us by suprise.
So our apologies to those users that were inconvenienced, we are working hard here to make sure it doesn’t happen again.
Posted by reallysimplesystems