|
|
|
Back to newsletter 148 contents
A colleague described to me what happened when he was setting up a booth at a small IT show. During preparation, as the booths slowly turned on their equipment, at some point the circuit breakers tripped and power was lost to all equipment. It turned out that there was a misconfiguration of the power equipment, and after twenty minutes, the problem was identified and the circuit breaker re- enabled. But the circuit breakers immediately tripped again. After trying a few more times and having the same thing happen, the power engineer searched for the new problem in the power equipment. Everywhere he probed, there was no detectable problem. But whenever he re-enabled the circuit breaker, it immediately tripped it again. Finally, he worked out that because all the booths were almost fully plugged in, as soon as the circuit breaker was re-enabled, the equipment in all the booths simultaneously tried to fire up causing a huge surge that the power equipment couldn't handle, and so causing the circuit breaker to trip (correctly protecting the power providing equipment). He actually had to disconnect the majority of booths manually, and then gradually reattach them to the circuit before the floor was live again.
I'm sure you can see the analogy here. Some of you have probably even seen it happen to your servers. Something tips your server over the edge, causing it to fail. You bring it up as quickly as possible - maybe it is even redundant and you have an automatic failover to your reserve system. Then the restarted or reserve system immediately fails too. The cause is from all the subsidiary systems or users reconnecting. If it's an outward facing webserver, users are refreshing their pages because the page didn't display so you get an immediate storm of requests. It's a particularly embarrassing failure mode, because the more successful the system, the more likely this is to happen. Load tests and stress tests rarely check for request storms on startup or failover, because the standard testing paradigm is to ramp up load gradually since that tends to be the most common load increase. If you don't have the "request storm on startup and failover" scenario in your load test performance suite - it's time to add it.
Now on to all our usual links to Java performance tools, news, articles and, as ever, all the extracted tips from all of this month's referenced articles.
Java performance tuning related news.
Java performance tuning related tools.
Back to newsletter 148 contents