"Load tests and stress tests rarely check for request storms on startup or failover, because the standard testing paradigm is to ramp up load gradually since that tends to be the most common load increase. If you don't have the 'request storm on startup and failover' scenario in your load test performance suite - it's time to add it."
"If application processing is slow but there doesn't seem to be an abnormally high CPU usage, look for blocked threads across several stack dumps. These are likely showing lock contention causing a bottleneck in an otherwise parallel application. The stack traces and lock information show exactly where the contention is occurring."