|
|
|
Back to newsletter 062 contents
Just recently I was asked the question, "How do estimate how much hardware is needed to support an application?" Unfortunately the only answer I have is, "as much as you need." Not a particularly useful answer as it is as vague as the question. The questioner is asking us to divine a significant amount of information regarding what the users will be up to, how these activities will be supported, what resources that support will require and at what level will those resources be consumed. Not simple questions, however they do seem to start leading us in a direction that may help us de-mystify the original question.
The answer to the question, "what amount of system resources will be consumed", is the same as the answer to "how much hardware is needed for this application". However the former question is different in that it implies that different implementations will require different resources and utilize them at different rates. This is fundamentally an architectural question, and to answer questions of architecture we must use architectural tools. The architectural tool to employ in this instance is benchmarking.
One of the purposes of benchmarking is to help architects model an architectural design decision to determine if that choice is strong enough to support the expected workload. To be successful both the model and the stress placed on it has be representative. Using a non-representative load can lead to either a failure to recognize a weak design or a design that is over engineered and consequently expensive to implement. Using a non-representative model means that you are not testing your design, you?re testing some other design and consequently you will derive at best questionable data in regards to how your system will react to load. At issue is the only true measurement of what your application will look like under load is your application under load in your production environment and of course we are trying to answer that question before we are in that situation. Consequently we need to answer these questions before building the entire system.
If we look at the complete system as a sum of its constituent components, then we can view how it will utilize system resources as a sum of how each of it?s constituent components will utilize these system resources. Why this decomposition is useful is that it allows us to isolate and study the theoretical limits of each individual component. For example if our system will rely on JMS, we can provide estimates for message size and frequency to help answer the question of how much network bandwidth is needed. By keeping the benchmark small and the questions that it is to answer simple we now have a means to estimate hardware requirements. The key to success is to try to size the benchmark so that it is large enough to answer the question at hand and not try to answer every question. For example the JMS solution maybe configured as it would be in product but instead of being used by a real application, it should most likely be used by a harness that creates messages on one end and receives message on the other. There should be no handling of the message. If it is required that a listener become unavailable (ie we need to simulate the in-line processing of a message), then the listener should sleep for some configurable period of time. This way the benchmark is controlled and consequently is repeatable.
If we attempt to construct an all encompassing benchmark we run the risk of ending up with a benchmark that will not answer the question that we are asking. For example, the authors of a well known framework recently attempted to answer the question, how much drag does their frame contribute to overall system performance. In the testing that they did they found that their frame produced better results then the same test conducted against the raw resources. They concluded that their test was flawed in a way that they have not been able to ascertain (due to time constraints). However a quick look at that benchmark and it is clear that they have over complicated the implementation. This over complication introduced "noise" into the system that resulted in these unexpected results. To understand a system one needs to tease it apart from any undue influences. In this case they included a database in the benchmark and although it may be necessary to understand the effects of having a database in your system, that element of testing needs to be added back into the system in a controlled manner and only after one understands the performance characteristics of the system utilizing it.
Another example to illustrate this point: we have a series of micro performance benchmarks that measures the cost of object creation. The first implementation in the series completes the unit of work in less than 800ms. The final implementation completes the unit of work in about 60ms. The first thing to know is that each of these benchmarks is completing the exact same task in the exact same manner. The only difference between these benchmarks is that garbage collection has been configured out of the results. Note that the question was regarding the cost of object creation and consequently garbage collection should not be part of the answer. However when considering the question of the cost of an object life-cycle we can now see that object allocation is quite cheap but reclamation is very expensive. However this point is not easily visible without a proper answer to the initial question. It is the na?ve implementation of this type of benchmark that has resulted in the recommendation that objects be pooled. We now know that object pooling actually increases the cost of each individual collection. Object pooling still has a place for some types of objects, mainly those with a very high cost of creation, or where you need to minimize garbage collection to maintain deterministic garbage collection pause times, but is almost certainly not needed as often as it is used.
We often have to deal with unreasonable questions from those from whom we build applications for. Quite often a more reasonable set of questions is lurking just beneath the surface of those unreasonable questions and quite often we have good tools that we can employ to divine an answer. In this case we can convert the original question into three and with some careful crafting, we can use benchmarking to provide useful answers.
Back to newsletter 062 contents