Java Performance Tuning

Java(TM) - see bottom of page

|home |services |training |newsletter |tuning tips |tool reports |articles |resources |about us |site map |contact us |
Tools: | GC log analysers| Multi-tenancy tools| Books| SizeOf| Thread analysers| Heap dump analysers|

Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks! 

Training online: Concurrency, Threading, GC, Advanced Java and more ... 

The Roundup October 2004

Get rid of your performance problems and memory leaks!

Modern Garbage Collection Tuning
Shows tuning flow chart for GC tuning

Java Performance Training Courses
COURSES AVAILABLE NOW. We can provide training courses to handle all your Java performance needs

Java Performance Tuning, 2nd ed
The classic and most comprehensive book on tuning Java

Java Performance Tuning Newsletter
Your source of Java performance news. Subscribe now!
Enter email:

Training online
Threading Essentials course

Get rid of your performance problems and memory leaks!

Back to newsletter 047 contents

The client that I was out teaching our performance tuning courses to was interested in being able to use JDK 5.0 during the course. Jack and I both figured, why not? After all, we encourage our trainees to experiment. As is the case with most courses, we have one special application that gets more than it's fair share of exposure during the week. It was this application that I put to the test.

The application contains three tiers, client, servlet, and db. I fired up the Servlet engine in 1.4 and hit it with our web loading script. After getting an idea of the CPU utilization and response times, I switched to running the Servlet engine in 5.0. What I noticed was that 5.0 used about 5% more CPU and added just under 10% to the overall response time. If you are looking to 5.0 for a performance boost, then you might like to wait for the next release as historically, the .0 release always seems to be slightly slower than the previous released version.

From javagaming we have an interesting explanation of tiered compilation. Currently the HotSpot profiler works to identify those portions of an application that are executed very frequently. Once identified, a JIT is directed to compile the "hot" section of code. How the JIT works is a trade-off between speed and the quality of the native code being generated. Tiered compilation will provide both higher quality native code at the cost of slower code generation, or will work quickly to produce a lower quality product. How all of this will work is still being sorted out. It is safe to say that it won't be showing up any time soon but it is something to watch out for.

Another poster was curious about how much memory 'new byte[1000000][1]' allocates as opposed to 'new byte[1][1000000]'. Would you be surprised if you were to find out the difference is 1.6MBytes? Yes you've read this correctly, the first declaration uses 1.6MBytes more than the second one! Remember, arrays are treated as a regular Java object. Thus you either have 1000000 of them all 1 byte long or, you can have one of them 1000000 bytes long. The choice is yours!

In our final thread from the gamers we have the following advice. "Mostly what you need to be able to do is tune the young generation collector to ensure that all our garbage gets zapped very frequently and hardly any of it gets to end up in the old generation." The question is, how does one do this? In my experience, I've found that it is best to increase the size of the survivor spaces. Since Eden and survivor spaces are all carved out of young space, increasing the size of the survivor spaces reduces the size of Eden. Reducing the overall size of Eden increases the frequency of GC. Increasing the size of the survivor spaces keeps the objects in young where GC is much cheaper. As with all performance tuning advice, your mileage may differ.

The Server Side

In our first thread from the serverside, we see a report of an attempt to use the J2SE 5.0. Apparently his attempt to run an Orion server on both windows and Linux resulted in a significant degradation in performance. One point, there is not enough information included in the posting to really determine if it is the move to 5.0 or if there is another issue that has caused the problem.

A posting asked the question of which J2EE monitoring tool one should use that integrates with MicroSoft's NTLM security framework. The response here, quite generously included a complete market survey of all such monitoring tools. Sweet!

Our final posting from theserverside asks for help in locating a tool that will load test an application server from inside. While it's useful to unit test from inside an application server, it is questionable if the same tactic is useful when load testing an application server. It is not surprising that there was not a single load testing tool recommended that would work from inside the application server.

The JavaRanch

From the JavaRanch, we have an eight month old thread just pop back to life as some developers took another stab at determining what runs faster, Enumeration or Iterator. As is the case with most initial benchmarks, this one was fraught with problems. Since none of the follow-on posts posted a benchmark, it's difficult to say if the numbers that they got were any better. In one case, the post claimed that using an Enumeration allowed the garbage collector to be "quieter". Which does bring up an interesting point, should garbage collection be included in this benchmark or not. The argument for including GC time in the benchmark is sound, if the algorithm consumes a lot of temporary objects than the cost of managing the life-cycle of those objects should be included. The problem is, the only connection between garbage collection and the algorithm is that it has dereferenced objects. This compounded by the fact that garbage collection is triggered when there is not enough memory in the heap to accommodate any more objects. Since this is all dependent upon the size of the heap, how big do we make the heap? In a normal tuning situation, you'd configure the garbage collector and memory spaces until you reached a sequential GC overhead of 5% or less. But if you apply this type of tuning practice to a micro performance benchmark then how do you compare the results?

Since the question is, which technique is faster (and we are assuming that we want to know by approximately how much), we could run both in a benchmark with equal settings. The next step would be to tune the VM until you've eliminated GC from either the Iterator run or the Enumeration run. If we assume that Enumeration is fastest, then we would use that as the baseline (i.e., it runs with no GC). If GC runs during the Iterator test, then that value is in excess of what it takes to run the Enumeration test. Not a perfect measurement but it certainly should give you a good idea of an answer.

In another posting, a greenhorn was microbenchmarking for differences in the performance of static variables over local variables. For the sake of sanity, I'm going to come out in favor of design. In other words, I don't care if static variables are faster, I'll let the design tell me if I should be using them or not. If it turns out that using local variables is a bottleneck, then I'll hit up on Sun to explain why!

In another thread there is an interesting discussion around the question, are we just now reaching the upper limits of processor speeds? The discussion draws no conclusion but does bring up an number of interesting points. First, even though the sizes are getting smaller, the relative speedups are not growing as fast as expected. Though IBM has reached its size expectations, it has not reached it's expected performance targets and yields are apparently very low. Even Intel has just recently announced that it will not be going after a 4.0Ghz chip and instead will be focusing on improving it's 3.6Ghz offering. Maybe they feel that it is time for some refactoring before going forward in the race for speed. One thing that is certain, processor speed is only one gauge of performance. For example, my Pentium-M 1.6Ghz is rated the same as a Pentium IV 2.4Ghz. One final note, there is an interesting reference to SEDA or Staged Event Driven Architectures. In yet another plug for Java performance, the site claims to have achieved speeds with their Java implementation that meet or exceed those of C/C++. Lets hear it again for dynamic compilations! And just in case you think this is just another clean room exercise, thing again. The list of applications based on their framework includes LimeWire, TerraLycos, Apache Excalibur Event Package, SwiftMQ as well as a few other highly scalable highly concurrent applications.

The final word,

You'd think that after the first round published in 2001, the server side would realize that no large company is going to gamble on the results when funding a study such as the one recently funded my MicroSoft. I dare say that Microsoft once again has scored a marketing success by putting the J2EE crowd back on it's heels. Doesn't TMC and Tyler Jewel get it? MicroSoft (or any large company) does not leave these types of studies to chance! The fix is in before the study is commissioned and as such, why would the TMC risk further damage to their already shaky creditability? For not letting the memory fade on the deliberate controversy generated from it's first study, I award TMC and Tyler Jewel the uncoveted Meadow Muffin award. Congratulations guys, you really stepped in it again.

Kirk Pepperdine.

Back to newsletter 047 contents

Last Updated: 2022-06-29
Copyright © 2000-2022 All Rights Reserved.
All trademarks and registered trademarks appearing on are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed:
Trouble with this page? Please contact us