Java Performance Tuning

Java(TM) - see bottom of page

|home |services |training |newsletter |tuning tips |tool reports |articles |resources |about us |site map |contact us |
Tools: | GC log analysers| Multi-tenancy tools| Books| SizeOf| Thread analysers|

Our valued sponsors who help make this site possible
Site24x7: Java Method-Level Tracing into Transactions @ $12/Month/JVM. Sign Up! 

New Relic: Try free w/ production profiling and get a free shirt! 

The Roundup September 2004

Get rid of your performance problems and memory leaks!

jKool for DevOps
Light up your Apps & get a cool t-shirt

Java Performance Training Courses
COURSES AVAILABLE NOW. We can provide training courses to handle all your Java performance needs

Java Performance Tuning, 2nd ed
The classic and most comprehensive book on tuning Java

Java Performance Tuning Newsletter
Your source of Java performance news. Subscribe now!
Enter email:

Site24x7: Java Method-Level Tracing into Transactions @ $12/Month/JVM. Sign Up!

New Relic
New Relic: Try free w/ production profiling and get a free shirt!

Get rid of your performance problems and memory leaks!

jKool for DevOps
Light up your Apps & get a cool t-shirt

Back to newsletter 046 contents

It is getting close to the time to start thinking about the latest and greatest version of Java to ever cross our computers. That would be none other than Tiger, a.k.a, JDK 1.5, a.k.a, Java 5.0. Now I get the progression of 1.2 to 1.3 to 1.4 and I even get the idea of Java2. What I don't understand is why Sun had to go and bump the number to Java5 skipping 3, and 4. I guess it has something to do with the enormous amount of change that the 1.5 is going to throw at us. Funny thing is, not many people have paid attention until now. Yeah, I know that most of you have heard of generics and auto-boxing, but do you really know what these things are? Have you thought through the impact that these changed in the core language might have on our ability to develop code? How about the impact that these structures have on performance?

In some instances, there should be no impact on performance as the changes are really all in the compiler. For example, generics are mostly about type safety and casting. In this instance, the work is done at compilation and there is no effect on the run time ... or is there? I guess the effect would depend upon the ability of the compiler to reduce this new grammar into efficient byte code. On the other hand, autoboxing has already been proven to be a performance drag. This was something that for me was not unexpected. I argued against the introduction of autoboxing more than a year ago in a guest editorial published in the JDJ. As projected, the main problems with autoboxing was the excessive amount of garbage that it created as it unwrapped and rewrapped primitives. The (not so) funny thing is, there is a perfectly good solution that eliminates the need for autoboxing. Unfortunately, this solution has either been overlooked or ignored. I would say ignored because the people who suggested autoboxing should have known that Smalltalk has a perfectly good answer to handling primitives. That is a structure known as an immediate object. The difference between an immediate object and a regular object is that the value is in the object oriented pointer (OOP) as opposed to being in heap. The thing is, this is all handled by the VM so programmers never get to feel that, and an immediate object seems no different than a regular object.

Now that the 1.5 (or 5.0 or what ever) is getting closer to being released, it is starting to stabilize and with it the desire for me to go in and just see how well these new features actually fare in a benchmark. As always, I suspect that the overall performance of the VM will be much better than it was in the 1.4. The 1.4 was a vast improvement over the 1.3 as it was over the 1.2. Be sure to watch this space for some micro-benchmarks. Be sure to keep reading here to see what's gone on this month in the discussion groups.

The JavaRanch

We start this month's round-up at the Java Ranch ( where the first question at the bar concerned a customer's request that reflection not be used in the delivered code. The perception is that reflection is slow and the reality is that it is slower than a direct method. The questions are: exactly how much slower is it, and the big one, does it matter?

It does seem sensible to be concerned about the use of reflection as there is a lot more work to be done to make a method call. First you have to find the method which means that you have to recreate the method signature. Then once you've found the method, you have to pack the parameters into arrays and then make a direct call to a method. In a direct call, you only do the last step, make a direct call to the method. So, it's extra work and it's got to take more time. But if you're only doing it once, then does it really matter? If the application is meeting its performance requirements does it really matter? If the client requests that you don't use it, do you ignore the request? The answer to all these questions is a resounding no. The thread does offer a link to an article that discusses the performance of reflection. A quick review of the paper revealed a couple of flaws in the micro-benchmark which may affect the result. Once again, fair warning to beware the micro-benchmark result.

In another discussion, the question centers on calculating Omega (or big O) value for an algorithm. It is a very academic subject that doesn't seem to make it out into the real world that much. That said, it is important in that it can be used to describe the strength of algorithm in a worst case scenario. The algorithm in question needed to create a list of N unique random numbers. If we ignore the really bad case where the random number generator cannot generate N unique numbers then the worst case is that we need to check each number in our list to see if the one that we have is unique. Then we would insert that number into the list. In the worst case (problems with the RNG aside), the algorithm functions in O(N^2).

Speaking of stronger algorithms, in another thread the question is posed on the efficiency of calculating sin and cos in Java. It was pointed out quite nicely that a power series can be used to approximate the values of sin and cos. These power series will converge in about 6 iterations for 32 bit numbers. It is surprising that this results in a speed up over the routines offered by Java but then the guys and gals at Java gaming spend a lot of time avoiding Java math routines so?.

Good news on Java startup times. It is being reported that the JDK 1.5 is starting up much faster than the 1.4 did. This information was teased out of a discussion that was triggered by the desire to have windows pre-start a VM to reduce perceived startup time. The thread dies hard when it is pointed out that all one is doing is throwing JVM startup time into the startup time of windows when doing this and that effect would not be appreciated. On this note, I do concur with that assessment, windows already takes too long to startup so I'm not interested in slowing it down.

On the home page of java gaming there is the announcement that, after a long absence, Jeff Kesselman is back! So, get out your Kesselman translators (Jeff is notoriously known for his bad typing) because the information that is encoded in his postings is worth the time it takes to decipher them.

At the heart of every gamming application is frame generation and because of this, you can be certain that there are a lot of discussions on that topic. What is fascinating is that at the beginning of the thread, one posting publishes this chart Then they launch into a discussion about garbage collection, garbage collection pauses, and object pooling. On the subject of object pooling there were two camps, those that used it extensively and those that let the object die and be collected (hence the tie in to GC). Letting the VM manage an object's life-cycle does result in less complex code. Using object pooling increases complexity but also reduces the strain on GC. Managing a pool will most likely require synchronization which can be a impediment to performance. Even so, the claim is that using pools actually significantly increased the performance of one participant's game. Truth be known, pooling was a solution to slow GC but the newer VMs are much better a GC then they once were which makes pooling less attractive.

My vote is always for the simplest thing that works and object pooling is never all that simple. For example, the posted code is not thread safe. Now that maybe is intentional as I have seen situations where collections with multi-threaded access have intentionally been left unsynchronized. That said, this looks more like an accidental decision.

In the next post we return to the performance of sin and cos. Though the author is anonymous it would appear to be someone from Sun engineering. The good news is that sin, cos, tan, ln, log10, square root and pow now use the X86 and AMD64 hardware. He is asking for suggestions on any other speedups. So if you have any requests, here is your chance!!!!!

The Server Side

From the serverside, we have a posting asking for advice on how to build a distributed cache. In this case every post came back with an answer from someone who has had experience building distrusted caches. As simple as the problem sounds, it is actually quite complex. The best solution was to buy one. To this end, several were offered, JBossCache, Tangosol and I'm going to mention one here that wasn't listed there, GemFire.

How does one list all the threads in a Weblogic Server? You can do it the BEA way by acquiring the ExecuteQueueRuntimeMBean and then running a query against it or you can do it the Java way and send a kill -3 to the server in Unix or a ctrl-break in windows.

In our final thread, it starts off as RMI vs JavaSpaces but ends up being RMI vs raw sockets! RMI performance has gotten better with every release. However, it may never be as fast as sending data over a raw socket. However, RMI stands for Remote Method Invocation. Ok, I know you all know this but I'm stating it here for emphasis. RMI is about a remote object interaction, raw sockets are about sending data. As you can see there is a significant difference in the level of abstraction of these two mechanisms. One supports the paradigm that we work in, object oriented programming, and the other is a simple service. Thus from a design perspective, using RMI fits. If RMI doesn't provide the performance you need, you can always extend it (as I've seen done in a couple of cases) or replace the non-performing piece of code to use raw sockets. As we always say, design considerations should win before any pre-mature optimization concerns.

The final word,

Last month I was asked about my favorite Java group, that is straight-talking java and it can be found at Yahoo groups. It has been known to go off topic on occasion so if you do happen to check it out do expect to see other topics beyond Java being discussed there.

Kirk Pepperdine.

Back to newsletter 046 contents

Last Updated: 2017-04-01
Copyright © 2000-2017 All Rights Reserved.
All trademarks and registered trademarks appearing on are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed:
Trouble with this page? Please contact us