Java Performance Tuning

Java(TM) - see bottom of page

|home |services |training |newsletter |tuning tips |tool reports |articles |resources |about us |site map |contact us |
Tools: | GC log analysers| Multi-tenancy tools| Books| SizeOf| Thread analysers| Heap dump analysers|

Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks! 

Training online: Concurrency, Threading, GC, Advanced Java and more ... 

The Roundup January 2005

Get rid of your performance problems and memory leaks!

Modern Garbage Collection Tuning
Shows tuning flow chart for GC tuning

Java Performance Training Courses
COURSES AVAILABLE NOW. We can provide training courses to handle all your Java performance needs

Java Performance Tuning, 2nd ed
The classic and most comprehensive book on tuning Java

Java Performance Tuning Newsletter
Your source of Java performance news. Subscribe now!
Enter email:

Training online
Threading Essentials course

Get rid of your performance problems and memory leaks!

Back to newsletter 050 contents

BeJUG's Javapolis conference is a perfect chance to have face to face conversations with people. For me, it is always a pleasure to run into (the now infamous author) Gregor Hohpe. We did spend some time talking about a thread that I'd started a while back with the others who were involved in reviewing Enterprise Integration Patterns. In that discussion, I was trying to get feedback on what they thought about Sun tying the Data Transfer Object (DTO) pattern to the Data Access Object (DAO) pattern.

Even though I don't believe that it started out this way, by the time that Sun published the J2EE blueprints, a DTO was the contracted object to act as a go between the domain and the persistence layer. How I believe it started (and I maybe wrong here but..) is that the author said, "DAO to some object... say a DTO". In essence, the DTO was used instead of a specific domain object. Others picked up on this, and now it's locked in stone.

Now I've seen some wonderful implementations of DAOs. Done right, they do isolation persistence from the domain. What they don't do is decouple what is being persisted in the domain to what is persisted in the database. If the sentence or the idea sounds kooky, then you've read it right. If the idea sounds kooky, then you've understood it. Funny thing is, people use DTO to try to do exactly that, decouple what is to be persisted from what is persisted. A less kooky way saying this is they are trying to decouple the DAO from the domain and unfortunately, this cannot be achieved. Ok, maybe I should relax this and say it is difficult and expensive to achieve and you ain't going to get there with a DTO.

I started my conversation with Gregor by saying I think that I have an anti-pattern and, once I finished my diatribe, Gregor blurted out False Decoupling (alternatively called Obfuscating Couplings). Once we got the basics worked, our conversation degenerated into a spat over who was going to get to get to blog it. Gregor graciously let me win but later on I realized that if I published this wonderful new anti-pattern, no one (besides you all) would ever read about it. So I've dutifully sent an email off to Gregor telling him to go ahead and publish it [Kirk is being a little modest here, this roundup column has thousands of readers each month - ed.].

One of the things that I've been interested in is the performance of patterns. This interest stems from the fact that architecture and design will have the largest impact on an applications performance over all other factors (though it may be considered an edge case in this argument, I've seen good architecture make up for a woeful lack of hardware). Bruce Tate does a bit of this in his Bitter book series. Where we differ is that he focuses on anti-patterns in general where as I am interested in what happens to performance when patterns are misused.

What happens when DTOs are misused (or used as prescribed in Sun's blueprints)? The answer to this question is that you've introduced yet another layer of serialization into your application. Serialization is but one form of marshaling an object. Marshaling is the act of taking an object in one form and transforming it to another usually with the explicit purpose of being able to reconstitute the original. With this definition, you most likely see that we do a lot of marshaling aside from that which takes place in network communications. In fact, Bela marshals objects into JBossCache, we sometime marshal objects to GUI widgets and we often marshal objects into a database. In fact, that is what the DAO is designed to do, marshal objects into the database. It does this by scooping the state out of an object and injecting it into a SQL update or insert statement. In the case of the Sun blueprint, that object to be marshaled is always a DTO. But in order to do that, we must transform our domain object to a DTO. To do that, we marshal our domain object to the DTO.

DO -> DTO -> SQL -> Byte buffer

Figure 1. Progression of a Domain Object to the Database

If we examine the progression of a domain object on it's journey to the database, we see why I've labeled the DTO yet another layer of serialization. Ok, we often put in extra layers such as proxies as these often yield a better design or some other advantage. Remember, there is no decoupling effect and without that there would appear to be no other advantage to using a DTO. So, it's helpful in explaining how DTO works, maybe not so helpful in the application but the real question is, is it harmful? Well since this is part of an anti-pattern discussion...

Using a DTO is harmful in this manner. First, the extra layer of marshaling does consume more CPU. It introduces a small delay in what is often a critical section of code. Using DTO creates more temporary objects and consequently puts more of a strain on GC. DTO obfuscates the real coupling that exists between the DAO and the domain. Finally, (at least for this argument) DTO bloats the code base after all, this extra marshaling does require code to manage it.

In performance management, you have two types of consumers of resources and response time budgets; those that grab large chunks all at once or those that take a wee bit very often. DTO often (but not always) falls into the later category. Unlike other objects (such as String) that fall into the later category, the use of DTO is typically well contained which does make this an easier problem to deal later on. That said, we should heed the lessons that anti-patterns teach us. In this instance it's not a premature optimization, it's planning for performance. And with that, let's move to see what lessons we can learn from others in the performance discussion groups.

The Server Side

From the server side ( we start with a thread ( ) announcing the release of JCrawler. Though the post claims that JCrawler is not just another web load testing tool, it is just another load testing tool. The first respondent to the post pointed out they used JCrawler and were disappointed when they couldn't find their bottleneck until they instrumented their application with Performasure. It is an interesting complaint because it's like complaining about having to boil water to make tea. It can be done but... JCrawler only provides load on the application, you still have to instrument your application to see how its constituent parts are responding to that load. That said, we have a new load testing tool to check out and at least one person who has accidentally used it successfully.

In my experience the most commonly asked question is how many users will that support: and true to form, that question gets asked once again this month in relation to J2EE applications running in a clustered environment. ( It's a bit of a na?ve question because how a J2EE application server performs is tightly coupled to how your application decides to use it. Used effectively and you should be able to support thousands of users performing light tasks assuming you have the right hardware. Done poorly or if the computational requirements are high, you may find yourself only able to support a few dozen users. So the question becomes; did you architect, design and code your application to fit into a J2EE environment? Do you have enough hardware to support the computational and communication requirements? Currently, this information has not been codified in a format that makes it available for all to read. Yes, we have design patterns and anti-patterns but it's still a question as to which patterns should be used and which ones should be ignored. Currently this information come bottled inside the minds of experienced J2EE experts. The best bet is to find one or two of these resources before starting down the road of building your J2EE application.

Do prepared statements get pre-compiled? That was the start of a thread ( The answer; prepared statements get compiled and cached as they are used in WAS4. There were a few warnings about how WAS may not properly close JDBC result sets which has the effect of running the application server out of connection resources. One would think that this would not be an issue any longer as WAS4.0 is a very old release. Also, WAS4.0 is tied to the JDK 1.3.1. That said, it is surprising (though it shouldn't be) that many development shops are still on versions of the JDK and J2EE that are this old and older. The reasons for this are best known by those that are still reliant on this ancient technology.

One other point of interest that came out of the thread was a technique for tuning the size of the prepared statement cache. The technique starts out by creating a modestly sized cache and then watching the rates of recompilation. If that rate was high, then they would increase the size of the cache by 50% and then repeat the process as needed.

The JavaRanch

From the Java Ranch we find a very short thread that brings up an old favorite subject of tail recursion ( The thread starts with a bit of code designed to test for the presence of a tail recursive optimization. A method is considered to be recursive if it calls itself. An example of this is found in listing 1.

public int tailRecursiveSum (int num) {
    if (num <= 0)
        return num;
        return num += tailRecursiveSum (num-1);

Listing 1. An example of a recursive method

If we were to make a call to tailRecuriveSum( int) with a parameter value of 3, we would make a recursive call for the values of 2, 1, and then 0. Each of these calls would result in a building up of the execution stack. The fact that we are keeping a reference to the current value (num) forces us to return to that execution context as we calculate the final result. A tail recursion is a special case of recursion. A version of the code is found in listing 2.

public int tailRecursiveSum (int num) {
    return trsHelper (num, 0);
private int trsHelper (int num, int soFar) {
    if (num <= 0)
        return soFar;
        return trsHelper (num-1, num+soFar);

Listing 2. An example of a tail recursive method

In this instance, the helper method passes in the current result and in doing so, frees the application from having to return to this particular execution context. Now it is up to the compiler to recognize that the code is tail recursive.

If the compiler is designed to recognize tail recursion, it can optimize by not building a new stack frame with every invocation. Instead it can safely over write the values in the current stack frame.

If you run the code with a large enough value, all the stack frames that you are building up will run the application out of memory. Since tail-recursion doesn't build up stack frames, it won't run out of memory (for that reason). If you run this test using Sun's JDK, you will get an OutOfMemoryError being thrown. Thus we can conclude that javac does not support tail recursion.

In our next thread from the Java Ranch, we get the question, how does one avoid an OutOfMemoryError when trying to load 14,000,000 objects into an ArrayList. The obvious answers include; get a whole lot more memory or, alternatively, don't load that many objects into memory all at the same time.

The problem with the first solution is that keeping that many objects in memory increase the cost of every GC cycle. In other words, there is a cost to be paid in adding extra memory to an application. The problem with the second solution is that by not loading all of these objects into memory, one will most likely be asking for an object that is not in memory. The question now becomes, which solution will offer the smaller penalty?

It is typical that an application would only visit a small portion of the entire data set that is available. Because of this, most solutions favor the later solution and as we have seen, the industry is busy working hard to solve the problem of intelligent caching.

From Java Gaming ( we have a discussion ( on how Sun might improve the interactions between Java and C so that we might have a more efficient JNI. I must admit that I've only though about interfacing C code to Java twice in the last 8 years. I do have to admit that interfacing with C is a painful experience, more painful than any other inter-language communications that I've done in the past and that does include a number of different combinations. Since there is no way that Java is going to be able to do everything for you (even though it does quite a bit), there is always going to be a need to interface with low level languages. What is interesting is there is a reference to a VM called black bird that was new to many in the group (including myself). Maybe we have a new VM to check out?

Finally from Java Gaming we have a thread on the optimizations available in the JRocket VM. The JRocket VM is a great example of how using a different set of assumptions can affect the implementation of a very specific specification. The assumptions that the JRocket team has made are all based in a specific target environment, long running server processes. In that environment they find that one typically has many CPUs and lots of available memory. Using these assumptions, they went on to optimize the implementation to those hardware conditions. JRocket does not contain a interrupter which means it must compile all byte code to native code straight up. They have also tuned GC and other operations in the VM to suite the server environment.

The effect of some of these optimizations is that JRocket appears to perform poorly in na?ve benchmarks (those that run for a very short period of time) on desktop computers. Where JRocket shines is in longer term studies that more closely mimic those found in production. For example, studies have shown that JRocket outstrips it competitors when it comes to picking up and dropping socket connections. However, I have seen shops stop using JRocket because it is a different beast.

It is interesting that JRocket would show up in a gaming discussion group because I would not expect that JRocket would do so well in this area. That said, we've seen other unexpected results when using techniques that we didn't expect to work. The conclusion that we've drawn from these experiences is that in order for one to really know if something is going to work or not is to experiment. Experimentation is something that we always encourage. So, experiment with JRocket, you may be pleasantly surprised.

Kirk Pepperdine.

Back to newsletter 050 contents

Last Updated: 2024-03-29
Copyright © 2000-2024 All Rights Reserved.
All trademarks and registered trademarks appearing on are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed:
Trouble with this page? Please contact us