Java Performance Tuning

Java(TM) - see bottom of page

|home |services |training |newsletter |tuning tips |tool reports |articles |resources |about us |site map |contact us |
Tools: | GC log analysers| Multi-tenancy tools| Books| SizeOf| Thread analysers| Heap dump analysers|

Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks! 

Training online: Concurrency, Threading, GC, Advanced Java and more ... 

The Interview: Steve Mayer, GC guy

Get rid of your performance problems and memory leaks!

Modern Garbage Collection Tuning
Shows tuning flow chart for GC tuning

Java Performance Training Courses
COURSES AVAILABLE NOW. We can provide training courses to handle all your Java performance needs

Java Performance Tuning, 2nd ed
The classic and most comprehensive book on tuning Java

Java Performance Tuning Newsletter
Your source of Java performance news. Subscribe now!
Enter email:

Training online
Threading Essentials course

Get rid of your performance problems and memory leaks!

Back to newsletter 035 contents

This month we interviewed Steve Mayer, an authority on Java garbage collection. And we find out just why he is an authority on Java garbage collection.

JPT: Can you tell us a bit about yourself and what you do?

I am a Principal Software Engineer at dynamicsoft, Inc. where I have worked for the last 5 years. Before that, I had a 5-year career as a Software Engineer on Wall Street. During all of these positions I spent a significant amount of my time performance tuning the applications that I worked on. At dynamicsoft, I focused a large amount of my time analyzing the Java garbage collector?s affects on our applications? performance and optimizing accordingly.

JPT: You've become something of an authority on analyzing Java garbage collection. How did that happen?

It happened quite naturally. In the early days, our products were designed elegantly for flexibility and growth. This was necessary, since the underlying protocol on which our products are based (SIP - RFC 3261) was still in its infancy within the IETF. As SIP evolved and our products became full implementations of the specification, we turned to performance tuning.

Performance was so bad in the beginning, as it so often is with new software, that the time spent in garbage collection (GC) did not even register as an issue. The applications were not fast enough to create large amounts of objects in a short amount of time. Since we were doing first-pass performance tuning, it was relatively easy to get an order of magnitude improvement rather quickly.

Once the applications were running this fast, they were able to create a much larger amount of garbage. At the time, all SIP transactions lived for 32 seconds. So, a proxy server that could do 20 transactions per second (TPS) created 640 incoming and 640 outgoing transaction in 32 seconds. At capacity, there would always be 1,280 "live" transactions that the GC could not collect. These transactions were (at the time) about 8 KB each, so a total of 10 MB would always remain uncollected in a 32 MB heap. The JVM paused for well under a second during a GC of this size, so it went unnoticed. Maybe 10 transactions would have to wait for a GC, which was no big deal.

After we performance tuned to over 200 TPS, we were using about 100 MB during a 256 MB GC. This took a couple seconds to complete, hence about 400 transactions would have to wait, and worse than that, there were retransmissions of each transaction that had to wait for more than 500 ms and again if it waited more than another second. Ironically, the more we performance tuned, the worse our problems became. It sounds strange, but the explanation is simple, the application spent less time processing each transaction, allowing for more garbage to be generated, which took longer to collect. Hence, there were two factors multiplying that caused GC times to be a larger percentage of application time.

Once we realized that GC was our biggest performance problem, the search was on to solve it. This was primarily my responsibility at dynamicsoft. Given this directive, I spent the last three years understanding the behavior of the different garbage collectors, and determining how to tune our applications to best work in concert with the collector. So, the simple answer is "I learned to analyze Java garbage collection out of necessity." If you are going to write near real-time, high performance applications in Java, you are going to need to understand the effects of garbage collection on your application. And after we did, performance of our applications went up by well over another order of magnitude.

JPT: What do you consider to be the biggest Java performance issue currently?

On the one hand, Java is elegantly designed with a large collection of general-purpose classes that make writing any kind of application much easier than it would be if you had to create these classes on your own. Even getting third party classes to do this would not be as nice, since everyone would not be using these same classes; you end up losing some skill portability between programmers. So it is great that this infrastructure exists.

The downside of this is that everything is generic, as it must be. These classes cannot be performance tuned to nearly the level that they could be for a specific task. A simple example of this is parsing an int from a byte array. The Java classes can only do this for you by parsing from a String. Turning a byte array into a String is very expensive due to multi-byte international character support (a great thing, but slow). Plus, this creates a new char array and a temporary String object that is immediately thrown away. Even with the vast improvement in object creation and garbage collection speeds, they are obviously much more expensive than doing nothing at all. Avoiding the international character conversion causes a second char array to be created, since you will need to do the casting yourself and then the String class will copy the chars from the array you provide.

So, let?s take a look at the difference in the times that it takes to parse a 4 digit int from a byte array by:

  1. Doing it the natural Java way
  2. Casting your own char array and then doing it the Java way
  3. Writing your own specific parser that works directly on the byte array

The natural Java way takes 2.20 microseconds. By doing your own casting, even though this creates an extra object, the time taken is reduced to 0.86 microseconds. This means you can now parse about 2.5 numbers as fast as it would have taken you to parse 1, not too bad. But, if you write your own parser to work directly on the bytes without any temporary objects being created or collected, it only takes 0.14 microseconds. That?s almost 16 times faster than the original, easy to do, method!

However, if you notice, the problem still lies with object creation and garbage collection. This is at the heart of the problem why these generic classes can be so costly to an application?s performance.

JPT: Do you know of any requirements that potentially affect performance significantly enough that you find yourself altering designs to handle them?

At this point, every time a new requirement comes, I find myself asking, "How is this going to affect performance?" As a team, we became so focused on performance that we never wanted to add new code. Every time you add code, you are slowing things down. Maybe not significantly, but still, we were that concerned. So, we would always try to tweak something else, when we added something new so that our overall performance would never decrease.

So yes, I try to tailor the design to encompass the performance that we need. The important thing here is that the overall design is flexible enough that it allows you to modify the code for performance without being completely intrusive to all the code. As you tune, you are going to need to re-factor things a few times, but that is just part of the game.

JPT: What are the most common performance related mistakes that you have seen projects make when developing Java applications?

The biggest performance related mistake, and it happens a lot, is not having the performance requirements defined early enough in the project. When you are designing software, you need to understand just how fast it is going to need to go. Having a hard number makes it possible to leave out optimizations that are intrusive, make the code much harder to maintain, or just take a lot of effort. If you have a soft number, like "as fast as it can go", then you have to make an effort to make these kinds of optimizations.

I know that it is often difficult to understand these numbers early in the project, but it is just as important as any other requirement. The later you put that requirement in, the harder it is to account for. Probably even harder than most requirements, since performance affects the entire systems, not just a part of the code, like a new feature might. This is one of the reasons that you need a design that lets you adapt to changing performance requirements.

JPT: Which change have you seen applied in a project that gained the largest performance improvement?

The largest single gain was when we were allowed to break external APIs and re-design major areas of the code. Although we had this freedom, common sense told us not to take complete advantage of the opportunity, since it would likely come back to haunt us, and as it turns out, it would have. Anyway, we did make a major change: we replaced the usage of the String class with a byte array based string class.

Since the majority of our applications build on our components were server based, there was no need to have real Java Strings, since there was no user to show them to. So, when the main thing that the application needs to do is read bytes from one side of the wire and write bytes on the other side, why spend all the time to transform the bytes to Strings and then back to bytes?

Since the applications do so much character manipulation, this optimization made our applications about 40% faster with this change. And this was after significant optimizations were already made to the code. In fact, we felt that we could not optimize further without taking steps like this one.

JPT: Have you found any Java performance benchmarks useful?

Not really, but only because I have not been in the position to be able to make a JVM or hardware switch. We have been using 1.2.2 from Sun because of the concurrent old collector. It was only recently, in 1.4.1 that this collector reappeared. Now we will be moving to J2SE 1.4.2 from Sun.

It is nice to have the benchmarks so that we can see how the world of Java performance has improved and continues to improve. It is also a good starting point to pick a JVM and hardware platform, if you have that luxury. But, in the end, you need to test you particular applications on a given platform and make sure that it performs the way that you need it to.

JPT: Do you know of any performance related tools that you would like to share with our readers?

There is a lot of good stuff out there that can help you performance tune. The important thing is that you are using something that helps you find the bottlenecks. I have used OptimizeIt! with a good deal of success in the past.

JPT: Do you have any particular performance tips you would like to tell our readers about?

Move to J2SE 1.4.2 and use the optimizing -server JVM option when appropriate.

This link is a very useful one for understanding the JVM options in general:

Use these options to get optimal performance from your application without having to change any code. Use the parallel young-generation collector (if you have more than 1 CPU) and the concurrent old-generation collector (regardless of the number of CPUs.)

If you want the extreme details on garbage collection and how to optimally set the JVM options that affect the garbage collector?s behavior, see the detailed paper that I co-authored with Nagendra Nagarajayya, of Sun Microsystems, at:

The most important thing is that you are able to analyze your application code and make the optimizations. You will need to be able to identify problem areas, measure the time it takes to do a certain operation and then optimize it and repeat the exercise. I have seen plenty of "optimizations" in my time that really ended up slowing things down. So, you need to have confidence, through empirical measurement, that the new code is in fact faster than the original.

Assuming that you have done all of the basics, like using the fastest JVM, the fastest hardware, designed the code properly for performance, and optimized the bottlenecks, you are going to need to get down and dirty to get any further performance our of your code.

One of the things that you need to know is how much time certain operations cost relative to other operations. When you have a choice of doing things several ways, you just naturally start to pick the fastest one. In order to gain this knowledge, you need to take a lot of measurements. Over time, you will start to just know the fastest way of doing a particular operation and use it automatically. This is the only way to truly start writing optimal code.

Thank you for your time Steve.
(End of interview).

Back to newsletter 035 contents

Last Updated: 2024-03-29
Copyright © 2000-2024 All Rights Reserved.
All trademarks and registered trademarks appearing on are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed:
Trouble with this page? Please contact us