Java Performance Tuning

Java(TM) - see bottom of page

|home |services |training |newsletter |tuning tips |tool reports |articles |resources |about us |site map |contact us |
Tools: | GC log analysers| Multi-tenancy tools| Books| SizeOf| Thread analysers|

Our valued sponsors who help make this site possible
New Relic: Try free w/ production profiling and get a free shirt! 

Site24x7: Java Method-Level Tracing into Transactions @ $12/Month/JVM. Sign Up! 

The Roundup July 2004

jKool for DevOps
Light up your Apps & get a cool t-shirt

Get rid of your performance problems and memory leaks!

Java Performance Training Courses
COURSES AVAILABLE NOW. We can provide training courses to handle all your Java performance needs

Java Performance Tuning, 2nd ed
The classic and most comprehensive book on tuning Java

Java Performance Tuning Newsletter
Your source of Java performance news. Subscribe now!
Enter email:

New Relic
New Relic: Try free w/ production profiling and get a free shirt!

Site24x7: Java Method-Level Tracing into Transactions @ $12/Month/JVM. Sign Up!

jKool for DevOps
Light up your Apps & get a cool t-shirt

Get rid of your performance problems and memory leaks!

Back to newsletter 044 contents

Is virtual memory management now becoming an outmoded technology? It is an interesting question and one that was indirectly asked in the forums by someone who was looking for tuning parameters that would lock a VM into memory. It's also a question that Jack and I have discussed. The reason for such a question is that paging and virtual memory will result in your application being scattered throughout real memory. For most processes, this does not represent a real problem as the OS and hardware support the use of virtual memory addressing with translation tables. The translation table works out where things really are. So if the hardware and OS are well adapted to using virtual memory, why would we be suggesting that one stop using it?

One reason is that we are now running a specific type of process, the Java virtual machine, that presents a different memory utilization profile than what more traditional C/C++ applications offer.

One of the first things that I found surprising about Cray super computers is that although they run BSD UNIX (UniCOS), the OS and hardware do not support virtual memory. What this means is that every bit of every program running on a Cray is in real memory all of the time. Why would they do this? The reason turns out to be quite simple; swapping in pages from disk would defeat the usefulness of the hardware supporting vector processing. The vector processing hardware/memory banks are highly optimized to produce a single result on every clock tick. They do this by scattering memory addresses throughout the different banks of memory that reside on the machine. If the hardware/OS had to perform a calculation to determine if the address was in memory or not, then this act in itself would defeat the usefulness of the vector processor. This says nothing of the cost of swapping in memory.

Cray Research did some studies on program execution. In those studies, they found that program execution is typically highly localized and the point of localization progressed quite slowly throughout the program. This information was used to optimize the construction of the instruction buffer (used to cache the next 40 words of instructions to execute). The purpose of this optimization was to ensure that an executable was able to execute without going to RAM to get the next instruction. In this case, it was determined that accessing RAM would be disruptive to the application's performance. So although this optimization has nothing to do with virtual memory, what it does say is that Cray Research found that even accessing RAM was costly enough that it was worth avoiding the trip. Once again if we consider the cost of moving from RAM to the CPU against moving from Disk to RAM to the CPU we can see why we would not want shared text (the instructions and symbols portion of an executable in a memory image) to be swapped to disk. UniCOS may look and feel like Unix but under the hood, it is quite a different beast all together. Cray has been able to achieve phenomenal execution speeds by a combination of understanding their users applications and using that knowledge to help them design the hardware.

In many respects, the execution profile of a Java Virtual Machine should be far easier to predict than that of an arbitrary application running on a Cray. Take memory management for example. The JVM acquires Java heap space (really a large chunk of C heap) and then uses it to create the data space for objects. Once the object is no longer referenced, it is the job of the JVM to recover the memory so that it might be reused. Now this is where performance problems can creep in. Objects that reference another don't have to be co-located on the same page in memory. If two related objects are not on the same page, then it is quite possible that one will be in memory and the other will be swapped to disk. Traversing those objects will result in the OS swapping pages in and out of RAM as it tries to resolve references that bounce through out memory (and hence virtual memory which may or may not be in real memory). In this scenario, it is quite possible that what should be a simple and quick garbage collection may take a considerable amount of time.

In order to avoid "dead time", we should avoid swapping any part of the JVM out to disk. In other words, we should avoid setting the maximum amount of memory (using the -Xmx flag) to a value that is greater than the amount of real memory that we have available. By available, we must consider the amount of memory being consumed by other processes. We also need to consider the amount of memory that will be consumed by the JVM itself. When we take all of this information into consideration, it makes sense to follow the lessons learned by Cray and just turn off virtual memory all-together. The garbage collector will thank you for it.

From we see a posting containing "-XX:MaxGCPauseMillis=2". It would seem as though that Sun in the 1.5/5.0 release is slowly revealing more options to control GC. In general GC seems much improved in the 1.5/5.0 release. Jack and I are looking forward to giving it a workout the in coming days.

Is there any way within the JVM to detect what the CPU is doing while your application is running? The motivation for asking such a question was so the developer could use the runtime information to try to reduce the load on the CPU. There were three suggestions offered. The first answer; it's running your program. The second answer was that you'd need to resort to using the JNI to acquire that type of information. The more interesting answer was to use the frame rate to determine if the (gaming) application was keeping up. Though this option may not be available in other domains, it certainly stresses the most important issue: is your application handling the assigned workload within the time budgeted to do so?

On occasion, one can run into a thread that is so rich in information that it just defies a summary or comment in a forum like this. The fact that I keep finding these types of threads at is a testament to the talent and dedication it take to squeeze out that final bit of performance. This thread was inspired by the Java vs C++ debate that I do hope has now run it's course. The initiator of the discussion posted code to calculate Mandelbrot sets. The first and most noticeable result was that running the benchmark (running on an AMD Athlon) using the -client option outperformed runs where the -server option was used. Others on the list took up the cause and they all found that running with -server offered the best performance. A little more discussion and we find out that the original result set came from using Sun's JVM on an AMD Athlon processor while the expected results were found on Intel PIV. Needing an explanation, they "broke" into the JIT and found that it was not producing the right set of native instructions for the AMD processor.

Though this result in itself is interesting, the analysis doesn't stop there. The code produced by the JIT was compared to that produced by gcc. The conclusion was that the C code ran slower because the compiler did not produce the best native code for the processor. The killer conclusion from this analysis is that programs written in Java may have better performance because the JIT CAN adjust itself for the underlying hardware and this cannot happen with C/C++. But, we're not done as the thread continues to reveal even more!

It would seem as though the server JIT relies on Streaming SIMD Extensions (SSE2). If your process does not include this feature, then you may experience better performance using the -client option. But as is always the case, assumptions should be backed up my measurements. By the way, the thread continues on with other tidbits.

Every now and then I arrive at at site that restricts employees' access to certain websites. Quite often, is on that list. All I can say is that it is a real shame because although games maybe what brings the group of developers at together, it is their love of discovering how things works that makes this site unique. Keep up the good work!

Kirk Pepperdine.

Back to newsletter 044 contents

Last Updated: 2017-03-29
Copyright © 2000-2017 All Rights Reserved.
All trademarks and registered trademarks appearing on are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed:
Trouble with this page? Please contact us