Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips February 2008
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 087 contents
Memory Leaks in Java (Page last updated March 2007, Added 2008-02-27, Author Gregg Sporar, Publisher Sun). Tips:
- As a first step in handling an OOME, try increasing the heap size
- turn on verbose:gc and analyse the gc output
- Try visualGC (visual info from same stats as available from verbose:gc )
- jconsole is Greg's preferred utility for looking at heap memory activity
- Difficult to correlate GC activity with application activity using GC monitoring tools
- JDK 6 now includes a stack trace for the OOME (but stack trace might not be the problem, could be just an allocation that happened while something else has sucked up all the memory)
- JDK 1.5.0_07 and up: -XX:+HeapDumpOnOutOfMemoryError - gives hprof heap dump, read using various tools, jhat, SAP memory analyser, various commercial profilers.
- jconsole & jmap can also generate a dump on demand (not available in all versions)
- Testing with tiny datasets can hide memory demands and memory leaks. Always ensure you test with representative datasets.
- Use a memory profiler (e.g. YourKit, JProbe, Netbeans profiler)
- Find the largest objects by "retained size" - how much space the object is keeping alive. These can indicate the root objects that are causing a memory leak.
- You usually need application code knowledge to track down which objects are causing a memory leak, as you need to know roughly which objects should be there and how many of them there should be.
- Use generation count to look for objects that may be causing memory leaks - every object on the heap has an age, which is the number of garbage collections it has survived, called it's generation. The number of different generations of instances of a class is called the generation count. A class with an increasing generation count is a prime suspect of being involved in a memory leak - it has instances of many different ages, and that is growing too.
- Taking a snapshot of the heap and walking the snapshot to look for problems is a lowish impact mechanism (except at snapshot generation) that allows you to look for problems - but can lead to information overload (looking for the needle in a haystack), and probably needs knowledge of the code to be useful - the snapshot does not provide any details about what piece of code referenced one object from another.
- Instrumenting the code has some overhead, but can quickly show where the problem is. However it does not show relationships between objects.
- The Permanent generation is used to store classes, and its size is controlled with a different flag (-XX:PermSize) from the rest of the heap.
- Profilers generally have a problem with identifying issues in the perm gen and classloader issues - though a hprof heap dump will show the classes and classloaders.
- To track down memory leaks from class objects being retained, you can undeploy a class (or set of classes) and examine the heap dump - the class may exist but no instances of the class will exist.
- JHat and Sap memory analyser support OQL, which allows you to search through large heap data using the OQL to pinpoint leaks in large datasets.
Using Callable to Return Results From Runnables (Page last updated December 2007, Added 2008-02-27, Author John Zukowski, Publisher Sun). Tips:
- The Callable interface has a call() method which returns an Object from running code in another thread. The only requirement of call() is the value is returned at the end of the call after all calculations to determine the value have been completed.
- The ExecutorService.submit() accepts and executes a Callable object, returning a Future object. The get() method of Future will then block until the task is completed, equivalent to the Thread.join() call together with an access of data calculated in another thread.
- When the get() method of Future is called, the Future will either have the value immediately if the task runs quickly, or will wait until the value is done generating. Multiple calls to get() will not cause the task to be rerun in the thread.
Our Collectors (Page last updated February 2008, Added 2008-02-27, Author Jon Masamitsu, Publisher Sun). Tips:
- The GC algorithms currently (Java 6) available are: "Serial"; "ParNew"; "Parallel Scavenge"; "Serial Old"; "CMS"; "Parallel Old".
- "Serial" GC is a stop-the-world, young generation, copying collector which uses a single GC thread.
- "Parallel Scavenge" is a stop-the-world, young generation, copying collector which uses multiple GC threads.
- "ParNew" is a stop-the-world, young generation, copying collector which uses multiple GC threads. It differs from "Parallel Scavenge" in that it has enhancements that make it usable with CMS. For example, "ParNew" does the synchronization needed so that it can run during the concurrent phases of CMS.
- "Serial Old" is a stop-the-world, old generation, mark-sweep-compact collector that uses a single GC thread.
- "CMS" is a mostly concurrent, old generation, low-pause collector.
- "Parallel Old" is an old generation, compacting collector that uses multiple GC threads.
- -XX:+UseSerialGC sets the JVM to use the "Serial" and the "Serial Old" GC algorithms
- -XX:+UseParNewGC sets the JVM to use the "ParNew" and the "Serial Old" GC algorithms
- -XX:+UseConcMarkSweepGC sets the JVM to use the "ParNew" and the "CMS" and the "Serial Old" GC algorithms. "CMS" is used most of the time to collect the tenured generation. "Serial Old" is used when a concurrent mode failure occurs.
- -XX:+UseParallelGC sets the JVM to use the "Parallel Scavenge" and the "Serial Old" GC algorithms
- -XX:+UseParallelOldGC sets the JVM to use the "Parallel Scavenge" and the "Parallel Old" GC algorithms
- If you want to use GC ergonomics and parallel GC you must use UseParallelGC (and UseParallelOldGC); UseParNewGC will not work with ergonomics.
- "ParNew" and "Parallel Old" do not work together.
- To use "CMS" with "Serial", use -XX:+UseConcMarkSweepGC -XX:-UseParNewGC. Don't use -XX:+UseConcMarkSweepGC and -XX:+UseSerialGC.
- Sun are working on a new garbage collector (called Garbage First or G1 atm) which is aimed to be a best of breed, with predictable GC pauses.
- Compacting and copying GCs do not keep track of all the objects, only the live objects - which are then moved. Everything else is now space that can be reused, which makes them super efficient, but not complete (they can think some objects are alive, when they are really dead).
- When the code cache is full, the JVM stops compiling further methods. Code is evicted when classes are unloaded and when methods are invalidated. Filling the heap with lots of classes that are used only once and never unloaded can fill the code cache and permanent generation.
- Garbage collection is faster than explicit memory management if 5 times as much memory or more is used, but degrades to slower if less than that is used (http://www-cs.canisius.edu/~hertzm/gcmalloc-oopsla-2005.pdf).
- Garbage collection performs very badly on a swapping system (orders-of-magnitude worse than explicit memory management in this worst case scenario).
My advice on JVM heap tuning, keep your fingers off the knobs! (Page last updated February 2008, Added 2008-02-27, Author Kirk Pepperdine, Publisher kodewerk). Tips:
- Measure, don't guess.
- The primary goal of all performance tuning exercises should be maximize the end user experience given the resource constraints.
- Minimize response times to the end user - so obviously you need to be measuring user response times.
- If the user response times are within tolerance then there is no need to change anything.
- GC efficiency is defined as the percentage of time GC is running in exclusion to your application over the run time of the application.
- The best way to calculate GC efficiency is to collect the logs produced by -verbose:gc switch (-Xloggc:logfile.name preferred for Sun JVM) and feed it through a tool such as HPJTune.
- If the GC efficiency is greater than 10%, you have a case for proceeding with tuning. If it is less than 10% but greater than 5%, tuning may help but it might not give you the boost you were hoping for. Anything less then 5% and you're most likely wasting your time.
- The less you fiddle with parameters, the better GC ergonomics will work.
- The most likely way to improve GC throughput and decrease GC pause times is to configure the maximum memory using the -Xmx flag.
- If you give the system too much memory, GC frequency will fall and GC efficiency will improve but you will start to experience long GC pause times as the system tries to maintain the too large heap space.
- GC ergonomics will try to best adjust things to cope with the current situation - unless you pin it down to specific values using command line settings, when it has to struggle within the given limits.
- You don't want objects to be promoted prematurely as removing them from old requires a mark, sweep, and compaction with an expensive full GC. The dilemma is, that which keeps short lived objects in the young generation (and quickly collected) will also keep long lived objects out of old (which adds useless copying overheads).
- Survivor ratio and tenuring threshold are primary tuning parameters to tune how long objects stay in the young generation.
- The measurement to take is the age distributions of your objects in the survivor spaces. The default threshold for Sun is 31. If you have objects leaking into old and the age distribution doesn't include old objects the conclusion can only be that you are experiencing premature promotion.
- In tuning the GC, I've rarely needed to set anything other than max memory and the survivor ratio.
- A bad guess on GC parameters can cripple GC ergonomics.
- Make sure you really measure the behavior you target for the real world, and don't just cover up the behavior long enough to pass a relatively short test.
- When you use normal stop-the-world GC, you probably want to try to make rare full GCs.
- If you use a "mostly concurrent" GC system (such as CMS), you want to try and avoid compaction, and more importantly the concurrent GC not being able to keep up with mutation rate under load and causing a full pause.
- Don't trust a test unless you've seen it do 10 *compacting* full GCs during the test, or have proven to yourself that for the duration of time you need the system to run you will *never* see a compacting full GC.
- Testing with the 1.6 JVM suggests that erratic GC behavior continues for about 10 minutes from startup before the GC ergonimics adjusts to a steady state.
Just what are the important performance factors for Virtualization? (Page last updated November 2007, Added 2008-02-27, Author Richard McDougall, Publisher richardmcdougall). Tips:
- Virtualization efficiency basics requires knowledge of: Power costs; CPU Efficiency; Performance vs. cost of the CPU.
- Extended knowledge of virtualization efficiency requires measuring: Throughput; Latency; Scalability; Memory efficiency; Throughput relative to power; Space-performance (per rack unit); Agility (time taken to deploy a new application).
- Virtualization efficiency also needs to detail: the observability of the system; and the availability of the system (e.g. how much downtime is needed to change various aspects of a VM/OS combination).
The Law of the Corrupt Politician (Page last updated November 2007, Added 2008-02-27, Author Dr. Heinz M. Kabutz, Publisher javaspecialists). Tips:
- The += operation is not atomic (so can lead to data corruption if used to concurrently update the same variable).
- Where a variable can be updated from multiple threads, it is probably worth checking for data integrity before updates proceed.
- Synchronizing on "this" is usually a bad idea. There must be at least one other place in your code with a reference to "this", otherwise it would get garbage collected. If that other place also synchronizes on "this", you could see all sorts of strange liveness issues.
- To avoid visibility problems with a concurrently updated variable, synchronize the access method or make it volatile.
- You can not interrupt a thread that is blocked on a synchronized lock. If you want to be able to interrupt a deadlock, use a Java 5 lock like ReentrantLock.
- A nice feature of the Java 5 locks is that you can differentiate between reads and writes. This is particularly useful when you have lots of threads reading concurrently, but only a few writing. The reads do not block each other, but the writes have exclusive access to the critical section.
- Using the ReadWriteLock is slower than simply using the synchronized keyword, but provides much more flexibility, and allows you much finer concurrency control.
Back to newsletter 087 contents
Last Updated: 2018-05-28
Copyright © 2000-2018 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us