Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips August 2011
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 129 contents
Proposed change to the GC efficiency calculation (Page last updated July 2011, Added 2011-08-29, Author Kirk Pepperdine, Publisher kodewerks). Tips:
- The standard calculation for GC efficiency is: [1 - (total GC pause time / application run time)]. This gives a reasonable percentage calculation for the percentage of available runtime the JVM is making the application non-productive.
- A GC efficiency measure of less than 95% (for an evenly busy application) is a signal that the collector is working too hard. This can often be fixed by tuning JVM parameters.
- The standard calculation for GC efficiency using pause times ignores "stolen" CPU time from concurrent collections.
- A calculation for GC efficiency that adds in CPU time used by the concurrent collections gives a better indication of total resources taken away from the application by the garbage collector.
- A concurrent CMS with PrintGCDetails looks like "13.071: [CMS-concurrent-mark: 0.068/0.084 secs] [Times: user=0.13 sys=0.01, real=0.08 secs]" - the CPU time is given by the first number after the CMS tag, i.e. here 0.068 secs (the subsequent number is the elapsed time, so here (0.068/0.084) is the percentage of total CPU used by the GC during this phase.
Memory Usage of Maps (Page last updated June 2011, Added 2011-08-29, Author Dr. Heinz M. Kabutz, Publisher The Java Specialists' Newsletter). Tips:
- ConcurrentHashMap splits its bucket table into a number of segments, thus reducing the probability of contention when modifying the map. It scales to about 50 cores; above 50 cores, you might be better using Cliff Click's Highly Scalable Libraries.
- -XX:+UseCompressedOops can speed up your application substantially if you are using a 64 bit machine.
- Empty default hash maps vary significantly in size. Smallest to largest are: Hashtable, HashMap, SynchronizedMap, NonBlockingHashMap, ConcurrentHashMap. Tuning using the contructor parameters allows you to reduce the empty size.
- NonBlockingHashMap scales to hundreds of cores, whereas Hashtable would have contention with two.
- Only change your code if you have measured and verified that the change actually fixes a bottleneck.
How Garbage Collection Differs in the Three Big JVMs (Page last updated May 2011, Added 2011-08-29, Author Michael Kopp, Publisher JDJ). Tips:
- Hotspot uses a generation heap. The young generation uses a copy collection, objects are created in Eden, and when that is full are copied to survivor spaces, and eventually to old (tenured) space. No fragmentation occurs in the young generation, and this is a fast collection as long as nearly all objects die in the young generation.
- Hotspot uses a generation heap. The old (tenured) generation uses mark and sweep algorithms; this takes longer the more objects that are alive, so can be expensive. It also produces fragmentation which either leads to wasted space or more overhead in defragmentation.
- With HotSpot generational heaps, the young generation collection is very fast, the old (tenured) generation collection can be quite slow, so the single most important optimization for garbage collection in the Hotspot JVM is sizing the generations to avoid objects reaching the old generation.
- HotSpot tenured garbage collection (GC) compacts depending on the GC algorithm in use: serial compacting GC performs compaction for every GC; parallel compacting GC performs compaction for every GC but only for parts if the heap (where it thinks the effort is worth it); concurrent GC does not compact at all, and relies on failover to serial or parallel for compaction when the heap is full enough.
- The second most important tuning option for HotSpot garbage collection, after sizing the generations, is to choose the garbage collection algorithm to be used by each generation.
- JRockit can use a generation heap or a continuous heap. Choosing which to use is the most important tuning choice for JRockit garbage collection.
- JRockit generation heap young generation space is called Nursery; new objects are placed in a Keep Area, their first GC moves them out of there, their next promotes them to tenured.
- JRockit continuous heap - the default setting - can result in better performance for throughput oriented batch jobs.
- Applications running on JRockit which are response time oriented should choose the low pause time garbage collection mode or a generational garbage collection strategy.
- JRockit does compacting for all Tenured Generation garbage collections, using an incremental mode for portions of the heap. This compaction is tuneable.
- The size of a thread local allocation (TLS) can be tuned in JRockit (and possibly in HotSpot though there is no documentation for that); a larger TLA can be beneficial where multiple threads allocate a lot of objects, but can lead to more fragmentation and takes more memory for each thread.
- The IBM JVM has a (by default) continuous heap or an optional generational heap similar to the HotSpot one with Allocate and Survivor space (which are switched on GC) in the Nursery (young) generation.
- The IBM JVM has a tuneable compaction phase in the old (tenured) generation, which can be a full compaction or incremental, selectable by configuration.
- In Java7, HotSpot also comes with G1, a train (incremental) garbage collection algorithm.
Garbage collection in WebSphere Application Server V8, Part 2: Balanced garbage collection as a new option (Page last updated August 2011, Added 2011-08-29, Author Ryan Sciampacone, Peter Burka, Aleksandar Micic, Publisher IBM). Tips:
- IBM JVM has a new balanced garbage collection policy -Xgcpolicy:balanced which targets evening out pause times and reducing the overhead of some of the costlier garbage collection operations. This dynamically selects heap areas to collect based on the biggest bang-for-buck while keeping pause time short.
- Important garbage collection costs come from: the total size of live data; the amount of heap fragmentation; the rate of object allocation.
- Techniques used to improve garbage collection algorithms include: using multiple generations to target different garbage collection algorithms at objects with different lifetimes; increasing parallelism; increasing concurrency; moving to more incremental collections to reduce the individual collection pause times.
- IBM JVM new balanced garbage collection policy is a region-based garbage collector which evens out pause times across GC operations on a best effort rather than a real-time guarantee.
- IBM JVM balanced garbage collection policy has teh same tuning options as other collectors; primary tuning is sizing the heap spaces with the -Xmn[sx] options.
- The primary goal of tuning eden space should be to contain the objects allocated for all transactions within the system at any one time. This means that the amount of data surviving from eden space in a system under regular load should be significantly less than the actual size of eden space itself.
- Tuning the heap should be iterative: select your heap parameters; run the application under a stress load; gather GC logs; analyse whether performance is better or worse, using the GC logs to identify possible bottlenecks and worthwhile changes in configuration; repeat (ideally changing only one parameter at a time).
Scalability Rules: 50 Principles for Scaling Web Sites (Page last updated June 2011, Added 2011-08-29, Author Martin L. Abbott, Michael T. Fisher, Publisher informIT). Tips:
- Design and build your system to scale the potential workload, and not hugely more than that as you will waste resources if your system can scale to much more than it needs to.
- Don't access more data than needed. For example don't do a 'select * from X' only to then process the results and throw away most, when a 'select A,B,C from X where Y And Z' would give you just the data you need.
- Easy to follow solutions increase the scalability of your organization and your solution.
- Systems that work too hard increase your cost and limit your ultimate size. Systems that make users work too hard limit how quickly you will grow your business.
- Design for 20x capacity, implement for 3x capacity, deploy for ~1.5x capacity.
- Reduce unnecessary features: simplify scope using the 80/20 rule - target the 80% of your benefit that is achieved from 20% of the work.
- Simplify design by thinking about cost effectiveness and scalability. Complexity elimination is about doing less work, and design simplification is about doing that work faster and easier.
- Simplify implementation by leveraging the experience of others - use pre-existing solutions rather than building your own.
- Reduce DNS lookups - they are time-consuming.
- Fewer (and smaller) objects usually correspond to more scalable, faster systems.
- Homogenous networks tend to perform better as there is more compatibility between segments, so fewer things to go wrong.
Fork and Join: Java Can Excel at Painless Parallel Programming Too! (Page last updated July 2011, Added 2011-08-29, Author Julien Ponge, Publisher Oracle). Tips:
- Basic Thread/Runnable based threads need correct and consistent behavior with respect to shared mutable objects, avoiding incorrect read/write operations while not creating deadlocks induced by race conditions on lock acquisitions. This is fine for simple examples, but such code can quickly become error-prone with increasing complexity.
- A common multi-threading pitfall is to use synchronize to provide mutual exclusion over large pieces of code. While this leads to thread-safe code, it also yields poor performance due to the limited parallelism that is induced by the exclusion lasting too long.
- java.util.concurrent provides: Executors which provide thread pooling and asynchronous execution strategies; Thread-safe queues; Fine-grained specification of time-out delays; Many synchronization patterns beyond the mutual exclusion provided by low-level synchronized blocks; Efficient, concurrent data collections; Atomic variables; A range of locks.
- An executor service must be shut down. If it is not shut down, the Java Virtual Machine will not exit when the main method does, because there will still be active threads around.
- "map and reduce" are divide and conquer algorithms: you split the data space to be processed by an algorithm into smaller, independent chunks (map); then the result of processing each chunk is collected together to form the final result (reduce).
- ForkJoinPool executor is dedicated to running instances implementing ForkJoinTask, which support the creation of subtasks (fork()) plus waiting for the subtasks to complete (join()). These are not suitable for I/O dependent or cooperating tasks (i.e. those that will wait on external resources), as each task occupies a thread until done; instead they should be used for CPU bound tasks.
- From Java 7 the try block has automatic resource management: any class implementing java.lang.AutoCloseable can be used in a try block opening, and it will be properly closed when the execution leaves the try block.
- A fork/join task should perform a sufficient amount of computation to overcome the fork/join thread pool and task management overhead, but too fine a level would hamper the efficiency of the approach.
Back to newsletter 129 contents
Last Updated: 2021-03-29
Copyright © 2000-2021 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us