Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips June 2014
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 163 contents
Introduction of Java GC Tuning and Java Java Mission Control (Page last updated December 2013, Added 2014-06-29, Author Leon Chen, Publisher Oracle). Tips:
- If an object is going out of scope, there is no benefit to nulling the reference to it first
- The performance tuning process is: (load) Test; Monitor; Measure/Profile; Tune; repeat until performance is achieved.
- Enable GC logging in production, the overhead is low to non-existent and the JVM supports GC log rolling since 1.6.0_34
- Minimum recommended GC logging flags are: -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCDetails.
- If you want to rotate the GC logs: -Xloggc: -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles= -XX:GCLogFileSize=
- The max heap on 32bit windows is 1.8GB, though you'll struggle to get over 1.5GB except on a completely empty system or by using the /3GB and windows large address space. With JRockit it's possible to acheve 2.8GB with that combination. See http://download.boulder.ibm.com/ibmdl/pub/software/dw/jdk/diagnosis/dw3gbswitch3.pdf
- The young generation size determines the frequency of minor GCs and the number of objects reclaimed in the minor GC.
- The old generation should be large enough to hold the steady-state size of the application - but you should aim to minimize the frequency of old generation GCs.
- The JVM memory footprint should be less than 90% of RAM available on the system to avoid any swapping - swapping will kill performance.
- A good GC tuning target is to avoid old generation GCs and maximize the number of objects reclaimed in the young generation GCs.
- To calculate your live data size, induce a full GC (probably best to do this in test) and see the heap size after that.
- A rule of thumb is to size your max heap at 4 times the live dataset size, and set the young gen to be a third of the max heap.
- Total heap size is the most important tuning decision, followed by young generation size.
- A typical sign of a memory leak is for the heap used after (full) GCs to gradually increase over time.
- SoftReferences can be GCed any time after there are no strong references to the referent, but typically last until memory is low.
- Softly reachable objects will remain alive for one second of lifetime per free megabyte in the heap by default. This value can be adjusted using the -XX:SoftRefLRUPolicyMSPerMB=milliseconds, e.g. 2500 changes the value from one second to 2.5 seconds
- WeakReferences are GCed any time after there are no strong references or soft referencs to the referent.
- PhantomReferences are for use instead of using finalize(). Avoid using finalize() - use a PhantomReference instead.
- The serial collector -XX:+UseSerialGC is good for single processor configurations.
- The parallel scavenge collector -XX:+UseParallelGC/-XX:+UseParallelOldGC ties to minimize throughput, so is the overall most efficient collector on multi-core machines, but may have long pauses.
- The concurrent collector -XX:+UseConcMarkSweepGC targets minimizing pause times at the cost of overall GC throughout. Best used with -XX:+UseParNewGC young generation collector
- The G1 collector -XX:+UseG1GC is the primary focus of collector improvements, targeted at both deterministic low pause times while maintaining good throughput and avoiding heap fragmentation.
Six things I wish we had known about scaling (Page last updated March 2014, Added 2014-06-29, Author Martin Kleppmann, Publisher Kleppmann). Tips:
- Realistic load testing is very hard - it's easy to miss things. Simulating realistic writes is particularly hard. So have an efficient rollback mechanism and make sure you can compare before an after performance of your production systems.
- Schema changes are very painful, especially considering multiple levels of data storage (down to archived data). But inevitable, for multiple reasons including performance. Expect some very laborious data migration tasks in your future.
- Have a data access layer which wraps access behind an API you can change, and which pools connections. Database connections can be a severe limitation on performance.
- Consider the operational costs and downtimes required for data replication - your solution will inevitably require replication and this should be designed to limit performance impact and operational effort.
- More RAM is an effective and cost-efficient tuning option for many issues.
- Have a subsystem which tracks all changes to your system (change capture); this allows you huge flexibility in experimenting how to handle the changes in other ways.
- Applications written in a stateless way are pretty easy to scale since you can just add copies. Scaling stateful parts of your system is hard.
How to interrupt a long-running "infinite" Java regular expression (Page last updated May 2014, Added 2014-06-29, Author Lincoln Baxter III, Publisher ocpsoft). Tips:
- Regular expression processing has indefinite processing time, valid expressions can incur catastrophic backtracking that can take indefinite amounst of time to return. The following two changes allow you to nicely interrupt regular expressions processing: 1. run the processing in a separate thread; 2. pass the matcher an interruptible implementation of CharSequence which wraps any other CharSequence but terminates throws an exception from charAt() when the thread is interrupted.
JVM concurrency: Java 8 concurrency basics (Page last updated April 2014, Added 2014-06-29, Author Dennis Sosnoski, Publisher IBM). Tips:
- Pre-Java 8 Futures support only checking whether the future has completed and waiting for the future to complete. Java 8 adds CompletableFuture with many more operations supported.
- CompletionStage represents a step in a possibly asynchronous computation. It defines many different ways to chain CompletionStage instances with other instances or with code, such as methods to be called on completion.
- CompletableFuture is best used when you're doing different types of operations and must coordinate the results; when instead you're running the same calculation on many different data values, parallel streams give you a simpler approach and likely better performance.
- Streams are push iterators over a sequence of values; they can be chained with adapters to perform operations such as filtering and mappingand have both sequential and parallel variations. Streams exist for primitive int, long, double, and typed objectss.
- Combining Streams with CompletableFutures enables you to perform asynchronous parallel execution; but Stream.parallel() (and Collection.parallelStream()) executes in parallel without needing CompletableFutures.
- Spliterator let's you iterate a collection of elements, but rather than getting each element you apply an action to the elements using the tryAdvance() or forEachRemaining() method. Spliterators can be split in two, making it easy for the Stream parallel-processing code to spread the work to be done across available threads.
- A sequence of Stream operations (called a pipeline) allows you to pass only one result from each step on to the next stage of the pipeline. If you want to pass multiple results, you must pass them as an object, but creating an object for the result of each individual comparison would hurt the performance of a stream approach compared to chunked approaches. Streams let's you handle this case efficiently by alowing you to pass a mutable intermediate container object which gets executed for each element.
AtomicLong JDK7/8 vs. LongAdder (Page last updated June 2014, Added 2014-06-29, Author Nitsan Wakart, Publisher Psychosomatic, Lobotomy, Saw). Tips:
- JMH is a micro-benchmarking harness written by the Oracle performance engineering team to help in constructing performance experiments while side stepping the many pitfalls of Java related benchmarks
- In Java 8 AtomicLong's CAS loop has been replaced with a getAndAddLong() intrinsic which is likely to be more efficient (e.g. on x86 CPUs this intrinsic atomically returns the current value and increments it). This translates into better performance under contention.
- Java 8 introduces LongAdder (there's a Java 7 backport available if needed) which is significantly more efficient than AtomicLong under contention (but they don't have identical functionality, you still need to use AtomicLong if you want a unique sequence generator).
- When you need long increment() performance (rather than long get()) and expect contention and you don't have the tightest memory constraints then LongAdder is a great choice.
Java 8 Concurrency: LongAdder (Page last updated May 2013, Added 2014-06-29, Author oakley, Publisher mind.out). Tips:
- LongAdder is new in Java 8 and provides an atomic way to add up a large number of values faster than AtomicLong.
- Striped64 is a new Java 8 class that is the base for a whole family of new classes in java.util.concurrency.atomic such as LongAccumulator, LongAdder, LongMultiplier, DoubleAdder and DoubleAccumulator.
- Striped64 holds a hash table of Cells. When two threads try to add something to a Striped64 then there is a good chance that the threads will try to add their value to different Cells in the hash table, reducing the contention.
- Cells in Striped64s are internally padded to ensure that their values fall in separate CPU cache lines from other Cell values, so avoiding contention (this has been shown to improve performance).
Back to newsletter 163 contents
Last Updated: 2022-06-29
Copyright © 2000-2022 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us