Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips August 2014
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 165 contents
http://www.infoq.com/presentations/garbage-collection-benefits
Garbage Collection is Good! (Page last updated July 2014, Added 2014-08-27, Author Eva Andreasson, Publisher InfoQ). Tips:
- The JVM memory footprint can be bigger than the maximum heap size because the JVM itself can use further OS memory beyond the heap.
- You need to re-tune the garbage collector if workloads change (e.g. transaction volumes increase significantly).
- Measure in production! Test environments just don't provide useful profiles.
- Tuning GC: Choose the right GC algorithm for your application profile; understand your allocation rate to allocate enough heap.
- Standard deviation and average of latency is not useful - it's not a standard distribution. You must look at ALL pauses.
- If the heap gets fragmented, you'll get a stop-the-world pause dependent on the heap size in order to defragment (currently only the Zing JVM defragments concurrently).
http://sett.ociweb.com/sett/settAug2014.html
Clean Readable Performant Java (Page last updated August 2014, Added 2014-08-27, Author Nathan Tippy, Publisher Object Computing, Inc.). Tips:
- The JVM works better on straightforward, easy-to-understand code.
- Break larger methods down into smaller methods that each have a clear simple purpose.
- There is almost no performance penalty for small private or static methods, which are easily inlined by the JIT compiler.
- Projects with too many interfaces and abstract base classes can be very difficult to trace through for people as well as compilers. Composition is encouraged over inheritance.
- Methods larger than 35 bytecodes will not be inlined unless they are called very frequently (by default).
- -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining allows you to see what is being inlined. Lines ending with "hot method too big" would have been inlined if the method had been smaller.
- Methods are easier to understand when all the data fields they use are local - and the JVM puts the data closer together which helps caching and prefetching.
- Eliminate member variables (instance fields) inside tight loops by reading the value once into a local variable before entering the loop.
- Local objects which have a short scope and are not assigned to fields nor returned from methods can potentially be inlined on the stack by the compiler, avoiding use of the heap.
- Avoid boxing and unboxing primitives, whether explicitly by conversion, or implicitly with autoboxing.
- Do not use exceptions as flow control, the code should flow sequentially and only throw exceptions in edge cases.
- The JIT compiler is unlikely to compile catch blocks; any code within the catch block will likely execute more slowly than other code.
http://vanillajava.blogspot.com/2014/08/try-optimising-memory-consumption-first.html
Try optimising memory consumption first (Page last updated August 2014, Added 2014-08-27, Author Peter Lawrey, Publisher vanillajava). Tips:
- Target memory first - you can improve speed quicker by speeding up memory processing.
- Memory allocation is multi-threaded, so scales reasonably well.
- If you want to minimize cache-level concurrency conflicts, you want to spend as much time as possible in the L2 (256KB) cache. For memory, this means avoiding having memory allocation become a bottleneck.
- Measure the allocation rate of your machine by writing a test which creates lots of garbage. Then you can check your GC logs for allocation rates to see how close to this you are - if you are close, you have an allocation rate problem.
- If you have long pause times e.g. into the seconds, your memory consumption has a very high impact on your performance, and reducing the memory consumption and allocation rate can improve scalability as well as reduce your worst case jitter.
- After targeting allocation rate, look at the CPU consumption with memory profiling turned on. This gives more weight to the memory allocations and so gives you a different view to looking at CPU alone. Only after there are no quick wins in this view should you start on CPU profiling alone.
http://java.dzone.com/articles/time-memory-tradeoff-example
Time - memory tradeoff with the example of Java Maps (Page last updated July 2014, Added 2014-08-27, Author Roman Leventov, Publisher DZone). Tips:
- The more memory a hash table takes, the faster each operation runs.
- Apart from the core collections that come with the JDK, there are now many alternative collection frameworks. The following provide high performance options: Higher Frequency Trading's OpenHFT Collections; Carrotsearch's High Performance Primitive Collections (HPPC); The fastutil collections; Goldman Sachs GS-Collections; The Trove collections; Apache's Mahout Collections.
- When the total memory taken by the map goes beyond CPU cache capacity, cache misses become more frequent as the map grows. Small maps can be an order of magnitude faster in performance than large maps.
- Open address hashmap implementations are often faster and more memory efficient than implementations that hold a linked list at a collided node.
http://psy-lob-saw.blogspot.co.uk/2014/08/the-many-meanings-of-volatile-read-and.html
The many meanings of volatile read and write (Page last updated August 2014, Added 2014-08-27, Author Nitsan Wakart, Publisher Psychosomatic, Lobotomy, Saw). Tips:
- Volatile fields in Java provide three distinct features: Atomicity or reads and writes across threads (including for longs and doubles); Store/Load to/from memory (cannot be optimised away by the compiler when a non-volatile field might be considered redundantly accessed or updated by the compiler, e.g. if (done)...); Global Ordering (volatile operations are global barriers against reads and/or writes, preventing reorders across the barriers).
- AtomicLong.lazySet (aka Unsafe.putOrderedLong) is an atomic operation.
http://www.infoq.com/presentations/latency-lessons-tools
Understanding Latency (Page last updated July 2014, Added 2014-08-27, Author Gil Tene, Publisher InfoQ). Tips:
- If a user does 100 operations, they are very likely to see the worst 1% latency operation amongst those operations.
- Latency does not follow the normal distribution, so standard deviation and means are not appropriate. Never use std dev. Always measure maximum times.
- The worst case latency can be several orders of magnitude larger (tens of thousands times larger) than the average in many systems.
- Periodic feezes are common to almost all systems. They are caused from many different things - not necessarily anything core to the system.
- Report all latencies on a logarithmic graph on the x-axis (... 90%, 99%, 99.9%, 99.99% ...)
- If the worst latencies are failing the requirements, you cannot tune the "hot" code and improve the worst latencies, you have to look at the code that produces the outliers!
- Peformance requirements need to specified as a pass/fail against the criteria (not "it should be faster") otherwise there is not criteria to stop tuning. Measurements need to evaluate the requirements.
- Latency requirements should have at least three cases, including the worst case eg 90% better than 50 ms, 99.9% better than 500ms, 100% better than 2 seconds.
- Load test tools which pause issuing requests when the system is paused (because all the requests in progress are paused too, waiting for the system too return) are not being realistic, because in real systems new requests will carry on being issued while the system pauses. Beware of this situation (termed coordinate omission). Worst case latencies will probably be reported correctly, but averages won't.
- Test your measurement technique to ensure it is measuring what it should be measuring. Use artificial systems that will guarantee what the measurements should show, and check whether it does show these.
- HdrHistogram is a free open source histogram tool for latencies that includes being able to handle coordinated omission.
- LatencyUtils is a free open source code-level measuring tool for measuring latencies including handling coordinated omission
- jHiccup is a free open source java agent for measuring pauses.
Jack Shirazi
Back to newsletter 165 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips165.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us