Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips August 2017
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 201 contents
http://javarevisited.blogspot.co.uk/2017/08/how-to-create-thread-safe-concurrent-hashset-in-java-8.html
How to create a thread-safe ConcurrentHashSet in Java 8 (Page last updated August 2017, Added 2017-08-30, Author Javin Paul, Publisher Javarevisited). Tips:
- ConcurrentHashSet.keySet(default value) and ConcurrentHashSet.newKeySet() provide fully capable ConcurrentHashSet instances (backed by a ConcurrentHashMap).
- ConcurrentHashSet.newKeySet() is designed for mutable use, whereas CopyOnWriteArraySet is only suitable for applications where you need a thread-safe set where set sizes stay small and read-only operations vastly outnumber write operations.
- ConcurrentHashMap.keySet() is a set view of the keys in the underlying ConcurrentHashMap (and has limited set operational support), and so elements can be changed directly against the map, whereas ConcurrentHashSet.newKeySet() is a fully independent fully operational set.
- CopyOnWriteArraySet mutation operations (add, set, remove, etc.) are expensive since they usually entail copying the entire underlying array.
https://www.youtube.com/watch?v=QJYmERaS7vo
Java performance techniques - The cost of HotSpot runtime optimizations (Page last updated July 2017, Added 2017-08-30, Author Ionut Balosin, Publisher Voxxed Days Vienna). Tips:
- The JIT compiler can operate in client (C1) mode (compilation after 1500 iterations), server (C2) mode (compilation after 10 000 iterations), and tiered mode (C1+C2) which has 5 tiers: (0) Interpreted; (1) C1 without profiling; (2) C1 with basic profiling; (3) C1 with full profiling; (4) C2
- -XX:PrintAssembly -XX:PrintAssemblyOptions=intel prints the generated assembly code (in intel format)
- The server compiler (C2) can speculatively optimize (eliminating many branches including for polymorphic calls) based on runtime conditions, which produces much faster code as long as the speculation is correct; when it isn't, an uncommon trap or caught OS signal will get hit and the compiled coode will be thrown away going back to the interpreter until the the JIT vompiler is triggered again. If you need the tightest performance, you need to be aware of this and code so that the JIT compiler avoids the deoptimization track.
- Explicit loop unrolling in your code is not advised, leave it to the JIT compiler which will do at least as well, and possibly better for the specific hardware being used.
- Using StringDeduplication (-XX:+UseG1GC -XX:+UseStringDeduplication) saves memory but will make garbage collection take longer.
- Biased locking is good for un-contended locks, but is worth disabling (-XX:-UseBiasedLocking) if there is contention (monitor with -XX:+PrintSafepointStatistics -XX:+PrintGCApplicationStoppedTime which let's you see safepoint statistics for stopped times).
- MaxInlineLevel sets how many levels of calls can be inlined.
- Try to avoid code that can't be inlined - code that is not bigger than MaxInlineSize (default 35 bytes), MinInliningThreshold (default 250), FreqInlineSize (hotmethods - default 325 byes), MaxInlineLevel (default 9), MaxRecursiveInlineLevel (default 1).
- JarScan is part of JITWatch that statically analyzes jar files for the size in bytes of methods (so you can set MaxInlineSize appropriately, or refactor to make methods smaller).
https://www.youtube.com/watch?v=fN3MtD-lNHc
Java Performance Puzzlers (Page last updated August 2017, Added 2017-08-30, Author Douglas Hawkins , Publisher Devoxx Poland). Tips:
- There are many magic numbers in JVM operations which cause cliffs in performance, eg a 8000 byte limit on huge methods beyond which methods don't get JITed.
- Intrinsics (like System.arraycopy) are always optimized, so should be preferred to other mechanisms of doing the same thing (the other mechanisms might get to the same performance after JITing, but might not).
- The order of the algorithm is not necessarily the most important factor for performance - two O(N) algorithms will have different performances and an O(Nlog(N)) could even be faster for your particular problem.
- When you always hit the cache or always miss the cache, hardware works well. But the case where the cache is hit often enough but not that often (eg 10%), hardware gets confused and this impacts performance.
- Memory localisation matters - the closer data is in memory, the better the performance. Its not (yet) possible to specify where objects go in memory, but you have some control over data localization with arrays and fields.
- Runtime checks can be eliminated by the JIT, but only if it is clear that the check is unnecessary. This can be difficult to infer by the developer.
- The runtime can optimize allocation by not zero-initializing if it knows it will immediately write the data - this can be slightly faster (which means where the runtime can identify that it creates the object/array rather than being passed an object/array, it can optimize the allocation to be slightly faster).
- Both the JIT compiler and the javac compiler have some specific optimizations that speed up very specific patterns of code - but only if it recognizes that specific pattern (String concatenation with an empty string is one such pattern for the JIT).
- Sizing collections correctly usually improves performance (except for some edge cases).
- Most collections resize exponentially, but Vector resizes linearly which can be really inefficient if you will be growing it alot.
- Monomorphic calls are more efficient than dimorphic calls, which in turn is more efficient than megamorphic (the HotSpot JIT doesn't optimize further than dimorphic). 90% of calls are monomorphic, and 5% are dimorphic.
- Write clean code, measure and improve only the hottest code - carefully.
https://www.infoq.com/presentations/java-performance-guide
Java Performance Engineer's Survival Guide (Page last updated July 2017, Added 2017-08-30, Author Monica Beckwith, Publisher Emerging Technologies). Tips:
- Ask what is the expected throughput, how it is measured and what happens if it falls below that - how long and how low can it go before failure.
- Ask what is the expected response time, how it is measured and what happens if it goes above that - how long and how high can it go for how many users before failure.
- Average response times are not useful - target worst case or 5-9s.
- What happens if one system is loaded more than others - how is that measured, handled and fixed. What is the maximum load that any one system can handle?
- The tuning process is: monitor, profile (identify areas of improvement), analyze, tune and apply.
- Useful JVM monitoring tools include: VisualVM, Java Flight Recorder, PrintCompilation, PrintGCDetails, PrintGCDateStamps, jmap -clstats, jcmd GC.class_stats
- Useful Linux/Windows monitoring tools include: mpstat, sysstat, iostat, pidstat, prstat, vmstat, dash, CPU-Z, cacti / Performance Monitor, Task Manager, Resource Monitor, CPU-Z, cacti
- Useful (free) system profiling tools include: Oracle Studio Performance Analyzer, perftools, PAPI, Code XL, Dtrace, Oprofile, gprof, LTT (linux trace toolkit)
- Useful (free) Java profiling tools include: VisualVM, Netbeans profiler, jconsole
- Tuning the GC: select the right heap; select the right GC algorithm; age objects appropriately; promote only long-lived objects; tune the number of GC threads (for stop-the-world and separately for concurrent work); see if CompressedOops is useful; larger heaps may need AlwaysPretouch and UseLargePages.
- Confirm you are measuring the right thing!
- CPU stats worth looking at include: CPU stats; core stats; cache hits, misses and levels; branch predictions; pipeline information; order-of-execution; load-store unit load and queues.
- Memory stats worth looking at include: Memory utilization; Memory bandwidth; read-write stats; max read bandwidth; max write bandwidth; max cross traffic bandwidth.
- SLAs must be measurable.
Jack Shirazi
Back to newsletter 201 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips201.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us