Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips September 2019
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 226 contents
https://www.youtube.com/watch?v=AuKHEGVyzEQ
Hunting Down Scalability Bottlenecks in Java (Page last updated September 2019, Added 2019-09-29, Author Sergey Kuksenko, Publisher Oracle CodeONE). Tips:
- There are 3 goals for using concurrency: reduce latency, hide latency, increase throughput
- Tuning strategies: Guess and check (eg add a resource, is it faster?); Deliberate choice (find the scarcest resource, add more, is it faster? - but it's hard to correctly identify the scarcest resource, eg CPU - is it the cache? the pipeline? or actually the processor?)
- The Universal Scalability Law tells you there is a maximum throughput you can reach
- Java 10+ correctly detect container core count, prior to that it always sees the underlying box. -XX:ActiveProcessorCount=N to override
- Hardware resource contention can happen on: cores, CPU caches, memory controllers, and from cache coherency (shared data in a cache line, locks, false sharing). If you hit 100% CPU utilization, it can be from any of these
- HyperThreads can slow down your application - try turning them off and compare test results
- Check your working set size to see if it is fitting in the CPU cache (on Linux: `sudo sh -c "echo -n 1" > /proc/[pid]/clear_refs; cat /proc/[pid]/smaps` looking at the "Referenced" size)
- The Object allocation rate correlates to memory throughput
- Typical lock coding errors are: protecting too many data items; protecting too much code; too small a critical section (because the locking overhead dominates the performance for this last one).
- Monitoring thread states across multiple stack traces is a good way to tack contention; looking at BLOCKED states shows the locked threads if they are locked for non-short periods
- Threads are the most important resource, you should know what your application threads are for. CachedThreadPools are unbounded, FixedThreadPools should be preferred.Split long running and short running tasks into different thread pools.
- Unfair locking can be much faster.
https://www.youtube.com/watch?v=mc2ZWmOgYDo
Monitoring and Troubleshooting Tools in JDK/bin (Page last updated September 2019, Added 2019-09-29, Author Poonam Parhar, Publisher Oracle CodeONE). Tips:
- jvmstat writes performance counter information to /tmp/hsperfdata_user/[pid]. jvmstat is enabled in a JVM with -XX:+UsePerfData (on by default). To access the data remotely, jstatd needs to be running on the remote system and need have the permissions to read the data. jps and jstat read these files.
- jstat prints statistics about JVM components: class, compiler, gc, gccapacity, gcpause, gcnew, gcnewcapacity, gcold, gcoldcapacity, gcmetacapacity, gcutil, printcompilation
- jcmd can send diagnostic commands to local JVMs (not remote), using the attach API. Use 'jcmd [pid] help' for commands. Popular commands: VM.version, VM.system_properties, VM.flags, GC.class_histogram, GC.class_stats, GC.heap_dump filename=dump, Thread.print, JFR.start name=rec settings=profile duration=2m filename=rec.jfr, JFR.check, JFR.stop, JFR.dump name=rec filename=rec.jfr
- jcmd [pid] PerfCounter.print prints all the perf counters available
- jinfo can enable, disable and set flags (if they can be dynamically set)
- jmap can provide classloader statistics, finalization info, class histograms and heap dumps
- jstack prints thread dumps, with concurrent lock info if used with the -l option
- jps, jinfo, jmap, jstack and jstat are all shortcuts for specific jcmd commands (so you could use just jcmd)
- jfr is a command-line tool to examine JFR files
https://www.youtube.com/watch?v=ASsAhWiz5Pw
Monica-Beckwith-on-Java-Garbage-Collection (Page last updated September 2019, Added 2019-09-29, Author Monica Beckwith, Publisher ieeeComputerSociety). Tips:
- You can increase the heap to avoid the collection long enough for some processes (cluster nodes, daily GCs, batches)
- Copying collector pauses tend to be proportional to the number of objects being copied
- Users noticing pauses depends on both pause durations and the frequency of the pauses (occasional longer pauses can be better than many short ones depending on how much the user is affected by the one and the other)
- Some common GC tuning options: if references are a problem, turn on parallel reference processing; tune the generation sizes based on the live dataset sizes; compressed oops can give good improvement; test if using the numa aware allocator -XX:+UseNUMA can improve performance
- You should enable -XX:+PrintGCDetails -XX:+PrintGCTimeStamps (Java9+ -Xlog:gc*) to see allocation and promotion rates and heap occupancies.
- Statistical analysis of GC events help to tell you if you have a problem, but to identify the problem you need to investigate the outliers
- Use a baseline to compare against changes to see which improve the situation. Always try one change at a time!
- If you need less than 1 millisecond pauses, you cannot have garbage collection. Servers tend to have network overheads above that so GC is probably not the limiting factor.
- Off heap and reusing buffers is a reasonable approach to avoid garbage collection for ultra-low pause applications
https://www.youtube.com/watch?v=s-b1HxY1DC0
Java Concurrency, A(nother) Peek Under the Hood (Page last updated September 2019, Added 2019-09-29, Author David Buck, Publisher Oracle CodeONE). Tips:
- The JIT compiler can change the ordering of some statements, fold statements, do many things to optimize the code. Only single-threaded code will conform to the functional order as written.
- You need a barrier when you write AND when you read to guarantee that write ordering in one thread is consistent with read ordering in another thread.
- volatile is not atomic, but does guarantee a happens-before ordering - everything written before a volatile write is readable by another thread after the volatile read.
- longs and doubles are not atomic (unless volatile)
- Barriers occur at: synchronized monitors acquisition and release, volatile reads and writes, constructor final writes, JNI entry/exit, IO, thread starts and joins, and lots of operations in java.util.concurrent
Jack Shirazi
Back to newsletter 226 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips226.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us