Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips November 2017
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 204 contents
https://www.youtube.com/watch?v=QH1jr8Y1FTQ
In-Memory Caching: Curb Tail Latency (Page last updated October 2017, Added 2017-11-28, Author Yao Yue, Publisher InfoQ). Tips:
- Caches losing a small percentage of data can mean a large increase in datstore load (eg 95% hit ratio dropping to ->90% means that DB load goes from 5%->10% - a doubling of load; the same 5% drop from 99.5%->94.5% means that DB load goes from 0.5%->5.5% - a 10x increase in DB load)
- Client timeouts which close connections and start new ones can cause a connection storm to the (remote) cache. TCP connection creations are expensive compared to read requests, so the connection storm can severely impact performance. What's worse is that the timeouts tend to happen in the first place because of the cache coming under a heavier than normal load, so this is often a load increase causing a storm causing even worse performance
- Periodic regular slowdowns are often caused by intensive I/O happening on the box - possible from other processes or backups or log copies
- Hash table expansion takes time - caches can have "blips" of bad performance while they rehash to handle more data (especially if it needs to lock during rehashing)
- After running for a sufficiently long time, memory defragmentation may cause a spike in memory needed - and if that is too much memory, you run out of memory
- Fragmented memory uses more memory than just the data needs, and this can be more than provisioned - which can cause running out of system memory
- Put operations of different kinds on different thread pools so that the high priority tasks are not blocked by lower priority and slow ones
- Make slow operations lockless since locking them is the worst thing to lock
- Cap all memory requirements and avoid churn (reuse buffers) so that memory usage is deterministic and doesn't fragment
https://www.youtube.com/watch?v=8_NRiQGyW8M
Optimizing Java with Linux Perf (Page last updated October 2017, Added 2017-11-28, Author Staffan Friberg, Thorvald Natvig, Publisher JavaOne). Tips:
- Linux perf gives you access to the hardware counters, kernel and user probes, tracing, CPU scheduling and eBPF
- perf-map-agent generates a perf map from a running JVM with -XX:+PreserveFramePointer -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints, then use 'perf record -e EVENT-LIST -g -p PID' and create the java perf map using perf-map-agent. Analyse with 'perf report --no-children -i perf.data' or flamegraphs from https://github.com/brendangregg/FlameGraph with 'perf script | stackcollapse-perf.pl --all | flamegraph.pl --color=java --hash > perf.svg' or using https://github.com/KDAB/hotspot 'hotspot perf.data' which provides a GUI tool for this
- Cycles per instruction indicate how efficiently you are using your CPU - mainly from efficiently (or not) using the CPU caches, so CPU cache miss data is also important
- A block matrix algorithm splits matrices up into sub-matrices for matrix operations so that you can push all the data as efficiently as possible through the CPU caches for matrix operations
- perf-jitdump-agent https://github.com/sfriberg/perf-jitdump-agent adds JIT compiled information to 'perf' dumps - use 'jitdump -p PID' and 'perf record -e EVENT-LIST -g -k 1 -p PID' and 'perf inject --jit -i perf.data -o perf.jit.data', then view with 'perf report --source --no-children -i perf.jit.data'
https://www.youtube.com/watch?v=Oi6-pXX11qw
Finding Subtle but Common Concurrency Issues in Java Programs (Page last updated September 2014, Added 2017-11-28, Author Mark Winterrowd, Publisher Oracle Developers). Tips:
- Use higher level concurrent structures if you can adapt your code to them into your application (eg ConcurrentHashMap) rather than building your own
- If you have to use low-level concurrency tools, try to keep your use hidden behind higher level APIs
- Simple lazily initializing a singleton doesn't work in a multi-threaded context and double-checked locking is fraught with concurrency errors for many implementations. Fast wrong results are NOT better than slightly slower correct results
- Concurrency bugs that look like they'll happen very infrequently can actually happen often enough to be quite painful
- Before you make a clever concurrency optimization, look at best practices, eg no need to contend on lock at all if the lock is protecting unrelated critical sections (so they should be using different locks); or if the lock is necessary, can slow operations be moved outside of the locked section.
- Don't lock on mutable fields - the field can be set to a different object which means different threads using the same critical section can lock on different objects
- synchronizing on an object which is different for different threads (eg 'this') but changes a common field (like a static field) is a known concurrency anti-pattern.
- Any field guarded by a lock needs the lock to be accessible as widely as the field itself
- Use notifyAll() rather than notify()
- Waiting on a stale condition (a wait() where the notify() that should wake it as already happened before entering the wait()) can look very similar to a deadlock without an actual deadlock, ie things are available for processing, but nothing progresses because the threads that could progress are waiting for a signal that doesn't come (or maybe will come at some point in the future when the condition that triggers the signal happens to occur again)
- Always check the wait() condition while holding the lock, ie the condition that let you enter the block to call the wait()
- Object.wait() can unblock without a signal - a spurious wake. This can result in unintended progress when the condition that put it into the wait() state is still such that it should stay in the wait(). Check your wait() condition in a loop.
synchronized (lock) {while(condition()){lock.wait();}progress();}finish();
- Ask yourself: does it actually need to be multi-threaded? Does eager initialization cause any actual performance issue? Is there a high level pre-built structure that provides the thread-safe functionality you need so you can avoid using low-level concurrency structures?
- Looking for concurrency bugs? Search for "synchronized" (or use a tool)
https://www.youtube.com/watch?v=UQjnp_i6PtQ
How to Properly Blame Things for Causing Latency: Distributed Tracing & Zipkin (Page last updated November 2017, Added 2017-11-28, Author Adrian Cole, Publisher Barcelona JUG). Tips:
- Distributed tracing is not just for latency - it gives you an architecture view as well as find out what is calling which other services
- Trace across microservices by adding headers that hold trace events and pass the headers downstream - but make sure there is no significant overhead (maybe even sample if needed to keep down overheads)
- Use size limited buffers and size limited trace entries to ensure that memory overhead of tracing is small
Jack Shirazi
Back to newsletter 204 contents
Last Updated: 2024-12-27
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips204.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us