Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips May 2019
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 222 contents
https://www.youtube.com/watch?v=PNx9WqQ9QeA
The 10 Common Concurrency Models (Page last updated May 2019, Added 2019-05-26, Author Jack Shirazi, Publisher Devoxx). Tips:
- "Shared mutable state" is what makes concurrency particularly difficult. You can either eliminate one of those 3 words, or you need to use optimistic or pessimistic transactional operations.
- Your high level options for correct concurrency are: Optimistic transactional operations (compare and swap); Pessimistic transactional operations (locks); Unshared state (state is isolated to a single thread); Immutable state; No state (stateless).
- Frameworks that implement high level concurrency models tend to have a fundamental flaw in making maintenance difficult because the framework tends to obscure where problems are when troubleshooting. So far no frameworks have adequately addressed this. Consequently, the best practice when using any concurrency model framework is to keep your codebase small enough to be able to reason about the whole application. (Project Loom with fibres is potentially going to avoid this limitation, but that's at least a year away).
https://www.youtube.com/watch?v=-E4q1CZg-Jw
The 7 deadly sins of concurrent programming (Page last updated May 2019, Added 2019-05-26, Author Sarah Zebian, Taoufik Benayad, Publisher Devoxx). Tips:
- Don't lock for too long as doing that blocks all other threads trying to proceed through that same lock. Minimize the scope of locked code, even if that needs extensive refactoring.
- Synchronizing on "this" is a maintenance bad practice as the code can be refactored to another object without thinking whether "this" is still appropriate to lock on.
- Unbounded thread creation is an antipattern which leads to resource exhaustion - use bounded thread pools.
- A message passing model let's you parallelize across consumers that have distinct state while minimizing or eliminating locking. The Actor model is a well known example of this.
- Techniques to process queues quicker include: consume multiple items from a queue if possible, rather than one at a time; back pressure so that the consumer is not overwhelmed; horizontal scaling, ie more consumers in parallel.
https://www.youtube.com/watch?v=bz1N5qy8w_4
Low level Java optimisation (Page last updated April 2019, Added 2019-05-26, Author Peter Lawrey, Publisher MelbJVM). Tips:
- You need bare metal for controlled ultra-low latency, you can't achieve consist low latency on a virtual hardware machine
- The isolcpus command with the thread affinity library allows you to specify which threads run on which cores, and is really needed for consistent ultra-low latency.
- The spectre and meltdown (nopti) patches should be turned off for consistent ultra-low latency - obviously you can only do this on a system that runs only trusted code.
- Turn off c-states (control power management) if you want to achieve consistent ultra-low latency - set the max to 1 (turns c-states off, it will always be in c-state 1).
- The command cpupower frequency-info tells you the variation in frequency of the CPU (used to conserve power use) - set the minimum frequency to the maximum available, eg sudo cpupower frequency-set -g performance -d 2.9g -u 2.9g. Setting this the maximum gives you consistent ultra-low latency, but disables turbo-mode (so you have lost the very highest possible burst performance). The BIOS can override this, so you should also go into the BIOS and switch off power management. grep "cpu MHz" /proc/cpuinfo allows you to see the actual frequency being used
- An application level technique to keep the core running at the highest frequency is to busy-wait on that core.
- For 10 microseconds and less latency, your application should be managing each core separately with isolated dedicated threads
- For consistent ultra-low latency you need your code and data (running on the dedicated thread) to fit in the L2 cache, around 256MB.
- With power management on, pausing for even one millisecond is enough for the power manager to put the core into low power mode and subsequent performance will be an order of magnitude slower until the core has been kicked back into high power mode.
- TLB misses are very expensive for ultra-low latency, costing around 10 microseconds.
- At 1% utilization, 1 in a 100 requests have a CPU queue of 2 which doubles your 99% percentile!
- The first branch of an if-condition is considered more likely by the compiler, so try to order your conditions in likeliness order (of course the compiler might reorder or not do that, but it usually follows this). For ultra-low latency, you should try different orders of code in the codebase in the critical code sections.
- Keep your code as simple as possible, the JVM is optimized to optimize that.
- Once you have enough parallelism that you are using all your cores, you can't effectively parallelize more.
https://www.youtube.com/watch?v=fbpEs51JfdU
Concurrency and Parallel Programming in Java (Page last updated March 2019, Added 2019-05-26, Author Arvind Kumar, Publisher Oracle). Tips:
- Any object can be a lock using the synchronized keyword. The object (actually it's monitor) will be used as a lock and only one thread at a time can acquire the lock, other threads waiting to acquire the lock will be blocked.
- Object.wait() causes the thread to be added to the wait set of the lock. Object.notify() and Object.notifyAll() will remove one thread from the wait-set.
- A data race is when at least one thread is writing a field while at least one thread is reading that field.
- Thread starvation can happen when a concurrent resource is unfair in choosing which thread to execute next - a particular thread may never be activated if there is a lot of contention on the resource. ReentrantLock has the capability to run as a fair lock (but this has extra overhead from the default unfair mode). The fork/join framework uses a work-stealing algorithm to eliminate thread starvation by allowing other threads to steal work from queues that have tasks when the work queue for those threads are empty.
- The fork/join framework is designed to speed up the execution of tasks that can be subdivided and executed in parallel.
Jack Shirazi
Back to newsletter 222 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips222.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us