Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips January 2024
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 278 contents
https://www.p99conf.io/session/distributed-system-performance-troubleshooting-like-youve-been-doing-it-for-twenty-years/
Distributed System Performance Troubleshooting Like You've Been Doing it for Twenty Years (Page last updated October 2023, Added 2024-01-29, Author Jon Haddad, Publisher P99 Conf). Tips:
- Ask the right questions: How slow is it, what is normal? What does the latency histogram look like? Did throughput change? Is it affecting every machine or if not which subset? Is it one type of request or all of them?
- Use distributed tracing to find which components are affected.
- Focus on throughput, latency, error rate, utilization (of every affected resource), and how saturated they are (queues, wait time, etc).
- Get metrics from affected boxes: CPUs, IO per device, cache effectiveness. Profile!
- Ensure you have observability including distributed tracing; narrow the problem to the fewest affected systems/components/requests; focus on the big measures (error rates and resource utilization); profile affected components.
https://www.youtube.com/watch?v=Cw4nN5L-2vU
Taming performance issues into the wild: a practical guide to JVM profiling (Page last updated October 2023, Added 2024-01-29, Author Francesco Nigro, Mario Fusco, Publisher Devoxx). Tips:
- Performance tuning is using the resources in a more efficient way. Profiling to identify inefficiencies is one good way to achieve that.
- There are too many layers and levels of abstraction to know what is inefficient from just considering the system - you need to measure resource usage and identify inefficiencies.
- Any profiler that suffers from safepoint bias is less informational than those that don't - profiling only from safepoints will miss potentially important information and cause misleading analyses.
- async-profiler is one of the better Java profilers, it doesn't have safepoint bias, can be configured to be low impact, can see native stack frames as well as Java ones, has integrated flame graphs, and has many many profiling modes (including CPU, wall clock, locks, allocations, cache misses, etc).
- Wall clock profiling uses thread sampling if there ae too many threads - otherwise the sampling would take too long and the profiling would be ineffective. This means that you can be missing important data, especially with short profiling periods.
- Use regexes to investigate flame graphs to see which frameworks take up time.
- Event driven frameworks have overhead from the abstraction layers.
- Profiling the wrong resource, ie the one not causing an inefficiency, provides misleading or useless information. You may need multiple investigations at different levels to determine which resource needs profiling for effective tuning.
- Make sure with tests that you are not measuring warmup effects unless that's specifically what you are intending to measure.
- High CPU usage needs CPU profiling, but low CPU usage needs other profiling, eg wall clock profiling.
- Caching is a good solution for any resource that is in high demand and which can have demand reduced by serving pre-computed data.
- Logging can have high overheads, and misusing logging frameworks or using them inefficiently is quite common.
- Bottlenecks can hide other bottlenecks, so be aware that making a bottleneck more efficient might not make the application more efficient. You need to iterate the tuning process until you achieve your goals.
- Using the wrong system clock source can produce very inefficient timing code. A profiler that goes down to the kernel calls will show this type of inefficiency.
- The most powerful optimizations are removing code and caching.
- Allocations in TLAB are much more efficient than allocations outside the TLAB, so it's useful to use a profiler that shows these different allocations.
https://www.infoq.com/presentations/factors-code-performance/
Adventures in Performance: Efficiency Analysis of Large-scale Compute (Page last updated March 2023, Added 2024-01-29, Author Thomas Dullien, Publisher InfoQ). Tips:
- Performance and efficiency are more generally important now because of cloud services: inefficiency by a vendor's implementation cuts directly in to gross margins; efficiency improvements in customer cloud deployments cuts costs directly and can be viewed the next day!
- The memory wall is significant when it comes to performance work. If you have to hit DRAM, you're looking at 100 to 200 cycles. You can do a lot of computation in 100 to 200 cycles on a modern superscalar CPU, so hitting memory is no longer a cheap thing.
- Multiple threads contending the same hard disk will ruin performance, because the hard disk has to seek back and forth between two areas on the disk. That's terrible for throughput. Modern SSD or NVMe drives have 1000 times more peak throughput than hard disks. They have almost free seeks, near instant (buffered) writes, and actually require parallelism to get peak performance! The optimal programming models for using these different types of storage is very different. If you do blocking I/O on NVMe drives, your thread pools are almost certainly sized too small.
- Spinning disks are high latency, low concurrency; local NVMe drives are low latency, high concurrency; network attached storage is high latency but essentially almost arbitrary concurrency.
- In a large organization, the cost of the biggest service is going to be nowhere close to the aggregate cost of the garbage collector, or your most popular library (eg logging library), because these libraries end up in every single service. If you start profiling across a fleet of services, what matters for (in)efficiency are the code they execute in common. Garbage collection costs across an entire organization is hugely significant.
- Because garbage collection costs across an entire organization is hugely significant and garbage collection cost tends to be dominated by memory requirements and pressure, memory profiling and tuning actually reduces costs across the board as reduced memory pressure translates directly to reduced garbage collection which translates directly to reduced CPU! A lot of high-performance Java dev have become experts at avoiding allocations altogether for exactly this reason.
- Modern CPUs have sticky state, which means your branch predictor will be trained by a particular code path taken. That means your benchmarks will vary in performance, based on whether they run for the third time or for the fifth time. One solution is randomly interleaving benchmark runs. You still have to contend with things like your CPU clocking up and down and architectural things. The address space layout of your code may actually cause 10% noise in your performance measurements, just from an unlucky layout. In the cloud you can have noisy neighbours and cloud instances that are maxing out the memory bandwidth of the machine, stalling your code. There can be all sorts of things that you did not anticipate.
- Pretty much all parameters in any code base for tuning anything, that haven't been updated in 3 years, are likely going to be wrong for the current generation of hardware.
Jack Shirazi
Back to newsletter 278 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips278.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us