Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips May 2024
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 282 contents
https://www.youtube.com/watch?v=hzATjYXplbk
Finding Java's Hidden Performance Traps (Page last updated May 2024, Added 2024-05-27, Author Victor Reneta, Publisher Devoxx UK). Tips:
- To find the cause of a performance issue you follow the sequence of: 1 look at distributed traces to see where requests are delayed; 2. look at the metrics affecting the identified component from step 1 and look for anomalous values; 3. profile (eg with JFR) the components that are identified.
- Distributed traces let's you see inefficiencies like repeated identical queries, unnecessary extra hops, child requests that can be parallelized.
- Common causes of performance problems include: thread pools too small, GC pauses, waiting too long to get a connection from a pool, too low a cache-hit ratio, too many errors, inefficient logging (eg building strings that are never used). Check your metrics for common issues.
- Once you fix a performance problems, you should be exposing a metric that would identify similar to it happening, and have an alarm for when it could happen again.
- Avoid network calls in methods which are locked (eg transactions) because the resources (eg JDBC connections) in the method are idle but locked while it calls out.
https://www.youtube.com/watch?v=9-S_nZ5gzGE
Pushing Java to the Limits: Processing a Billion Rows in under 2 Seconds (Page last updated May 2024, Added 2024-05-27, Author Roy Van Rijn, Publisher Devoxx UK). Tips:
- Integer math is much faster than floating point math. If you have a fixed number of decimal values after the decimal point, you can use an integer to represent it and just divide by the order of magnitude when you need final results.
- If processing data in parallel, it's more efficient to load a chunk per core to process rather than load one big chunk and process that in parallel, as the one big chunk will be shared across the CPU caches which is much less efficient.
- Loading file data by memory mapping rather than processing the file using file reading, is much faster.
- Reading in 8 bytes at a time from memory (not disk) is more efficient than byte by byte, as there is a CPU instruction for this (ref SWAR).
- Bit shift is faster than divide, if you are dividing by a power of 2.
- Representing strings in longs can be extremely efficient, if you only need the string at the end of a sequence of tasks.
- Branchless programming (no if statements, no alternate class implementations) is very CPU-pipeline friendly so can be extremely efficient.
- Open address hashmapping using forward probing can be faster than hashmaps with linked nodes, because you avoid the additional node creation and linking operations.
- Using a data structure per core, then merging the data structures at the end of processing means the CPU caches use separate data and don't contend, which is faster.
- Avoiding object creation is very efficient but difficult to code and tends to reduce maintainability.
https://www.youtube.com/watch?v=tn1Q7-dAHd8
Practical Performance Analysis (Page last updated March 2024, Added 2024-05-27, Author Simone Bordet, Publisher Devoxx). Tips:
- When measuring performance, you need a target before starting so that you know what to measure and what the goal is.
- Steady-state load testing is testing at normal production loads; limit load testing is increasing load until a resource is saturated, to find the limits of the application and determine which resource limits the scale. It also tells you whether the system recovers when the load subsequently reduces, or whether the limit load broke the service.
- Useful OS level changes: increase number of open files, extend the ephemeral port range, change the CPU governor to performance (which stops the CPU frequency being dynamically altered).
- A good JVM configuration is JVM 21 or later with generation ZGC (-XX:+UseZGC -XX:+ZGenerational) and -XX:+DebugNonSafepoints (and of course -Xmx large enough for your application so that it doesn't get throttled by the heap being too small).
- Track at least these resources: CPU, network, SSD, perf, hiccups, GC, threads, connections, response (time, type, errors).
- Always measure the baseline - a test measuring normal load after a gradual ramp up and no issues in the system - so that you have something to compare future anomalous measurements against.
- USE method: utilization, saturation, errors. Start by looking for errors. Then saturation - max capacity of each resource. The utilization last. Utilization is an average over time of how much a resource is used, saturation is identifying when a resource reaches capacity for a period. You can have a saturated resource when utilization is not 100% because the saturation happens over a shorter period than the averaging calculation for the utilization.
- If there is an identified shortfall from load tests, you should profile the component that is causing the shortfall. Use a profiler which has no safepoint bias.
- Make sure the clients generating load in load testing are not themselves saturated causing throttling in the client.
Jack Shirazi
Back to newsletter 282 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips282.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us