Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips October 2024
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 287 contents
https://www.youtube.com/watch?v=xFb_LcapbXw
Java Performance Update 2024 (Page last updated October 2024, Added 2024-10-29, Author Per Minborg, Publisher Devoxx). Tips:
- Relatively recently delivered performance projects include CDS for startup time, light-weight virtual threads for scalability, foreign function and memory API, vector support.
- Common performance metrics are: throughput; average latency; worst case (99.99%) latency; startup time (time until first request can be served); warmup time (start to when peak performance is reached); memory usage, access patterns and cache locality; thread usage; CPU usage; contention; power efficiency.
- Don't run performance tests on laptops, the performance varies too much because of power management even within a test.
- For benchmarks you must run at least 30 000 times, and use System nanotime (not currentTimeMilliseconds which can be shifted by the OS). JMH is the best tool for microbenchmarks.
- Iterating across multiple bytes can be made more efficient by converting sets of 8 bytes to longs, then using long operations or int operations.
https://medium.com/towards-data-engineering/apache-spark-wtf-stranded-on-dates-rows-74f0d9788b8b
Apache Spark WTF??? - Stranded On Dates Rows (Page last updated August 2024, Added 2024-10-29, Author Angel Alvarez Pascua, Publisher Towards Data Engineering). Tips:
- Locks are synchronization points used to control access to a shared resource by multiple threads. Whenever a program gets stuck, locks are a likely culprit.
- A deadlock is where two or more threads are blocked, each waiting for resources held by others. There is no CPU activity by the threads in a deadlock. A livelock is where threads are active but can't progress because the resource they need is effectively unavailable. There is (potentially lots of) CPU activity by threads in a livelock.
- To see why threads are blocked from progressing, get a thread stack dump, eg with the jstack utility. For deadlock identification one thread stack is sufficient, but for livelocks you might benefit from getting multiple thread stack dumps and looking for changes.
- A very very slow running app might look like a livelocked app - look for any sign of progress at all over a longer period to see which it is.
- Ensure you test with data that reflects real data variety and volumes.
- Creating Throwable instances is a costly operation (mainly because of filling in the stack trace).
https://www.youtube.com/watch?v=hnJHlHqHqsI
Accelerating performance of Java applications on Arm64 (Page last updated October 2024, Added 2024-10-29, Author Dave Neary, Publisher Devoxx). Tips:
- Use a more recent JVM - just by moving from JVM 8 to JVM 21, you get a 300% increase in performance on ARM (from better GC, intrinsics and aarch64 specific ops) - -XX:+UnlockDiagnosticVMOtions -XX:+PrintIntrinsics let's you see instrinsics.
- -XX:+AlwaysPreTouch -XX:+UseTransparentHugePages loads real memory for the virtual memory your JVM wants, and gives large pages. This can be quite efficient during application runtime, but has a startup overhead (you need to let the user space advise the OS about hugepages with
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
).
- Default ergonomics are terrible for cloud workloads. Use explicit heap sizes (use -XX:MaxRAMPercentage or 75%-85% of container memory), garbage collector algorithms, system resources provided, and -XX:ActiveProcessorCount values.
- Most tuning depends on the app specific requirements, there is a tradeoff between latency, throughput and cost. More resources add costs but improve throughput and latency; A GC like ZGC that targets tail latencies can improve throughput and latency but needs more resources - without more resources it can improve worst case latencies at the cost of average latency and overall throughput; etc.
- If you have not tuned your OS, you are leaving performance optimizations that could be (easily) applied. For Linux, you can use
tuned
ie tuned-adm profile SOMEPROFILE
. The default profile is "balanced" which is aimed at laptops, it balances performance and power management which is not optimal for app performance. Server application optimal profiles include "network-throughput" and "network-latency". Other OS tuning options include kernel page sizes and CPU tick length.
- Turning off tired compilation and giving adequate cache can be a good optimization for some apps, -XX:-TieredCompilation -XX:ReservedCodeCacheSize=64M -XX:InitialCodeCacheSize=64M.
Jack Shirazi
Back to newsletter 287 contents
Last Updated: 2024-12-27
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips287.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us