Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips March 2024
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 280 contents
https://www.youtube.com/watch?v=H3dAE_MkFa0
Calibrate Garbage Collection on the Ground and Run Your Java App in the Cloud (Page last updated February 2024, Added 2024-03-29, Author Ana Mihalceanu, Publisher Porto Tech Hub). Tips:
- If you get the performance you want with the default GC setting, there is no need to change them!
- Both the number of GCs and the total pause time for GCs are useful, along with distributions of pause times.
- GC defaults for hotspot JVMs after JVM 11 are SerialGC for less than 1792MB memory, or 1 CPU (if either applies), otherwise G1GC; and maximum heap set to a quarter of physical memory.
- Serial GC is optimal for small heaps and/or 1 CPU JVMs. You can fine tune it with -XX:MaxGCPauseMillis, -XX:GCTimeRatio, -XX:+UseStringDeduplication.
- G1 GC is targeted to balance latency and throughput while keeping pause times within a few hundred milliseconds. You can fine tune it with -XX:MaxGCPauseMillis (default 200ms), -XX:+UseStringDeduplication. For pause time tuning, set the MaxGCPauseMillis lower; for throughout tuning set the MaxGCPauseMillis higher and add -XX:+UseLargePages.
- To avoid heap resizing overheads where footprint doesn't matter, set -Xmx and -Xms the same.
- The parallel GC has goals to minimize pause time goal, then maximze throughput, then minimize footprint. It resizes the heap to achieve these goals, using the -XX:MaxGCPauseMillis and -XX:GCTimeRatio values. Throughput target is 1-(1/(1+GCTimeRatio)). Resizing is done using -XX:YoungGenerationSizeIncrement, -XX:TenuredGenerationSizeIncrement and -XX:AdaptiveSizeDecrementScaleFactor. You can also fine tune it with -XX:MaxGCPauseMillis.
- The ZGC geneationall collector (-XX:+UseZGC -XX:+ZGenerational) needs tuning with the heap size (-Xmx) - the more memory the better it performs.
https://foojay.io/today/12-lessons-learned-from-doing-the-one-billion-row-challenge/
12 Lessons Learned From Doing The One Billion Row Challenge (Page last updated February 2024, Added 2024-03-29, Author Anthony Goubard, Publisher foojay). Tips:
- Different hardware can have very different performance. For tests, target as close to production hardware as possible; for production, benchmark different hardware against each other to see which is best (either for performance or performance-vs-cost).
- Be aware that an optimization on one type of hardware may produce very different performance on other types (eg optimizing for spinning disks is likely de-optimizing for SSD).
- Files read from disk cache have very different performance to those read from disk - this is particularly relevant to testing when a previous test run is likely to have left files in the disk cache.
- Working with bulk data (eg many bytes) is almost always more efficient than working with small amounts of data (one byte at a time).
- Hashing can be made more efficient, especially by avoiding collisions - perfect hashing. (Perfect hashing with no wasted space is called minimal perfect hashing).
- Some interesting command-line JVM flags you may not know about and are occasionally useful in edge-case applications: AlwaysPreTouch, InlineSmallCode, FreqInlineSize, UseTransparentHugePages, TrustFinalNonStaticFields, CompileThreshold, UseNUMA.
- JVM options can get you better performance without the need to change any line of code.
- There are many JVM distributions available, with different performance characteristics, it can be worth testing on different JVMs.
- Different data dramatically affects performance for an algorithm. Make sure in testing that sufficient data variation is used.
- Some micro optimizations: FileInputStream and new String(byte[], StandardCharset.UTF_8) is faster than FileReader; HashMap then sort keys is faster than using TreeMap; for building a map, one HashMap per thread and combining the result is faster than one global ConcurrentHashMap; try {} catch {} outside a loop is faster than having it in the loop.
https://www.youtube.com/watch?v=pf73bn5_Fx4
Practical Performance Analysis (Page last updated January 2024, Added 2024-03-29, Author Simone Bordet, Publisher Devoxx). Tips:
- "What is the current performance" is not a question that can be answered without having specific performance goals in mind - request times, throughput, business transaction times, etc.
- Steady-state and limit load tests are quite different. Steady-state models your expected traffic including peaks. Limit increases the load until some resource is saturated - this identifies the resource that will bottleneck your system, and also whether the system can recover if the load then reduces.
- To measure client times when load testing, use a separate client "normal behaviour" from the load generating clients, so that the load generation mechanism doesn't affect the client measurements.
- Typical OS changes needed for load tests include: increasing the number of open files the system allows, increasing the ephemeral port range on client generators, and setting the CPU governor to "performance".
- A useful load test JVM configuration for Java 21+ is -XX:+UseZGC -XX:+ZGenerational -XX:+DebugNonSafepoints and set -Xmx large enough!
- Track the main shared resources: CPU, network, SSD, GC, threads, connections, etc. Obviously monitor times, throughput and errors.
- Analyze errors, then saturation, the utilization (this order is most useful). Errrors may be test artefacts and those should be eliminated if possible.
- After identifying limits and those components that are limiting and also how much improvement is needed, profiling lets you identify what can be improved.
- Ensure your client generation tool is not itself overloaded.
- Modern machines vary the CPU frequency dynamically, and this can have huge impacts (eg a factor of 2) on performance. Watch for this in tests and/or set frequencies (eg set CPU governor to "performance").
Jack Shirazi
Back to newsletter 280 contents
Last Updated: 2025-01-27
Copyright © 2000-2025 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips280.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us