Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips May 2025
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 294 contents
https://www.youtube.com/watch?v=u7-S-Hn-7Do
Advanced performance analysis with async-profiler (Page last updated May 2025, Added 2025-05-29, Author Andrei Pangin, Publisher Devoxx). Tips:
- CPU bottlenecks are relatively easy to analyse, almost any profiler can find these. More complex problems like high resource consumption, latency outliers, long startup time, and mysterious exceptions are harder to analyse with many profilers.
- Async-profiler does CPU profiling, wall clock profiling, heap profiling, allocation profiling, lock profiling, accesses low-level hardware performance counters, JVM-specific profiling, memory leak detection, displays native stack traces and non-Java threads, and also has instrumentation capabilities.
- Async-profiler can automatically switch to a per-thread CPU clock as an alternative sampling mechanism for CPU profiling when perf events are restricted (e.g., inside a container).
- Async-profiler can instrument any JVM function or Java method, eg with (
asprof -e FUNCTION_NAME ...
), for example to see what code calls System.gc() you can profile the `JVM_GC` function (asprof -e JVM_GC ...
), and to find where an OOME is caused, you can profile the OutOfMemoryError creation , eg asprof -e "java.lang.OutOfMemoryError.<init>" ...
.
- Async-profiler can help identify native memory leaks (memory outside the Java heap) using its `nativemem` event (
asprof -e nativemem ...
). The `nativemem` event instruments `malloc` and related calls, providing a flame graph of allocations from JVM code, application code, and third-party libraries. To distinguish between allocated and freed native memory, `nativemem` profiling can output to JFR format, which stores all allocation and free events. Async-profiler's native memory profiling can be enabled at runtime without JVM startup options. A threshold can be set for native memory sampling (e.g., "--nativemem 2m") to collect samples only for larger allocations, reducing overhead.
- You can monitor the number of open files for a Unix process with pid PID using
ls /proc/PID/fd | wc -l
. Async-profiler can trace functions inside the Linux kernel using `kprobes` to find where file descriptors are created. (Profiling Linux kernel functions requires `sudo` and the `fdtransfer` option, which offloads perf event creation to a separate privileged process). For example to trace the creation of file descriptors you could use (sudo asprof --fdtransfer -e kprobe:fd_install ...
).
- Async-profiler supports continuous profiling with the `loop` option, which restarts profiling periodically and produces JFR recordings. Interactive searchable heat maps (use jfrconv utility to convert the JFR files to the heat maps) are a new visualization mode (inspired by Brendan Gregg's flame scope) for analyzing continuous profiling data over long periods. Heat maps reduce recording size by two orders of magnitude while retaining information and can provide differential flame graphs which highlight frames that appear in an anomaly period but not in a baseline.
https://www.youtube.com/watch?v=7BHsDuL7Hro
LLM generated - useless visuals but the list it is reading out is a good basic list - don't watch, just use the tips list (Page last updated March 2025, Added 2025-05-29, Author Some LLM, Publisher LLM Generated). Tips:
- Throughput is the amount of work a system can handle in a given time (e.g., requests per second); Latency is the time it takes to process a single request; Resource Utilization is how efficiently resources like CPU and memory are used (high utilization can indicate a bottleneck); Concurrency is the number of requests a system can handle simultaneously.
- A common scalability bottleneck is from thread Contention: Multiple threads competing for shared resources. Mitigate with concurrent data structures (e.g., `ConcurrentHashMap`).
- A common scalability bottleneck is from too long duration `synchronized` blocks/methods. Mitigate by reducing the scope of `synchronized` blocks.
- A common scalability bottleneck is from lock contention: similar to `synchronized`. Mitigate by using lock-free algorithms (e.g., `AtomicInteger`).
- A common scalability bottleneck is from I/O Bound Operations: threads blocked too long waiting for I/O (disk, network, database, web service). Mitigate by using thread pools (`ThreadPoolExecutor`).
- A common scalability bottleneck is from database bottlenecks: slow queries (inefficient SQL); lack of indexes (slow full table scans); connection pool exhaustion (delays when the pool is empty); database locking (contention within the database). Mitigate by optimizing queries (e.g., using `EXPLAIN` in MySQL); Creating indexes; Using connection pooling; caching frequently accessed data; properly sizing and monitoring connection pools.
- A common scalability bottleneck is from memory leaks: gradual memory consumption leading to `OutOfMemoryError`; holding onto unused objects; forgetting to release object references; unbounded caches; improper use of `String.intern()`. Mitigate by using memory profiling tools; reusing objects (object pooling); using `WeakReference` or `SoftReference`; releasing resources (file handles, connections) in `finally` blocks or try-with-resources; avoiding large static collections.
- A common scalability bottleneck is from inefficient algorithms and data structures. Mitigate by choosing appropriate data structures and avoiding unnecessary object creation.
- A common scalability bottleneck is from excessive garbage collection (GC): frequent GC cycles; high object allocation rates; long-lived objects filling the heap. Mitigate by increasing heap size (but tune carefully); tuning garbage collection algorithms; minimize object lifespan.
- A common scalability bottleneck is from network bottlenecks: latency, bandwidth limitations, and packet loss. Mitigate by compressing data; caching data (client-side or CDN); using load balancing; using more efficient protocols.
- A common scalability bottleneck is from caching issues: incorrect configuration or insufficient cache size. Mitigate by tuning cache size; choosing appropriate cache eviction policies (e.g., LRU, LFU); ensuring proper cache invalidation; using distributed caching.
- A common scalability bottleneck is from external dependencies: slow or unreliable external services. Mitigate by using asynchronous operations; implementing circuit breaker pattern; setting timeouts; implementing retry mechanisms.
- Tools and techniques for identifying bottlenecks: profiling tools eg VisualVM for CPU usage, memory allocation, etc; APM tools eg Elastic Observability (for end-to-end visibility, transaction tracing); logging to track requests; monitoring system metrics (CPU, memory, etc.); load testing, simulating user traffic (e.g., using JMeter, Gatling) to find bottlenecks; thread dumps to find blocked threads (e.g., using `jstack`); heap dumps to find memory leaks (e.g., using `jmap`, Eclipse MAT).
- Scalability design principles: Statelessness (easier horizontal scaling); Horizontal Scalability (load balancing, distributed caching); Microservices (independently scalable services); Asynchronous Communication (decouple components); Caching; Monitoring and Alerting (detect issues early).
https://www.youtube.com/watch?v=vSpnDL_dzm8
Scooby RAM, where are you? (Page last updated May 2025, Added 2025-05-29, Author Andrzej Grzesik, Publisher Devoxx). Tips:
- CPUs read memory in "cache lines" (usually 64 bytes), not individual bytes, meaning reading a single byte from memory is an illusion. The processor/memory performance gap continues to increase, making memory access patterns an increasingly important concern for software developers.
- JVM ergonomics determine default heap sizes based on hardware. In Java 8, a server-grade machine (two or more physical processors and 2+ GB RAM) defaults to a quarter of memory for the heap. In Java 11 it uses G1 GC by default for machines with more than 2GB of RAM and at least two cores. A "core" is what `runtime.availableProcessors` shows - regardless of CPU quota. `ActiveProcessorCount` can be explicitly set and influences default thread pool sizing.
- Modern JDKs have container support, allowing the JVM to see its running environment. JVM flags like `InitialRAMPercentage`, `MaxRAMPercentage`, `MinRAMPercentage`, and `ActiveProcessorCount` allow expressing memory limits as percentages, useful for varying container sizes. While percentages are useful, setting explicit `Xmx` (maximum heap size) values for each container class is recommended for predictable behavior.
- JVMs generally prefer more than one CPU and a reasonable amount of RAM; deploying many small Java containers (e.g., 0.5 CPU) is possible but often you get better performance with larger, more beefy JVMs.
- Setting `Xmx` equal to `Xms` (initial heap size) is a common practice to prevent heap resizing during application runtime and promote predictability.
- The `AlwaysPreTouch` flag ensures that the JVM touches all heap pages at startup, forcing the OS to allocate memory immediately, which can provide early feedback on memory allocation issues.
- The `AdaptiveSizePolicy` flag allows memory to be returned to the OS; it's often preferable to disable this explicitly.
- An "OOMKilled" container error means the entire container is terminated, not just the Java process. To take a heap dump when a container dies, a flag like `-XX:+HeapDumpOnOutOfMemoryError` won't work as the process is killed before it can execute. `JCMD` is a powerful command-line tool for taking heap dumps. But you need sufficient writable space for the dumpand taking a heap dump can cause the application to appear unresponsive to health checks, so adjust health check aggressiveness. Finally you need a way to access the heap dump file from outside the container.
- Heap dumps can be larger than the actual heap size; Opening a heap dump often requires adjusting the heap size of the analysis tool; It's normal to use multiple tools to analyze heap dumps, as some may fail on certain dumps; Loading large heap dumps can take a long time and may require a server-grade machine; Eclipse Memory Analyzer is a free and open-source tool for heap dump analysis; Heap dumps may not immediately reveal native memory leaks, such as those caused by unreleased Java strings passed to native code that retain references.
- Metaspace is where class information is loaded, it's native memory. By default, it's unlimited; setting an explicit limit is recommended.
- Direct Memory is used by `ByteBuffer`s, `MaxDirectMemorySize` defaults to `Xmx`.
- Native Memory Tracking (`-XX:NativeMemoryTracking`) only tracks `malloc` calls made by the JVM, not those from JNI or native code. Tools like `jemalloc` or `async-profiler` can help troubleshoot native memory issues.
- By default, Java does not trim the native heap (`-XX:TrimNativeHeapInterval=0`); setting this value to something other than zero enables periodic trimming. You can also do it at runtime using
jcmd PID System.trim_native_heap
.
- Enabling GC logging (`-Xlog:gc*`) is crucial for early detection and troubleshooting of GC-related issues.
- "GC induced application pause time" (from GC logs) indicates how much the application paused due to garbage collection. The `GC overhead limit exceeded` error occurs when GC runs excessively (historically, 98% of the time) relative to application code.
- The `OnOutOfMemoryError` flag can execute a script upon an OutOfMemoryError.
- `UseStringDeduplication` (available with G1 GC) helps reduce heap usage when dealing with many identical `String` objects, though it incurs a performance penalty.
- Using specialized collections (eg from Eclipse collections) that avoid boxing primitives (eg `long` instead of `Long`) can significantly reduce allocations and improve performance for intensive computations.
- Epsilon GC (`-XX:+UseEpsilonGC`) is a "no-op" garbage collector useful for benchmarking and verifying if an application fits within a certain memory limit without GC overhead.
- Allocation Profiling is essential for understanding where allocations occur. Tools like VisualVM/NetBeans can be used, but are not recommended for production. Async-profiler is recommended for production environments.
- Run benchmarks for any changes.
- Java Flight Recorder (JFR) and Java Mission Control (JMC) are powerful tools for monitoring and profiling Java applications, and JMC is open source and can be used in production.
Jack Shirazi
Back to newsletter 294 contents
Last Updated: 2025-05-29
Copyright © 2000-2025 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips294.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us