Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips March 2025
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 292 contents
https://www.infoq.com/presentations/optimizing-java-app-kubernetes/
Optimizing Java Applications on Kubernetes: beyond the Basics (Page last updated November 2024, Added 2025-03-28, Author Bruno Borges, Publisher InfoQ). Tips:
- Reducing image size: use slim Linux distributions (Debian slim, Alpine) for the base image, distroless images, or build custom images (Alpine is small but has potential compatibility issues (musl libc) and may lack commercial support from cloud vendors); include only necessary application dependencies; separate the dependency layer from the application code layer for better caching during builds; use modular JDKs (JLink) to include only required modules.
- Improving startup time: use Class Data Sharing (CDS - creates a binary representation of libraries for faster loading); CRaC uses checkpoint/restore for very fast restarts - requires framework/runtime awareness of checkpoints.
- JVM defaults are conservative and designed for shared environments. They may not be optimal for containerized environments.
- JVM rounds up CPU allocation (e.g., 1.2 CPUs becomes 2, ActiveProcessorCount is a JVM flag that allows setting the perceived number of processors, even if the CPU limit is lower, useful for I/O-bound applications).
- JVM default heap size allocation is 50% of available RAM up to 256MB, then 120MB heap size until 512MB, after which it is 25%. Manually configure JVM settings (heap size, GC) for containerized environments. Don't rely on defaults.
- Epsilon GC is used for benchmarking without GC overhead.
- Metaspace and code cache consume a fixed amount of memory regardless of heap size.
- Be aware of different Garbage Collectors and their characteristics (throughput vs. latency). A recommended starting point is setting heap size set to 75% of the memory limit.
- Kubernetes scaling: Horizontal Pod Autoscaler (HPA) scales out by adding more replicas - not always the most efficient solution; Vertical Pod Autoscaler (VPA) scales resources of existing pods. The InPlacePodVerticalScaling feature gate allows resource updates without restarts, but the JVM doesn't fully utilize this yet (up to JVM 23); kube-startup-cpu-boost allows containers to have more resources during startup.
- CPU throttling is a major problem in Kubernetes. It occurs when CPU usage exceeds the limit within the CPU period. This can significantly impact GC performance and increase latency. 1000 millicores is not 1 vCPU, it is that amount of CPU resource per time slice (typically 100 milliseconds) across all JVM/application threads - then NO MORE CPU resource in that time slice - this is CPU throttling. The JVM gets CPU resource again in the next time slice.
- Recommended GC choices: serial for 1 CPU, parallel up to 2 CPUs and 4GB, G1 up to 4 CPUs, then Z or Shenandoah.
- Consolidating replicas and allocating more resources per pod can improve performance and reduce costs.
- Use A/B testing of different performance configurations (GCs, resource limits, scaling strategies) in production - straightforward with Kubernetes (assuming you have or can use multiple replicas).
https://www.youtube.com/watch?v=hDQ4TDDP_Qw
Handling concurrent access to shared resources in Java (Page last updated August 2024, Added 2025-03-28, Author Sebastian Konieczek, Publisher OpenValue). Tips:
- The primary concurrency challenge is managing *shared resources*. Race conditions arise when multiple threads access and modify the same resource concurrently, leading to non-deterministic behaviour.
- The `synchronized` keyword uses intrinsic locks (monitors) associated with every object. Only one thread can hold the lock on a monitor at a time.
- The Lock interface (ReentrantLock, ReentrantReadWriteLock) is more flexible than `synchronized`. `ReentrantLock` allows the same thread to acquire the lock multiple times without blocking. `ReentrantReadWriteLock` allows multiple threads to hold a read lock concurrently, but only one thread can hold a write lock at a time. Great for read-heavy scenarios. Reentrant means the same thread can acquire the lock multiple times without blocking itself.
- The `volatile` keyword provides atomic access to shared variables. It ensures that writes to a `volatile` variable are immediately visible to other threads.
- Virtual Threads are lightweight threads introduced in Java 21. They can maximize throughput within blocking I/O operations. Virtual threads park when blocked, freeing the carrier thread (platform thread) to run other tasks.
- Thread pinning is when a virtual thread holds onto a carrier thread, preventing other virtual threads from using it. Between Java 21 and 23 inclusive you may want to avoid `synchronized` blocks and methods with virtual threads to prevent thread pinning.
- Common distributed synchronization mechanisms are: Database or file system locks; Optimistic vs. pessimistic locking - optimistic locking uses version columns or timestamps to detect concurrent modifications; Message queues (single-threaded processing of messages).
https://www.youtube.com/watch?v=46b4SALICyA
Memory API: Patterns, Uses Cases, and Performance (Page last updated October 2024, Added 2025-03-28, Author Jose Paumard, Remi Forax, Publisher Devoxx). Tips:
- ByteBuffer uses 32-bit indexing which limits access to only 2GB of memory; has unpredictable deallocation (relying on the garbage collector); has unintuitive operations such as needing to `flip()` before reading; and allows accessing memory after the file is closed.
- Memory Segment has a number of advantages over ByteBuffers: They: prevent memory access after freeing; explicitly manage memory segments, giving developers control rather than ByteBuffer's reliance on the GC; have performance comparable to `Unsafe`; allow access to "unsafe" memory regions, but only when explicitly requested; have ByteBuffer compatibility; have more flexible ways to access memory (direct, array-like, stream-like). ByteBuffer is now implemented on top of Memory Segments.
- Memory Segment basics: always off-heap; `Arena` objects are used to create and manage memory segments; Segments can be printed to reveal their actual memory address for debugging; access is done via `set` and `get` methods, using a `Layout` to specify the data type (eg `Java.INT`); direct offset access (using longs) is possible, but unaligned access (accessing an `int` at an odd address) will cause a `IndexOutOfBoundsException`; `setAtIndex` and `getAtIndex` multiply the index by the layout size to calculate the offset.
- Memory segments cannot be directly used for file I/O. Instead a `ByteBuffer` acts as an intermediary - you get the ByteBuffer *view* of the MemorySegment.
- Arenas manage the lifecycle of memory segments. Different Arena types offer different behaviours and performance characteristics. Arenas track related memory segments for collective management (especially deallocation). Segments from different arenas might be interspersed in memory. Closing an Arena deallocates its segments, potentially creating holes in memory. Subsequent allocations may need to find a hole large enough, leading to fragmentation and allocation overhead.
- Arena Types: `ofConfined()` - confined to the thread that created the Arena, fastest allocation and deallocation; `ofShared()` - can be accessed by multiple threads, closing has linear synchronization overhead depending on thread counts; `ofGlobal()` - never closed; `ofAuto()` - legacy ByteBuffer, avoid because of unpredictable GC based deallocation.
- `FileChannel.map()` now accepts an `Arena`. This allows mapping a file to a memory segment managed by a specific Arena. This is the preferred way to work with file-backed memory segments.
- `PaddingLayout` can be used for aligning struct members correctly. Its important for performance and correctness, especially when creating arrays of structs. Incorrect padding can lead to crashes or incorrect behaviour. Java 23 provides better error reporting for padding issues than Java 22.
- The JVM is aware of VarHandles and can optimize access. The compiler doesn't do type checking for VarHandles. You have to ensure correct types yourself. Runtime errors will occur if types are mismatched. VarHandle performance is close to `Unsafe`. VarHandle relies heavily on the JIT compiler, steady-state performance takes time to reach.
Jack Shirazi
Back to newsletter 292 contents
Last Updated: 2025-03-25
Copyright © 2000-2025 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips292.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us