Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips January 2026
|
JProfiler
|
|
Get rid of your performance problems and memory leaks!
|
|
JProfiler
|
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 302 contents
https://www.youtube.com/watch?v=MtTZ9FjXjAQ
How to build distributed concurrency primitives (Page last updated December 2025, Added 2026-01-26, Author Andrii Rodionov, Publisher GeeCON). Tips:
- Local concurrency techniques include: keywords like volatile for memory visibility, synchronized for blocks an methods; and classes like Atomics, Semaphores, and CountdownLatches.
- Spin locks implement optimistic locking: threads continuously loop (spin) while attempting to atomically change a variable's state (eg via a CAS operation). This is efficient for very short lock durations as it avoids the overhead of parking threads.
- One distributed locking technique is RPC wrapping: traditional Java data structures are wrapped in Remote Procedure Calls (RPC) to allow external microservices to access lock APIs.
- One distributed locking technique is Heartbeats and Leases: to prevent a lock from being held indefinitely if a client fails, clients must periodically send heartbeats to the server. If heartbeats stop for a set duration, the server automatically releases the lock.
- One distributed locking technique is Fencing (fenced locks): to handle "false positives" where a client is alive but delayed (e.g., due to a GC pause), the server issues a monotonically increasing fence number with each lock. The shared resource only accepts requests with a fence number equal to or greater than the last one it processed, effectively blocking "old" lock holders.
- One distributed locking technique is State replication: to avoid a single point of failure, the internal state of the lock (owner, waiting queue) is replicated across a cluster of servers so that the system remains functional if one server restarts.
- One distributed locking technique is to use Consensus protocols, eg Raft or Paxos: These protocols ensure linearizable behaviour, meaning all members of a cluster return the same state. They require a majority (quorum) of nodes to agree on an operation before it is executed.
- One distributed locking technique is to replicate state machines: this architecture involves persisting commands into a replicated log. Once a majority of nodes acknowledge the log entry, the command is executed across all machines to ensure identical state.
- One distributed locking technique is flipping between optimistic and pessimistic loops. Choosing between a CAS (Compare-and-Swap) loop and pessimistic locking depends on contention; CAS may require significantly more RPC calls under high contention.
- One distributed locking technique is In-Place Server Modification: rather than a client reading, modifying, and writing data back (three RPC calls), the modification logic (function) is sent to the server. The server performs the update locally, which can be three times faster by eliminating network latency.
- One distributed locking technique is using Fairness Queues: implementing a fairness log or queue ensures that external clients are served in the order they requested the lock, preventing them from being blocked indefinitely by "faster" services.
https://www.fasterj.com/articles/concurrencybug.shtml
Finding the root cause of a rarely occurring race condition (Page last updated December 2025, Added 2026-01-26, Author Jack Shirazi, Publisher fasterj.com). Tips:
- When a map lookup returns null unexpectedly, there are only a handful of general causes: The map "read" saw a stale or inconsistent view (map not thread-safe); The key was removed; The value was removed or set to null; The key was never added; The key changed in a way that breaks the lookup (e.g., mutated hashcode or identity).
- Java objects cannot be garbage collected and later "re-acquired.".
- Fundamentally, concurrency bugs come from shared mutable state. All three words need to be active: There needs to be state, ie some data structure; The data structure needs to mutate, ie it gets updated (as well as accessed); The data structure needs to be shared by more than one thread.
- "Thread-safe" does not mean "free of races". Thread-safety depends not just on technical implementation but also how the class is used.
- Rare failures are often caused by shared state.
- Rare CI-only failures are often caused by shared static test state.
- "Thread-safe" does not prevent races between components with overlapping lifecycles.
- When you've eliminated all root causes, revisit your assumptions - especially around isolation.
- Focus on exactly how the shared mutable state is used by all the components and instances that use it.
- Minimize the shared mutable state being used. The more there are in a workflow, the more exponentially difficult it is to reason about how they are being used.
https://www.youtube.com/watch?v=AYnf2kPlQuQ
How Java can power low-latency systems (Page last updated November 2025, Added 2026-01-26, Author Lakshminarasimhan Sudarshan, Publisher Hasgeek TV). Tips:
- Use JMH to measure performance.
- Use Linux perf utility from the command line to gain visibility into low-level hardware metrics that impact performance.
- Monitor instruction counts - compare the number of instructions generated by different code implementations; for instance, object arrays can require 3x more instructions than primitive arrays, leading to slower performance.
- Analyze L1 cache misses to identify when the CPU is forced to fetch data from main memory, which can be 60x slower than fetching from the L1 cache.
- Track branch prediction misses as these can degrade performance by wasting CPU pipelining efforts, even in code that does not appear to have many explicit "if" statements.
- Measure operations per millisecond to identify significant deltas between different data structures or processing methods.
- Benchmark contiguous memory allocation (like primitive arrays) vs. non-contiguous allocation (like object arrays) to see the effect on cache hits.
- Operate on batches of objects at a time, and model the batch in primitive arrays rather than an array of Objects.
- Monitor for false sharing where different threads need to use the same cache lines even though the data each thread is different, because the different data happens to fall into the same cache line.
- Aim for: contiguous memory access, simple code, avoid if's, batch processing, use vector APIs,.
Jack Shirazi
Back to newsletter 302 contents
Last Updated: 2026-01-25
Copyright © 2000-2026 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips302.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us