Tips June 2016

http://www.javamagazine.mozaicreader.com/MayJune2016#&pageSet=26&page=0
Building a Massive Off-Heap Data Queue (Page last updated May 2016, Added 2016-06-28, Author Peter Lawrey, Publisher Java Magazine). Tips:

An unbounded queue removes the need for flow control between producers and consumers, as spikes can't cause queue overflow.
An unbounded queue detaches the performance of producers from consumers, so they can be tested independently.
Retaining all requests lets you performance test repeatedly using the same production data.
Using shared memory between JVMs eliminates TCP/IP overheads.
Using shared memory gives you the benefit of operating system efficient buffering and asynchronous writes, though at the cost of having potentially delayed persistence.
If the shared memory working set overflows main memory size, performance can drop by an order of magnitude.
Memory-mapped files can be used in a mode where there is a single copy in memory - so that is the same memory shared between processes. Writes are visible across processes in the time it takes the L2 cache to synchronize (eg ~25ns).
Writing metadata (eg message size) first to shared memory, followed by writing the data, allows multiple threads to write data at the same time as the threads can read the metadata of other threads and avoid the "claimed" region.
Sequential writes and reads are more efficient when using persisted storage.

https://www.youtube.com/watch?v=SjC9bFjeR1k
Understanding Microservice Performance (Page last updated June 2016, Added 2016-06-28, Author Rob Harrop, Publisher Spring I/O). Tips:

Define exactly what you mean by latency - is it to first byte, last byte, etc.
Record all requests with times and timestamps and success/error, so you can validly analyse the data. Interesting resulting stats include min,max,mean (throughput) over windows(10s, 30s, 1m, ...) and min,max,95th%,99%,99.9% (latency).
Latency distribution is long-tailed, not normal nor exponential.
The most useful latency is the latency that the customer sees, not the service time.
The probability that a customer will see a worse than a nth centile request is 1-(n)^requests, eg the average page makes 42 requests to display the page, then 1-(0.95)^24 = 88% of pages see the 95th centile or worse in latency!
Throughput needs to be related to the latency of those requests for that throughput. Occupancy = latency x throughput (Littles Law) and your occupancy for the system is limited, so beyond some point throughput and latency form an inverse relationship - typically at high utilisation.
Services sharing the same infrastructure cause a more than additive increase in latency because of contention
Guesstimate will simulate the addition (stacking) of latency distributions.
Amdahl's law says that the speedup you get in the overall system from speeding up a subsystem is limited by the proportion of time that all the other subsystems take - so focus on the subsystems that take a lot of time.
The Universal Scalability law says that the system throughput capability increases by adding capacity but is reduced by the amount of additional contention and crosstalk that the additional capacity adds (specifically, relative capacity = N / (1 + a(N - 1) + b N (N - 1)) where N is capacity (eg servers), a (alpha) is contention overhead and b (beta) is crosstalk overhead (agreement between systems)). If beta is 0, this is Amdahl's law. Both alpha and beta factors limit scalability. Fitting your data into the equation will give you the alpha and beta, and let you extrapolate for higher load and capacity.
Use the system utlization as an early warning sign that throughput or latency may begin to suffer soon.

http://videlalvaro.github.io/2015/12/learning-about-distributed-systems.html
What We Talk about When We Talk about Distributed Systems (Page last updated December 2015, Added 2016-06-28, Author Alvaro Videla, Publisher Videla). Tips:

In distributed systems, it can be difficult to detect whether a process has crashed, or whether it is just taking a very long time to reply.
Consider which kind of process failures you want to cope with: crash failures (a process crashes and eiither stops or recovers to a good state) are far more common than arbitrary failures (processes can send wrong information to their peers; they can impersonate processes; corrupt its local database contents, etc), so you may want to focus automatic error handling on the common cases and leave the rare cases to manual intervention.
Timeouts and regular pings (heartbeats) are essential to efficiently handle failures in distributed systems.
Having leader nodes based on consensus introduces bottlenecks for many operations, so you need to carefully decide whether they are appropriate for problem being solved.
There is no simultaneity in a distributed system, so the system needs to be designed with that in mind.

https://www.infoq.com/presentations/low-latency-concurrrent-java-8
The Quest for Low-latency with Concurrent Java (Page last updated March 2016, Added 2016-06-28, Author Martin Thompson, Publisher QCon). Tips:

Causes of blocking include: JVM safepoints (typically GC related, but others too like JIT events and lock events); OS related freezes (usually from various IO events including paging); hardware related; notifcations; contention; synchronization.
If you don't queue, you're very responsive - but the only way to avoid queuing is to provision a high amount of capacity so that there is no need to queue, which can be expensive.
Amdahl's law says that the speedup is limited by the part of the job that can't be parallelized (eg if 95% of it can be parallelized, that means 5% can't be which means the maximum theoretical speedup is 20x); the Universal Scalability law continues from Amdahl to point out that you additionally have coherency effects across the tasks which reduces the potential speedup even more and also means that there is an optimal number of parallel tasks to parallelize the task by, after which the speedup actually decreases.
Enqueueing and dequeueing time can be longer than the time spent queueing for small queues, especially in blocking queues. Signalling the thread that it is unblocked (eg with java.util.concurrent.locks.Condition) takes at least a few microseconds, but with outliers even on a quiet system up to milliseconds.
ConcurrentLinkedQueue has better performance than ArrayBlockingQueue and LinkedBlockingQueue, but an optimised pre-allocated ring-buffer implementation (like the Disrupter, ManyToOneConcurrentArrayQueue, ManyToOneRingBuffer, Aeron IPC) can be two orders of magnitude faster.
Experience indicates that logging is one of the biggest performance problems.

https://www.infoq.com/presentations/cloud-anti-patterns
Microservices Antipatterns (Page last updated April 2016, Added 2016-06-28, Author Tammer Saleh, Publisher QCon). Tips:

Scale only those services that need to be scaled, rather than all services.
Boring is beautiful, Microservice may not be the answer, don't just start there - start monolithic and extract as needed.
No DB should be shared amongst microservices, otherwise the microservices must upgrade at the same time with any schema change. Use a versioned gatekeeper service which supports both old and new schemas.
Use queueing to smooth traffic so that the system can handle peaks with degraded service rather than failure.
A Discovery Service or (DNS) router helps you autoscale services by reducing configuration overheads.
Use Circuit Breakers to prevent ailing services from being overloaded (only let requests gradually increase again when the service is fully operational again).
Use correlation IDs for each request so that a request can be tracked across services to identify where issues (including bottlenecks) occur.
Use client wrappers for APIs - this provides flexibility to mock and also change the communications layer to a more efficient one without changing the clients.
Every difference between services (from hardware layer up) introduces more unique changes that may need fixes applied for any issue - try to commonise as much as possible.

https://dzone.com/articles/concurrency-and-how-to-avoid-it
Concurrency and How to Avoid It (Page last updated April 2016, Added 2016-06-28, Author Bozhidar Bozhanov, Publisher DZone). Tips:

Selecting the safest isolation level (serializable and repeatable read) may result in too slow performance. But other isolation levels can give lost updates, phantom reads and other concurrency issues.
On a single machine you can use language concurrency features (Java locks, concurrent collections, etc) to control concurrency conflicts, but when you deploy to more than one machine you need distributed solutions: distributed locks; message queueing; clustering frameworks; data-server locks; conflict-free replicated data types (CRDT); journaling (insert-only model) with versioning.

Jack Shirazi

Back to newsletter 187 contents

Last Updated: 2025-12-25
Copyright © 2000-2025 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips187.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us