Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips May 2018
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 210 contents
https://www.youtube.com/watch?v=q1mMgsvAkF8
The Computer Science behind a modern distributed data store (Page last updated May 2018, Added 2018-05-28, Author Max Neunhoeffer, Publisher Devoxx). Tips:
- Distributed system consensus is hard when: the network has an outage or dropped/delayed/duplicated packets; disks fail (and corrupt data); machines fail (and return with old data or no data)
- Paxos and Raft are consensus protocols, Raft is understandable
- The limit for sorting speed on modern systems is not the comparison computations, it's the data movement; so the old established sorting algorithms are no longer the best. Sorting is fastest by getting the data to the parallel cores faster. mergesort is a good parallel sort with a min-heap data structure. You size the heaps to fit into the CPU cache.
- Log structured merge trees allow fast bulk inserts into large datasets (by making most disk writes sequential), and provide a hot set that fits into RAM to give fast reads. A Bloom/Cuckoo filter helps find items faster (or tell you the item doesn't exist faster) than just searching directly
- Hybrid Logical Clocks ensures timestamps for events linked by causality are correctly ordered. This is done by sending a timestamp with every message, and using the later timestamp of the local clock or the largest timestamp of any message (where the local clock is already NTP synchronized).
http://performantcode.com/gc/gc-explained-times/
User, Sys and Real Times in GC Log (Page last updated September 2017, Added 2018-05-28, Author Grzegorz Mirek, Publisher performantcode). Tips:
- User, sys and real times in the GC log are the same as the output of the unix 'time' command: real = the elapsed real time (wall clock time) for the GC; user = the CPU time spent in non-kernel user-mode for the GC; sys = the system CPU time of rth GC.
- You would expect to see real time less than user+sys time for any GC executed with parallel threads (most GCs), since each thread has time on the CPU and these sum to give the final user and sys times. In fact you can get a good measure of the parallel efficiency exhibited by the GC by calculating the ratio '(user+sys)/real'. If the serial collector is used, then this ratio would be close to 1.
- If the ratio '(user+sys)/real' is significantly less than 1 for a GC log, this is a good indication of a problem with the system, either IO causing the GC to be blocked for a while, or CPU being saturated on the system by activity from other processes.
- For latency-sensitive Java applications, you should move Java log files to a separate or high-performing disk drive (e.g., SSD, tmpfs)
https://raygun.com/blog/java-performance-optimization-tips/
Java performance optimization tips: how to avoid common pitfalls (Page last updated September 2017, Added 2018-05-28, Author Taylor, Publisher Raygun). Tips:
- Before trying any optimizations, check that your assumptions about the performance are correct by profiling the code
- Take baseline measurements before and after each optimization to confirm the improvement.
- Before deciding on optimizing a code section, consider whether the basic approach is right - maybe changing the whole approach, algorithm, or data structure would result in much better performance.
- Make sure you consider realistic amounts of data, large data volumes often need to be handled completely different from small data volumes
- The data access pattern matters - random access and sequential access can be optimized differently.
- Streams have a performance cost compared to hand-coded loops (eg increased memory allocations), but reduce errors from more clarity. Parallel streams should only be used in rare scenarios and only after you've measured both the parallel and serial operations to confirm the parallel one is in fact faster.
- On smaller data sets the cost of splitting up work, scheduling it on other threads and stitching it back together once the stream has been processed will dwarf any speedup from running computations in parallel.
- Don't underestimate the cost of parsing into objects and formatting objects into strings. If you can avoid the conversions, you can gain 2 orders of magnitude!
- Use String formatting rather than concatenation when it's likely that the format won't be called (eg debug code) as that avoids unnecessary object creation most of the time; but concatenation instead of formatting when it is likely it will be called because format is much less efficient than concatenation.
https://www.pluralsight.com/guides/get-rid-of-that-bottleneck-using-modern-queue-techniques
Get Rid of That Bottleneck Using Modern Queue Techniques (Page last updated January 2018, Added 2018-05-28, Author Kobi Hikri, Publisher Pluralsight). Tips:
- With push you need to consider the scenario the receiver has to process more requests than it can handle. Either the producer waits for the consumer to be ready to receive, or waits for the result from the consumer. In both cases you have an implicit queue (maybe a set of threads). It's better to make the queue explicit, ideally between the producer and the consumer so that each can operate at their own maximum efficiency.
- Queues can decouple services so that they are not reliant on knowing each other's APIs. This allows for many optimizations such as new implementations taking over seamlessly from older ones. Service discovery, load balancing, dynamic system scaling and reliable messaging can be handled by the queue infrastructure instead of the services, making the system more robust and scalable.
- A message queue can act as a load balancer with inverted responsibility: a typical load balancer maintains a list of services it sends messages to and continuously checks they are available. Adding and removing services from the cluster needs to use the load balancer as controller. A message queue allows the service instances to attach and get/send messages on demand, the services themselves are effectively doing the load balancing simply by being available.
Jack Shirazi
Back to newsletter 210 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips210.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us