Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips April 2019
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 221 contents
https://www.youtube.com/watch?v=Jl-lybDC3h8
Don't Make it a Race: The Four Common Concurrency Control Patterns (Page last updated April 2019, Added 2019-04-29, Author Jack Shirazi, Publisher JAXLondon). Tips:
- Low level concurrency building blocks (in order of sophistication) are: Runnable, Thread, ThreadGroup, synchronized, Object.wait(), Object.notify(), Object.notifyAll(), NIO, java.util.concurrent.locks, volatile, java.util.concurrent.atomic, java.util.concurrent, some of java.lang.invoke (eg VarHandles), @Contended
- Thread-safe collections include: java.util, Clojure collections (all available in Java), PCollections, Chronicle, Agrona, Guava, Eclipse, Fastutil, Vavr, Apache, Trove, ObjectLayout/StructuredArray, Roaring Bitmaps, LMAX Disruptor, JCTools, high-scale-lib
- The four common concurrency data control patterns are: pessimistic locking, optimistic transactions, queue to a single thread, partitioning the data. The majority of frameworks are built using combinations of these. Pessimistic Locking is easy, simple to understand and probably fast enough at reasonable concurrency; Optimistic Transaction is most likely the way to go if you need to be able to handle higher concurrency, but you need to work hard to avoid subtle bugs; Partitioned Data is the fastest highly concurrent option but really hard to achieve so a very long way to go to get your implementation mature (ie low on bugs); Sending to a Queue is the easiest for adding asynchronous support and also for adding distributed support.
- The biggest problem with concurrency is Shared Mutable State. EACH word matters because eliminating ANY word stops concurrency being difficult. Stateless is the jackpot; Immutable data is great, you can use it across all threads with no problems.
- High level concurrency models are designed to reduce concurrency problems by avoiding sharing or mutability. The ten most common concurrency models in java are: DIY - Threads & Locks (eg Thread class, Runnable, synchronized, java.util.concurrent.locks, Executors); Functional Programming (eg Streams & Lambda expressions, RxJava); Atomic & Thread-local (eg java.util.concurrent.atomic, Clojure collections, VarHandle, SoftwareTransactionalMemory); Actors (eg Akka framework); Communicating Sequential Processes (eg ParallelUniverse'sQuasar framework, Project Loom, Apache Camel); Data Parallelism(eg using GPUs, "Java on the GPU" Dmitry Aleksandrov); MapReduce (eg Apache Spark framework); Single-threaded Event-driven (eg Vert.x framework); Multi-threaded Event-driven (eg Kafka framework); Grid Computing (eg Apache Ignite, Hazelcast).
- A procedure for handling Shared Mutable Data: 1. Determine shared mutable data early (and the expected concurrency level); 2. See if you can eliminate any word: Shared|Mutable|State; 3. If you can't, try to use a concurrency model; 4. If you can't then you need to understand concurrency really well; 5. Encapsulate and Minimize Touchpoints.
- Encapsulate and Minimize Touchpoints: Concurrency patterns are not mutually exclusive, choose according to need, try to keep it as simple as possible; Use Good OOP - encapsulate and present component APIs for client classes to use, encapsulation is spectacularly important for maintainable concurrency management; Avoid letting any data structures escape the class, that's a recipe for creating concurrency bugs, eg don't have getMap(){return mySharedMap;} ; Use immutable classes returning new instances for any change to minimize bugs creeping in during maintenance; Use mature thread-safe collections.
https://www.youtube.com/watch?v=OYpTn0nWKR4
10,000 Java performance tips over 15 years - what did I learn? (Page last updated May 2017, Added 2019-04-29, Author Jack Shirazi, Publisher Devoxx). Tips:
- Apart from outliers and concurrency, almost every performance & memory problem in Java has a known technique to identify the cause. Know what is achievable performance (with reasonable effort) vs extreme outiers (that take huge effort to achieve)
- With Stateless Concurrency, any concurrency number is achievable, this is completely horizontally scalable.
- For Stateful Concurrency, distribution matters most! The CAP theorem applies (you can't have all 3 of consistency, availability, partition tolerance), so globally consistent transactions can only be responsive if they are eventually consistent. You can either have long slow globally consistent transactions (eg bank account balances); Or local transactions but globally eventually consistent (your facebook updates take a while to show round the world even after they show on your feed).
- For Immediately consistent Stateful Concurrency, you can reasonably achieve (on 1 box, see TPC-C) 2000 transactions per second with a 5 second response time.
- For Eventually consistent Stateful Concurrency, you can reasonably achieve (on 1 box) 300 transactions per second with a 100 millisecond response time.
- Data has a set of boundaries after each of which performance decreases dramatically. Data fitting in to: CPU cache; NUMA node; total RAM; local disk; remote persistent storage (take your time to determine which of these is right for your application).
- Responsiveness tends to be limited by your application request processing time and GC pause time.
- For Stateful applications, under 100 milliseconds response time is achievable at scale and seems to be the acceptable target. To achieve this you need to: avoid doing things that need to create many objects or make a lot of garbage in one go, eg reloading large XML config, loading and dumping large documents, replacing large graphs and collections instead of editing the elements and keeping the data structure
- For Stateless applications: 40 millisecond is the human perception limit, and is easily achievable; 20 milliseconds is not too hard to achieve, often with just garbage collection tuning; Around 5 milliseconds is achievable with garbage collection tuning together with object lifecycle tuning; Consistent 2 millisecond pause times is really hard but not impossible to achieve.
- To achieve below 2 millisecond pauses needs a complete change in coding style, and most likely a complete rewrite of the app ? basically you can't have any GC pauses: avoid object churn; use zero-copy; use very large heaps (so a long time before they fill) that can be restarted before any garbage collection
- For pauses of 10 microseconds and lower, you need to architect: for cores to run independently; tune the data for the shared CPU cache; and use thread affinity to keep specific operations on specific cores.
- For good performance, designs to avoid include: Fine-grained communication; Treating local calls and remote calls equivalently; Designs that are difficult to add caching to; A non-parallelizable workflow; No load balancing in the architecture; Long transactions; Big differences between the data model and object model; Slow extensive embedded security checking; Non intuitive user interfaces; Lack of feedback in the user interface; Locking shared resources for non-short periods; Not paginating data.
- Having a bottleneck is NOT the problem because there is always a bottleneck (otherwise it would take no time at all). So failing to achieve target times is the REAL problem - Which means you need targets!
- The bottleneck is almost always a shared resource.
- The most common non-manual-errors to cause downtime are resource leaks mainly: Memory Leaks; Forgetting to "close()" resources (JDBC connections, File handles); Disk space filled
- For Heap analysis and GC tuning, you really need GC logs.
- To tune the heap: 1. Avoid Swapping; 2. Remove serious memory leaks; 3. Choose a Xmx value; 4. Choose your GC algorithm; 5. If you still have problems, fine tune a) Young gen size, b) GC algorithm fine tuning options, c) Be prepared to read GC logs and do a LOT of research
- Stack traces list deadlocks
- Stack traces show synchronized locked monitors
- Identify contended methods by comparing repeated stack traces, looking for methods at the top - these are contended if locked
- Make sure to monitor queue sizes and thread pools for threads in-use
- The ThreadMXBean (see Thread Contention Monitoring docuementation) getThreadInfo() method is useful
- jstack -l gives java.util.concurrent lock info
- Check CPU utlization & context switches, specifically looking for decreases from the baseline not caused by context switches
- For slow database queries use JDBC monitoring (eg p6spy) and fix by tuning the requests
- For inefficient application code use an execution profiler (eg jvisualvm) and fix by tuning the code
- For too many db queries use JDBC monitoring (eg p6spy), refactor calls
- For concurrency issues use stack trace analysis
- For memory leaks use a Heap dump analyzer (eg eclipse MAT), identify the objects retaining memory and fix the code to stop that
- For configuration issues (pooling thresholds, request throttling), keep a record of changes, check diffs, reconcile against test configs
- For a slow DB use JDBC monitoring (eg p6spy), find clusters of slow queries, correlate against other events and DB stats
- For GC pauses use GC logging and gc logs analyzer (eg GCViewer), then tune the GC
- For Memory churn look for object allocation in a memory profiler (eg jvisualvm)
https://www.youtube.com/watch?v=S-awUjTOK60
How to analyze the most common performance problems in Java (Page last updated June 2018, Added 2019-04-29, Author Jack Shirazi, Publisher Java2Days). Tips:
- Having a bottleneck is not a problem, because there is always a bottleneck (otherwise it takes no time at all). Failing to achieve targets is the problem - which means you need targets to know if you have a problem.
- About half of configuration issues that cause problems can be eliminated by manually diffing the new deployment and the current deployment and finding obviously erroneous configuration changes.
- For datastores, indexes always matter, caching in memory always makes it faster, faster disks always makes it faster, the schema matters enormously
- For individual queries that take a long time, improve the query. This likely needs an index or schema change.
- If you find multiple queries that are identical, ask: is it an inefficiency? Can they be cached or reduced in frequency?
- If you find multiple queries that differ only in a parameter, can they be combined or use a parametrized query?
- External requests are always more expensive than logging that request, so so log them all with timing for analysis. Look for individual requests with large times - these are inefficient requests; Look for lots of entries on the same connection and with very close timestamps, these are "chatty requests" and should be combined or simplified; Look for lots of individual big times across many unrelated requests over a time window - these show the external server is overloaded.
- You absolutely MUST monitor the waiting time to acquire the connection from any connection pools.
https://medium.com/hotels-com-technology/optimizing-your-server-by-limiting-request-overheads-e2ddc25d25e5
Optimizing your server by limiting request overheads (Page last updated January 2019, Added 2019-04-29, Author Jack Shirazi, Publisher Hotels.com). Tips:
- There are 3 legs to managing successful servers: make each server more efficient; scale your servers horizontally; and limit the impact of each request
- The business processing needed to satisfy a request is the only essential processing, everything else - including marshaling and unmarshaling data, and sending over the network - is overhead which you want to minimize.
- For requests that have different overheads (eg because some return much more data than others) create different APIs or APIs with different parameters, and make sure that the cost to the client is clearly differentiated for making the request that has higher overhead (higher latency, higher memory, etc) so that clients prefer the lower overhead requests.
- Minimize the size of the data transferred: consider compression, minifying, binary format, and a shared dictionary (lets you send codes rather than constants eg A stands for "fieldname1", B for "fieldname2", etc).
- Paginate and/or lazily Populate the data. The choice of chunk size to send is a balance between the number of additional requests this generates and the amount of unnecessary data transferred.
- Streaming data can be very efficient and is especially optimal where the data can be consumed as a stream
Jack Shirazi
Back to newsletter 221 contents
Last Updated: 2024-10-29
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips221.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us