Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Tips September 2016
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 190 contents
Concurrent Programming with Java (Page last updated April 2016, Added 2016-09-28, Author Ankireaddy Poly, Publisher NightHacking). Tips:
- Give your thread a threadname, so that you can identify which code matches that thread's execution when looking at stack traces.
- A synchronized block locks the block for exclusive access.
- Checking conditions outside of the synchronized block does not guarantee that the condition is still valid inside the block - you need to do the conditional check inside the block.
- If you synchronize on the "this" object (whether explicitly using "this", or implicitly by synchronizing methods rather than blocks), other code has access to that object and can synchronize it - you don't have that under your control. So you shouldn't synchronize on "this", instead use an internal private object that you have complete control over which code can synchronize on.
- Uncontrolled thread creation is a problem waiting to happen - resource exhaustion will happen when you get more successful! Instead use controlled size thread pools.
- ReentrantLock can be used instead of a synchronized block. It has a "fairness" capability that you don't get with a synchronized block; also you can determine the number of threads waiting to acquire the lock and you can also try to get a lock with a timeout, neither of which you can do with the synchronize block.
- ReentrantReadWriteLock is like ReentrantLock but has two locks - multiple readers can simultaneously hold the readlock as long as there are no holders of the writelock; but only one writelock can be held and if held, no readlocks can be held simultaneously (ie either N readlocks or 1 writelock can be held at any specific time).
- Callable is like Runnable but with the ability to get a data item from the thread; and also to stop the thread; and to return an exception if one is thrown by the run method.
- CountDownLatch let's you synchronize execution timing across multiple threads.
- CyclicBarrier let's you execute something based on multiple threads reaching a particular state.
Laws of Performant Software (Page last updated September 2016, Added 2016-09-28, Author Crista Lopes, Publisher Tagide). Tips:
- Check how any libraries you use scale - there is always a better way to do it if you need to.
- Small code differences can have huge effect differences. Examples include: producing too much garbage; not caching; not filtering early enough; slow or excessive serialization; too large data structures.
- There is a very high correlation between performance degradation, and the unbounded use of resources. If you don't limit the use of a resource, chances are it will be exhausted. Examples include: ad hoc threads instead of pools; unbounded pools, caches, queues, instead of fixed sized ones.
- Unless you are 100% sure the lines are always of reasonable size, do not use readline!
- Don't use resources in an unbounded manner, or the operation of your program will degrade very quickly after a certain threshold.
- Most parts of the code contribute very little to performance, so you need to identify and focus on the parts of code that do matter.
- No amount of nodes, cores, or memory will save you from code with performance design flaws.
JVM Tuning in a rush! (Page last updated September 2016, Added 2016-09-28, Author Tomasz Borek, Publisher Space Video). Tips:
- You need your application monitored or you'll have huge problems finding what is causing issues.
- Understand your traffic patterns, the infrastructure handling it, and how the application handles that traffic.
- Determine the limits of the traffic that your system can handle.
- A spike in valid traffic looks similar to a DDOS attack - you need to be able to handle both.
- Some change delivered to production are often the cause for a performance issues - make sure you know and record all changes being delivered.
- Hitting connection pools and thread pools limits are often the causes of performance issues, make sure config is known and the pools are monitored.
- You need to turn on GC logging, or it is much more difficult to tune.
- The JVM is a process, so limited by RAM and ulimit (on linux).
- OOMEs in your JVM application typically are from hitting the heap limit, or the PermGen/Metaspace limit, or hitting RAM or filedescriptor limits.
- Each thread requires additional OS memory above the heap setting.
- The quick fix for OOME of a heap is to increase the size - but if you have a leak that will just delay the time until OOME is hit, so you need to monitor the heap used.
- A larger heap tends to have longer worst case pauses
- Try to have very short-lived apart from a few very long-lived objects - this works best with the garbage collection.
- When the application becomes unresponsive, eg from a GC pause, this not only creates a backlog of requests, but human nature may also cause many more requests than normal - when people see a program hanging, they tend to retry.
- Using the wrong data structure tends to cause massive performance problems.
- Use "The Box" methodology: consider traffics; then code (threads, algorithms, data structures); then JVM flags; then OS resources (ulimit, config, architecture, other processes); then any virtualization; then the hardware.
- Use the "USE" methodology, Utilization, Saturation, Errors: how utilized is your resource; how big is the wait queue; how many errors is it generating.
- sar is useful for generating baselines that you can compare on a daily basis.
- jinfo let's you change any manageable flags on the fly (without restarting).
- Three very useful plugins for jvisualvm are OQL syntax support, thread inspector, and gcvisualizer.
Thinking in Parallel (Page last updated September 2016, Added 2016-09-28, Author Brian Goetz, Stuart Marks, Publisher JavaOne). Tips:
- for loops are explicitly sequential. Streams are declarative, there is no concept of ordering within any one Stream operation. Streams are also easily parallelizable.
- Some for loops create an artificial data dependency between successive elements that forces sequential processing. Using Streams avoids this artificial dependenyc, making the code clearer and parallelizable.
- Parallelism is purely an optimization (using more resources to get to the answer faster) - you can always run anything sequentially. You are always using more work to do something in parallel. So given this, you should only run in parallel if it speeds up the computation! Measure, and leave it sequential if no speedup - sequential is less error prone and uses fewer resources.
- The sequence for deciding to make something parallel is: first, try it sequentially, it might be fast enough; if not, measure whether there is any speedup from going parallel (there might not be if the overheads are sufficient).
- The simplest way to decompose parallely is divide-and-conquer: recursively split the problem until you reach sub-problems that are more efficient to solve sequentially than by splitting; then combine the solutions of the sub-problems to reach the aggregate solution. This uses no shared mutable states which means no locks, no concurrency issues.
- For processing small amounts of data, a sequential solution is usually faster than a parallel one because of parallelism overheads.
- An efficient parallel algorithm divides the problem as quickly as possible (to use otherwise idle resources).
- Parallelising data processing has costs which can reduce the potential speedup: splitting the data (how easily does it split, arrays split well, linked lists don't); managing the split data; dispatching to different processors (including context switches); possible data copying (splitting/merging collections is expensive); non-locality of data on the cores (keeping the CPU busy - waiting for cache misses doesn't, array based are good, links are not).
- Just because it's easy to switch a stream process to using parallelism doesn't mean it's efficient to do so.
- If the CPU is waiting for data (because it's not already in the CPU cache), it 's much slower than if the data is already available - array based data sources get into the cache very efficiently.
- Operations like limit(), skip(), findFirst() are inefficient for parallelism - either avoid them, or call .unordered() if the order of encountering elements is not meaningful (eg with HashSet) or not important, and the stream will optimize those operations correspondingly.
- sum() and max() operations are really efficient to merge, but groupingBy() on a HashMap is very expensive (a lot of copying) and could overwhelm any parallelism advantage.
- A rule of thumb is that if N is the number of data items and Q is the amount of work (eg operations) per item, then NQ < 10000 means there will not be a speedup. Over 10,000, may be a speedup - measure! But there are additional factors: NQ too low; cache-miss ratio too high; data source is expensive to split; combining the computation results is too expensive; stream pipeline uses encounter-order-sensitive operations.
- Stop optimizing as soon as the measured performance achieves the performance requirements. By definition if you have no performance requirements, you should stop before you start.
- Streams pipelines (in Java 8) are intended for efficiently parallelizing compute intensive tasks, not IO intensive ones.
Collections Refueled (Page last updated September 2016, Added 2016-09-28, Author Stuart Marks, Publisher JavaOne). Tips:
- Default methods added to the collections interfaces include Iterable.forEach() for streams; Iterator.remove() which throws a UnsupportedOperationException so you no longer need to implement that method if you don't want it (just override if you do want to support it); Collection.removeIf() for simpler code but also more efficient than doing the equivalent code by applying optimizations; List.replaceAll(); List.sort() sorts list in-place, so is more efficient than the old Collections.sort(); Map.forEach() with a two-arg lambda (key, value); Map.replaceAll() to modify the values in a map.
- Map has additional defaults methods Map.computeIfAbsent(), Map.computeIfAbsent(), Map.getOrDefault(). eg a Java 8 Multimap implementation is very concise, eg put(k, v) is map.computeIfAbsent(k, x -> new HashSet<>()).add(i); remove(k, v) is map.computeIfPresent(k, (k1, set) -> set.remove(v) && set.isEmpty() ? null : set); etc
- Comparator has default methods that makes it simpler to create correct Comparators, eg Comparator.comparing(), Comparator.thenComparing(), Comparator.nullsFirst(), Comparator.nullsLast(), Comparator.naturalsOrder().
- JEP 269, available in Java 9, adds factory methods for collections which return immutable collections, eg List.of(a, b, c, ...), Map.of(k1, v1, k2, v2, ...). Note that immutable collections are automatically threadsafe. They are also space efficient.
- varargs is expensive becase it creates an array that then gets thrown away, for each call.
The Diabolical Developer's Guide to Performance Tuning (Page last updated September 2016, Added 2016-09-28, Author Kirk Pepperdine, Publisher JavaOne). Tips:
- Execution profiling tells you what your application is doing - but doesn't tell you what your application is NOT doing, which makes it useless when the problem is that your application is waiting for something.
- If you don't have production equivalent data in you testing, you won't see the issues that you will see in production. As a consequence, lots of enterprises have decided that testing in production (eg with a subset of traffic) is the only realistic approach.
- Determine where most of the request time is being spent - that's where you need to focus your effort to get the biggest improvement (Amdahl's law).
- Turn on GC logging, -XX:loggc:... -XX:+PrintGCDetails -XX:+PrintTenuringDistribution.
- Make sure you are monitoring the system and hardware resources.
- Ask which category is dominating consuming the CPU: Application, JVM, Kernel, Nothing.
- A performance issue decision tree: Is the system cpu > 10% of user cpu? If yes, the system is the issue, look at system cpu, memory, disk IO, network IO, locks, context switching, wait queues. If not then is there spare CPU capacity (idle > 0%)? If yes then you aren't using the CPU fully, check your thread pools are big enough, look for locks that are preventing you doing work. If not, check the GC logs to see if you need to tune GC or memory (eg allocation rates); if you don't, then your application is the cause of the issues and you need to profile that, and improve the data structures and algorithms.
- You need a performance requirement! Otherwise why do you need to tune?
Back to newsletter 190 contents
Last Updated: 2017-10-01
Copyright © 2000-2017 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us