Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Tips December 2015
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 181 contents
High Performance Lists in Java (Page last updated October 2015, Added 2015-12-29, Author Thomas Mauch, Publisher JavaZone). Tips:
- LinkedList has a big performance disadvantage compared to ArrayList.
- Boxing primitive data types to store them in Object based collections is inefficient in both speed and memory and garbage collection.
- For efficiently inserting elements into an array, you can use a cycle buffer structure - changing the indexation of the start of the array instead of moving array elements.
- Allowing a gap in an array structure lets you efficiently operate near an index where many operations are happening.
- GapList is a List structure which is as efficient as ArrayList for fast ArrayList operations, but is also efficient for operations like insertion which are inefficient in ArrayList.
- IntGapList is an efficient List structure for both array and linked list operations, but which operates directly on ints. There are also LongGapList, ByteGapList, CharGapList, DoubleGapList, FloatGapList.
- ArrayList and GapList don't scale to very large collection sizes. BigList addresses this by storing elements in small blocks making growing the collection efficient, and allows blocks to be shared between BigList instances which makes copying very efficient (a copy-on-write approach ensures shared blocks can be modified in separate instances). Recently accessed blocks have a faster access path.
- If you are only adding to and removing from the end of a list, use ArrayList. For any other combination of list operation activity GapList is faster than ArrayList, LinkedList and ArrayDeque; except for very large arrays (from a few thousand up depending on how random the removal/insertion are) when BigList becomes the most efficient general purpose list; if you are operating on a primitive data types (X) then use the corresponding XGapList or XBigList primitive collection implementation.
- Adding elements in random order to a sorted data structure will be slow - you are usually better off pre-sorting the elements then adding them.
High-Concurrency HTTP Clients on the JVM (Page last updated December 2015, Added 2015-12-29, Author Fabio Tudone, Publisher Parallel Universe). Tips:
- The number of in-progress requests your app can support depends on the language runtime, the OS and the hardware; modern OSes can support hundreds of thousands open TCP connections. However a modern OS can only support 5000-15000 threads, so for maximum concurrent request handling you cannot use a 1-thread to 1-connection model.
- If there are only few concurrent connections then the "one-thread-per-connection" model is perfectly fine. For maximum concurrent connection handling, you need to use an NIO model with a few threads handling many connections [the article lists several implementation for HTTP/S].
- Any method consuming asynchronous data must be asynchronous itself or it will block and nullify the advantages of asynchrony.
- Quasar fibers are very efficient threads implemented in userspace, so you can have millions of them.
- When only few sockets are open, the OS kernel can wake up blocked threads with very low latency. But OS threads add considerable overhead so you can't have hundreds of thousands of them and even with thousands, context switching becomes very expensive. OS threads are not the best choice for fine-grained concurrency on highly concurrent systems.
- With an NIO framework, you can handle over 40k active connections requesting simple HTTP reqquests using just 16 threads.
Weak, Soft and Phantom references: Impact on GC (Page last updated December 2015, Added 2015-12-29, Author Gleb Smirnov, Publisher plumbr). Tips:
- Whenever the garbage collector (GC) discovers that an object is weakly reachable (reachable only from Reference objects), it is put onto the weak reference ReferenceQueue and becomes eligible for finalization. After finalization, the GC has to check again that the object is not reachable, so objects held by weak references cause extra GC overhead.
- Many caching and other 3rd party libraries use weak referencing so, even if you are not directly creating any in your code, your application could still be using weakly referenced objects in large quantities.
- Soft references are collected much less eagerly than weak ones, typically in response to memory pressure (weakly referenced objects could be collected at any time). This means a double hit happens when memory pressure increases, with the GC already trying hard to free up memory it also starts to look at softly referenced objects meaning even longer or more frequent GCs.
- You have to manually clear() phantom references unless you dereference the PhantomReference object itself, otherwise both the PhantomReference and its referent remain in memory. Be careful, one unexpected exception in the thread that processes the reference queue could kill the thread leaving objects unexpectedly in memory and resulting in an OOME.
- Consider enabling the -XX:+PrintReferenceGC JVM option to see the impact that references have on garbage collection.
- Normally, the number of references cleared during each GC cycle is quite low. If this is not the case and the application is spending a significant period of time clearing references, then investigate further.
- Generic solutions where references are causing GC problems are: increase the heap size; make sure that any application level phantom reference queue processing threads have not died; change the application's use of references.
Everything You Know About Latency Is Wrong (Page last updated December 2015, Added 2015-12-29, Author Tyler Treat, Publisher Brave New Geek). Tips:
- Latency rarely follows a normal, Gaussian, or Poisson distribution; so looking at averages, medians, and even standard deviations is not greatly useful. Median latency is irrelevant.
- Freezes, for whatever reason (GC pauses, context switching, interrupts, IO flushes, etc) are unpredictable, and could affect your measuring tool, so disguising the freeze, if you aren't careful to make sure they don't.
- Latency graphs from most tools can be useless because they average the outliers, but the outliers are very important for latency measurements.
- The maximum latencies are important, you need to explain them. They might be for a well known reason, like a restart, but they all need itemizing.
- If a request is composed of multiple subrequests, the longest subrequest time is likely to be seen by a significant portion of requests.
- If a load generation tool waits for responses before sending the next request, and the request it is waiting for hits a freeze, then the requests that would have been sent during that wait would also have frozen but they are not sent so they don't get measured, causing incorrectly optimistic measurement of overall response times - all those requests that would have been frozen are never even sent so never measured.
- If your monitoring system is affected by the same freezes that system it monitors, then the freeze can stop a measurement being initiated, so the freeze may not be seen by the monitoring system.
- Induce a long simulated freeze in your system to confirm that the monitoring and/or load testing tools correctly measure the freeze time.
- Define your SLAs and then determine how many machines you need to meet them. Establish the limits.
- Use histograms for latencies so that you can see outlier data easily.
5 Tips for Reducing Your Java Garbage Collection Overhead (Page last updated December 2015, Added 2015-12-29, Author Niv Steingarten, Publisher takipi). Tips:
- Wherever possible, size collections for the maximum capacity they will need in their lifetime. This improves both collection processing speed and garbage collection efficiency.
- Avoid creating data in memory to process when you can instead process it as a stream, keeping the in-memory data to a minimum.
- Use Immutable objects weherever feasible. When marking, the garbage collector can skip immutable objects in older generations, since they cannot reference anything in the younger generation (the objects an immutable object can reference must be alive when it is created in the young gen, so those other objects can never be younger than it).
- Dynamic string concatenation can generate a lot of garbage. Use string builders.
- Avoid boxing primitive data in Object based collections by using collections that are implemented to handle primitive data efficiently (like the Trove primitive collections).
Which thread executes CompletableFuture's tasks and callbacks? (Page last updated November 2015, Added 2015-12-29, Author Tomasz Nurkiewicz, Publisher nurkiewicz). Tips:
- CompletableFuture.supplyAsync() by default uses ForkJoinPool.commonPool(), the thread pool shared between all CompletableFutures, all parallel streams and all applications deployed on the same JVM.
- Use CompletableFuture.supplyAsync(Supplier<U>, Executor) with your own thread pool instead of Use CompletableFuture.supplyAsync(Supplier<U>) which uses the same thread pool as all parallel streams and any CompletableFutures which use the default pool.
- Use CompletableFuture.thenApplyAsync(Function, Executor) with your own thread pool instead of Use CompletableFuture.thenApply() to control which threads the function executes on.
Back to newsletter 181 contents
Last Updated: 2017-11-28
Copyright © 2000-2017 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us