Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips January 2014
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 158 contents
Implementing High Performance Parsers in Java (Page last updated December 2013, Added 2014-01-30, Author Jakob Jenkov, Publisher InfoQ). Tips:
- Sequential access parsers only access a "window" in the document sequence; random access parsers allow you to navigate through the data as you please. Random access parsers are generally slower and have higher memory requirements, but are easier to use.
- Instead of constructing an object tree from parsed data, a more efficient approach is to construct a buffer of indices into the original data buffer.
- If your data contains elements that are independent of each other, you can pull in a chunk that contains at least one full element at a time to restrict memory requirements.
- When indexing into a structure, you can use the maximum size of the elements to reduce memory usage, e.g. if the elements are never longer than 65,536 bytes, you can use an shorts instead of ints; or if 2 bytes are two small but 3 bytes are sufficient, you can use ints and use the fourth byte for other information related to the element (but packing like has a computation overhead, so packing is a space vs time tradeoff).
- Only instantiating objects when they will definitely be needed allows you to avoid instantiating objects that will never be used - many parsers instantiate all objects as they parse, whereas an indexed approach (holding indexes to the start and end of the elements and what type they are) allows you to minimize object creation by only instantiing on request.
Java EE 7 - Concurrency Utilities (Page last updated October 2013, Added 2014-01-30, Author Rahman Usta, Publisher DZone). Tips:
- The Java 1.5+ Executor API (e.g. ExecutorService and ScheduledExecutorService) provides efficient Thread management by offering a variety of thread pool environments. Java EE 7 Concurrency Utilities Standard makes these object types injectable and manageable by container services (as ManagedExecutorService and ManagedScheduledExecutorService).
- Container resources are special objects that are managed by application servers. DataSources, JMS resources and the Concurrency units with the Concurrency Utilities standard (ManagedExecutorService and ManagedScheduledExecutorService) are examples of container resources, accessible through the @Resource annotation or the Context interface type objects (e.g. InitialContext).
Caching In: Understand, Measure and Use your CPU Cache more effectively (Page last updated October 2013, Added 2014-01-30, Author Richard Warburton, Publisher InsightfulLogic). Tips:
- Performance issues are from: Networking; Database; Other external systems; Disk I/O; Garbage collection; Insufficient Parallelism; CPU load. Eliminate the other issues before targeting CPU load.
- The fastest CPU speed maximizes Instructions Retired (so that's a good thing to measure when low level tuning for CPU execution).
- System cache profilers include: perf; rdmsr/wrmsr.
- Try to fit your working set (data needed for processing) into the CPU caches.
- To assist the CPU prefetching mechanism: Use smaller data types (e.g. use -XX:+UseCompressedOops); Avoid holes in your data (arrays rather than linked lists); make accesses linear (loop over sequential elements).
- Multidimensional arrays (in Java) are arrays of arrays, so not necessarily close in physical memory. You can convert a multidimensional array to a linear array, e.g. using a calculated index like array[ROWS * col + row]. Make sure the inner loop is iterating over sequential elements, if you get this the wrong way round you'll just get cache misses all the time (i.e. NOT
array[COLS * row + col], but
array[ROWS * col + row] where the inner loop is
for (int row=0; row < ROWS; row++).
- To optimise the CPU cache access with data locality, use: primitive collections (like trove, GS fastutils); arrays rather than linked lists; array backed data structures like hashtables rather then search trees; avoid loop unrolling (that bloats the code and you want to keep the code smaller); custom data structures like Judy arrays (associative array/map), kD-Trees (generalised binary space partitioning), Z-Order Curve (multidimensional data in one dimension).
- You can use sun.misc.Unsafe to directly allocate memory to align it for caches - this is not recommended. You would need to handle access and update management specially for classes using this technique.
- Java hotspot allocations are 8-byte aligned. Mid aligned (within a cache line) is significantly faster than straddling multiple cache lines.
- Bigger pages waste space but give you faster lookups. Too much wasted space will cause paging to disk, dramatically slowing down performance. -XX:+UseLargePages is faster if you have enough memory. (You can also set page size for the OS & in the Bios).
- Data that is next to each other are likely to be in the same cache line. Mostly this is good, but if different threads are writing to those proximate data items, the writes have to go through to main memory which reduces performance. You can pad the data to avoid this contention. (@Contended may become available in the future to point the compiler at variables that need padding). Padding variables must appear to be usable - otherwise the JOT compiler can eliminate them. Also, variables can be reordered in memory (typically data types are put together with longer datatypes first, e.g. all longs put first).
Java VM - Beware of the YoungGen space (Page last updated November 2013, Added 2014-01-30, Author Pierre-Hugues Charbonneau, Publisher Java EE Support Patterns). Tips:
- Profiling (including memory leak detection) and performance & load testing are things you need to perform in order to gather all proper data and facts about your application memory footprint and JVM runtime health.
- You should be able to answer these questions: Is your Java heap leaking; Do you have large and/or frequent JVM GC pauses; Is your overall pause time higher than 5% of runtime; Are GC pauses impacting response times beyond tolerance; Have you had any OOMEs or crashes in the last three months; Does the JVM need manual restarts? If any answers are "yes", you should be tuning the app.
- Young generation garbage collections ARE stop-the-world - apparently many people think they are not.
- Solutions to excessive young generation garbage collection pause times include: reducing the object allocation rate; split the application into multiple JVMs so that each one has a garbage collection rate sufficiently low; tune the young generation sizes.
A Case Study of JVM HotSpot Flags (Page last updated December 2013, Added 2014-01-30, Author Kirk Pepperdine, Publisher java.net). Tips:
- Use -XX:+PrintFlagsFinal to get the values for flags used by the JVM.
- You can programatically trigger a garbage collection calling System.gc(). This is not advised. -XX:+DisableExplicitGC stops a code call to System.gc() from being enacted on. -XX:+ExplicitGCInvokesConcurrent makes the JVM use the concurrent collector if -XX:+DisableExplicitGC is not enabled.
- RMI triggers a Full GC every hour - you can alter the frequency by setting the properties sun.rmi.dgc.client.gcInterval and sun.rmi.dgc.server.gcInterval.
- The downside of setting the max heap size to min heap size is that the JVM will not be able to adapt to changes in load using ergonomics. There are times when this gives better performance, but you need to test to see.
- If you know your allocation rate, you can tune the Eden size to control the frequency of young gen GCs.
- You are better off setting your initial perm size to the size you know you need, and the max perm size higher, rather than have the two the same.
- Setting -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 tells the concurrent garbage collector to start a collection when the amount of tenured space used is greater than 70%. If the concurrent collector cannot finish the collection before the heap fills up, it will produce a concurrent mode failure and failover to a serial collector causing a large stop-the-world GC, so getting the InitiatingOccupancyFraction correct can be important (the default is 69%).
- Remark, the 4th phase of the CMS, is single-threaded. You can make it parallel with -XX:+CMSParallelRemarkEnabled, but this is not necessarily faster due to some contention in the concurrent collector, so you need to measure whether this improves performance.
- -XX:+CMSClassUnloadingEnabled will unload no longer needed classes from perm space, but will make the collection take longer.
- The -XX:+AggressiveOpts is experimental and the flags it turns on can change for any release, so if using it you need to test it's effect, and retest with every change of JVM version. However, it does often provide better performance.
Caching Techniques (Page last updated November 2013, Added 2014-01-30, Author Jakob Jenkov, Publisher Jenkov.com). Tips:
- Caching is the technique of speeding up data lookups by reading a copy from a local (or at least closer) data structure (the cache) instead of reading the data directly from its real source in another (remote) system.
- There are three issues to think about with a cache: Populating the cache; Keeping the cache and source data in sync; Managing cache size.
- There are two options for populating a cache: Upfront population and Lazy population. Upfront population eliminates delays in the first access of the data, but the population may take a long time and you may cache data you never need. Lazy population only caches the data you need, but has an initial hit with each item cached (as it is retrieved from the master source). An efficient option would be to upfront load the data items you're certain to need, and lazily cache the rest.
- A write-through cache is a cache allows writing to it, and the data is then written to the master source. This only really works well if the cache is the only route to updating the master source.
- Expiring cache entries based on how long they've been there is an option where the data can acceptably be out-of-date with no impact. Many types of data have this type of property.
- Active expiry of cache entries requires the system that updates the master source - or that master source system itself - notifies the cache when an entry is expired. This means the cache data will be as up-to-date as possible, but is more complex and leads to unpredictable performance.
- A cache should be size limited. Techniques for evicting data from the cache due to size constraints include: Time based eviction (how long the entry has been in the cache); First in, first out (FIFO); First in, last out (FILO); Least accessed (the entries that have been accessed the least number of times, but bear in mind that older entries are likely to have been accessed more often, so you might want to make it access count in the last N time intervals); Least time between access (average time between accesses, values that were once accessed a lot but fade in popularity will have a dropping average time between accesses).
Back to newsletter 158 contents
Last Updated: 2023-08-28
Copyright © 2000-2023 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us