Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips February 2014
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 159 contents
The Principles of Java Application Performance Tuning (Page last updated December 2013, Added 2014-02-26, Author Se Hoon Park, Publisher Cubrid). Tips:
- To fully tune a Java application you need at least a basic level of understanding of: Hardware; OS processes; The JVM; Garbage collection; JIT compilation; Locks; Concurrency; Class loading; Object creation.
- One procedure for Java performance tuning is to repeatedly: Specify target performance; Specify the JVM configuration(s); Check that OS CPU, memory, and IO are acceptable, or tune; Check that response times are acceptable, or tune; Check that throughput is acceptable, or tune; After any change from tuning, start again from the beginning of this sequence.
- Throughput and response times often impact each other, tuning to optimize one frequently adversely affects the other, so you need to balance to get the overall best performance.
- Currently (around when 1.7.0_51 JVM is current), the concurrent collector is the best collector for low pause times in Oracle JVMs.
- Set the New area (young generation) size sufficiently large to hold all short-lived objects, so that they are collected before they get promoted to the old generation. But too large a New area size can cause longer pauses from copying live objects, so try to avoid oversizing this.
- Recommended JVM options for a webserver: -server -Xms -Xmx -XX:NewRatio: or -XX:NewSize= -XX:MaxNewSize= -XX:PermSize=256m -XX:MaxPermSize=256m -Xloggc: -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction= -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath= -XX:OnOutOfMemoryError= (stop or restart).
- If stop-the-world GC times are too long, tune the GC and use a profiler and/or heap dump to identify objects that can be eliminated.
- If CPU usage is low, analyse the concurrency behaviour using thread profilers or stack dumps to reduce wait times.
- If CPU usage is high, use a profiler to determine the execution bottlenecks and improve the algorithms.
- A tuning approach: first decide if tuning is even necessary; focus on the biggest bottleneck; much tuning is a balancing exercise so bear in mind that if you are improving one thing, you are probably making something else worse.
FAQ: Testing mobile app performance (Page last updated January 2014, Added 2014-02-26, Author Caroline de Lacvivier, Publisher SearchSoftwareQuality). Tips:
- Mobile testing needs to account for variable network conditions, limited battery power and device constraints. You are better off purchasing real devices for testing rather than relying on emulators and simulators.
- Consider network variability early in the development process - apps designed for speed and simplicity will be less vulnerable to network interference.
- Define how the app should behave when a phone switches from WiFi to 3G, and test it.
- Walk your test devices into a dead zone to see how they maintain functionality with a lost or weak connection.
- Run updates in the background without any user initiation, make them frequent so that post-release improvements are easier.
- http://www.developer.com/ws/android/development-tools/supercharge-your-android-emulator-speed-with-intel-emulation-technologies.html Supercharge Your Slow Android Emulator
Beyond Averages (Page last updated December 2013, Added 2014-02-26, Author Dan Kuebrich, Publisher DZone). Tips:
- There are three averages: mode (the data value that occurs most often, e.g. the value corresponding to the high point on a histogram or statistical distribution); median (the value which is in the middle of the distribution, with an equal number of data points higher and lower than it); mean (an arithmetic or geometric sum of all values divided by the number of values). For a "normal" distribution, these tend to be similar, but for a long tail distribution ("log normal") - which is much more common for networked applications - they can be spread far apart and the median can even be above the 95th centile (40% of customers using the TraceView monitoring applicaton showed this behaviour).
- With just statistical data, it's difficult to account for the causes of outliers. You may not even be able to determine if the data is single behaviour or multi-modal (i.e. generated from multiple separate behaviours).
- Averages from multi-modal data are misleading to use, e.g. mixing cold and warm cache hit performance data together produces averages that will never be seen by any users (they'll see on or the other behaviour).
- Percentiles (e.g. 90th percentile, 90% of requests were faster or the same, while 10% were slower) are useful data to retain with averages.
- The Apdex (Application Performance Index) is one standard performance measure going from 0 (worst) to 1 (best) : Apdex = (Satisfied Count + Tolerating Count / 2) / Total Samples, e.g. if there are 100 samples with a target time of 3 seconds, where 60 are below 3 seconds, 30 are between 3 and 12 seconds (assuming 4x target time = tolerated time), and the remaining 10 are above 12 seconds, the Apdex score is (60 + (30 / 2) )/ 100 = 0.75
- Histograms and heat maps provide ways to identify multi-modal data - multi-modal data are revealed when there are multiple humps in the data.
The state of String in Java (Page last updated February 2014, Added 2014-02-26, Author Attila Balazs, Publisher jaxenter). Tips:
- Until Java 7u6 String held a char array which could be longer than the characters in the string - the String object maintained offset and count fields that provided the subset of characters that made the string. That made creating a substring very efficient - no character copying, just a new object with new offsets. However it did mean that objects could be much larger than expected as the original character array was alway retained even if the substring was much smaller. After Java 7u6 String holds just the characters in the string, so substrings no require character copying but character array sizes now correspond to the string characters.
- Before Java 7u6 String objects that were created as substrings from other strings could reference much larger character arrays than would be expected without knowing about String internals and copying mechanisms. After Java 7u6 String sizes are as you would naively expect. This change was to reduce unexpected memory retention from normal String processing.
- Before Java 7u6 String.intern() was slow and not particularly efficient, nor was it tunable (and the internal String map was held in perm space). You were recommended to use your own WeakHashMap<String, WeakReference<String>> to create your own version of intern'ed strings. After Java 7u6 String.intern() is efficient, the map is now in the heap, and you can tune it's bucket size with -XX:StringTableSize= (default 1009 in Java 7, 60013 in Java 8).
- The -XX:+PrintStringTableStatistics option prints out statistics about the internal string cache during shutdown and to jmap.
- In Java 7, String compression using byte arrays for ascii characters was removed to reduce code complexity. The compression option prior to Java 7 enabled memory savings of up to 30%. You can explicitly use an implementation of CharSequence backed by byte arrays (e.g. BlobBackedCharSequence) to get a similar saving in memory if needed.
- If a field can have only a limited number of values, use enums instead of strings.
- From Java 7u6 String contains a hash32 field - a new hash code based on Murmur hash, which gives a better dispersion of hash values. You can access it through sun.misc.Hashing.stringHash32(). Its usage for hashmaps isn?t enabled by default, you need to set the ?jdk.map.althashing.threshold? property - setting this to a value X, then HashMap and related classes with a capacity at least X will use the alternative hashing algorithm.
Java 8 Streams API - Laziness and Performance Optimization (Page last updated January 2014, Added 2014-02-26, Author Amit Phaltankar, Publisher amitph.com). Tips:
- Lazily loading in the background allows for a more responsive interface (though at the cost of reduced details on first display).
- Eager processing will always process all the data - if you end up utilizing only a small chunk of it, this is a waste of resources and takes longer too. Lazy processing is a 'process only on demand' strategy - just process the subset of data you need.
- In the Java 8 Streams API the intermediate operations are lazy; that is the operations defined in the arguments are executed as needed rather than when passed to the method. E.g. when calling Stream.map(instance -> block).collect(...), the "block" is not executed when it is passed to the map() method, it is evaluated when the collect() is called.
- In the Java 8 Streams API the operations setup a pipeline, so earlier operations can reject elements or pass them through to later operations. The operations can also short-circuit the full set of operations, ending stream processing before all elements are processed if the appropriate short-circuit conditions are reached. Stream.limit(N) would stop processing after "N" number of results had been obtained. This can be a huge performance gain. anyMatch(), allMatch(), noneMatch(), findFirst(), findAny(), and substream() are other short-circuit Stream methods.
Demystifying Protocols and Serialization Performance (Page last updated December 2013, Added 2014-02-26, Author Todd Montgomery, Harry Brumleve, Publisher InfoQ). Tips:
- Serialization's main problem is performance. In low latency systems quite often there is a lot of communication and data processing is kept to the minimum necessary, and here it's not unusual to see 30% of the latency taken up by serialization.
- Being efficient with encoding and decoding (bot within and outside of serialization) is essential to achieve low latency systems.
- String parsing is very CPU-intensive compared to what you can achieve with efficient protocols. JSON is much more efficient than XML and similar protocols that are verbose and requires string parsing.
- Open address map implementations are more efficient than hash maps with node lists; and map implementations that directly handle data primitives without wrapping and unwrapping them are much more efficient than those that need to wrap.
- Optimization is measuring and in determining if one technique is better than another.
- Be careful with micro-benchmarks, it's very easy to measure something unnatural that doesn't matter when the real application is running. But micro-benchmarks can be useful if you get them right.
- A binary communication protocol is much more efficient than non-binary ones, as you can read and write primitive data types extremely efficiently - there's can be no parsing involved, just memory copies.
- Whenever any conversion is necessary to and from string formats, you have overhead that may be unnecessary - see if you can move to a binary format in these cases. If human reading capability is necessary, you can always provide a reader.
- Having a version field in the first piece of data that is read makes it much more efficient to read data types that have multiple versions.
Back to newsletter 159 contents
Last Updated: 2018-06-28
Copyright © 2000-2018 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us