Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips January 2018
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 206 contents
https://www.infoq.com/presentations/performance-openj9
Performance beyond Throughput: An OpenJ9 Case Study (Page last updated January 2018, Added 2018-01-29, Author Marius Pirvu, Publisher QCon). Tips:
- OpenJ9 pays attention to startup time, footprint, ramp-up, response time, CPU
- Caching improves response time at the expense of footprint
- Profiling during runtime improves throughput at the cost of startup and footprint
- OpenJ9 speeds startup time with -Xquickstart (a specialized running mode), and shared class cache and AOT compilation -Xshareclasses
- Many JVMs (eg with microservices) use more memory than fewer larger JVMs because each JVM has memory overhead (runtime, data structures, JIT, GC structures etc) beyond the application memory requirement.
- A large percentage of global JVM time is spent idling. But "idle" JVMs are not actually idle. OpenJ9 detects idle state and minimizes CPU consumption and memory in that state by reducing sampling times, JIT usage and memory with a GC (with -XX:+IdleTuningGcOnIdle to dump garbage) and releasing memory pages.
- OpenJ9 -Xtune:virtualized reduces CPU consumption from the JIT thread, targeted at cloud JVMs constrained to one core (best used with -Xshareclasses).
- OpenJ9 Metronome GC has a configurable soft real-time GC policy that can achieve 1ms pauses (It also supports pause-less GC on zOS with -Xgc:concurrentScavenge).
- Linux low level performance tools include: CPU - top, htop, vmstat, pidstat, mpstat, sar, nmon; Memory - sar, dstat, slabtop, free, nmon; Disk - iotop, iostat, sar, nmon; Network - ping, iftop, netstat, tcp, nicstat; Profilers - perf, oprofile, tprof
- OpenJ9 performance tools are Health Center and GCMV.
https://www.youtube.com/watch?v=w2zaqhFczjY
Java 9 VarHandles Best practices, and why? (Page last updated November 2017, Added 2018-01-29, Author Tobi Ajila, Publisher Devoxx). Tips:
- Compiler reordering and weak hardware memory models mean that you need to correctly use memory fences (synchronized, volatile, locks, atomics) to guarantee consistent outcomes in concurrent applications.
- Over-synchronization makes programs slow, but under-synchronization results in incorrect data.
- Java 9+ has ArrayElementVarHandle which allows you to treat an array element as volatile.
- VarHandle has the following memory ordering modes from weakest to strongest: Plain, Opaque, Release/Acquire, Volatile. Plain can be reordered and eliminated like an ordinary r/w for a field. Opaque writes are eventually seen. Release/Acquire ensures ordering is maintained for the field. Volatile ensures sequential consistency across multiple fields.
- Get a VarHandle using a lookup then access, eg MethodHandles.lookup().findVarHandle(Example.class,"field",int.class) using findVarHandle/findStaticVarHandle/unreflectVarHandle (the last to convert from a reflected field to the field).
- VarHandles can be shared safely, so you can use a static final VarHandle declaration to hold VarHandle instances (and private is best practice).
- Declare fields that are used by VarHandles as volatile so that if they are accidentally set with a weak update, the program is still correct.
- VarHandle.get/set is plain access, the compiler can reorder and eliminate lines, eg with 'while(xVarHandle.get(this))...' the compiler can hoist the access to 's=xVarHandle.get(this);if(s)...'.
- Opaque writes are eventually seen, updates are coherent, but doesn't enforce ordering constraints. So the compiler cannot hoist reads out of loops (it knows the variable may be written by other threads), another thread writing the field will eventually be seen. But order of execution can be changed. So visibility updates from another thread will be seen but if you need to guarantee ordering of writes across multiple fields, this ordering mode is insufficient.
- Release/Acquire ensures ordering is maintained between accesses and updates of each field that is in this ordering mode. This is sufficient for most concurrency. If a field is updated, the next read will read that value, but ordering between two such fields can be reordered.
- Volatile ensures sequential consistency. All reads and writes are ordered.
- Memory ordering Volatile vs Release/Acquire example: Thread1: X.setM(this, 1); int ry = Y.getM(this); Thread2: Y.setM(this, 1); int rx = X.getM(this). If M is volatile, at least one of rx and ry must be 1 (the statements in each thread cannot be reordered). If M is Release/Acquire if there is no reordering then at least one of rx and ry would be 1, but it is allowed for the compiler to reorder the statements in which case both could be 0.
- Memory ordering Release/Acquire vs Opaque example: Thread1: dinner = 17; READY.setM(this, 1); Thread2: if (READY.getM(this) == 1) int d = dinner. d is guaranteed to be 17 (or you can't enter the clause) when M = Release/Acquire; if M = Opaque d can be 0 (assuming dinner was 0 before Thread1 executes).
- Memory ordering Opaque vs Plain example: Thread1: while(X.getM(this))...' Thread2: X.setM(this,false). If M = Opaque, thread1 is guaranteed to terminate at some point; if M = Plain thread1 may never terminate.
- Fences: LoadLoad prevents loads before the fence being reordered to after the fence and those after the fence being reordered to before it; StoreStore prevents stores before the fence being reordered to after the fence and those after the fence being reordered to before it; Release prevents loads and stores before the fence being reordered to after the fence and stores (only) after the fence being reordered to before it; Acquire prevents loads (only) before the fence being reordered to after the fence and loads and stores after the fence being reordered to before it; Full prevents loads and stores before the fence being reordered to after the fence and loads and stores after the fence being reordered to before it.
- 'x.a=1;x.b=2;VarHandle.releaseFence()' is equivalent to 'x.a=1;x,b=2;X.setRelease(...,x)' but if it were 'x.a=1;y.a=2;VarHandle.releaseFence()' you can't make x and y changes visible to other threads in one operation using setRelease. In this case you could also downgrade VarHandle.releaseFence() to VarHandle.storestoreFence() since there are no dependent reads in the example.
https://www.youtube.com/watch?v=hjpzLXoUu1Y
An Introduction to JVM Performance (Page last updated November 2017, Added 2018-01-29, Author Rafael Winterhalter, Publisher Devoxx). Tips:
- The standard method call is a virtual method call - the runtime needs to find which class in the hierarchy the method is implemented in so that it calls the correct method. This indirect call overhead makes things slower than ideal. But most methods (90%) are not overridden, so this is overhead with no benefit in most cases. The JVM optimizes this overhead away by analysing the hierarchy and replacing the call with a direct jump in the JIT compiled code if there is only one implementation of the method (a monomorphic call site). Bimorphic call sites (2 implmentations for a method in the hierarchy) is also optimized. Above 2, there is no effective optimization (apart form ones that can apply to all call sites), although a special case is applied when the JVM identifies a polymorphic call site where most of the calls go to one implementation, this is treated as monomorphic with a check and a jump back to the default polymorphic call path for the cases where it doesn't apply. So if you are writing for optimal virtual calling, have single implementations of a method or at most 2, or if you have to, aim for one highly dominant implementation amongst the various available. (The asumptions are always checked and when the assumptions become untrue the calls are re-optimized so it soesn't affect functionality but it does affect performance).
- An ugly but effective call site optimzation is to change a polymorphic implementation from multiple subclasses into a switch clause to monomorphic implementations.
- Use ArrayList as your preferred List - avoid Linkedlist.
- Double braced initialization, eg new ArrayList{{add("foo");}} generates a new subclass and this adds subclass implementations to what could otherwise be a monomorphic or bimorphic call site. Removing these can improve speed by avoiding making them polymrophic call sites.
- Inlining not only improves performance directly by avoiding indirection and improving code proximity (so avoiding extra code cache paging), it also enables further optimizations because it provides larger code chunks that the JIT compiler can optimize together. Be careful about the inlineability of code, including already optimized inlined code (eg loop unrolled code).
- Write cleanly structured programs and avoid too much virtualization
- Adding extra steps can make some code faster by incrasing predicability. The example given is operating on a set of numbers with the operation based on whether the number was above or below a cutoff. If the numbers are sorted first, the compiler can optimize because it sees more predictable processing for the branch.
- Minimize the scope of objects, if the scope is small enough the JVM can apply escape analysis and avoid the object creation.
- Use System.nanotime() to measure elapsed time rather than System.currentTimeMillis().
- Use JMH for microbenchmarks.
https://dzone.com/articles/memory-leaks-fallacies-and-misconceptions
Memory Leaks: Fallacies and Misconceptions (Page last updated December 2017, Added 2018-01-29, Author Vladimir Sor, Publisher DZone). Tips:
- OOME is not always caused by a leak. Sometimes the heap is just not big enough. This is particularly true if the OOME happens during the initialization phase of the application.
- Sometimes a leak is functionality that is unbounded (by design or lack of thought) which is typically easy to see as it directly causes the OOME and you can see where from the OOME stack trace.
- If you are not getting an OOME and not getting major or full or old generation garbage collections, then you don't have a leak.
- If the heap used after successive major/full/old generation garbage collections is increasing, that's indicative of a memory leak; if the heap is not increasing in that way, there is no memory leak.
Jack Shirazi
Back to newsletter 206 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips206.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us