Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips November 2011
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 132 contents
http://www.oraclejavamagazine-digital.com/javamagazine/20111112#pg42
Stress Testing Java EE 6 Applications (Page last updated November 2011, Added 2011-11-28, Author Adam Bien, Publisher Oracle). Tips:
- It is impossible to predict all nontrivial bottlenecks, deadlocks, and potential memory leaks from design or code review; it's also impossible to find memory leaks with unit and integration tests.
- Bottlenecks are caused by locks and I/O problems that are hard to identify in a single-threaded scenario. You need to run stress tests.
- Application servers usually cannot handle load with default factory settings, which are optimised for development - you need to change the settings for production.
- Load tests should take into account the expected number of concurrent users and realistic user behavior (including "think times" - pauses in user activity between actions).
- RESTful Services are easily stress-testable by executing several HTTP requests concurrently.
- The open source load testing tool Apache JMeter comes with built-in HTTP support.
- Non-production Java processes can be easily monitored with VisualVM (VisualVM shipped with the JDK, though worth checking the VisualVM Website for updates).
- Your load generator should run on a dedicated machine or at least in an isolated (virtual) environment, otherwise the resource usage of the load generator will confuse other measurements.
- Stress tests do not generate a realistic load, they try to break the system.
- Increasing loaded classes might indicate problems with class loading and can lead to an OutOfMemoryError due to a shortage of PermGen space.
- A steadily increasing number of threads indicates slow asynchronous methods.
- An unbounded ThreadPool can lead to an OutOfMemoryError.
- A steady increase in memory consumption can eventually lead to an OutOfMemoryError (possibly caused by memory leaks).
- VisualVM comes with a sampling profiling tool called Sampler - sampling overhead is about 20 percent.
- Using System.out.println can lead to significant performance degradation.
- Generate reference measurements, and perform automated stress tests as often as possible (e.g. nightly) and compare the results - this makes identification of potential bottlenecks easy.
- Misconfigured application servers are a common cause of bottlenecks.
http://www.informit.com/articles/article.aspx?p=1758378
Java Application Profiling Tips and Tricks (Page last updated November 2011, Added 2011-11-28, Author Charlie Hunt and Binu John, Publisher informIT/Prentice Hall). Tips:
- Most Java performance improvements come from: Using a more efficient algorithm; Reducing lock contention; Generating more efficient code for a given algorithm.
- Many performance issues come from using inappropriate data structures. Use the best data structures and algorithms for optimal performance.
- CPU clock cycles spent executing operating system or kernel code are clock cycles that cannot be used to execute your application code. Consequently you should monitor the system OS CPU utilization and target reducing the amount of time it spends consuming system or kernel CPU clock cycles.
- To tune I/O, reduce the frequency of making I/O call. A typical way to do this is to buffer data so that larger chunks of data are read or written during each I/O operation.
- NIO nonblocking data structures allow reads and writes of as much data as possible in a single operation.
- Project Grizzly (https://grizzly.dev.java.net) and Apache MINA (http://mina.apache.org) are Java NIO frameworks which can help reduce the complexity of what you need to write when using NIO.
- Only severe lock contention shows high system CPU utilization as JVMs now only use operating system locking primitives as a last resort, preferring user-level locking where possible.
- High values of voluntary thread context switches are an indicator of high lock contention.
- Use ThreadLocal objects rather than globally accessible objects where the internal state of the object is not really needed globally, but has to be synchronized if used globally (e.g. thread local random number generation is probably a more efficient than global random number generation).
- Atomic and concurrent data structures rely on CAS operations which uses a form of optimistic synchronization. High contention around atomic variables can lead to poor performance or scalability even though a concurrent or lock-free data structure is being used.
- Design your application to minimize the scope, size, and amount of data that needs to be synchronized.
- Where data needs synchronized access and updates, consider partitioning the data such that the partitioning allows higher throughput by reducing the subsets of data that need locking for accesses and updates.
- A thread that reads a volatile field is guaranteed to read the value that was last written to that volatile field, regardless of which thread and core is doing the read or write. A volatile field's value will be kept in sync across all application threads and CPU caches. This can be a performance problem; where you would otherwise have all operations occurring within a core and it's caches, the volatile field access requires a memory barrier to be passed so that CPU caches are updated in case another cpu updated the field.
- A high number of CPU cache misses on a volatile field where the code frequently writes to that volatile field may be a performance issue from that volatile - in this case try to either avoid the volatile or reduce the frqeuency of updates to it.
- Avoid having your collections dynamically resize themselves if possible, by correctly sizing them first.
- If operations in a loop iteration are independent of operations in other loop iterations (for the same loop), it is likely that loop can be parallelized.
http://www.jfokus.se/jfokus11/preso/jf11_TheSecretsOfConcurrency.pdf
The Secrets Of Concurrency (Page last updated February 2011, Added 2011-11-28, Author Heinz Kabutz, Publisher The Java Specialists Newsletter/jfokus). Tips:
- Concurrency bugs can be hidden until the "right" performance conditions occur - often when production load is particularly heavy.
- Do not catch exceptions without actually handling the exceptional case correctly - or rethrowing the exception.
- InterruptedException should be rethrown or handled like
while(running){work(...); try{Thread.sleep(...)}catch(InterruptedException e){Thread.currentThread().interrupt();break;}}
- Passing out immutable objects makes the code thread-safe and hugely limits the possible concurrency bugs.
- Having too many threads can be bad for performance and make it difficult to understand what is happening. Target not more than around 4 threads per available core. Use thread pools.
- If you avoid synchronization, you often cannot tell when threads will update shared data.
- Debugging concurrency bugs affects the sequence of operations and can stop the bug occurring during debugging - thus making it effectively undebuggable!
- volatile, final, and synchronized can each ensure that shared data changes are visible across all threads as soon as they are done.
- Code can be reordered by the compiler or system, unless you have used correct synchronization.
- Non-atomic operations (e.g. more than one statement; and oerators like +=) can be interleaved when performed simultaneously by multiple threads, easily leading to corrupt data.
- synchronized locks and unlocks automatically (from the code perspective) but doesn't timeout and can't be interrupted. java.util.concurrent locks need to be explicitly locked and unlocked, and can be timed out and interrupted.
- java.util.concurrent locks should be unlocked in a finally block to ensure they are unlocked regardless, e.g.
lock.lock();try{doAsLittleAspossible();}finally{lock.unlock();}
- ReentrantReadWriteLock lets you get multiple read locks, but only one write lock, which is optimal for performance most types of code.
- Thread contention is difficult to spot - typically the bottleneck is not CPU or I/O or garbage collection, leaving contention.
- Do not use a string as a lock object, as this could be interned and shared across the whole JVM.
- You can corruptly update shared longs and doubles if they are updated outside of synchronized blocks or not declared volatile.
- Too much synchronization causes contention but not enough can lead to corrupt data.
- Changing the hardware or any resources can expose previously hidden concurrency bugs and make a previously stable system unstable.
http://highscalability.com/blog/2011/5/11/troubleshooting-response-time-problems-why-you-cannot-trust.html
Troubleshooting response time problems - why you cannot trust your system metrics (Page last updated May 2011, Added 2011-11-28, Author Michael Kopp, Publisher highscalability). Tips:
- 99% CPU utilization can either be optimal or indicate impending disaster. It really only indicates that there is no spare capacity.
- System load tells you how many threads or processes are currently executing or waiting to execute. Loads showing more runnable processes than cores should be accompanied by full CPU utilization unless processes are waiting on other resources (I/O, etc).
- Unix operating systems normally show close to 100% memory utilization as they fill the memory up with buffers and caches which get discarded if that memory is needed. In order to get the "real" memory usage, subtract these (in Linux use 'free').
- If a portion of a Java process gets swapped out of system memory, this can impact performance dramatically as Java memory is random access so the swapped out portion can easily be required at any time, and swapping back in is time-consuming.
- Use response time of requests to identify if there is a performance problem, then use system and JVM metrics to identify where the problem is.
- Monitor the response time between layers of a system so you can isolate where the system is taking up time.
- Monitoring garbage collection is important, but analyze the impact of tuning benefits before spending time tuning - if the pauses only amount to 5% of the response time, it may not be the top priority.
- Many database calls per request is a common performance anti-pattern.
http://www.parleys.com/d/2657
The Ghost in the Virtual Machine A Reference to References (Page last updated October 2011, Added 2011-11-28, Author Bob Lee, Publisher Oracle/Square/parleys). Tips:
- An object is reachable if any thread can get to it. System roots include: classes with static fields; thread stacks; current exceptions; JNI globals; the finalizer queue; the interned stringpool.
- Things the garbage collection cannot free up include: listeners, file descriptors, native memory, state held outside of the JVM.
- finalizers (defined in finalize()) can be run concurrently, the ordering is not defined.
- Objects with finalize() methods defined (other than the finalize() defined in the Object class) will live longer than otherwise, as they cannot be reclaimed until the finalize() method is first called by the JVM.
- Soft references are for quick and dirty caching; weak references are for fast cleanup; phantom references are for safe cleanup.
- -XX:SoftRefLRUPolicyMSPerMB specifies how long to retain soft refs in ms per free MB of heap, default is 1000ms
- Weak references are cleared as soon as possible after no strong or soft refs remain to the referent.
- Phantom references are enqueued after no other references remain, post-finalizer; they must be cleared manually.
- Guava provides automatic generation of finalizer refrence management.
- WeakHashMap releases the key and value when the key becomes garbage collectable.
- Guava MapMaker will generate useful maps including weak/soft keys and values etc maps, including on demand creation of values and size limiting maps (so will automatically discard too many objects).
http://niklasschlimm.blogspot.com/2011/09/your-web-applications-work-by-sheer.html
Your web applications work - by sheer coincidence! (Page last updated September 2011, Added 2011-11-28, Author Niklas Schlimm, Publisher niklasschlimm). Tips:
- Pool your threads in categories to prevent CPU overload
- Cache results of each CPU intense task so that they only execute once
- What if your (few hundred) inactive threads suddenly become active?
Jack Shirazi
Back to newsletter 132 contents
Last Updated: 2021-03-29
Copyright © 2000-2021 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips132.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us