Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips February 2009
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 099 contents
Measuring Web application response time (Page last updated November 2008, Added 2009-02-25, Author Srijeeb Roy, Publisher JavaWorld). Tips:
- Client-side execution time is just as important as server-side execution time, and if you're measuring from the end-user perspective, you should also be looking at network time.
- Capturing server-side execution time of a Web request is easy: use a Filter to store the current time at the start of the request; when the Filter returns, you can calculate the elapsed time.
- It is important to capture response time from an end-user perspective.
- The time sequence for a browser request is: initiated from the browser at t0; reaches the server and hits the Filter at t1; request is handled (servlet or JSP or POJO or EJB) then returns to the Filter and leaves it at t2; it reaches the browser at t3; any web page methods are executed and complete at t4. From the user's perspective, the time taken is t4-t0.
- To capture the time a request was initiated from the browser you must intercept the request initiation: clicking a Submit button, clicking a link, calling form.submit, window.open, window.showModalDialog methods, location.replace. Those requests that replace the current page can be intercepted in the window.onbeforeunload event. You can store the time in a cookie.
- A few client initiation points cannot be captured using onbeforeunload, e.g. window.open and window.showModalDialog. In these cases you need to capture the time in these methods, then call the original method.
- To capture the time at which a client method ends, use the onload event.
Date-Race-Ful Lazy Initialization for Performance (Page last updated December 2008, Added 2009-02-25, Author Jeremy Manson, Publisher jeremymanson). Tips:
- A race condition, which is when you don't know the order in which two actions are going to occur.
- A data race is when you have one or more writes, and potentially some reads; they are all to the same memory location; they can happen at the same time; and there is nothing in the program to prevent it.
- A data race doesn't matter where a 32-bit variable (everything except long and double) is being updated and it doesn't matter how many times the variable is updated.
- Avoid data races wherever possible. The Java memory model allows extensive reordering of operations, a second read can actually be moved so that your processor does it before the first, particlularly important for data races.
Utilizing a Multi-Core System with the Actor Model (Page last updated November 2008, Added 2009-02-25, Author James Leigh, Publisher devX). Tips:
- Contested synchronized blocks and other blocking operations are slow and require the OS to put threads to sleep and use interrupts to activate them. This puts pressure on the scheduler, which can result in significant performance degradation.
- The actor model of concurrent computation takes full advantage of multi-core and multi-processor computing by using "actors" so that every method call (or message) to an actor is executed in a unique thread, avoiding all of the contested locking issues typically found in concurrent applications. This allows for more efficient concurrent processing while keeping the complexity of actor implementations low, as there is no need to consider concurrent execution within each actor implementation.
- In the actor model, each method call is placed in a queue waiting until the actor is available to process the call - messages are received at any time and are acted on when time permits. Calls are usually asynchronous and do not block, so the calling thread continues execution and avoids any need to rely on thread interrupts. Results are passed as callback objects.
- Calls to an actor carry overhead compared to sequential calls as they need to queue to a separate thread cannot be optimized with compilers in the same manner as sequential calls. Smaller, faster objects that are better implemented as immutable or stateful. Actors have advantages they run in a dedicated thread so avoid "synchronized" and "volatile" keywords, so thread memory does not need to sync up with the main memory as often.
Lock Options (Page last updated December 2008, Added 2009-02-25, Author Bartosz Milewski, Publisher Dr. Dobb's Journal). Tips:
- Races occur when two or more threads are accessing shared memory without proper synchronization.
- Deadlocks occur when synchronization is based on locking and multiple threads block each other's progress.
- Data races are the opposite of deadlocks. The former result from not enough synchronization, the latter from too much synchronization not properly ordered.
- If you can ensure that all threads take all locks in the same order, you'd know they will not deadlock.
- Order all locks, and ensure that once a lock is acquired, only locks further along in the sequence can be acquired by that thread (or throw an exception).
Run-time performance and availability monitoring for Java systems (Page last updated July 2008, Added 2009-02-25, Author Nicholas Whitehead, Publisher IBM). Tips:
- If you only have operating-system statistics such as CPU utilization or memory size, it's difficult to diagnose a garbage-collection or thread-synchronization problem.
- Synthetic transactions can play a key role in monitoring continuity to confirm a system's health.
- The standard metrics that you can access through a JVM's MXBeans are: ClassLoadingMXBean: (the class loading system); CompilationMXBean (the compilation system) GarbageCollectionMXBean (garbage collection); MemoryMXBean (heap and nonheap memory); MemoryPoolMXBean (memory pools); RuntimeMXBean (start time, up time, ...); ThreadMXBean (threads).
- Request multiple attributes in one call using getAttributes(ObjectName name, String attributes)
- Reduce the polling overhead of the JMX exposed memory pools by implementing the listening collector pattern using thresholds instead of a polling pattern (but at the cost of some granularity of data).
- Any more than 10 percent of any 15-minute period spent in garbage collection indicates a potential issue.
- The ThreadMXBean lets you collect the following metrics: System and user CPU time; Number of waits and total wait time; Number of blocks and total blocked time;
Four Paths to Java Parallelism (Page last updated December 2008, Added 2009-02-25, Author Matt Walker, Kevin Irwin, Publisher JDJ). Tips:
- If your problem fits into the heap of a single JVM on an SMP machine and can be decomposed using divide-and-conquer, then it could be appropriate for the fork/join framework
- The fork/join framework is appropriate for algorithms that can solve problems by splitting them into smaller subproblems, solving them, then merging the results to form the solution to the original problem - the subproblems can be computed independently of one another, allowing the computer to work on them simultaneously.
- ParallelArray is built on top of the fork/join framework to handle parallelised sort, search, and summarization on in-memory arrays.
- Dataflow programming operates on data while streaming through graph. As the data is streaming, only data required by any active operation need be in memory at any given time, allowing very large data sets to be processed.
- Pervasive DataRush is a library and dataflow engine, allowing you to construct and execute dataflow graphs in Java.
- Terracotta is an open source solution that allows multiple JVMs, potentially on different machines, to cluster and behave as a single JVM.
- Hadoop is an open source implementation of MapReduce: Map takes a key and a value and produces a list of (key, value) pairs, potentially of a different type (Map :: (K,V) -> [(K', V')]). Reduce then takes a list of values all corresponding to the same key and produces a final list of values (Reduce :: (K', [V']) -> [V']). Behind the scenes, the framework spreads your data over multiple machines and orchestrates the distributed computation using the map and reduce you provide.
Back to newsletter 099 contents
Last Updated: 2018-02-27
Copyright © 2000-2018 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us