Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
New Relic: Try free w/ production profiling and get a free shirt!
Site24x7: Java Method-Level Tracing into Transactions @ $12/Month/JVM. Sign Up!
Newsletter no. 8, July 20th, 2001
Get rid of your performance problems and memory leaks!
New Relic: Try free w/ production profiling and get a free shirt!
Site24x7: Java Method-Level Tracing into Transactions @ $12/Month/JVM. Sign Up!
Get rid of your performance problems and memory leaks!
The performance tuning articles published over the last month
emphasize where the focus in Java is today. The majority of
articles have looked at performance for the enterprise web server,
considering design patterns, J2EE, servlets and dynamic web pages.
Smaller in number but still significant is the set of articles
looking at performance for embedded Java applications. In
addition, we have some interesting articles on fundamentals of
Java performance including synchronization and the WeakHashMap.
And finally, we seem to be getting one article each month that
is targetted at Java games programming: this month the
VolatileImage class from SDK 1.4 is covered.
This month Kirk tells us about discussions on writing a test
harness, performance tuning an applet, the performance of sin
and cos, serialization performance, Java's productivity,
stateless session beans and clusters, and much more. We also
get that promised update on his bananas, and a tour de force
on the Tour du France (I just couldn't resist that).
Finally here's my usual reminder to our Japanese readers that
Yukio Andoh's translation should be available at
http://www.hatena.org/JavaPerformanceTuning/ in a week or so.
Java performance tuning related news.
All the following page references have their tips extracted below.
All the following page references have their tips extracted below.
The 88th running of cycling?s crown jewel: the Tour du France is now
well underway. The cyclist with the shortest accumulated time over
20 stages (or individual races) that lead 209 elite athletes through
France, is awarded the highly coveted yellow jersey also known as
the "golden fleece". Each day of the 3450 km race brings the
challenge to remain ahead of the sweeper, a van with a broom
attached to it that picks up the stragglers when they fall too far
behind the race leaders.
Many riders end their tour after desperately struggling to stay
ahead of the sweeper as they climb the Alps or the Pyrenees. Over
the years, the drive for performance has changed the nature of the
race. Long gone are the days when participants were forbidden to
accept assistance. The drive for performance then forced participants to
carry a redundant system (a second bicycle) on their backs. In the
modern version, cycling?s star performers are supported by team
vehicles, domestics, and team mechanics. It is the domestics?
responsibility to shuttle food and water to the team?s leaders. If
a star's bicycle fails, the domestic will surrender his and then
wait for the team vehicle.
Overall time is not the only metric. The tours best sprinter is
awarded the green jersey. The tour's best climber is awarded the
coveted red and white polka dotted jersey. In the midst of the chaos
that surrounds each days race, cycling?s heros provide us with
insight into the effort and resources needed to support peak
performances. Last year Lance Armstrong rode the 3630 km course in a
time of 92.33.08. This works out to an average speed of 39.2 km/hr.
Now that?s what I call performance.
On now for this months look at the Java Ranch
A question asked was: can you specify tuning parameters in the
Java Plug-in Control Panel that affect CPU and I/O? Though no one
answered that question, it is interesting to note that the -X
parameters can be set in the control panel. With these, you can tune
things such as heap size, gc, etc.
One of the bartenders is writing a test harness and was asking
questions about how to log. His idea was to create channels from
large StringBuffers. Thoughts from respondents included that the
item be handed off to some asynchronous thread, process, or server
to complete the logging. It was pointed out that one should avoid
situations where the harness may interfere with your timings.
Another interesting suggestion was to predefine event types and
then just write event codes instead of text messages.
Onto Java Gaming at www.javagaming.org
where we find our own Jack,
performance tuning an applet for a developer in the thread
JavaGaming.Org Message Board: Performance Tuning: Profilers: HPROF.
The density of information in this thread is such that I don?t feel
it can be summarized. I strongly recommend that you check it out for
yourself. The thread included coverage of -Xprof, the HotSpot
profiler, and how it differs from -Xrunhprof; what might or might
not get inlined; what to target when tuning loops; and more.
In another thread, a developer was concerned about the amount of
memory that was being used. It was 3x more than a similar
application written in C. The response was that Java is lazy about
reclaiming memory and that one could use System.getFree() and
System.totalMemory() to determine the real footprint. These numbers
could be used to constrain the heap size of the application.
One final thread for those looking to future technologies concerned
Java processors. ARM is developing a Java processor named Jazelle.
For further information, check out
Also at Java Gaming, but in the "2D Graphics Programming " discussion
group brather than the "Performance" discussion group, was a long
thread about performance when using
functions. The basic advice was to use look up tables rather than
on the fly calculations, or even re-write the routines to avoid
trig functions if possible. But more interesting was a long
explanation of possible performance trade-offs, written by Sun's
JDK math expert and posted in the thread. Once again, this is too
dense to summarize, so you might want to check the thread out
JavaGaming.Org Message Board: 2D Graphics Programming in Java: Other Stuff: Sin and Cos
Finally, lets check out the Middleware Company at
The first thread that I pulled up presented
an interesting problem. A developer asked why serializing a string
was significantly faster than serializing his own custom class
(which in this case were almost identical). A participant responded
with a nice concise answer which, when implemented, resulted in a
20x speed up. That answer was that the default implementations of
the readObject and writeObject methods use reflection. By overriding
these methods, you can eliminate the need to use reflection that
will net a nice performance improvement.
On a related thread, the question was: to clone or serialize? In the
end, a respondent answered that when benchmarking, he found
serialization to be much slower than cloning. I found this result
surprising until I looked through the source for Object and found
protected native Object clone() throws CloneNotSupportedException.
Once again, the C++ vs. Java question was asked. Many of the
thoughtful responses questioned the ability for a large project to
contain the many problems (such as classical memory leaks) when
using C++. In the end C++ is faster but, if your application does a
lot of DB access, is that extra speed noticeable? The current rule
of thumb is that people are 4 times faster deploying in Java then
they are in C++. I?ll take a performance penalty for an increase in
productivity of that magnitude any day.
Another question centered on the use of static objects in Stateless
Session Beans running in a Weblogic cluster. A WebLogic cluster is
a group of WebLogic EJB servers that can communicate and stay in
sync with each other. A stateless session bean is an EJB that does
not carry any client or session specific state and is equally
sharable by each and every client. Thus the fact that this developer
was suggesting that he was carrying state in the stateless session
bean is a violation of the EJB specification. Curiously enough, he
was wondering why that state did not get replicated to the other
servers. The Server Side seems to have a lot of coverage of WebLogic
clusters. I?ll spend more time covering EJB servers in future
editions of this column.
And finally, as promised, here is the report on my Oronoco bananas.
After the initial burst, the growth rate did slow. Last week, I
removed about 8 kgs of bananas from the tree. The finger sized
bananas look like mini versions of the traditional store bananas.
The taste is slightly heavier and much sweeter. I am looking forward
to my next bunch.
Using the WeakHashMap class (Page last updated June 2001, Added 2001-07-20, Author Jack Shirazi). Tips:
- WeakHashMap can be used to reduce memory leaks. Keys that are no longer strongly referenced from the application will automatically make the corresponding value reclaimable.
- To use WeakHashMap as a cache, the keys that evaluate as equal must be recreatable.
- Using WeakHashMap as a cache gives you less control over when cache elements are removed compared with other cache types.
- Clearing elements of a WeakHashMap is a two stage process: first the key is reclaimed, then the corresponding value is released from the WeakHashMap.
- String literals and other objects like Class which are held directly by the JVM are not useful as keys to a WeakHashMap, as they are not necessary reclaimable when the application no longer references them.
- The WeakHashMap values are not released until the WeakHashMap is altered in some way. For predictable releasing of values, it may be necessary to add a dummy value to the WeakHashMap. If you do not call any mutator methods after populating the WeakHashMap, the values and internal WeakReference objects will never be dereferenced.
- WeakHashMap wraps an internal HashMap adding an extra level of indirection which can be a significant performance overhead.
- Every call to get() creates a new WeakReference object.
- WeakHashMap.size() iterates through the keys, making it an operation that takes time proportional to the size of the WeakHashMap.
- WeakHashMap.isEmpty() iterates through the collection looking for a non-null key, so a WeakHashMap which is empty requires more time for isEmpty() to return than a similar WeakHashMap which is not empty.
When synchronization is required (Page last updated July 2001, Added 2001-07-20, Author Brian Goetz). Tips:
- synchronization means mutual exclusion (if the same monitor is used), atomicity of the synchronized block (again with respect to other threads using the same monitor) and synchronization of thread memory to main memory.
- Because synchronization synchronizes thread memory with main memory, there is a cost to synchronization beyond simply acquiring a lock.
- Too little synchronization can lead to corrupt data; too much can lead to reduced performance and deadlock.
- The costs of synchronization vary with JVMs, with more recent JVMs being more efficient.
- The costs of synchronization differs depending on whether or not threads are actually contending for locks (more expensive, slower), or for uncontended synchronization where the thread is basically acting in single-threaded mode (cheaper, faster).
- You need to synchronize or make
volatile variables holding data that will be shared between threads.
- Composite operations may need synchronizing to make them atomic even if each individual operation is already synchronized.
Using VolatileImage (Page last updated May 2001, Added 2001-07-20, Author Someone@sun). Tips:
- Graphics performance in 1.2 is worse than 1.1. 1.3 is better, and 1.4 should be the fastest yet.
- From 1.2 direct access to image pixels was available, but was too slow to be usable because it involved copying many bits around in memory.
- Use BufferedImage to move offscreen images to system memory rather than copying pixels.
- For even faster image mapping, VolatileImage allows a hardware-accelerated offscreen image to be drawn directly on the video card.
- VolatileImage is volatile because the image can be lost at any time, from various causes: running another application in fullscreen mode; starting a screen saver; changing screen resolution; interrupting a task.
- Only constantly re-rendered images need to be explicitly created as VolatileImage objects to be hardware accelerated. Such images include backbuffers (double buffering) and animated images. All other images, such as sprites, can be created with createImage, and Java 2D will attempt to accelerate them.
- If an image, such as a sprite, is drawn once and copied from many times, Java 2D makes a copy of it in accelerated memory and future copies from the image can perform better.
- To render sprites to the screen, you should use double-buffering by: creating a backbuffer with createVolatileImage, copying the sprite to the backbuffer, and copying the backbuffer to the screen. If content loss occurs, Java 2D re-copies the sprite from software memory to accelerated memory.
- Only some graphics operations (e.g. curved shapes) are accelerated on some platforms. Use profiling to determine what works best for your situation.
- From 1.4 Swing uses VolatileImage for its double buffering.
- VolatileImage.getCapabilities() provides an ImageCapabilities object which gives details of the runtime VolatileImage. The ImageCapabilities allows the application to decide to use less images, images of lower resolution, different rendering algorithms, or various other means to attempt to get better performance from the current situation and platform.
Servlet Filters (Page last updated June 2001, Added 2001-07-20, Author Jason Hunter). Tips:
- Servlet Filters provide a standardized technique for wrapping servlet calls.
- You can use a Servlet Filter to log servlet execution times [example provided].
- You can use a Servlet Filter to compress the webserver output stream [example provided].
Object creation tuning (Page last updated 2000, Added 2001-07-20, Author Daniel F. Savarese). Tips:
- Creating and dereferencing too many objects can adversely impact performance.
- Avoid holding on to objects for too long by explicitly dereferencing them (setting variables to null) and by using weak references.
- Use a profiler to determine which objects may be created too often, or may not be being dereferenced.
- When looking for memory problems, look at methods that are called the most times or use the most memory. Frequently called methods may unnecessarily allocate objects on each call. Methods that use a lot of memory may not need to use as much memory or they may be a source of memory leaks.
- Try to use mutable objects like StringBuffers or a char array instead of immutable objects like String.
- Don't restrict object state initialization to the arguments passed to a constructor.
- Provide a zero-argument constructor that creates reasonable default values and include setter methods or an init method to allow objects of that class to be reused.
- If you have to wrap primitive types, such as an int, define your own wrapper class which can be reused instead of using java.lang.Integer.
- If you need to create many instances of a wrapper class like Integer, consider writing your algorithm to accept primitive types.
- Use a factory class instead of directly calling the "new" operator, to allow easier reuse of objects.
- Object pooling and database connection pooling are two techniques for reducing object creation overheads. Object pools can be sources of memory leaks and can themselves be inefficient.
The Optimistic Locking pattern (Page last updated July 2001, Added 2001-07-20, Author Yasmin Akbar-Husain and Eoin Lane). Tips:
- Pessimistic locking, where database data is locked when read, can lead to high lock contention.
- Optimistic locking only checks data integrity at update time, so has no lock contention [but can have high rollback costs]. This Optimistic Locking pattern is usually more scalable than pessimistic locking.
- Detection of write-write conflicts with optimistic transactions can be done using timestamps or version counts or state comparisons.
Using java.lang.reflect.Proxy (Page last updated July 2001, Added 2001-07-20, Author Tom Harpin). Tips:
- The java.lang.reflect.Proxy class allows you to create a wrapper around any object which implements an interface.
- Interposing proxy objects is a useful approach to trace or profile method calls.
Rules and Patterns for Session Facades (Page last updated June 2001, Added 2001-07-20, Author Kyle Brown). Tips:
- Use the Facade pattern, and specifically Value objects, to transfer all the subset of data needed from en entity bean in one transfer.
Scaling web services (Page last updated June 2001, Added 2001-07-20, Author Simeon Simeonov). Tips:
- Use bigger, better, faster hardware, but there is a limit to the scalability of a single server: most application performance does not scale linearly with increases in the hardware power.
- Use more than one server in a cluster that services requests as if it were a single server using: OS-level clustering (OS level built in failover mechanisms); Software load balancing (using a load-balancing front-end dispatcher); Hardware load balancing (e.g. DNS round-robin to different servers).
- A basic load-balancing scheme is achievable by sending documents with varying binding addresses (different URL hosts)
- Use faster communication protocols (e.g. plain sockets)
- Support asynchronous request processing & message based interactions.
Sun community discussion on "Optimizing Entity Beans" with Akara Sucharitakul (Page last updated June 2001, Added 2001-07-20, Author Edward Ort). Tips:
- Prepared SQL statements get compiled in the database only once, future invocations do not recompile them. The result of this is a decrease in the database load, and an increase in performance of up to 5x.
- Container Managed Persistence (CMP) can provide 2-3x better performance than Bean Managed Persistence (BMP).
Optimizing dynamic web pages (Page last updated July 2001, Added 2001-07-20, Author Helen Thomas). Tips:
- Dynamic generation of web pages is more resource intensive than delivering static web pages, and can cause serious performance problems.
- Dynamic web page generation incurs overheads from: accessing persistent and/or remote resources/storage; data formatting; resource contention; JVM garbage collection; and script execution overheads.
- Dynamic content caching tries to mitigate Dynamic web page generation overheads by reusing content that has already been generated to service a request.
- JSP cache tagging solutions allow page and fragment level JSP output to be automatically cached.
- On highly personalized sites page-level caching results in low cache hit rates since each page instance is unique to a user.
- Component-level caching applies more extensively when components are reused in many pages, but requires manual identification of bottleneck components.
J2ME apps, with a discussion of the needs to balance performance (Page last updated June 2001, Added 2001-07-20, Author Glenn Coates). Tips:
- J2ME devices have limited processing power, so performance is important and must be considered for the target device.
- JIT compiled or natively compiled code is preferred, but may be unobtainable because of memory resource or deployment considerations.
- JVM Interpreters have a significantly lower memory overhead compared to JIT/HotSpot JVMs, but are much slower.
- Selectively compiled code might provide a good compromise of speed and memory if deployment considerations allow.
- The application does not need to be lightning fast in order to have a responsive user interface. The perception of speed is important: for example, the user interface should give immediate feedback.
- JVM slection for the J2ME device is pivotal to achieving the required performance.
- Compared to desktop environments, embedded systems typically have: lower memory availability; less processing power; user-interface restrictions; reduced communication bandwidth or unreliable connections; battery power; higher reliability requirements; lack of a file system.
J2EE challenges (Page last updated June 2001, Added 2001-07-20, Author Chris Kampmeier). Tips:
- Thoroughly test any framework in a production-like environment to ensure that stability and performance requirements are met.
- Each component should be thoroughly reviewed and tested for its performance and security characteristics.
- Using the underlying EJB container to manage complex aspects such as transactions, security, and remote communication comes with the price of additional processing overhead.
- To ensure good performance use experienced J2EE builders and use proven design patterns.
- Consider the impact of session size on performance.
- Avoid the following common mistakes: Failure to close JDBC result sets, statements, and connections; Failure to remove unused stateful session beans; Failure to invalidate HttpSession.
- Performance test various options, for example, test both Type 2 and Type 4 JDBC drivers; Use a load-generation tool to simulate moderate loads; monitor the server to identify resource utlization.
- Perform code analysis and profiling.
- Performance requirements include: the required response times for end users; the perceived steady state and peak user loads; the average and peak amount of data transferred per Web request; the expected growth in user load over the next 12 months.
- Note that peak user loads are the number of concurrent sessions being managed by the application server, not the number of possible users using the system.
- Larger loads require greater amounts of hardware to satisfy that load.
- Applications that perform very little work can typically handle many users for a given amount of hardware, but can scale poorly as they spend a large percentage of time waiting for shared resources.
- Applications that perform a great number of computations tend to require much more hardware per user, but can scale much better than those performing a small number of computations.
- Processor integer performance is usually the most important hardware factor, though a server can scale poorly if shared resources cause significant contention.
- Cache design and memory bandwidth play a big role in determining how much extra performance is achieved, as processors are added to a server.
- Additional capacity should be designed into the system.
- Extrapolate from known performance test results to predict the performance of the system when varying amount of resources are available.
Reusing objects in embedded Java (Page last updated July 2001, Added 2001-07-20, Author Angus Muir and Roman Bialach). Tips:
- A lot of object creation and destruction can lead to a fragmented heap, which reduces the ability to create further objects.
- Define the bulk of memory you need (buffers, etc.) up-front at initialization, and use object pooling to avoid further creation or destruction of objects.
- Throwing/catching exceptions are tremendously expensive.
- Pooling is not always faster than object creation.
Chapter 2, "Java: Fat and Slow?", of "Java 2 Micro Edition: Professional Developer's Guide" referenced from http://www.microjava.com/articles/techtalk/giguere(Page last updated May 2001, Added 2001-07-20, Author Eric Giguere). Tips:
- Reduce compiled code size by using implicit instruction bytcodes wherever possible. For example, limiting a method to four or fewer local variables (three on non-static methods as "this" takes the first slot), allows the compiler to use implicit forms of instructions (such as aload, iload, fload, astore, istore, fstore, and so on).
- Similarly numbers -1, 0, 1, 2, 3, 4 ,5 have special bytecodes
- Java class files are standalone - no data is shared between class files. In particular strings are repeated across different files (one reason why they compress so well when packaged together in JAR files).
- An empty class compiles to about 200 bytes, of which only 5 bytes are bytecode.
- There are no instructions for initializing complete arrays in the Java VM. Instead, compilers must generate a series of bytecodes that initialize the array element by element. This can make array initialization slow, and adds bytecode to the class.
- You can reduce bytecode bloat from array initialization by encoded values in strings and using those strings initialize the arrays.
- Explicitly set references to null when they are no longer needed to ensure that the objects can be garbage collected.
- Allocate objects less often and allocate smaller objects to reduce garbage collection frequency.
Performance tuning report in German, recently updated. Thanks to Peter Kofler for extracting the tips. (Page last updated June 2001, Added 2001-07-20, Author Sebastian Ritter). Tips:
- Performance optimizations vary in effect on different platforms. Always test for your platforms.
- Reasons not to optimize: can lead to unreadable source code; can cause new errors; optimizations are often compiler/JVM/platform dependent; can lose object orientation.
- Reasons to optimize: application uses too much memory/processor/I/O; application is unnaceptably slow.
- Don't optimize before you have at least a fyunctioning prototype and some identified bottlenecks.
- Try to optimize the design first before targeting the implementation.
- Profile applications. Use the 80/20 rull which suggests that 80% of the work is done in 20% of the code.
- Target loops in particular.
- Monitor running applications to maintain performance.
- Plan and budget for some resources to optimize the application. Try to have or develop a couple of performance experts.
- Specify performance in the project requirements, and specify seperate performance requirements for the various layers of the application.
- Consider the effects of performance at the analysis stage, and include testing of 3rd party tools.
- Use a benchmark harness to make repeatable performance tests, varying the number of users, data, etc. Use profilers and logging to measure performance and identify performance problems.
- Optimize the runtime system if the optimization does not require alterations to the application design or implementation.
- Test various JVMs and choose the optimal JVM.
- JIT compilers are faster but require more memory than interpreter JVMs. HotSpot can provide better performance and a faster startup and maintain a relatively low memory requirement.
- Design in asynchronous operations so tasks are not waiting for others to finish when they don't need to.
- use the right VM
- use the right threading model (native vs. green)
- use native compilers
- give more ram to the VM
- give all ram to short-lived applications to completely avoid GC
- use alternate/optimizing compilers
- use the right database driver
- use direct JDBC drivers
- expand all JDK classes into the filesystem to increase access to classes
- use slot-local variables (1st 128 bit = 4 slots) (applies for interpreters only)
- use int
- use Arraylist instead of Vector
- use own Hashtable implementations for atoms (i.e. int)
- use caches
- use object pools
- avoid remote method calls
- use callbacks to avoid blocking remote method calls
- use batching for remote method calls
- use the flyweight pattern to reduce object creation
- use the right access modifier: static > private > final > protected > public
- use inlining
- use shallow hierarchies (to avoid long instantiation chains)
- use empty default constructors
- use direct variable access (not recommended, breaks OO)
- mix model with view (not recommended, breaks OO)
- use better algorithms
- remove redundant code
- optimize loops
- unroll loops
- use int as loop counter
- count/test loops towards 0
- use Exception terminated loops for long loops
- use constants for expressions with known results, e.g. replace
x = 3; ... (x does not change) ...; x += 3; with
x = 3; ... (x does not change) ...; x = 6;
- move code outside loops
- how to optimize: 1st check for better algorithms, 2nd optimize loops
- use shift for *2 and /2
- do not initialize with default values (0, null)
- use char arrays for mutable Strings
- use arrays instead of collections
- use the "private final" modifier
- use System.arraycopy() to copy arrays
- use Hashtable keys with fast hashcode()
- do not use Strings as keys for Hashtables
- use new Hashtable() instaed of Hashtable.clear() for very large Hashtables
- inspect JDK source
- use methods in order: static > final > instance > interface > synchronized
- use own specialized methods instead of JDK's generalized ones
- avoid synchronization
- avoid new objects
- reuse objects
- use the original instead of overloaded constructors (give default parameters by your own)
- avoid inner classes
- use + for concenating 2 Strings, use Stringbuffer for concenating more Strings
- use clone to create new objects (instead of new)
- use instance.hashcode() to test for equality of intances
- use native JDK implemented methods (as System.arraycopy())
- avoid Exceptions (use Exceptions only for cases with probability < 50%, else use error flags)
- combine multiple small try-catchs to one larger block
- use Streams instead of Readers, use Reader and Writer only if you need internationalization
- use buffering for io
- use EOFException and ArrayOutOfBoundsException for terminating io reading loops
- use transient fields to speedup serialisation
- use externalization instead of serialisation
- use multiple threads to increase perceived performance
- use awt instead of swing for speed
- use swing instead of awt for less memory
- use super.paint() to initialiiy draw something (i.e. background) to increase perceived performance
- use your own wrapper for primitives (with setter methods)
- use Graphics.drawPolygon() (native implemented) instead of several Graphics.drawlines().
- use low priority threads to initialize graphic components in the background
- use synchronized blocks instead of synchronized methods
- cache (SQL) Statements for DB access
- use PreparedStatements for DB access
Last Updated: 2017-03-01
Copyright © 2000-2017 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us