Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Newsletter no. 7, June 18th, 2001
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
This month I've added a new section on Java performance related news
items. In the future this section will include any updates to existing
tools, vendor announcements, benchmark reports, etc. Also newsworthy
is that the resources page has been
restructured to add two profiling tools sections,
(Free) Profiling Tools
(Not Free) Profiling Tools.
I was surprised to find out how many free resources there are.
This months articles have no real theme, except perhaps to show how
widely used Java is. Kirk adds the newish discussion group at
JavaGaming.org to his roundup of the Java performance tuning
discussion groups. But there's no update on his banana tree this
Finally, my usual reminder to our Japanese readers that Yukio
Andoh's translation should be available at
http://www.hatena.org/JavaPerformanceTuning/ in a week or so.
Java performance tuning related news.
All the following page references have their tips extracted below.
All the following page references have their tips extracted below.
Other additions to the website
If there is any group of developers that knows how to squeeze out every
ounce of performance, it has to be gamers. Anyone remember the original
flight simulator? How about Snipes, one of the first network based,
interactive shoot-em-up games. The guy with the bigger hardware always
seemed to have the upper hand. Because I always seemed to be the one
with the slowest hardware, I quickly came to appreciate how performance
affects the gaming experience. So, it is not surprising to find that a
site such as www.JavaGaming.org
supports a discussion group focused on performance. I'll start this
months roundup with a review of this site.
The discussion group at www.JavaGaming.org is organized a little
differently than is the JavaRanch and theServerSide. At the outset, it
resembles wiki but, once you tunnel down, you'll find the familiar
"by topic" message flow. My first random walk though the links led me
to a nice description of garbage collection (GC) from Jeff Kessellman.
In a very dense message, Jeff dispelled the notion that reference
counting was being used in the HotSpot VM. Jeff explained that a full
GC is performed before the VM will throw an out-of-memory error. He
also provided a nice recipe on how to find object leaks in your
application. But that wasn't the end of Jeff's contribution to the
list (as we'll see later on).
My wanderings through wwww.JavaGaming.org got less random when I saw
a thread entitled "Ugly memory tricks in 1.4 beta". I must say that
even though the thread was short and simple, I needed to walk away from
my desk and think about it for a while. The code segment presented
below prints "2" and "os".
String osv = "Osvaldo";
sun.misc.Unsafe.getUnsafe().putInt(osv, 16, 2);
Jeff Kesselman pointed out that the unsafe class is suppose to provide
direct access to native memory without needing to use the JNI.
Apparently, the implementation can only be used from privileged code
and does perform bounds checking. Even so, this is a class I'm sure I'm
not going to be using any time soon. While you're out checking
"Java's Newest Trick -- THE UNSAFE CLASS!" posted on the Java lobby
site by Osvaldo Doederlein, I'll be brushing up on my SEGV debugging
skills (core dump territory, for those of you who have had the good
luck to miss out on SIGSEGV - ed.).
One thread started off with a discussion of pooling/synchronization vs.
object creation in which Jeff provided some valuable information on the
effects of local cache on the CPU on multi-threaded applications. The
V9 Sparc introduced four new fetch/cache instructions to help with
threading and performance. This change broke several of the pre-V9 VM
implementations as it caused different threads (or in some cases, the
same thread) to maintain different copies of data in local cache,
which in turn created cache coherency bugs. As it turns out, this
problem is not isolated to the Sparc CPU. As is generally the case when
there is a discussion on pooling, the subject quickly turned to the
cost of synchronization. Jack described the differences between method
and block synchronization (which I covered in last months newsletter)
and some interesting information on cache coherency (see Art Jolin's
"Java's Atomic assignment" article in Java Report Aug 1998 p27). Jack
commented that method synchronization was faster than block
synchronization in every VM that he has tested. It was at this point
that Jeff laid down the gauntlet and a spirited and most informative
debate between Jeff and Jack began. The primary question was: should one
make VM/hardware specific optimizations? Jeff took the position that
one should focus on WORA optimizations. Jack argued that many useful
optimizations often take advantage of VM/hardware specific features.
IMHO, the clinching argument was made when Jack took an I/O example from
Jeff?s book and explained how it was a hardware specific optimization.
My personal experience is from a number of target environments that
are quite diverse. Aside from making sure that I use strong algorithms,
I often implement hardware/VM performance enhancements as these frequently
provide the best performance boost. Using plug-in points (pg 25,
"Java Color Modeling with UML", Peter Coad, Eric Lefebvre, Jeff DeLuca)
one can often isolate these optimizations.
Lets look at what's happening down on the
Java Ranch. The question that
caught my eye concerned an out-of-memory error caused by a heap that
kept growing. Of course, increasing the heap size helped delay the
problem, but did not prevent it. The obvious advice given was that objects
were not being released. It was suggested that a profiler be used. One
important fact: the application was running in TomCat, which implies that
the client is running a browser. So, is the answer really that simple?
What happens if the objects are collecting in the HTTP session state?
How do you know when a user is finished with his/her session state? VM
bloat is common in systems where the interface between the client and the
data is a servlet. So, the profiler will tell you that you're holding onto
too much data. It won't tell if the user is finished with it (unless the
user logs out of course).
Here is an interesting but yet unanswered question concerning the KVM.
Are sockets slower on the KVM? Although my experience with the KVM is
limited, it does include experimentation with sockets all be it on a
Palm Vx wired to a PC. Since the amount of data that I was passing was
small, I didn't notice any performance problems. The Palm Vx did not open
a socket. A server daemon did this for it. The Palm Vx used a hot-sync like
connection to connect with the server. I may be wrong but, I suspect that
a moderately higher load would strain this type of link. Maybe someone with
more experience could comment on this.
What?s new from The Server Side?
We start with a question concerning the scalability of the BSD VM.
Unfortunately, it doesn't do so well according to the Volano benchmark.
You can find these and other results at
It is common when using EJBs to map an Entity Bean to a row in a table.
This can result in the EJB server creating a large number of Entity
Beans when the table is large, or when you need to look at the entire
table. One such participant was asking how one should deal with this
problem. A solution that is commonly used is known as bi-modal access.
In a bi-modal architecture, Session Beans provide read-only access to the
db. All read/write operations are performed by Entity Beans. A follow
up question was, why use Entity Beans at all? The response: writing all
the JDBC code was time consuming. Using Entity Beans solves this problem
and a few others.
This column wouldn't be complete if we didn't somehow mention JavaOne.
I did not attend JavaOne, but I did watch the keynotes. It was particularly
interesting to see James Gosling introduce asserts and generics. I must
say that I'm not a big fan of writing code to downcast. I also don't like
the performance hit that casting extracts. Given this, one would think
that I would applaud the introduction of generics into the language. But,
unlike Mr. Gosling, I'm not happy that generics are being introduced into
the language. Why? Because, Java does not need any more syntax and what
do generics do? Introduce more syntax. The object already is being returned
to a typed holder. The object itself is typed. If the types don't match,
then throw a ClassCastException. The only loss is that you can't restrict
which objects get inserted into a collection. Is this feature worth the
extra syntax? You know my opinion, lets hear yours.
Comparing the performance of LinkedLists and ArrayLists (and Vectors) (Page last updated May 2001, Added 2001-06-18, Author Jack Shirazi). Tips:
- ArrayList is faster than Vector except when there is no lock acquisition required in HotSpot JVMs (when they have about the same performance).
- Vector and ArrayList implementations have excellent performance for indexed access and update of elements, since there is no overhead beyond range checking.
- Adding elements to, or deleting elements from the end of a Vector or ArrayList also gives excellent performance except when the capacity is exhausted and the internal array has to be expanded.
- Inserting and deleting elements to Vectors and ArrayLists always require an array copy (two copies when the internal array must be grown first). The number of elements to be copied is proportional to [size-index], i.e. to the distance between the insertion/deletion index and the last index in the collection. The array copying overhead grows significantly as the size of the collection increases, because the number of elements that need to be copied with each insertion increases.
- For insertions to Vectors and ArrayLists, inserting to the front of the collection (index 0) gives the worst performance, inserting at the end of the collection (after the last element) gives the best performance.
- LinkedLists have a performance overhead for indexed access and update of elements, since access to any index requires you to traverse multiple nodes.
- LinkedList insertions/deletion overhead is dependent on the how far away the insertion/deletion index is from the closer end of the collection.
- Synchronized wrappers (obtained from Collections.synchronizedList(List)) add a level of indirection which can have a high performance cost.
- Only List and Map have efficient thread-safe implementations: the Vector and Hashtable classes respectively.
- List insertion speed is critically dependent on the size of the collection and the position where the element is to be inserted.
- For small collections ArrayList and LinkedList are close in performance, though ArrayList is generally the fasterof the two. Precise speed comparisons depend on the JVM and the index where the object is being added.
- Pre-sizing ArrayLists improves performance significantly. LinkedLists cannot be pre-sized.
- ArrayLists can generate far fewer objects for the garbage collector to reclaim, compared to LinkedLists.
- For medium to large sized Lists, the location where elements are to inserted is critical to the performance of the list. ArrayLists have the edge for random access.
- A dedicated List implementation designed to match data, collection types and data manipulation algorithms will always provide the best performance.
- ArrayList internal node traversal from the start to the end of the collection is significantly faster than LinkedList traversal. Consequently queries implemented in the class can be faster.
- Iterator traversal of all elements is faster for ArrayList compared to Linkedlist.
"Cutting Edge Java Game Programming". Oldish but still useful intro book to games programming using Java. (Page last updated 1996, Added 2001-06-18, Author Neil Bartlett, Steve Simkin ). Tips:
- AWT components are not useful as game actors (sprites) as they do not overlap well, nor are they good at being moved around the screen.
- Celled image files efficiently store an animated image by dividing an image into a rectangular grid of cells, and allocating a different animation image to each cell. A sequence of similar images (as you would have for an animation) will be stored and transferred efficiently in most image formats.
- Examining pixels using PixelGrabber is slow.
- drawImage() can throw away and re-load images in response to memory requirements, which can make things slow.
- Pre-load and pre-scale images before using them to get a smoother and faster display.
- The more actors (sprites), the more time it takes to draw and the slower the game appears.
- Use double-buffering to move actors (sprites), by redrawing the actor and background for the relevant area.
- Redraw speed depends on: how quickly each object is drawn; how many objects are drawn; how much of each object is drawn; the total number of drawing operations. You need to reduce some or all of these until you get to about 30 redraws per second.
- Don't draw actors or iages that cannot be seen.
- If an actor is not moving then incorporate the actor as part of the background.
- Only redraw the area that has changed, e.g. the old area where an actor was, and the new area where it is. Redrawing several small areas is frequently faster than drawing one large area. For the redraws, eliminate overlapping areas and merge adjacent (close) areas so that the number of redraws is kept to a minimum.
- Put slow and fast drawing requirements in separate threads.
- Bounding-box detection can use circles for the bounding box which requires a simple radii detection.
- Load sounds in a background thread.
- Make sure you have a throttle control that can make the game run slower (or pause) when necessary.
- The optimal network topology for network games depends on the number of users.
- If the cumulative downloading of your applet exceeds the player?s patience, you?ve lost a customer.
- The user interface should always be responsive. A non-responsive window means you will lose your players. Give feedback on necessary delays. Provide distractions when unavoidable delays will be lengthy [more than a few seconds].
- Transmission time varies, and is always slow compared to operations on the local hardware. You may need to decide the outcome of the action locally, then broadcast the result of the action. This may require some synchronization resolution.
- Latency between networked players can easily lead to de-synchronized action and player frustration. Displays should locally simulate remote action as continuing current activities/motions, until the display is updated. On update, the actual current situation should be smoothly resolved with the simulated current situation.
- Sending activity updates more frequently ensures smoother play and better synchronization between networked players, but requires more CPU effort and so affects the local display. In order to avoid adversely affecting local displays, send actvity updates from a low priority thread.
- Discard any out-of-date updates: always use the latest dated update.
- A minimum broadcast delay of one-third the average network connection travel time is appropriate. Once you exceed this limit, the additional traffic can cause more grief than benefit.
- Put class files into a (compressed) container for network downloading.
- Avoid repeatedly evaluating invariant expressions in a loop.
- Take advantage of inlining where possible (using final, private and static keywords, and compiling with javac -O)
- Profile the code to determine the expensive methods (e.g. using the -prof option)
- Use a dissassembler (e.g. like javap) to determine which of various alternative coding formulations produces smaller bytecode.
- To reduce the number of class files and their sizes: use the SDK classes as much as possible; and implement common functionality in one place only.
- To optimize speed: avoid synchronized methods; use buffered I/O; reuse objects; avoid unnecessary screen painting.
- Raycasting is faster than raytracing. Raycasting maps 2D data into a 3D world, drawing entire vertical lines using one ray. Use precalculated values for trignometric and other functions, based on the angle increments chosen for your raycasting.
- In the absence of a JIT, the polygon drawing routines fron the AWT are relatively efficient (compared to array manipulation) and may be faster than texture mapping.
- Without texture mapping, walls can be drawn faster with one call to fillPolygon (rather than line by line).
- An exponential jump search algorithm can be used to reduce ray casts - by quickly finding boundaries where walls end (like a binary search, but double increments until your overshoot, then halving increments from the last valid wall position).
- It is usually possible to increase performance at the expense of image quality and accuracy. Techniques include reducing pixel depth or display resolution, field interlacing, aliasing. The key, however, is to degrade the image in a way that is likely to be undetectable or unnoticeable to the user. For example a moving player often pays less attention to image quality than a resting or static player.
- Use information gathered during the rendering of one frame to approximate the geometry of the next frame, speeding up its rendering.
- If the geometry and content is not too complicated, binary space partition trees map the view according to what the player can see, and can be faster than ray casting.
Tutorial on the full screen capabilities in the 1.4 release (5 pages plus example pages under the top page) (Page last updated June 2001, Added 2001-06-18, Author Michael Martak). Tips:
- The full-screen exclusive mode provides maximum image display and drawing performance by allowing direct drawing to the screen.
- Use java.awt.GraphicsDevice.isFullScreenSupported() to determine if full-screen exclusive mode is available. If it is not available, full-screen drawing can still be used, but better performance will be obtained by using a fixed size window in normal screen mode. Full-screen exclusive applications should not be resizable.
- Turn off decoration using the setUndecorated() method.
- Change the screen display mode (size, depth and refresh rate), to the best match for your image bit depth and display size so that scaling and other image alterations can be avoided or minimized.
- Don't define the screen painting code in the paint() method called by the AWT thread. Define your own rendering loop for screen drawing, to be executed in any thread other than the AWT thread.
- Use the setIgnoreRepaint() method on your application window and components to turn off all paint events dispatched from the operating system completely, since these may be called during inappropriate times, or worse, end up calling paint, which can lead to race conditions between the AWT event thread and your rendering loop.
- Do not rely on the update or repaint methods for delivering paint events.
- Do not use heavyweight components, since these will still incur the overhead of involving the AWT and the platform's windowing system.
- Use double buffering (drawing to an off-screen buffer, then copying the finished drawing to the screen).
- Use page-flipping (changing the video pointer so that an off-screen buffer becomes the on-screen buffer, with no image copying required).
- Use a flip chain (a sequence of off-screen buffers which the video pointer successively points to one after the other).
- java.awt.image.BufferStrategy provides getDrawGraphics() (to get an off-screen buffer) and show() (to display the buffer on screen).
- Use java.awt.BufferCapabilities to customize the BufferStrategy for optimizing the performance of your application.
- If you use a buffer strategy for double-buffering in a Swing application, you probably want to turn off double-buffering for your Swing components,
- Multi-buffering is only useful when the drawing time exceeds the time spent to do a show.
- Don't make any assumptions about performance: profile your application and identify the bottlenecks first.
Various performance tips (Page last updated May 2001, Added 2001-06-18, Author Asha Balasubramanyan). Tips:
- Use buffered I/O. Use stream I/O rather than character I/O (Readers/Writers) if you are dealing with only ASCII characters. Avoid premature flushing of buffers.
- Recycle objects. try to minimize the number of objects you create in your java programs.
- Factor out constant computations from loops. Push one-time computations into methods called once only.
- Use StringBuffer when dealing with mutable strings. Initialize the StringBuffer with the proper size.
- Comparison of two string objects is faster if they differ in length.
- Avoid converting Strings to bytes and back.
- StringTokenizer is slow. Write your own tokenizer.
- Use charAt() instead of StartsWith() in case you are looking for a single character within a String.
- Avoid premature object creation. Creation should be as close to the actual place of use as possible.
- Avoid initializing twice.
- Zeroing buffer contents is not usually required.
- Be careful about the order of evaluation of expressions with OR and AND conditions.
- Use ArrayList for non-synchronized Vectors.
- Minimize JNI calls in your code.
- Minimize calls to Date and related classes.
Timing out sockets (Page last updated 2000, Added 2001-06-18, Author David Reilly). Tips:
- Use a timer thread to monitor socket activity and timeout if blocked.
- Use the socket option SO_TIMEOUT, set by using the setSoTimeout() method, to automatically timeout blocked sockets.
Load testing of web applications (Page last updated June 2001, Added 2001-06-18, Author Frank Cohen). Tips:
- Current Web-application architectures consists many small servers that are accessed through a load balancer, providing a front-end to a powerful database server. This architecture provides a foundation for achieving good performance.
- Load testing of web applications should include: State machine testing (entries in a shopping basket, should still be there when checked out); Really long session testing (session started then continued several hours later); Hordes of savage users testing (users do lots nonsensical activity); Privileged testing (only some users should be able to access some functionality); Speed testing (do tasks complete within the required times?). Each type of test should be run with several differetn user loads.
- Test suites should be automated and easily changed.
- [Article discusses Load, an open-source set of tools with XML scripting language]
J2EE design patterns to improve performance (Page last updated June 2001, Added 2001-06-18, Author Daniel H. Steinberg). Tips:
- Combine multiple remote calls for state informationinto one call using a value object to wrap the data (the Value Object pattern, superceded by local interfaces in EJB 2.0).
- Where long lists of data are returned by queries, use the Page-by-Page Iterator pattern: a server-side object that holds data on the server and supplies batches of results to the client.
Moving from JSP to EJB (Page last updated June 2001, Added 2001-06-18, Author Patrick Sean Neville). Tips:
- Entity EJBs should contain aggregate get/set methods that return chunks of data rather than fine-grained get/set methods for individual attributes, to reduce unnecessary database, transactional, and network communication overheads.
- Avoid stateful session beans as they are resource-heavy, since one instance is maintained for each client.
- Under heavy loads, entity beans should do more than merely represent a table in a database. If you are merely retrieving and updating data values, consider using JDBC within session beans instead.
- If you have one large database host but only a small Web and middleware host, consider moving much of your logic into stored procedures and calling them via JDBC in session beans.
- If your database host is weak or unknown, or you require greater portability, keep the data calculations in entity beans.
- Consider using a single stateless session bean to provide access to other EJBs (this is a fašade pattern). This optimizes multiple EJB references and calls by keeping them in-process.
- Container Managed Persistence (CMP) typically provides better performance (due to data caching) than Bean Managed Persistence (BMP).
Judging various aspects of Java, including performance (Page last updated May 2001, Added 2001-06-18, Author Brian Maso). Tips:
- J2EE defines component models with high scalability potential. Maximizing scalability requires sticking to stateless session beans and handling all database interactions programmatically (through pooled JDBC connections).
- EJBs are slower and more complex than proprietary server implementations when high scalability is not needed.
- Java (to 1.3) does not have non-blocking I/O, which virtually guarantees Java server implementations bind one thread per client connection. This limits communication throughput. Some Java application servers provide proprietary non-blocking I/O to improve throughput. From the 1.4 SDK, Java includes non-blocking I/O.
Experiences building a servlet (Page last updated June 2001, Added 2001-06-18, Author Asif Habibullah, Jimmy Xu). Tips:
- Keep the size of the client tier small so that downloads are fast.
- Use the servlet init() and destroy() methods to start and stop limited and expensive resources, such as database connections.
- Make the servlets thread-safe and use connection pooling.
- Use PreparedStatements rather than plain Statement objects.
- Use database stored procedures.
Sun presentation on J2SE performance strategies (originally accessed from Reginald Hutcherson's page) (Page last updated May 2001, Added 2001-06-18, Author Reginald Hutcherson). Tips:
- The Sun 1.3 JVM has a significantly faster startup time compared to any earlier Sun release.
- Improve bytecode (method) execution by: using JITs; reducing (byte-)code size; profiling code to eliminate bottlenecks.
- Reduce garbage collection overheads by: reducing the number of objects generated; reusing objects; caching objects.
- Reduce multithreading overheads by targetting the granularity of locks, and managing synchronization correctly.
- Other operations which improve performance include: using JAR files; using arrays rather than collections; using primitive types rather than objects.
- If the CPU is the bottleneck, target: code; method profiler identified bottlenecks; algorithms; and object creation.
- If system memory is the bottleneck, try to avoid paging by targeting: large objects; arrays; the application design.
- If disk I/O is the bottleneck, identify the problem and eliminate it.
- Ensure that you have becnhmarks and targets, and run reproducible benchmark tests.
- Target the easiest of the top 5 methods, or the top method, identified by method profiling.
- Repeat profile, fix, benchmark iterative process.
- Avoid runtime String concatenation. Use StringBuffer instead.
- Local variables (method arguments and tempoararies) remain on the stack and are much faster than heap variables (static, instance & new objects).
- Use strength reduction: "x = x + 5" -> "x += 5"; "y = x/2" -> "y = x >> 1"; "z = x * 4" -> "z = x << 2".
- Reuse threads by pooling threads.
- Use Buffered I/O classes.
- Method synchronization is slightly faster than block synchronization (and produces smaller bytecode).
- Optimize after profiling the functional application, not before.
- Obfuscators can make class files smaller.
Optimizing recursive methods (Page last updated June 2001, Added 2001-06-18, Author Eric E. Allen). Tips:
- Try to convert recursive methods into tail-recursive methods.
- You can test if a particular JIT is able to convert tail-recursive into loops with a dummy tail-recursive method which never terminates. If the JVM crashes because of stack overflow, no conversion is done (if the conversion is managed, the JVM loops and never terminates).
- The HotSpot JVM with the 1.3 release does not convert tail-recursive methods into loops. The IBM JVM with the 1.3 release does.
Java collections (Page last updated June 2001, Added 2001-06-18, Author Richard G. Baldwin). Tips:
- Choose the right structure for the right job.
- ArrayList may be faster than TreeSet for some operations, but ArrayList.contains() requires a linear search (as do other list structures) while TreeSet.contains() is a simple hashed lookup, so the latter is much faster.
Computational planning and scheduling problem solving (not performance tuning) (Page last updated June 2001, Added 2001-06-18, Author Irvin Lustig). Tips:
- [Article introduces the solving of planning and scheduling problems in Java]
Last Updated: 2021-04-28
Copyright © 2000-2021 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us