Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Tips March 25th, 2002
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 016 contents
Microtuning (Page last updated March 2002, Added 2002-03-25, Author Jack Shirazi). Tips:
- Performance is dependent on data as well as code. Different data can make identical code perform very differently.
- Always start tuning with a baseline measurement.
- The System.currentTimeMillis() method is the most basic measuring tool for tuning.
- You may need to repeatedly call a method in order to reliably measure its average execution time.
- Minimize the possibility that CPU time will be allocated to anything other than the test while it is running by ensuring no other processes are runing during the test, and that the test remains in the foreground.
- Baseline measurements normally show some useful information, e.g. the average execution time for one call to a method.
- Multiplying the average time taken to execute a method or sequence of methods, by the number of times that sequence will be called in a time period, gives you an estimate of the fraction of the total time that the sequence takes.
- There are three routes to tuning a method: Consider unexpected differences in different test runs; Analyze the algorithm; Profile the method.
- Creating an exception is a costly procedure, because of filling in stack trace.
- A profiler should ideally be able to take a snapshot of performance between to arbitrary points.
- Tuning is an iterative process: you normally find one bottleneck, make changes that improve performance, test those changes, and then start again.
- Algorithm changes usually provide the best speedup, but can be difficult to find.
- Examining the code for the causes of the differences in speed between two variations of test runs can be useful, but is restricted to those tests for which you can devise alternatives that show significant timing variations.
- Profiling is always an option and almost always provides something that can be speeded up. But the law of diminishing returns kicks in after a while, leaving you with bottlenecks that are not worth speeding up, because the potential speedup is too small for the effort required.
- Generic integer parsing (as with the Integer constructors and methods) may be overkill for converting simple integer formats.
- Simple static methods are probably best left to be inlined by the JIT compiler rather than by hand.
- String.equals() is expensive if you are only testing for an empty string. It is quicker to test if the length of the string is 0.
- Set a target speedup to reach. With no target, tuning can carry on for much longer than is needed.
- A generic tuning procedure is: Identify the bottleneck; Set a performance target; Use representative data; Measure the baseline; Analyze the method; Test the change; Repeat.
JMS redelivery (Page last updated March 2002, Added 2002-03-25, Author Prakash Malani). Tips:
- Both auto mode (Session.AUTO_ACKNOWLEDGE) and duplicate delivery mode (Session.DUPS_OK_ACKNOWLEDGE) guarantee delivery of messages, but duplicate okay mode can have a higher throughput, at the cost of the occasionally duplicated message.
- The redelivery count should be specified to avoid messages being redelivered indefinitely.
Caching SOAP services (Page last updated March 2002, Added 2002-03-25, Author Ozakil Azim and Araf Karsh Hamid). Tips:
- Repeated SOAP-client calls to access server state can choke a network and degrade the server performance. Cache data on the client whenever possible to avoid requests to the server.
- Ensure the client data remains up to date by using a call to a server service which blocks until data is changed.
String concatenation, and IO performance. (Page last updated March 2002, Added 2002-03-25, Author Glen McCluskey). Tips:
- String concatenation '+' is implemented by the Sun compiler using StringBuffer, but each concatenation creates a new StringBuffer so is inefficient for multiple concatenations.
- Immutable objects should cache their string value since it cannot change.
- Operating systems can keep files in their own file cache in memory, and accessing such a memory-cached file is much faster than accessing from disk. Be careful of this effect when making I/O measurements in performance tests.
- Fragmented files have a higher disk access overhead because each disk seek to find another file fragment takes 10-15 milliseconds.
- Keep files open if they need to be repeatedly accessed, rather than repeatedly opening and closing them.
- Use buffering when accessing file contents.
- Explicit buffering (reading data into an array) gives you direct access to the array of data which lets you iterate over the elements more quickly than using a buffered wrapper class.
- Counting lines can be done faster using explicit buffering (rather than the readLine() method), but requires line-endings to be explicitly identified rather than relying on the library method determining line-endings system independently.
Sun community chat on EJBs with Pravin Tulachan (Page last updated March 2002, Added 2002-03-25, Author Edward Ort). Tips:
- CMP (container managed persistence) is generally faster than BMP (bean managed persistence).
- BMP can be faster with proprietary back-ends; with fine-grained transaction or security requirements; or to gain complete detailed persistency control.
- Scalability is improved by passing primary keys rather than passing the entities across the network.
- EJB 2.0 CMP is far faster than EJB 1.1 CMP. EJB 1.1 CMP was not necessarily capable of scaling to high transaction volumes.
- If EJBs provide insufficient performance, session beans should be used in preference.
- Don't make fine-grained method calls across the network. Use value object and session facade design patterns instead.
Double-if on multi-CPU (Page last updated February 2002, Added 2002-03-25, Author Phil Vickers). Tips:
- Double-if logic fails on multiple CPU machines. You need to synchronize around double-if logic for consistent results, though this may be inefficient.
Stateful to Stateless Bean (Page last updated February 2002, Added 2002-03-25, Author Brett McLaughlin). Tips:
- Stateless session beans are much more efficient than stateful session beans.
- Stateless session bean have no state. Most containers have pools of stateless beans. Each stateless bean instance can serve multiplw clients, so the bean pool can be kept small, and doesn't need to change in size avoiding the main pooling overheads.
- A separate stateful bean instance must exist for every client, making bean pools larger and more variable in size.
- [Article discusses how to move a stateful bean implementation to stateless bean implementtaion].
Alternatives to using 'new'. (Page last updated March 2002, Added 2002-03-25, Author Jonathan Amsterdam). Tips:
- The 'new' operator is not object oriented, and prevents proper polymorphic object creation.
- Constructors must be made non-public and preferably private to limit the number of objects of a class.
- The Singleton pattern and the Flyweight (object factory) pattern are useful to limit numbers of objects of various types and to assist with object reuse and reduce garbage collection.
- The real-time specification for Java allows 'new' to allocate objects in a 'current memory region', which may be other than the heap. Each such region is a type of MemoryArea, which can manage allocation.
- Using variables to provide access to limited numbers of objects is efficient, but a maintenance problem if you need to change the object access pattern, for example from a global singleton to a ThreadLocal Singleton.
- A non-static factory method is polymorphic and so provides many advantages over static factory methods.
- The Abstract Factory design pattern uses a single class to create more than one kind of object.
- An alternative to the Flyweight pattern is the Prototype pattern, which allows polymorphic copies of existing objects. The Object.clone() method signature provides support for the Prototype pattern.
- Prototypes are useful when object initialization is expensive, and you anticipate few variations on the initialization parameters. Then you could keep already-initialized objects in a table, and clone an existing object instead of expensively creating a new one from scratch.
- Immutable objects can be returned directly when using Prototyping, avoiding the copying overhead.
Tuning JVMs for servers. (Page last updated February 2002, Added 2002-03-25, Author Timothy Dyck). Tips:
- Multiple JVMs are often available for a particular platform. Choose the JVM that best suits your needs.
- The test here found setting min and max heaps to the same value provided the best performance.
- Limiting each Sun 1.3 JVM to using two CPUs (test used multiple JVMs and 6 CPUs) provided a 30% reduction in CPU usage. IBM JVMs did not require (or benefit from) this optimization.
Object Resource Pooling (Page last updated March 2002, Added 2002-03-25, Author Paul King). Tips:
- If the overhead associated with creating a sharable resource is expensive, that resource is a good candidate for pooling.
- Pooled objects create a resource in advance and store it away so it can be reused over-and-over.
- Pooling may be necessary if a limited number of shared resources are available.
- Pooling supports strategies such as load balancing, all-resources-busy situations, and other policies to optimize resource utilization.
- [Article discusses pooling characteristics].
- Load balancing is possible by varying how pooled objects are handed out.
- Pool size can be tuned using low-water and high-water marks.
- Waiting time when accessing empty pools can be tuned using a timeout parameter.
- Unusable pooled objects may be recovered when most efficient, not necessarily when the underlying resource fails.
- The Recycler pattern fixes only the broken parts of a failed object, to minimize the replacement cost.
Using NIO (Page last updated March 2002, Added 2002-03-25, Author Aruna Kalagnanam and Balu G.). Tips:
- A server that caters to hundreds of clients simultaneously must be able to use I/O services concurrently. Prior to 1.4, an almost one-to-one ratio of threads to clients made servers written in Java susceptible to enormous thread overhead, resulting in both performance problems and lack of scalability.
- The Reactor design pattern demultiplexes events and dispatches them to registered object handlers. (The Observer pattern is similar, but handles only a single source of events where the Reactor pattern handles multiple event sources).
- [Articles covers the changes needed to use java.nio to make a server efficiently muliplex non-blocking I/O from SDK 1.4].
J2EE best practices. (Page last updated February 2002, Added 2002-03-25, Author Chris Peltz). Tips:
- Executing a search against the database calls one of the methods. finder() methods must return a collection of remote interfaces, not ValueObjects. Consequently the client would need to make a separate remote call for each remote interface received, to acquire data. The SessionFacade pattern suggests using a session bean to encapsulate the query and return a collection of ValueObjects, thus making the request a single transfer each way.
- The Value Object Assembler pattern uses a Session EJB to aggregate all required data as various types of ValueObjects. This pattern is used to satisfy one or more queries a client might need to execute in order to display multiple data types.
MIDP GUI programming (Page last updated March 2002, Added 2002-03-25, Author Qusay Mahmoud). Tips:
- Applications with high screen performance needs, like games, need finer control over MIDP screens and should use the javax.microedition.lcdui package which provides the low-level API for handling such cases.
- Always check the drawing area dimensions using Canvas.getHeight() and Canvas.getWidth() [so that you don't draw unnecessarily off screen].
- Not all devices support color. Use Display.isColor() and Display.numColors( ) to determine color support and avoid color mapping [overheads].
- Double buffering is possible by using an offscreen Image the size of the screen. Creating the image:
i = Image.createImage(width, height); Getting the Graphics context for drawing:
i.getGraphics(); Copying to the screen
g.drawImage(i, 0, 0, 0);
- Check with Canvas.isDoubleBuffered(), and don't double-buffer if the MIDP implementation already does it for you.
- To avoid deadlock paint() should not synchronize on any object already locked when serviceRepaints() is called.
- Entering alphanumeric data through a handheld device can be tedious. If possible, provide a list of choices from which the user can select.
The Eight Fallacies of Distributed Computing (Page last updated 2000, Added 2002-03-25, Author Peter Deutsch). Tips:
- The network can fail to deliver at any time.
- Latency is significant.
- Bandwidth is always limited.
Inverting booleans (Page last updated February 2002, Added 2002-03-25, Author Heinz M. Kabutz). Tips:
- The fastest way to invert a boolean is to XOR it (bool ^= true).
- Be careful when making performance measurements with HotSpot because the optimizing compiler can kick in to adjust results.
The Proxy design pattern. (Page last updated February 2002, Added 2002-03-25, Author David Geary). Tips:
- Creating images is expensive.
- ImageIcon instances create their images when they are constructed.
- If an application creates many large images at once, it could cause a significant performance hit.
- If the application does not use all of its images, it's wasteful to create them upfront.
- Using a proxy, you can delay image loading until the image is required.
- The Proxy pattern often instantiates its real object, the Decorator pattern (which can also use proxy objects) rarely does.
- The java.lang.reflect package provides three classes to support the Proxy and Decorator patterns: Proxy, Method, and InvocationHandler.
Java Transaction Service (Page last updated March 2002, Added 2002-03-25, Author Brian Goetz). Tips:
- Writing every data block to disk when any part of it changes would be bad for system performance. Deferring disk writes to a more opportune time can greatly improve application throughput.
- Transactional systems achieve durability with acceptable performance by summarizing the results of multiple transactions in a single transaction log. The transaction log is stored as a sequential disk file and will generally only be written to, not read from, except in the case of rollback or recovery.
- Writing an update record to a transaction log requires less total data to be written to disk (only the data that has changed needs to be written) and fewer disk seeks.
- Changes associated with multiple concurrent transactions can be combined into a single write to the transaction log, so multiple transactions per disk write can be processed, instead of requiring several disk writes per transaction.
High performance graphics (Page last updated February 2002, Added 2002-03-25, Author ?). Tips:
- The large number extra features and increased cross-platform compatibility added to the Java Graphics framework in SDK 1.2 made the graphics slower than the 1.1 Graphics. SDK 1.4 targeted these performance issues head on.
- VolatileImage allows you to create hardware-accelerated offscreen images, resulting in better performance of Swing and gaming applications in particular and faster offscreen rendering.
- When filling a shape with a complex paint, Java 2D must query the Paint object every time it needs to assign a color to a pixel whereas a simple color fill only requires iterating through the pixels and assigning the same color to all of them.
- The graphics pipeline (from SDK 1.4) only gets invalidated when an attribute is changed to a different type of value, rather than when an attribute is changed to a different value of the same type. For example rendering one opaque color is the same rendering another opaque color, so this would not invalidate the pipeline. But changing an opaque color to a transparent color would invalidate the pipeline.
- Smaller font is rendered faster than larger font.
- Hardware-accelerated scaling is currently (1.4.0 release) disabled on Win32 because of quality problems, but you can enable it with a runtime flag, -Dsun.java2d.ddscale=true.
- From SDK 1.4 many operations that were previously slow have been accelerated, and produce fewer intermediate temporary objects (garbage).
- Alpha blending and anti aliasing adversely affect performance.
- Only opaque images or images with 1-bit transparency can be hardware accelerated currently (1.4.0).
- Use 1-bit transparency to make the background color of a sprite rectangle transparent so that the character rendered in the sprite appears to move through the landscape of your game, rather than within the sprite box.
- Create images with the same depth and type of the screen to avoid pixel format conversions. Use either Component.createImage() or GraphicsConfiguration.createCompatibleImage(), or use a BufferedImage created with the ColorModel of the screen.
- Rectangular fills--including horizontal and vertical lines--tend to perform better than arbitrary or non-rectangular shapes whether they are rendered in software or with hardware acceleration.
- If your application must repeatedly render non-rectangular shapes, draw the shapes into 1-bit transparency images and copy the images as needed.
- If you experience low frame rates, try commenting out pieces of your code to find the particular operations that are causing problems, and replace these problem operations with something that might perform better.
- Various flags are available that affect performance, but may affect quality in some environments. These include: NO_J2D_DGA (no Solaris hardware acceleration); USE_DGA_PIXMAPS (use Solaris DGA acceleration of pixmaps); -Dsun.java2d.noddraw=true (turn off DirectDraw); -Dsun.java2d.ddoffscreen=false (disable DirectDraw offscreen acceleration); -Dsun.java2d.ddscale=true (enable hardware acceleration in Win32); -Dsun.java2d.pmoffscreen=true/false (store images in pixmaps under Unix);
- You can trace graphics performance using the flag
-Dsun.java2d.trace=<optionname>,<optionname>,... where the options are
log (print primitives on execution);
timestamp (timestamp log entries);
count (print total calls of each primitive used);
out:<filename> (send logs to filename);
Minimizing bytecode size for J2ME (Page last updated February 2002, Added 2002-03-25, Author Eric Giguere). Tips:
- Eliminate unnecessary features.
- Avoid inner classes: make the main class implement the required Listener interfaces and handle the callbacks there.
- Use built-in classes if functionality is close enough, and work around their limitations.
- Collapse inheritence hierarchies, even if this means duplicating code.
- Shorten all names (packages, classes, methods, data variables). Some obfuscators can do this automatically. MIDP applications are completely self-contained, so you can use the default package with no possible name-clash.
- Convert array initialization from code to extract data from a binary string or data file. Array initialization generates many bytecodes as each element is separately initialized.
GC performance tuning (Page last updated February 2002, Added 2002-03-25, Author Alka Gupta and Michael Doyle). Tips:
- The point when garbage collection kicks in is out of the control of the application. This can cause a sequential overhead on the application, as the garbage collector suspends all application threads when it runs, causing inconsistent and unacceptable application pauses, leading to high latency and decreased application efficiency.
- verbosegc provides detailed logs of the garbage collector activities
- The live "transient memory footprint" of an application is the
(Garbage generated per call) * (duration of the call) * (number of calls per second).
- GC pause time caused by two-space collection of short-lived objects is directly proportional to the size of the memory space allocated to holding short-lived objects. But smaller available space can mean more frequent GCs.
- Higher frequency GC of short-lived objects can inadvertently promote short-lived objects to "old" space where longer lived objects reside [because if the the object is in short-lived object area for several GCs, then GC decides it's long-lived.] The promoteAll option will force the GC to assume that any object surviving GC of young space is long-lived, and is immediately promoted to old space..
- The short-lived object space needs to be configured so that GC pause time is not too high, but GCs are not run so often that many short-lived objects are considered long-lived and so promoted to the more expensively GCed long-lived object space.
- The long-lived object space needs to be large enough to avoid an out-of-memory error, but not so high that a full GC of old space pauses the JVM for too long.
- [Article covers 1.2 and 1.3 GC memory space models].
- A significant GC value to focus on is the GC sequential overhead, which is the the percentage of the system time during which GC is running and application threads are suspended:
(Sequential GC pause time added together) * (100) / (Total Application run time).
- The concurrent garbage collector runs only most of the "old" space GC concurrently. Some of the "old" space GC and all the "young" space GC is sequential.
- GC activity can take hours to settle down to its final pattern. Fragmentation of old space can cause GC times to degrade, and it may take a long time for the old space to become sufficiently fragmented to show this behavior.
- GC options which can reduce fragmentation (such as bestFitFirst).
- The promoteAll option produced a significant improvement in performance [which I find curious].
Back to newsletter 016 contents
Last Updated: 2017-10-01
Copyright © 2000-2017 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us