Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Tips November 26th 2002
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 024 contents
WebSphere performance testing (Page last updated November 2002, Added 2002-11-27, Author Alexandre Polozoff, Publisher IBM). Tips:
- Performance testing is the only way to determine the optimal settings (for JVM, connection pooling, etc.)
- The test environment should come as close to the production environment as possible. At the very least, the two environments should have the same machine and OS level configurations.
- Common test environment mistakes include putting one or the other application server on a different OS patch or fixpack level, or on different memory configurations, resulting in inconsistent results and/or behavior. Doublecheck that the TCP/IP stack settings are identical to each other as well, particularly the duplex settings on the NIC cards.
- If HTTP Session persistence is enabled, make sure that the Sessions table is isolated from other databases and marked as VOLATILE.
- Testing should be done in isolation from other activity, to minimize unexpected competition for resources from non-application sources.
- Problems of shared test environments include: someone else using a large portion of network capacity, e.g. from a network backup; and other applications utilizing or changing backend resources.
- When the application is running at full capacity, introducing too many requests actually decreases performance and throughput so at that point requests should be queued at the front entry point (called gating).
- Tune the maximum number of servlet engine threads by analyzing performance test results. There is no valid "general" value to set this to (the WebSphere default of 25 is normally too low for high volume applications).
- Tune the maximum Connection Pool Size by monitoring the size and adjusting it in different tests. The generally accepted maximum value for data source connections, even in high volume installations, is 40, with the typical application somewhere between 10 and 20.
- Correlate expected test results. If there are maximum 75 servlet engine threadsand the servlet response time is sub-second, then you would expect to see at least 75 requests per second. You would also not expect to see 8-second response times on the client side. into account the number of HTTP connections for static data at the same time. Make sure that the test results match what you are
- Make sure that the load test client is not running hot (100% CPU), low on memory or that some network bottleneck has been encountered.
- The load test client must be capable of generating the appropriate type of request needed for loading the application in question.
- Always start performance testing by recording a base line set of results.
- Ensure you are monitoring the application during load testing. Compare measurements with previous one.
- Monitor as many resources as possible, including CPU, ports in established state, bandwidth. Some key JVM parameters to monitor are: the number of active servlet engine threads; the number of active ORB threads for applications with EJBs; the amount of free and used memory, and the number and duration of garbage collection cycles; the servlet response times.
- Resolve incorrect configurations first (e.g. mis-configured firewalls, reverse proxies, throughput set to half duplex instead of full duplex, routing taking different hops to/from the same set of devices, firewall set up for proxy instead of passthrough).
- Specify performance expectations: servlet response time; client response time; requests per second throughput; etc.
- Plan to fix problems. Note that the two generic solutions have costs: throwing more hardware at the problem can be expensive; fixing the application bottlenecks can take a long time.
- The performance test phase of the development lifecycle can easily take several months to complete, even if the application has few problem issues. Testing early and often within the performance test environment is strongly recommended.
- Waiting until the very end of the development lifecycle to begin load testing is probably the worst performance testing scenario.
- [Article describes some testing procedures for various generic server configurations]
- Adjust the JVM heap size settings in reasonable increments to determine the optimal memory settings for the application. Make sure that the JVM heap size settings are within the physical memory limits of the machine, including all other process memory requirements.
- For each set of JVM heap size settings, run one test with garbage collection turned on, and another with it turned off.
- Monitor the garbage collection cycles by watching how the free and used memory is utilized by the JVM.
- Test the application with a variety of minimum and maximum servlet thread pool sizes to determine which settings move as much work as possible through the application.
- Once the CPU utilization of the application approaches 80% you will be hitting against the limits of the CPU.
- Once CPU utilization has reached saturation, increased load only increases response time.
- Tune the size of the ORB thread pool by monitoring the thread pool and EJB activity/response times.
- The client simulation should accurately reflect expected user group usage patterns.
- An application suffering from bottlenecks or excessive synchronization typically exhibits poor response times and low CPU utilization.
Efficient MIDP Programming (Page last updated June 2002, Added 2002-11-27, Author Forum Nokia, Publisher Nokia). Tips:
- Don't try to make all of your code efficient, find the bottlenecks and focus on making them more efficient.
- Careful design and algorithm choice yield greater benefits than line-by-line code optimizations.
- Get to know the performance of the libraries (especially graphics libraries), and choose carefully how you call them.
- Different phones? MIDP implementations vary significantly in their performance characteristics, sometimes even between versions of the same model.
- The best-performing approach on one phone may not be the best on another.
- Profiling a MIDlet running in an emulator may not tell you much, as emulators can have very different bottlenecks from actual phones.
- You normally can't run a profiler on a MIDlet running in a phone. Explicitly determine timings by adding calls to System.currentTimeMillis(). Make sure to check the resolution of the phone's system clock, and adjust the test to be sufficiently longer than the minimum clock resolution.
- Call System.gc() before starting a test.
- If only a small part of the screen needs to change, you should request a repaint using the method Canvas.repaint(int x, int y, int width, int height)
- If you issue repaint requests faster than the device can process them, it may merge several requests into one by calling the paint method with a clip rectangle covering all their rectangles; if the rectangles are widely spaced this will include much area that doesn?t need repainting.
- Use an off-screen image if your screen changes only slightly between repaints, then copy the area specified by the Graphics parameter's clip rectangle.
- Avoid creating unnecessary garbage objects on the heap. Often it is easy to reuse existing objects instead.
- A MIDP virtual machine can comfortably garbage-collect thousands of objects per second.
- Java threading is not guaranteed to be pre-emptive, but may be cooperative. Your code should not wait for a condition in a tight loop, but should call yield or wait every time around the loop.
- Both bandwidth and latency have average values and variations. Even if the average value is acceptable, if the variation is large, the user will frequently experience unacceptable values.
- For large amounts of data, bandwidth usually has the most effect on networking speed. For small amounts of data, it is often latency that is more important.
- For current phones long latency rules out the possibility of highly interactive real-time multi-player arcade games, as you can't see and respond to other players' actions in real time.
- Use threading to execute network communications in the background without blocking the interface, wherever appropriate.
- Minimize network round trips by trying to get everything required in one remote request, possibly using a proxy servlet.
- SOAP can be very inefficient. Design XML-based protocol to be as simple as possible. A custom protocl is likely to be more even more efficient.
- Keep the JAR file size as small as possible: have as few classes as possible; avoid interfaces; use the unnamed package; use a pre-processor instead of static final constants; limit the use of static initializers; use an obfuscator.
- Cut-and-paste reuse rather than library calls can help to make the JAR smaller.
- Keep resources (such as PNG) as small as possible. Different tools give different compression factors.
- Combine image files into one image and extract images at runtime.
- Release unneeded screens (e.g. splash screens) for garbage collection.
- Delay creation of rarely used screens (e.g. Help, Options) until needed, and release them as soon as possible.
- Set references to null when they are no longer needed.
- Design the MIDlet to avoid network communications unless absolutely necessary, e.g. only send packets when something changes.
- If you do something that takes a long time, show a visible and animated indicator.
- Keep the user-interface responsive. Make sure that your event call-backs (e.g., Canvas.keyPressed or CommandListener.commandAction) return quickly.
- Make sure that there is a visible (or audible) reaction to each key press.
- Hide unavoidable delays with some other activity.
1.4 HotSpot GC (Page last updated November 2002, Added 2002-11-27, Author Alka Gupta and Michael Doyle, Publisher Sun). Tips:
- HotSpot heap is split into Eden (where new objects are created), two Survivor spaces (the three spaces collectively are the young generation space), and an old generation space. The young generation was collected using a copying collector (copy live objects from one space to another, everything left can be reclaimed), optimal for short-lived objects. The old generation used a mark-compact collector (mark all live objects, reclaim what's left then compact the resulting fragmented space).
- Before 1.4.1, GC was single-threaded and stop-the-world in nature. This could cause long pauses in application activity, even on multi-processor systems.
- 1.4.1 includes a new young generation GC algorithm, parallel GC, enabled using -XX:+UseParNewGC (or -XX:+UseParallelGC for applications with very large young generation heaps and no concurrent mark-sweep GC). Parallel GC is essentially the same as previously but the stop-the-world phase of the young generation collection is multi-threaded, so the GC can complete more quickly on multi-process machines (as the GC algorithm is itself mostly parallelizable). The -XX:ParallelGCThreads=n flag allows the number of GC threads to be explcictly specified, the default is one thread per processor.
- 1.4.1 includes a new old generation GC algorithm, concurrent mark-sweep GC, enabled using -XX:+UseConcMarkSweepGC. Concurrent mark-sweep GC changes how the old generation is collected to minimize the amount of time that the other non-GC threads are suspended. The -XX:CMSInitiatingOccupancyFraction=x flag specifies how full the old generation can get before this GC kicks in, and the parameter should be tuned for the application.
- The flag -XX:MaxTenuringThreshold=n specifies how many times (generations) the objects in the young generation are copied before being moved to the old generation.
- The flag -XX:TargetSurvivorRatio=n specifies how full survivir space gets before moving objects to the old generation.
- The following flags specify information printed out to stderr during GC: -verbose:gc (turns on GC logging); -Xloggc=filename (moves logging to filename instead of stderr); -XX:+PrintGCTimeStamps (timestamps GC log entries); -XX:+PrintGCDetails (extra GC info logged); -XX:+PrintTenuringDistribution (even more extra GC info logged).
- [Article gives an example of interpreting the output from 1.4.1 GC logging].
- [Article provides and describes a tool which can mine the 1.4.1 GC log data].
- Use the alternate thread library available from Solaris 8 by setting the LD_LIBRARY_PATH=/usr/lib/lwp:/usr/lib on Solaris 8 (the default on Solaris 9).
- Use prstat -Lm -p <jvm process id> to analyze the resource usage of a process on a per light-weight-process.
- To get a full thread dump of a running Java application, send a SIGQUIT signal to the JVM process (e.g. kill -QUIT <JVM process pid>).
- Use the -Xrunhprof command line flag available in the JVM to help identify unnecessary object retention (sometimes imprecisely called "memory leaks").
- Minimize the number of times the DatagramSocket.connect(InetAddress address, int port) and DatagramSocket.disconnect() methods are called.
- [Article describes tuning GC for a particular application].
User Interface Design (Page last updated November 2002, Added 2002-11-27, Author Mauro Marinilli, Publisher Developer.com). Tips:
- Focus on the needs of end users. Know your user.
- Slow response is a cause of error and user frustration in using the application.
- If somebody is used to having a task completed in a given amount of time, both excessive completion time or too short a time can confuse the user.
- Short response times help the user to explore the UI more easily.
- Operations that can be processed using only a person's short-term-memory (STM) are easier and faster to solve than those that require long-term-memory or some external cognitive help. STM holds 5 to 9 items, each of which holds information that lasts 15-30 seconds. STM is assisted by the user: feeling at ease with the application; having a reassuringly predictable idea of how it works; without the fear of making catastrophic operations; without feeling compelled by the system.
- Following standard designs leverages the knowledge users have gained from other applications.
- Minimize the set tasks the user needs to carry out to complete their activity.
The final keyword (Page last updated October 2002, Added 2002-11-27, Author Brian Goetz, Publisher IBM). Tips:
- Don't declare things
final for performance reasons, until as late as possible.
- Declaring methods
final does NOT allow the compiler to inline that method. JITs can inline methods, whether or not they are
final, so there is usually no performance benefit to declaring methods
- If you want to optimize your code, stick to optimizations that will make a big difference, like using efficient algorithms and not performing redundant calculations -- and leave the cycle-counting optimizations to the compiler and JVM.
- Unlike with final methods, declaring a final field helps the optimizer make better optimization decisions, because if the compiler knows the field's value will not change, it can safely cache the value in a register.
J2ME benchmarking (Page last updated October 2002, Added 2002-11-27, Author Wang Yi, C.J. Reddy, and Gavin Ang, Publisher JavaWorld). Tips:
- J2ME benchmarking should focus on the total user experience
- The benchmarks presented in the article can help determine which part of a J2ME application is a bottleneck
- Most J2ME devices seem to have similar graphics painting performance.
- Avoid creating multiple LCDUI objects.
- XML parsing is resource instensive, and even a simple XML file needs seconds to parse.
Multiple layers of dynamic proxies (Page last updated November 2002, Added 2002-11-27, Author crazybob, Publisher crazybob.org). Tips:
- By specializing a Proxy to test for handing off method invocation to another proxy, you can wrap proxies outside other proxies with negligible additional overheads.
Handling unexpected thread death (Page last updated November 2002, Added 2002-11-27, Author Roy M. Pueschel, Publisher JavaWorld). Tips:
- Create one (or two) watcher thread(s) to restart or log thread death for thread services when they could unexpectedly terminate.
Nybbles of Development Wisdom (Page last updated October 2002, Added 2002-11-27, Author Terence Parr, Publisher ?). Tips:
- Don't worry about writing super efficient code until you know there is or will be a speed problem. Use a profiler to know rather than deduce where the inefficient hot spots are.
- Do expensive operations either up front or in the background (load data, snoop or search other sites, sort, ...).
- Use memory if you have it. If your sizeof(database) < sizeof(RAM), cache the whole damn thing.
- Cache pages that don't change or change infrequently to reduce server load.
Capacity Planning (Page last updated November 2002, Added 2002-11-27, Author Arunabh Hazarika and Srikant Subramaniam, Publisher BEA). Tips:
- Capacity planning is achieved by measuring or estimating the number of requests the server processes,calculating the demand each request places on the server resources, then using this data to calculate the computing resources (CPU, RAM, disk space, and network bandwidth) necessary to support current and future usage levels.
- When traffic increases on a major Web site that isn't adequately equipped to handle the surge, response time deteriorates significantly.
- Studies have shown that if a site's response time is more than 10 seconds, users tend to leave.
- SSL is a very computing-intensive technology and the overhead of cryptography can significantly decrease the number of simultaneous connections that a system can support. Typically, for every SSL connection the server can support, it can handle up to three non-SSL connections.
- Typically, a good application will require a database three to four times more powerful than the application server hardware.
- Increasing user numbers with no increase in CPU comsumption usually indicates a bottleneck exists.
- Additional processes running on the same machine can significantly affect the capacity (and performance) of the application server. The database and Web servers are two popular choices for hosting on a separate machine.
- User behaviour is unpredictable. When estimating the peak load, it's advisable to plan for demand spikes and focus on the worst-case scenario.
- If the response time doesn't improve after adding servers to a cluster, and the Web server machine shows a CPU usage of over 95%, consider clustering the Web server or running it on more powerful hardware.
- It's essential to optimize the application by eliminating or reducing the hot spots and considering the working set/concurrency issues.
- In a fully loaded tuned system, the CPU utilization is usually in the 90-95% range. Throughput won't increase with the addition of more load, but response times will increase as more clients are added. The throughput at this point determines the capacity of the hardware.
- During load testing it's essential to use a transaction scenario that closely resembles the real-world conditions the application deployment will be subjected to.
Socket connection timeouts (Page last updated November 2002, Added 2002-11-27, Author John Zukowski, Publisher Sun). Tips:
- From 1.4, the connect call allows an additional timeout parameter that can timeout connection attempts. (This differs from the setSoTimeout method which sets the timeout for read() calls only).
WebLogic Server Performance Tuning (Page last updated October 2002, Added 2002-11-27, Author Arunabh Hazarika & Srikant Subramaniam, Publisher BEA). Tips:
- The first step in performance tuning is isolating the hot spots, determining which system component is contributing to the performance problem.
- The most expensive operations that the EJB container executes are probably database calls to load and store entity beans. Some cases reduce database calls: read-only beans only require one load; unmodified beans don't need to execute the store operation.
- Pooling (e.g. beans and connections) and caching are important performance improving features of application servers, and should be used and tuned.
- Disabling synchronous JMS writes can improve performance.
- The JMS message acknowledgement interval should be tuned: too small may slow message processing; too large may cause lost or duplicate messages.
- A single JMS server is adequate until scalability peaks, then multiple servers should be used.
- Asynchronous JMS consumers typically scale and perform better than synchronous consumers.
- JSP cache tags can be used to cache data in a JSP page.
- It is advisable not to store too much information in the HTTP session.
- In JVMs that organize the heap into generational spaces, the new space or the nursery should typically be set to between a third and half of the total heap size.
Fast authentication (Page last updated November 2002, Added 2002-11-27, Author Larry Ashworth, Publisher ZeroPoint). Tips:
- Cookie based authentication is simple and quick. Dynamically rewriting links to include a session id can impose a high load on the server.
- The best time to check for expired sessions is during the authentication process.
- Heavy-weight object creations (i.e. SimpleDateFormat) should only occur once if possible, during your authentication class's initialization process.
- Use StringBuffer.append() and not String concatenation.
- Use object methods that empty an existing object rather than recreating the object. String arrays are an exception
- Don't create an object within a looping structure if you can help it.
- Larger database table sizes decrease performance. Cache data where possible.
Speeding JVM startup (Page last updated November 2002, Added 2002-11-27, Author ?, Publisher IBM). Tips:
- Disabling JIT compilation improves JVM startup time.
Thread pooling (Page last updated November 2002, Added 2002-11-27, Author Vishal Goenka, Publisher JDJ). Tips:
- Each threads has a memory overhead, and also adds scheduling overhead. Thread pooling limits these overheads.
- Thread creation also has an overhead that can be higher than the overhead of managing a thread pool. However in the latest JVMs thread creation is much cheaper than previous JVMs.
- Thread pooling has a number of drawbacks: ThreadLocals are not as useful; the pool can be fully used stalling further requests from being processed; managing a pool can be more expensive than simply creating and discarding threads on demand.
HTTP sessions vs. stateful EJB (Page last updated October 2002, Added 2002-11-27, Author Peter Zadrozny, Publisher BEA). Tips:
- The comparative costs of storing data in an HTTP session object are roughly the same as storing the same data in a stateful session bean.
- Failure to remove an EJB that should have been removed (from the HTTP session) carries a very high performance price: the EJB will be passivated which is a very expensive operation.
HashSet, LinkedHashSet, and TreeSet (Page last updated November 2002, Added 2002-11-27, Author Glen McCluskey, Publisher Sun). Tips:
- HashSet is faster than LinkedHashSet which is in turn faster than TreeSet for access/updates (iteration speed has a different order).
- Iteration over a LinkedHashSet is generally faster than iteration over a HashSet.
- Tree-based data structures (like TressSet) get slower as the number of elements get larger.
WebSphere performance tips (Page last updated October 2002, Added 2002-11-27, Author Arvind Shukla, Publisher ). Tips:
- Deliver compressed documents where possible.
- Use the document expiration settings to let the client browser retrieve documents from its own cache.
- Move static content from the application server to the web server.
The effect of Slow download speeds (Page last updated October 2002, Added 2002-11-27, Author Steve Sampsell, Publisher EurekaAlert). Tips:
- Pages downloaded slower can sometimes increase user interest. [Probably when the wait enhances interest in some way - the example given was an erotic image].
Back to newsletter 024 contents
Last Updated: 2017-11-28
Copyright © 2000-2017 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us