Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips January 29th, 2003
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 026 contents
Scaling Server Performance (Page last updated January 2003, Added 2002-12-29, Author Brian Neal, Publisher AcesHardware). Tips:
- Java servlet based site served 250,000 page visits in a day (average of 3 visits per second), each page required an average of 2.36 requests to be handled by the server. Peak rate was 677 requests per minute (11.28/second). The Java Web application used only 20% of CPU AT PEAK TIME! (database took another 7.5% CPU).
- Cached content is key to excellent performance. Caching pages (and page fragments) in-memory can easily provide an order of magnitude speedup.
- For optimal performance, you need to isolate the bottlenecks in your software and eliminate them.
- Keepalive means that each HTTP connection must wait idle for a certain period of time, perhaps 10 or 15 seconds. With Keepalive, a large number of concurrent requests from unique clients can result in a large number of idle HTTP connections waiting to timeout, potentially depleting resources.
- You can mitigate performance issues by throwing hardware at them.
- If database performance is lacking, even solid state disks (SSDs) can't outperform a system that doesn't have to run the query in the first place.
- Serving compressed pages to browsers that can handle them alleviates bandwidth problems.
- Serving static content is the least demanding workload a modern day HTTP server will encounter.
Implementing High Availability for WebSphere Application Servers (Page last updated November 2002, Added 2002-12-29, Author John Lamb, Michael Laskey, Publisher e-ProMag.com). Tips:
- For most Web servers, high availability (i.e., system accessibility on as close to a 24/7 basis as possible) is of paramount importance.
- Reliable hardware and software provide the base level of availability.
- Advanced features such as RAID devices enhance availability.
- Fault-tolerant systems ensure the constant availability of the entire system but at a higher cost.
- OS/hardware clustering provides high availability but not necessarily load balancing.
- If the failover system uses the same disk as the main system it is important to have a mirrored (RAID 1) shared disk to use for OS clustering.
- Application code should be prepared to rollover to an alternative database server if the current server stops providing data.
- Web sites that require massive scalability need IP Sprayers which provide for load distribution and failover for incoming HTTP requests.
- [Articles details the configuration and architecture of an actual website which is highly available, scalable, fast and reliable].
- Use: the application server failover features; OS clustering; Web/IP sprayer; RAID disks; SMP; redundant power supplies; Round robin DNS;
Optimizing the performance of JAWS Webserver (Page last updated 1999, Added 2002-12-29, Author James C. Hu, Irfan Pyarali, Douglas C. Schmidt, Publisher Distributed Object-Oriented Systems). Tips:
- Caching disk data in-memory is the primary improvement possible for webservers. The overhead of disk I/O is the primary factor in webserver performance, but this can be reduced to a negligible constant factor via memory caches.
- After disk I/O the primary determinants of Web server performance are its concurrency and event dispatching strategies.
- Asynchronous event dispatching can increase server throughput by reducing the context switching and synchronization overhead incurred from multi-threading by reducing the number of threading resources required to handle client requests concurrently.
- Request dispatching occupies a large portion (50%) of non-I/O related Web server overhead. The choice of concurrency strategy, such as thread/process pool vs. thread/process per-request, and dispatching strategy, such as asynchronous vs. synchronous, has a major impact on performance.
- Synchronous Event Dispatching uses one thread to accept client connections and hands off each resulting client socket to one other processing thread for request handling. Asynchronous Event Dispatching multiplexes all client connection I/O through the same I/O subsystem (usually consisting of one or two threads), with processing threads handling only non I/O tasks, i.e. request processing only excluding low-level reads and writes.
- The time required to create a new thread is an order of magnitude smaller (faster) than the time required to create a new system process (on Windows NT)
- A synchronous thread pool server strategy is useful for bounding the number of OS resources consumed by a Web server. Each thread waits in accept(), and the operating system selects one thread to unblock when a client connection request is received. Then the thread synchronously processes that client connection, returing to accept() when request handling is complete. If there are more pending requests than pooled threads, connection requests are queued in the TCP/IP stack preventing the server from being overloaded (of course response times degrade from the client's point of view since that response time includes time spent in the queue).
- An asynchronous thread pool server strategy consists of having threads wait on completion ports (equivalent to the Selector from the NIO packages). The operating system selects one thread to handle any non-blocking request (accept/read/write). [In Java you'd probably do it slightly differently, with one thread executing the Selector in one thread and explicitly handing off requests to other threads].
- Dedicating a thread for each client connection provides good support for prioritization of client requests, allowing differentiation of quality of service. However if certain connections receive considerably more requests than others, they can become a performance bottleneck.
- Memory mapping files can boost their transmission speed. However under heavy server loads the extra kernel load produced by using I/O with memory mapped files may degrade performance.
Sorting (Page last updated December 2002, Added 2002-12-29, Author Alex Blewitt, Publisher JavaWorld). Tips:
- When sorting large amounts of data extracted from a database, it is usually more efficient to let the database perform the sort using the ORDER BY clause, rather than to extract the data unordered and sort in Java.
Make Object Pooling Simple (Page last updated November 2002, Added 2002-12-29, Author Karthik Rangaraju, Publisher JavaPro). Tips:
- Object pooling is one efficient way to manage access to a finite set of objects among competing clients.
- By limiting object access to only the period when the client requires it, you can free resources for use by other clients. Increasing utilization through pooling usually increases system performance.
- Reuse objects to minimize costly initializations (pooling objects is often associated with such reuse).
Servlet Best Practices 1 (Page last updated December 2002, Added 2002-12-29, Author Jason Hunter, Publisher OnJava). Tips:
- OutputStream has less overhead than PrintWriter, so use it for characters as well as bytes, except when there's a charset mismatch between the stored encoding and the required encoding.
- Use ResultSet getAsciiStream( ) method instead of getCharacterStream( ) to avoid conversion overhead for ASCII strings.
- You can pre-encode static String contents with String.getBytes( ) so that they're encoded only once.
- Don't Use SingleThreadModel.
Servlet Best Practices 2 (Page last updated January 2003, Added 2002-12-29, Author Jason Hunter, Publisher OnJava). Tips:
- Pre-generate pages as much as possible offline to reduce online overheads.
- Cache as much as possible.
- Use the Last-Modified HTTP header and the associated If-Modified-Since header to enable use of the browser cache.
Designing Distributed Systems (Page last updated October 2002, Added 2002-12-29, Author Bill Venners, Publisher Artima.com). Tips:
- When designing for distributed systems, failure should be your number one concern.
- Local/remote transparency is misleading, there is no real transparency. Local and remote are fundamentally different.
- Simplicity is one key solution. The more things you do, the more you have to think about recovering from failure modes.
- Transactions are one way to deal with potential failure, but transactions add overhead.
- Idempotency, reinvoking remote method calls, should be harmless (e.g. method call has ID attached, so a call either invokes the method or is ignired if it has already been invoked).
- Latency is important for distributed systems. Batching (combining) multiple requests into one request is one way of improving latency dependencies.
- Stateless interactions are much more efficient than stateful interactions because stateless interactions reduce the amount of work required to recover from failure.
New IO API (Page last updated December 2002, Added 2002-12-29, Author Todd Stewart, Publisher OCIWeb). Tips:
- The NIO packages enable write high performance, scaleable socket applications, and improved performance when reading and writing data to and from files.
- High-performance server applications which manipulate Strings tend to create a lot of garbage causing significant overhead. Buffers are fixed size and can significantly reduce the amount of garbage produced
- Direct Buffers can use operating systems native I/O mechanisms, which are very efficient at moving and copying data
- Direct buffers have higher allocation and de-allocation costs than non-direct Buffers. Use direct buffers for large, long-lived buffers.
- FileChannel allows a file to be mapped into memory, which may provide better performance than the typical read or write methods.
- The FileChannel allows bytes to be transferred from one file to another file (via its channel). Many operating systems can perform this in a very efficient manner by transferring directly from the file system cache.
- Multiplexed non-blocking I/O, possible with selectable channels and selectors is more scaleable and efficient than the standard thread-oriented, blocking I/O (one socket per thread).
- [Article runs through a pattern matching example which shows no performance improvement from using NIO I/O opertions, and even shows a significant performance decrease when memory mapping the file].
- Memory mapped files are unlikely to improve performance of sequentially accessed, line-based data.
- Memory mapped files can be much more efficient when dealing with large binary data sets or randomly accessed data.
Simulate discrete simultaneous events (Page last updated December 2002, Added 2002-12-29, Author David Mertz, Publisher IBM). Tips:
- [Article describes a Python package that lets you simulate systems].
- Simulations and tests help you determine the limits and bottlenecks of a system.
- Determine what it is that you need to optimize before starting to optimize.
Designing and Implementing J2EE Clients (Page last updated June 2002, Added 2002-12-29, Author Mark Johnson, Inderjeet Singh, Beth Stearns, Publisher informIT). Tips:
- From a user's point of view, the client is the application. It must be useful, usable, and responsive.
- Always keep in mind that the client depends on the network, and the network is imperfect: latency is non-zero; bandwidth is finite; and the network is not always reliable.
- The ideal client connects to the server only when it has to, transmits only as much data as it needs to, and works reasonably well when it cannot reach the server.
- Every client platform's capabilities influence an application's design. For example, a browser client cannot generate most types of graphics, the server must generate such graphics. A programmable client, on the other hand, could generate such graphics, reducing server and network load.
- Typical client responsibilities include: presenting the user interface; validating user inputs; communicating with the server; managing conversational state. The more responsibilities you place on the client, the more responsive it can be.
- A cost of using browser clients is potentially low responsiveness. The server handles logic, so many remote connections can be required which is a problem when latency is high or bandwidth capacity is reached.
- Client-side validation is an optimization to improve user experience and decrease load, but you should re-validate on the server: never rely on the client exclusively to enforce data consistency.
- Full programmability enables Java clients to be potentially much more responsive than browser clients. Java clients can use less bandwidth and fewer remote calls to manage the same activity as a browser.
- Binary messages consume little bandwidth, an attractive feature in low-bandwidth environments.
- Java clients (as opposed to browser clients) have the ability to work while disconnected, which is beneficial when latency is high or when each connection consumes significant bandwidth.
- The primary consideration throughout the design of a J2EE client should be the network.
Intro to MicroJava Game creation (Page last updated December 2002, Added 2002-12-29, Author David Fox, Publisher OnJava). Tips:
- MIDP 2.0 includes better support for audio, animations, sprites, tiled backgrounds, transparency, and better graphics.
- Preverification sets up hints so that the actual verification of bytecode will happen much more quickly, saving you valuable startup time.
- Double-buffering guarantees that your animation will occur at the same rate no matter which phone your game is running on.
- Many phones automatically double-buffer your graphics for you: you can check whether double-buffering is supported within a Canvas using the isDoubleBuffered() method.
- Rolling your own double-buffering can make for smoother animations, but copying images can very slow and memory-intensive.
- Some systems repaint the screen slower than the image can be copied -- these phones will show graphic flickering.
- The refresh rate of most mobile phone screens is much slower than you may be used to for PC game programming. Frame rates of 4 fps (frames per second) and even slower are common.
- The best way to smoothly animate sprites is to create a game loop as a separate thread, and use the System.currentTimeMillis() method to adjust the sprite's position over real time.
- Remember that in the micro wrold every byte counts. Avoid hashtables and vectors, and recycle any objects you no longer need.
- Instead of creating two buttons on two separate screens, try to merely change the label on an existing button.
- Avoid using costly operations like string concatenations. Use a StringBuffer instead.
- For user interface design, use few, large, simple components that require as few keypad presses as possible.
- Garbage collect whenever resources fall too low, using: Runtime.getRuntime().gc()
- Use a code packer or obfuscator to compress your bytecode as much as possible.
J2EE Enterprise Bean Basics (Page last updated August 2002, Added 2002-12-29, Author Dale Green, Kim Haase, Eric Jendrock, Stephanie Bodoff, Monica Pawlan, Beth Stearns, Publisher informIT). Tips:
- Use enterprise beans when an application needs to be scalable
- Because stateless session beans can support multiple clients, they can offer better scalability for applications that require large numbers of clients. Typically, an application requires fewer stateless session beans than stateful session beans to support the same number of clients.
- An EJB container may write a stateful session bean to secondary storage. However, stateless session beans are never written to secondary storage. Therefore, stateless beans may offer better performance than stateful beans.
- Use stateless session beans when: the bean retains no client specific data; the bean performs a generic task for all clients (e.g. sending emails); the bean fetches read-only data from a database.
- Only Message-driven beans can receive messages asynchronously. Asynchronous processing helps to avoid tying up server resources.
- Local calls are usually faster than remote calls, so entity beans designed using local interfaces provide better performance. Local access means that the beans must run in the same JVM and the location of the bean is not transparent (unlike remote interface beans).
- Distributing beans over multiple servers can improve performance for some applications, and these beans would need to have remote interfaces (rather than or as well as local interfaces).
- Remote calls should be coarse-grained to reduce the access calls are required, since coarse-grained objects contain more data than fine-grained ones.
Benchmarking Method Devirtualization and Inlining (Page last updated December 2002, Added 2002-12-29, Author Osvaldo Pinali Doederlein, Publisher JavaLobby). Tips:
- [Article discusses the overheads of polymorphic method calls].
- Tricks employed to avoid compiler optimizations (for microbenchmarks): Parsing, like "Integer.parseInt("3")", to hide a constant from the compiler; Avoid compile-time constants in loop bounds to prevent easy loop unrolling; Test loops are run sufficient numbers of times to avoid timer's precision error, e.g. a ~10ms precison requires each test runs for at least 1000ms; Repeat the full test within a VM invocation a few times to be nice with dynamic optimizers; Test loops should produce some data which is consumed after the loop ends so that code and values and cannot be optimized away; Instead of do-nothing code (like empty methods), use minimal code that's still difficult to compute in compile time; Know your compiler theory.
Performance and Scalability of EJB Applications (Page last updated November 2002, Added 2002-12-29, Author Emmanuel Cecchet, Julie Marguerite, Willy Zwaenepoel, Publisher OOPSLA). Tips:
- EJB applications with session beans perform as well as a Java servlets-only implementation and an order-of-magnitude better than most of the implementations based on entity beans.
- Fine-granularity access entity beans limits scalability.
- Session fašade beans improve performance for entity beans but only if local communication is very efficient or EJB 2.0 local interfaces are used.
- Using session beans, communication costs form the major component of the execution time on the EJB server
- Using entity beans, the cost of reflection affects performance severely.
- Using session fašade beans, local communication cost is critically important.
- EJB 2.0 local interfaces improve the performance by avoiding the communication layers for local communications.
- Set the thread stack size to 32 KB using -Xss32k instead of the default thread stack size to avoid running out of space. (32 KB stack size is sufficient for non-recursive applications). (JDK 1.4.0_01. requires a 96 KB minimum stack size, -Xss96k).
Large-Scale Financial Applications & Service-Oriented Architectures (Page last updated December 2002, Added 2002-12-29, Author Anwar Ludin, Publisher BEA). Tips:
- [Article describes various types of enterprise architectures].
- The web server parses the SOAP message, invokes a remote EJB session facade which handles local calls to session and enitity beans.
- Clustering at different nodes enhances reliability and availability.
- Horizontal scaling is achieved through clustering.
- Adding resources such as CPUs and memory to the server machines achieves vertical scaling.
- J2EE design patterns are an invaluable source of information for avoiding performance pitfalls.
Back to newsletter 026 contents
Last Updated: 2020-10-28
Copyright © 2000-2020 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us