Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Tips March 31st, 2003
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 028 contents
Servlets, stateless session beans, or both? (Page last updated February 2003, Added 2003-03-31, Author Kyle Gabhart, Publisher IBM). Tips:
- The strength of the servlet architecture is in its overall efficiency and relative simplicity. EJB components, on the other hand, are more robust, and consequently more complicated to develop, maintain, and debug.
- Servlet scalability is smooth and efficient: a given servlet instance handles simultaneous requests by spawning a new thread for every request and executing the service() method within each thread.
- An EJB container requires a substantial amount of server power and memory.
- In exchange for this higher overhead, however, EJB components provide effective management of enterprise resources, transactions, and security checks, without sacrificing too much in the way of response times and overall scalability.
- The most common scalable alternative to pure servlets is to use the combination of servlets talking to stateless session beans which in turn communicate with the enterprise beans.
Choosing A Collections Framework Implementation/Providing a Scalable Image Icon (Page last updated February 2003, Added 2003-03-31, Author John Zukowski, Publisher Sun). Tips:
- Use interfaces to define variables, allowing the actual class implementation used to be easily changed.
- The additional features provided by TreeSet and LinkedHashSet add to the runtime costs.
- ArrayList provides quick indexed access to its elements, and works best when elements are only added and removed at the end.
- LinkedList is best when add and remove operations happen anywhere, not only at the end.
- LinkedList has slower indexed operations than ArrayList.
- TreeMap maintains the keys of the map in a sorted order. However, if you only need the keys sorted when you are done with the Map, sometimes it's better to simply keep everything in a HashMap while adding elements, and create a TreeMap when required since maintaining the element order when elements are changing has overhead.
- On-the-fly scaling of images is practically as fast as caching and reusing a scaled image for some JVM and hardware configurations. And on-the-fly scaling requirers less memory as there is no cached instance.
Thread safety (Page last updated February 2003, Added 2003-03-31, Author Vladimir Roubtsov, Publisher Javaworld). Tips:
- Atomic update is insufficient to ensure timely access in multiple threads. Synchronized or volatile may be necessary depending on context.
Optimizing (Page last updated February 2003, Added 2003-03-31, Author Ted Neward, Publisher Ted Neward). Tips:
- Remember the 80/20 rule: 20% of the code is responsible for 80% of the time taken by the application. Focus on that 20%
- Use a profiler to find the bottlenecks and speed them, not other irrelevant parts of the application.
- When profiling run the system under "normal load" conditions.
HttpSession Objects (Page last updated January 2003, Added 2003-03-31, Author Brian Russell, Publisher JavaDevelopersJournal). Tips:
- The most common type of session persistence is database persistence. It provides an efficient way of saving session data and it's usually fairly easy to set up on the application server. Memory persistence in a cluster is also easy to use, if your application server supports it. The only drawback is that sessions can sometimes hold large amounts of data. Storing the session in memory reduces the amount of memory available to the other processes on the server. File system persistence can be slow at times and the file system may not always be accessible to multiple servers. (The remaining alternative, cookie persistence, is highly insecure).
- The session is convenient but not always the best place to put information. Single page data should use the request scope and then forward the request from the servlet to the JSP. This causes the objects to be destroyed after the request has ended, which is after the data is displayed by the JSP. If you put the objects into the session, you would either have to remove them in your code or leave them there. Leaving objects in the session is not a good idea because you're using up valuable resources for no reason. This becomes even more of an issue when your Web site has hundreds or thousands of visitors, all of whom have a session that's loaded with objects.
- Anything that needs to exist longer than one request can be stored in the session, as long as these objects are removed as soon as they're no longer needed.
- Session timeout (default usually 30 minutes of inactivity) is configured in the deployment descriptor. The HttpSession API also provides a setMaxInactiveInterval() method (and getMaxInactiveInterval() method).
JVM Shutdown Hooks (Page last updated January 2003, Added 2003-03-31, Author Frank Jennings, Publisher JavaDevelopersJournal). Tips:
- Don't run Runtime.runFinalizersOnExit() in a shutdown hook.
- Add a shutdown hook thread which times the shutdown and terminates the process (using Runtime.halt()) after a reasonable interval (a few seconds at most).
Optimized web services (Page last updated January 2003, Added 2003-03-31, Author Howard D'Souza, Publisher WebservicesDevelopersJournal). Tips:
- WSDL provides a language- and protocol-neutral way to describe a service. It is possible to have multiple definitions for a single service, allowing sophisticated clients running on a fast local network to use the alternate binding to get significantly improved performance
- [Article gives an example of optimizing a Java web service by providing a definition optimized for Java clients.]
J2ee best practices (Page last updated January 2003, Added 2003-03-31, Author TheMiddlewareCompany, Publisher TheServerSide). Tips:
- Modern methodologies have placed more emphasis on the design phase because it contributes tremendously to the quality, scalability and reliability of an application.
- Create a limited number of expensive resources, and share them from a common pool.
- Cache remote data locally. A distributed cache needs synchronization which may have high (excessive) overheads. Examples of objects worth caching include: the results of JNDI lookups; EJB home objects; JDBC datasource objects; and JMS topics and queues.
- Use design patterns. Many assist performance such as: Session Fa?ade (provide a service access layer that hides the complexity of underlying interactions, and consolidates many logical communications into one larger physical communication); Value Objects (Create a single representation of all the data that an EJB call needs. By aggregating all of the needed data for a series of remote calls, you can execute a single remote call instead of many remote ones increasing overall application performance); Service Locator (efficiently manage and obtain resources through centralizing distributed object lookups, reducing the difficulty and expense to obtain handles to components).
- OPTIMIZE COMMUNICATION COSTS: Use local calls as opposed to remote; Aggregate data;Batch requests (e.g use JDBC batched statements); cache data;
- BUILD A PERFORMANCE PLAN: work with your customers, both internal and external, to specify reasonable performance criteria; build in a little time during each design iteration, to work the bottlenecks out for each scenario; understand what tools that you'll use to analyze and tune performance, e.g. load simulators, profiler generators, profile analyzers; create or rent a simulation environment; build critical test cases; in the build phase don't spend time tuning components that are not bottlenecks, only fix major performance problems; leave extensive tuning to the later test phases; stop tuning when the system achieves performance targets.
- typical iterative steps for effective performance tuning: Establish performance criteria from requirements; Identify major performance problems; Identify hot spots; Find and solve majorbottlenecks only; Use regression tests; Set base lines; Verify production performance; Monitor production performance.
- Plug memory leaks (i.e. inappriopriate object retention by leaving references from roots). Close resources; ensure long-lived objects stop referencing short-lived objects; ensure that any potential exception doesn't prevent cleanups from proceeding. Typical leaks include: putting an object in a collection and not removing it; not flushing stale data from caches; forgetting to unsubscribe objects; singletons holding references to objects that should be short-lived; seldom used objects being kept alive for the application duration; leaving session state alive for too long.
- The major symptom of a memory leaks is that used heap space continually increases and doesn't stabilize. In this situation, use a memory profiler to find the cause of the leak.
- Use the 80/20 rule (80% of a code's execution will occur in 20% of the code); focus on performance hot spots.
- Measure all of the major components of your infrastructure to locate the bottlenecks.
- Improve your deployment configuration, e.g. single system to a clustered architecture, upgrade hardware, or upgrade other infrastructure (like your network).
- If your database platform is the bottleneck, then it won't make any difference how your J2EE server is deployed
- When you require high availability and a more scalable performance environment, you will need to go with a clustered approach. Clustering options include: adding additional JVMs, and load balance between them; adding additional CPUs, and cluster multiple JVMs with connection routing, failover and load balancing between them; adding multiple machines, with the Application Server providing failover, clustering, load balancing and connection routing.
- Clustering can degrade your caching performance because of distribution overheads (communications overhead for distributed invalidation of dirty data; with N different caches you have less of a chance for getting a hit)
JMS security options (Page last updated February 2003, Added 2003-03-31, Author Steve Trythall, Publisher JavaDevelopersJournal). Tips:
- SSL is expensive because of a high cost of establishing a connection (five remote messages between the client and broker and two very costly asymmetric cryptographic operations). For a small number of long-lasting connections, this may be acceptable, but may be too expensive for multiple short connections.
- Two options to reduce this security cost are: Fewer connection initiations by altering your design to reuse connections; and Message-level security, that is avoiding SSL and encrypting the messages instead.
- The choice of cipher can have a large impact on messaging performance. Typically the broker will choose the strongest cipher available to it, which is often triple DES (3DES). This will seriously impact performance and may be considerably stronger security than the user had ever intended. For example triple DES is consistently double the cost of using 56-bit DES.
- If performance is a concern, an RC4-based cipher is a considerably better choice than DES or 3DES. If cipher strength is of paramount importance, then a 128- or 256-bit AES is a better choice when it's supported by the JMS vendor.
- Using message-level encryption allows individual messages to be encrypted once, before sending to all subscribers. This is much more efficient than SSL which which will separately encrypt every message send to each subscriber.
Submillisecond precision from Java (Page last updated January 2003, Added 2003-03-31, Author Vladimir Roubtsov, Publisher JavaWorld). Tips:
- [Article provides code for evaluating the timer resolution of the underlying machine.]
- Measuring times near or smaller than the system resolution can give misleading results.
- System.currentTimeMilllis() is suitable only for profiling relatively long-lasting (100 ms and longer) operations.
- Object.wait(long) or Thread.sleep(long) timer based systems give alternative methods that may have better resolution than using System.currentTimeMilllis(), but are unreliable.
- [Article provides a native call implementation for sub-milisecond timer resolution].
Stack Trace Decoding (Page last updated January 2003, Added 2003-03-31, Author Heinz Kabutz, Publisher JavaSpecialists). Tips:
- JDK 1.4 stack trace classes provide faster stack tracking than JDK 1.3
- [Article provides version-polymorphic stack tracing that uses the most efficient available methodology.]
Chapter 12 of "Java Performance Tuning", "Distributed computing". (Page last updated January 2003, Added 2003-03-31, Author Jack Shirazi, Publisher O'Reilly). Tips:
- Use a relay server to examine data transfers.
- Reduce the number of messages transferred.
- Cache data and objects to change distributed requests to local ones.
- Batch messages to reduce the number of messages transferred.
- Compress large transfers.
- Partition the application so that methods execute where their data is held.
- Multiplex communications to reduce connection overhead.
- Stub out data links to reduce the amount of data required to be transferred.
- Design the various components so that they can execute asynchronously from each other.
- Anticipate data requirements so that data is transferred earlier.
- Split up data so that partial results can be displayed.
- Avoid creating distributed garbage.
- Optimize database communications. Application partitioning is especially important with databases.
- Use statically defined database queries.
- Avoid database transactional modes if possible.
- Use JDBC optimizations such as prepared statements,specific SQL requests, etc.
- Try to break down the time to execute a Web Service into client, server, and network processing times,and extract the marshalling and unmarshalling times from client and server processing.
- Don 't forget about DNS resolution time for a Web Service.
- Try to load-balance high-demand Web Services or provide them asynchronously.
- The granularity of a Web Service is important. For more scalable and better performing Web Services, create coarser services that require fewer network requests to complete.
Create objects in scope (Page last updated February 2003, Added 2003-03-31, Author Ashutosh Shinde, Publisher DevX). Tips:
- Create objects in the scope that they are needed to avoid redundantly creating objects when they might never be used.
Enterprise Java Performance: Best Practices (Page last updated February 2003, Added 2003-03-31, Author Kingsum Chow Ricardo Morin Kumar Shiv, Publisher Intel). Tips:
- Components of enterprise applications requiring tuning: network topology; system configuration; network I/O; disk I/O; operating system; database; clustering; application design; application server; driver choice; persistence implementation; JVM choice; JVM parameters; JVM code & JIT; cache architecture; SMP scaling.
- The primary element for good performance is the application design.
- Testing should take place in a performance test environment that mimics the production environment. Workloads must be representative (e.g. good user simulations, fully populated databases with realistic data) measurable (e.g. throughput, response times, and concurrency or injection rate), and repeatable (e.g. databases need to be reinstated to the original states for each test). Variations in the primary metrics should not exceed a 5% margin across measurements.
- Establish a baseline, including recording all configurable parameters for that baseline.
- Define performance goals for the system (e.g. the desired throughput within certain response time constraints).
- The iterative tuning process are as follows: Collect data (Use stress tests and performance-monitoring tools to capture performance data as the system is exercised); Analyze the collected data to identify performance bottlenecks; Identify, explore, and select alternatives to address the bottlenecks; Apply solution; Evaluate the new performance.
- Use the measured data to drive performance improvement actions (don't guess).
- Make sure only one performance improvement action is applied at a time.
- Follow a top-down approach to identifying bottlenecks. At the top are system-level items such as disk subsystem configuration, network devices, and database configuration; in the middle are application-level items such as transaction configuration, persistence strategies, and JDBC drivers; and at the bottom are machine-level items such as JVM configuration, multi-processor configurations, and processor caches.
- Performance tools fall under the following categories: Stress test tools (repeatably simulate user activity); System monitoring tools such as Windows perfmon and Unix/Linux sar and iostat utilities (e.g. to measure CPU utilization, % processor time, disk I/O, % disk time, read/write queue lengths, I/O rates, latencies, network I/O); Application server monitoring tools (measure statistics such as queue depths, utilization of thread pools, and database connection pools); Database monitoring tools (measure statistics such as cache hit ratios, sorts rates, table scan rates, SQL response times, and database table activity); Application profilers (identify application-level hotspots and drill down to the code-level); JVM monitoring tools (such as Garbage Collection monitoring with verbosegc).
- Beware the overheads of measurement tools, and try measure one components at a time (some tools can slow processing by one or more orders of magnitude on application performance i.e., 10-100X).
- For batch processing the raw throughput (the amount of work done in a period of time) is the only real metric of interest.
- For interactive processing the response time (the time taken for each unit of work) is very important and in some cases it may be of higher importance than the throughput.
- Pipelining (breaking down the required work into many parts) allows component sections to simultaneously work on multiple parts of transactions, thereby maximizing use of system components. Pipelining is extensively used to increase throughput, but its effect on the time taken for an individual transaction is not a primary consideration.
- Parallelism throws multiple resources at a task so that the task completes faster. Its primary effect is to reduce response time. Multi-threaded code is a way to achieve this in software. Hardware examples would include mirrored disks and multiple network cards.
- Drawing a throughput curve can be very valuable in understanding system-level bottlenecks and helping identify potential solutions.
- Reaching maximum throughput without full saturation of the CPU is an indicator of a performance bottleneck such as I/O contention, over- synchronization, or incorrect thread pool configuration.
- Hitting a high response time metric with an injection rate well below CPU saturation indicates latency issues such as excessive disk I/O or improper database configuration.
- Reaching application server CPU saturation indicates that there are no system-level bottlenecks outside of the application server. Further tuning may involve tweaking the application to address specific hotspots, adjusting garbage collection parameters, or adding application server nodes to a cluster.
- Reaching CPU saturation is a goal for the performance tuning process, not an operational goal. An operational CPU utilization goal would be that there is sufficient capacity available to address usage surges.
- For databases: Isolate log files to dedicated devices to reduce conflicts between the sequential nature of log operations and random access to data tables; Size the sort area memory size to minimize disk sort operations; Allocate sufficient database cache memory (but avoiding swapping); Index frequently used, highly selective keys; Index foreign keys frequently used in joins; Use full-text retrieval keys where appropriate; Use disk striping (e.g., RAID 1+0) to spread I/O operations and to avoid device contention.
- Use the Composite Entity design pattern. This lets coarse-grained entity beans manage a set of subordinate persistent objects, reducing the number of fine- grained remote calls); Value Object. This pattern assembles data requests into aggregated data objects to reduce remote calls to individual field get methods. It reduces the number of fine-grained remote calls and allows the transfer of more data with fewer remote calls.
- Use the Session Fa?ades design pattern. This encapsulates business logic and data access, eliminating the need for clients to access fine-grained business and data objects, thus reducing the number of remote calls. It is often used in combination with the Value Object pattern.
- Use the Service Locator design pattern. This encapsulates access to directory access through JNDI and provides caching of retrieved initial contexts and factory objects (e.g., EJB Homes), reducing expensive accesses to JNDI by implementing caching strategies.
- Use the Value List Handler design pattern. This encapsulates access and traversal of database-generated lists of items, improving performance by providing low-overhead list population mechanisms and implementing caching strategies.
- Enterprise JavaBeans (EJB) homes and data sources should be cached to avoid repeated JNDI lookups.
- Try to minimize HTTP session use. JSPs create HTTP sessions by default: override this with session="false" where appropriate.
- Release database connections when they are no longer needed.
- Explicitly remove unused stateful session beans.
- Asynchronous messaging can enhance the scalability of the system by supporting pipelining of operations, though at a latency cost (reduced response times).
- Key Application Server Parameters to tune: socket multiplexing (use where available); thread pool size (increase until fully utilized); transaction queues; database connection pools (sized one per active thread); JDBC prepared statement cache (increase size until fully utilized); session timeouts (too long will use too much memory); initial bean pool size (to optimize initial response times); bean cache size (try to avoid passivations); transaction isolation levels (use the least restrictive but still valid); JSP fragment or full-page caching (minimize dynamic page generation).
- Tune the waiting queue size (server entry point at TCP level) to a high enough level that the server is at full capacity, but not so high that the server is swamped.
- Tune the heap configuration.
- Select the best garbage collection algorithm.
Back to newsletter 028 contents
Last Updated: 2017-11-28
Copyright © 2000-2017 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us