Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips December 27th 2002
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 025 contents
Does nulling variables help garbage collection (Page last updated November 2002, Added 2002-12-27, Author Jack Shirazi, Publisher Kabutz). Tips:
- In some rare situations, explicitly nulling a variable reference can allow the garbage collector to reclaim an object in the young generation, avoiding promoting that object and having to reclaim it more slowly in the old generation.
- Object cycling (creation then deletion) is an order of magnitude faster in the young generation than the old generation.
- Objects that are too large to fit into the young generation must be created directly in the old generation.
- Correctly scoping objects avoids having to rely on nulling variables to dereference objects.
- Tuning the heap is worthwhile and can avoid you having to make code changes.
The util.concurrent package (Page last updated November 2002, Added 2002-12-27, Author Brian Goetz, Publisher IBM). Tips:
- wait(), notify(), and synchronized are tricky to use correctly and there are many performance, deadlock, fairness, resource management, and thread-safety hazards to avoid.
- Doug Lea's util.concurrent is a well tried free package of concurrency utilities that will form the basis of the java.util.concurrent package in JDK 1.5. util.concurrent supports timing out locks; interrupting lock acquisition attempts; creating shared locks; supporting multi-mode locking, such as concurrent-read with exclusive-write; acquire a lock in one method and release it in another; etc.
- A real-world task scheduler should deal with threads that die, kill excess pool threads so they don't consume resources unnecessarily, manage the pool size dynamically based on load, and bound the number of tasks queued. util.concurrent includes efficient thread pool management.
- util.concurrent includes efficient asynchronous process scheduling.
Optimizing EJB validation code (Page last updated December 2002, Added 2002-12-27, Author Brett McLaughlin, Publisher IBM). Tips:
- Place all your data validation code (ensuring data is the correct type/format/range) in the business delegate methods (which execute on the client). Keep your data validation logic as close to the client as possible to minimize unnecessary requests to the server.
- Business-specific validation (testing against business rules, e.g. a specific book exists) should be in the EJB layer optimized to use local calls.
Long-running tasks (Page last updated November 2002, Added 2002-12-27, Author Serguei Eremenko, Publisher Builder.com). Tips:
- [Article describes a class designed for efficiently managing long running tasks].
CPU usage monitoring (Page last updated November 2002, Added 2002-12-27, Author Vladimir Roubtsov , Publisher JavaWorld). Tips:
- Programmatically querying for CPU usage is not possible using pure Java, you need to use a JNI call.
- [Article describes implementing a JNI call to getProcessCPUTime() to get CPU usage of the system].
Application server performance (Page last updated November 2002, Added 2002-12-27, Author Rohit Valia and Marina Sum, Publisher Sun). Tips:
- HTTPS is characterized by light inbound/heavy outbound traffic. SSL requires heavy CPU usage.
- Each type of application has its own performance profile. Benchmark your application using measurment toolkits.
J2ME portability (Page last updated December 2002, Added 2002-12-27, Author Jeremy Wakefield, Keith Braithwaite & Tony Robinson, Publisher JavaDevelopersJournal). Tips:
- It's possible for a MIDP implementation to pass the TCK tests and not perform well at all.
- MIDlets allocate resources in startApp() and release them in destroyApp(). MIDlets PAUSED by the device's operating system must release expensive or volatile resources. Some platforms call destroyApp() when the MIDlet is PAUSED, and create a new instance when it returns to the foreground. A MIDlet should expect the possibility of being DESTROYED when PAUSED, and manage resources accordingly.
J2ME design patterns (Page last updated December 2002, Added 2002-12-27, Author Ben Hui, Publisher JavaWorld). Tips:
- [Article presents several design patterns suitable for J2ME applications].
- It takes a lot of memory and processing power to show 1,000 strings on a mobile phone. Use the Pagination pattern to divide data into pages.
Tuning 1.4.1 heap and GC (Page last updated November 2002, Added 2002-12-27, Author Nagendra Nagarajayya and J. Steven Mayer, Publisher Sun). Tips:
- 1.4.1 has three young generation collectors (copying collector; parallel copying collector; parallel scavenge collector) and two old generation collectors (mark-compact collector and concurrent collector).
- The parallel copying collector uses multiple threads (one per CPU by default) to garbage collect the young generation, thus should be faster than the default single-threaded copying collector on multiple-CPU machines. Enabled by using the -XX:+UseParNewGC option (though on a single CPU machine the default copy collector is still used since it is more efficient in that case). The -XX:ParallelGCThreads=<num threads> option can be used to specify a different number of threads from the default (one per CPU), and can force the use of this collector on a single CPU machine.
- The concurrent collector minimizes the stop-the-world portions of the old generation garbage collection, thus minimizing application pause times. Most of the concurrent collector GC occurs concurrently with the application. Enabled by using the -XX:+UseConcMarkSweepGC option. The concurrent collector background thread starts running when the percentage of allocated space in the old generation goes above the -XX:CMSInitiatingOccupancyFraction=<percent>, default value is 68%. If "the rate of creation" of objects is too high, and the concurrent collector is not able to keep up with the concurrent collection, it falls back to the traditional mark-sweep collector.
- The parallel scavenge collector is similar to the parallel copying collector but is optimized for large systems: mutliple CPUs and heap sizes above 10 gigabytes. Enabled by using the -XX:UseParallelGC option, which forces old generation GC to use the original mark-compact old generation collector. With large heaps, the -XX:TargetSurvivorRatio should pronbably be increased beyond the default of 50. The -XX:ParallelGCThreads=<num threads> option can be used to specify a different number of threads from the default (one per CPU), and can force the use of this collector on a single CPU machine. Adaptive sizing can be turned on with the -XX:+UseAdaptiveSizePolicy option.
- The mark compact (original) old generation garbage collector is very efficient where pause time is not a big criterion.
- 1.4.1 heap consist of: permanent generation, used to store class objects and related meta data (sized using -XX:PermSize and -XX:MaxPermSize, use -Xnoclassgc to prevent GC here); old generation used to hold old objects promoted from the younger generation (sized using -Xms and -Xmx which specifies the total sizes of young and old generation combined); and young generation which is further divided into an Eden, and two semi-spaces (sized using -XX:NewSize, -XX:MaxNewSize, -XX:SurvivorRatio such that
EdenSize = NewSize - ((NewSize / ( SurvivorRatio + 2)) * 2) and
SemispaceSize = (NewSize - EdenSize) / 2).
- The -XX:MaxTenuringThreshold=<N> option specifies how many collections an object can remain in the young generation, after which it is promoted to the old generation. The promoteall option (last available in 1.2.2) which immediately promoted any object that survived a young generation to old generation space, can be simulated using -XX:MaxTenuringThreshold=0 (in which case there is no need for any semispace, so they should be sized extremely small in conjunction, e.g. -XX:SurvivorRatio=20000).
- [Article gives an example of using Amdahls law to predict the parallelism efficiency of an application from its GC costs].
- Options that generate GC logging are -verbose:gc, -XX:+PrintGCDetails, -XX:+PrintGCTimeStamps, -XX:+PrintHeapAtGC. Note that -Xloggc does not log data from the -XX switches. [Article describes the information available when using these options].
- [Article provides an analysis script to analyze the GC logging output available from 1.4.1 GC logging options].
- If the application has few intermediate objects (objects surviving one or more young generations GCs), using the promoteall modifier decreases the amount of time that the application spends copying in young GC. But the collection cost is moved to old generation GC, which may be worse.
- Copy collection is directly proportional to the size of the live objects.
- Dereference objects (perhaps by nulling the variable pointing to them) as soon as they are no longer needed.
- Avoid creating unnecessary objects.
- Pooling objects can easily cost more to use than the time saved in GC. Objects are suitable for pooling when they: take a long time to create or use up a lot of memory (threads, db connections); are static objects, especially when they have no state.
- Temporary objects referenced from objects in the old generation (e.g. long-lived objects) have a higher scanning cost (young generation GC phase).
- Flattened objects are cheaper to garbage collect than the equivalent data in nested objects.
- The -XX:PretenureSizeThreshold=<byte size> (default 0) specifies the size of objects to create directly in the old generation. This option can easily degrade performance.
- NIO direct buffers are stored outside the heap, so avoid GC, and can be used to store long life objects like lookup tables, caches, etc. [Except presumably there is some GC when the buffer is released].
- Reference objects are expensive for the garbage collectors.
- Finalizers are very expensive for the garbage collectors.
- Finding the optimal size for the young generation is pretty easy. The rule of thumb is to make it about as large as possible, given acceptable collection (pause) times. There is a certain amount of fixed overhead with each collection, so their frequency should be minimized. If pauses are not an issue, young generation space should be made as large as possible.
- Old generation heap size needs to be balanced to minimize overall GC times. Too small a size leads to an increased rate of GC, to large a size leads to long GC times as the old generation GC time is proportional to the size of the heap.
- Old generation is not compacted by default, leading to a fragmented heap. The -XX:+UseCMSCompactAtFullCollection option can be used to enable compaction of the old generation heap at the cost of performance.
- Heaps large than physical memory incur operating system paging overheads which significantly degrades performance.
- [Article inludes Solaris specific tuning options, sections 22.214.171.124 and 18].
Java Performance for DB2 Applications (Page last updated November 2002, Added 2002-12-27, Author John Campbell, Publisher IBM). Tips:
- SQLJ can be faster than JDBC because some of the JDBC runtime overheads can be shifted to compilation time.
- Ensure the SQLJ serialized profile is customized correctly, and request online checking.
- Use the same ConnectionContext for corresponding SQLJ statements.
- Close resources (Resultsets, Statements, ...) when finished with
- Disable autocommit.
- SELECT and UPDATE only those columns you need.
- Tune the JVM heap.
- Use the latest JVM and JDBC driver.
- Turn on DB2 dynamic SQL statement caching with CACHEDYN=YES in your subsystem parameters (DSNZPARM).
Caching web services (Page last updated December 2002, Added 2002-12-27, Author Brian D. Goodman, Publisher IBM). Tips:
- Web service performance depends on: Network transaction time; The time it takes to handle the message (XML parsing/generation, etc); The time the service itself takes to execute.
- Cache where possible [Article describes a few caching scenarios].
Statistical analysis to help identify performance bottlenecks (Page last updated November 2002, Added 2002-12-27, Author ProactiveNet, Publisher ebizQ). Tips:
- A user?s perception of a quality application experience is predominantly determined by the application?s response time and availability.
- Use statistical quality control to manage the huge volume of performance data: periodic statistical sampling, creation of "normal" control charts and focusing additional analysis on "abnormal" data points. Accurate baselining is critical.
- A series of analyses helps identify the causes of performance problems: 1 detect abnormal performance (outside expected ranges); 2 correlate abnormalities compared to baseline measurements for all other variables at about the same time as the alerted performance abnormality; 3 eliminate detected abnormalities that are unlikely causes; 4 score the remaining abnormalities according to degree of abnormailty and likely causation.
Compile time assists for JITs (Page last updated October 2002, Added 2002-12-27, Author Vivek Haldar, Publisher Sun). Tips:
- [Article describes how the compiler can add optimization hints while still satisfying the Java security model].
Building a lightweight XML parser (Page last updated November 2002, Added 2002-12-27, Author Guang Yang, Publisher DevX). Tips:
- A full-featured XML parser can be too resource intensive for your particular application. The SimpleDOMParser (described here) is a highly simplified, under 4 KB and fewer than 400 lines of source.
- SAX parsing provides better performance than DOM as SAX process one tag at a time, while DOM maintains global state. However SimpleDOMParser provides a simplified DOM which is not resource intensive.
Safe data writing with memory mapped files (Page last updated November 2002, Added 2002-12-27, Author Greg Travis, Publisher DevX). Tips:
- Efficiently updating file based data caches changes in memory and writes it periodically to disk (called checkpointing).
- Using memory mapped files can be efficient because the underlying operating system manages disk pages on demand, only loading the data that is actually accessed.
- Any change to a memory mapped file immediately changes the file, so care must be taken to avoid corrupt states. The article describes a class which maintains changes in memory and duplicated in a disk-based "change log", efficiently applying changes atomically to avoid corrupt states. In case of a crash, the "change log" allows recovery to the last non-corrupt state.
- Writes to disk are not necessarily immediately flushed to disk, for efficiency reasons. You can force data to disk using MappedByteBuffer.force(), but this can slow down processing.
Comparing The Performance of J2EE Servers (Page last updated November 2002, Added 2002-12-27, Author Christopher L Merrill, Publisher Web Performance Inc). Tips:
- [Report comparing the performance of servlet servers. Describes how to build a load test].
Back to newsletter 025 contents
Last Updated: 2020-03-30
Copyright © 2000-2020 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us