Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips January 2008
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 086 contents
Checklist/Tuning Guide for Optimizing the JRockit JVM (Page last updated December 2007, Added 2008-01-30, Author Steven Pozarycki, Publisher BEA). Tips:
- Confirm the exact version you are using with java -version
- Gather the exact JVM flags that are being used so that changes are clearly identified
- Determine what the goal of the application is, e.g. "short response times" or "high application performance".
- For high application performance regardless of pause times, set the Dynamic Garbage Collector to "-Xgcprio:throughput". For short pause times use -Xgcprio:pausetime -Xpausetarget=XXX' with the pausetarget value set
- To diagnose performance problems, first gather a JRockit Recording (JRA) of about 10 minutes of the runtime while the problem is occurring. You can do this by using the jrcmd.sh utility or JRockit Mission Control (JRMC).
- To diagnose garbage collection problems, first gather GC stats using -Xverboselog:perTestGC.log -Xverbose:opt,memory,gcpause,memdbg,compaction,gc,license -Xverbosetimestamp -Xgcreport
- Sometimes explicit calls to do garbage collection (e.g. calling System.gc() in the code) can be a problem. This can be disabled with the -XXnoSystemGC flag.
- To ensure that the profiles and measurements have been taken during steady-state, allow enough time for the application to warm up
- See if -XXtlasize and -XXlargeobjectlimit need to be tuned (keeping in mind that, for most applications, the thread local area size should be at least twice the size of the large object size)
- Run a test with -Djrockit.lockprofiling enabled, with no other logging enabled from the JVM. There is an overhead of around 5 to 10 percent with this flag, so only use for profiling
- Monitor the operating systems with top, iostat, and, if necessary, take thread dumps.
- Test -XXsetGC:singleparpar and -XXsetGC:genparpar to see which is better for your application.
- Tune the nursery size; try with -Xgc:gencon -Xns50m; try with -Xgc:parallel -XXcompactratio:1.
- Test if setting the -XXgcthreads flag to the actual number of physical CPUs helps
- If there is lock contention on fat locks, they can be disabled by using -XXdisableFatSpin or by letting JRockit adaptively disable them with -Djrockit.useAdaptiveFatSpin=true
- If you are running on Xeon hardware, adding -XXallocPrefetch and -XXallocRedoPrefetch' to decrease the cost of memory allocation
- Try the -XlargePages option - amongst other things this will lock the heap into memory so it does not get swapped out by the operating system
- See if -XXaggressive helps - this uses more resources during startup but should stabilize to a more optimal system
- Try the -XX:+UseNewHashFunction which enables a new, faster hash function for HashMap
- Test the generational GCs -Xgc:gencon or -Xgc:genpar; and changing the compaction ratio (-XXcompactionRatio=nn)
- Blocks on the heap that are smaller than the minimum block size count as wasted space, so lowering the minimum block size with XXminBlockSize: <memSize>, can reduce wasted space (the default is 2 KB)
Asynchronous, High-Performance Login for Web Farms (Page last updated December 2007, Added 2008-01-30, Author Udi Dahan, Publisher InfoQ). Tips:
- In a synchronous login solution, the load on the app server and, consequently, on the database will be proportional to the number of logins.
- In a synchronous login architecture the database is the bottleneck - many large sites have numerous read-only databases for this kind of data, with one master for updates - replicating out to the read-only replicas.
- In an "asynchronous" login solution, you cache username/hashed-password pairs in memory on our web servers, and authenticate against that. This reduces the load on the database (which would otherwise be the bottleneck). This trades memory on the webserver for db communications.
The Law of the Leaked Memo (Page last updated October 2007, Added 2008-01-30, Author Heinz Kabutz, Publisher JavaSpecialists). Tips:
- A Java compiler can reorder independent statements inside a statement block (e.g. x=a; y=b;) - so you cannot rely on the order of statements that are independent of each other within a block.
- A Java compiler can move statements that lie outside a synchronized block into the block.
- A Java compiler cannot move code that is inside a synchronized block to outside the block - this restriction lets you prevent early writes from happening in critical sections of your code.
- Volatile fields have special semantics that guarantee reads across all threads will see the last write that occured in any thread.
Comparing the Google Collections Library with the Apache Commons Collections (Page last updated December 2007, Added 2008-01-30, Author A.R. Narayanan, Publisher DevX). Tips:
- com.google.common.base.FinalizablePhantomReference, FinalizableSoftReference and FinalizableWeakReference provide Reference objects that are automatically reference queued, and only require overriding finalizeReferent() in a subclass to automatically process finalized objects.
- com.google.common.collect.BiMap and org.apache.commons.collections.BidiMap define maps that allow bidirectional lookup between key and values (both the key and value entries must be unique).
- HashBiMap from the Google Library and DualTreeBidiMap from Apache Commons Collections have very similar insertion and seek times. The Commons Collections TreeBidiMap takes a little longer to insert but saves memory space because it does not use dual map to represent value mapping to the key.
- org.apache.commons.collections.map.LRUMap removes the least recently used entry if the map is full.
- com.google.common.collect.ConcurrentMultiset uses ConcurrentMap and keeps a count of copies of objects added and removed form a set. The Apache Commons Collections HashBag and Synchronized Bag, are functionally similar but the former is not thread safe and the latter locks the object for all method calls.
- org.apache.commons.collections.buffer defines a contract for object removal in a collection that allows the removal order to be based on: insertion order (for example, a FIFO queue or a LIFO stack); access order (e.g., an LRU cache); some arbitrary comparator (e.g., a priority queue); or any other well-defined ordering.
- The Apache Commons Collections provides FixedSizeList, LazyList, FixedSizeMap, Flat3Map, LazyMap, LRUMap, and ListOrderedSet.
A Glassfish Tuning Primer (Page last updated December 2007, Added 2008-01-30, Author Scott Oaks, Publisher java.net). Tips:
- A lot of default configurations are optimized for development. In development, performance is different: you'll trade off a few seconds here and there to make starting the appserver faster, or deploying something faster. In production, you'll make opposite trade-offs, so generally products need different tuning setups for production.
- A general GC configration to start with is the throughput collector with large heaps and a moderate-sized young generations: that makes young GCs quite fast. That will lead to a periodic full GC, but the impact of that on total throughput is usually quite minimal. (E.g. -server -Xmx3500m -Xms3500m -Xmn1500m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+AggressiveOpts)
- If you absolutely cannot tolerate a pause of a few seconds, you can look at the concurrent collector, but be aware that this will impact your total throughput.
- On a CMT machine like the SunFire T5220 server, you'll want to use large pages of 256m, and a heap that is a multiple of that (e.g. -server -XX:LargePageSizeInBytes=256m -Xmx2560m -Xms2560m -Xmn1024m -XX:+UseParallelGC )
- For Glassfish you should aim to remove the -client option, include the -Dcom.sun.enterprise.server.ss.ASQuickStartup=false flag, and include -DAllowMediatedWriteInDefaultFetchGroup=true if you are using CMP 2.1 entity beans.
- Acceptor threads are used to accept new connections and to schedule existing connections when a new request comes in. In general, you'll need 1 of these for every 1-4 cores on your machine.
- Request threads run HTTP requests. You want "just enough" of those: too few and the machine has idle cycles while requests get queued up unnecessarily; too many and they compete for CPU resources and your throughput decrease (the machine wastes time context switching, i.e. doing overhead work rather than actually processing requests - too many request processing threads can be a big performance problem).
- For CPU bound requests there should be no more than one request thread per CPU. For requests that have significant I/O, more than one thread per CPU wil be optimal - the exact number depends on the amount of I/O and is probably obtained by trial and error.
- Use JDBC drivers that perform statement caching; this allows you to reuse prepared statements and is a huge performance win. The JDBC drivers that come bundled with the Sun Java Systems Application Server provide such caching; Oracle's standard JDBC drivers do as well, as do recent drivers for Postgres and MySQL. Configure the JDBC driver properties to use statement caching when you set up the JDBC connection pool, e.g. ImplicitCachingEnabled=true MaxStatements=200
- If you serve a lot of static content, make sure to enable the HTTP file cache.
When can I start performance tuning? How do I monitor? (Page last updated November 2007, Added 2008-01-30, Author Charlie Weiblen, Publisher performanceengineer). Tips:
- The focus of performance testing and analysis should be on understanding performance behavior and its causes.
- Tuning should be a secondary task that results from testing and identifying problems, not a primary goal on its own.
- If you try to monitor too much all at once, you will have too much data - 90% of it will probably be useless. You need to be targeted in what you monitor.
- A general approach to performance testing and analysis is: Monitor the basics: CPU, Memory, I/O ; Understand the performance and scalability; Identify the bottlenecks additing monitoring, as necessary; Create hypotheses for root cause of performance and/or scalability problems; Identify fixes to the hypotheses, or add more monitoring; Test the hypotheses and the fixes.
Two Ways To Boost Your Flagging Web Site (Page last updated November 2007, Added 2008-01-30, Author Michael Nygard, Publisher michaelnygard). Tips:
- Caching can take up a lot of space and if the caches fill the heap they can affect garbage collection times making performance slower.
- Moving the cache out of the app server process reduces duplication, uses less heap, and also reduces garbage collection pauses. If you make the cache distributed, as well as external, then you can reduce duplication too.
- Usually cached objects accesses follow a power law: popular are requested hundreds or thousands of times as often as the average item. So you should make sure popular items are "hot", i.e. the most speedily retrievable.
- If your database does many more reads than writes (which is usually the case), use a read pool of DBs: configure the write master to ship its archive logs to the read pool databases, and they will synch up from the logs (note: database clustering overheads would void the benefits of read pools).
- A read DB pool will be some amount out of date, so is only appropriate for data that does not need to be absolutely current.
Back to newsletter 086 contents
Last Updated: 2018-04-29
Copyright © 2000-2018 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us