Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips November 2006
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 072 contents
The Garbage Collector is too aggressive!!! (Page last updated November 2006, Added 2006-11-29, Author Tony Printezis, Publisher Sun). Tips:
- The JIT can identify when variables are no longer usable, and reduce the scope of those variables, allowing objects to be garbage collected earlier than you might think.
- Passing a long that is the value for a pointer an object through the JNI layer does not keep the object "live".
- JNI calls from the native side can be expensive, the pattern of getting the pointer to an object on the Java side, and passing that is a performance optimization, but you need to pass a reference to the Java object too even if you are not using that Java reference at all, in order to ensure that the Java object is not prematurely collected [SWIG has been updated to provide this pattern in a backwards compatible way]
- Passing an extra parameter through JNI has a small overhead, but one that is acceptable if it helps avoid manipulating the Java object from the native side.
Java2D Gradients Performance (Page last updated September 2006, Added 2006-11-29, Author Romain Guy, Publisher java.net). Tips:
- Use the clipping rectangle to minimize what is drawn
- Painting large areas with gradients can be time consuming. You can optimize by painting the gradient in a buffer image and use that every time a repaint event occurs.
- If the memory of buffered gradient uses too much resources for you, paint the gradient in a 1 pixel wide picture for vertical gradients or in a 1 pixel high picture for horizontal gradients. Then, at runtime, paint the picture with drawImage(image, x, y, w, h, null). By passing the size of the component to this method call, you will get the result you wanted quickly and without wasting memory.
- DirectDraw is the standard pipeline. You can use sun.java2d.noddraw=true and sun.java2d.d3d=true to use direct3d and sun.java2d.opengl=true to use openGL.
- Timing painting operations should be performed by calling Toolkit.sync() after each paint to ensure the drawing commands are flushed to the graphics card.
- An alternative to strecthing a 1 pixel gradient fill is to copy the image multiple times until it fills the image to be painted.
- If the GradientPaint specifies an area that fully covers the area you're rendering to, then it doesn't have a chance to cycle, then GradientPaint performs much faster with the cyclic property set to true (it turns out that it's simply a matter of the difference between a simple lookup (cyclic) and constantly checking to see if you're past the area defined in the GradientPaint (non-cyclic)).
Ask the Experts on Swing (Page last updated October 2006, Added 2006-11-29, Author Scott Violet, Shannon Hickey, Chris Campbell, and Oleg Sukhodolsky, Publisher Sun). Tips:
- If you want to create a multi-threaded Swing app, then we've made it easier with SwingWorker in 1.6.
- JSR 296 will answer how best to structure resources in your application, how main should look, how to use threads, or persistent app state and more. (Making it easy for new Java developers to get a solid Swing application up and running is an overall goal for JSR-296, and a corollary is that the API it defines has to be modest in size and easy to learn.)
- Swing package documentation updated, now ALL access to Swing components be done only on the event dispatching thread (previous policy was that operations could happen on any thread up until the time that they were realized - a call to setVisible(true) realizes a frame and the components it contains).
- Stick with ImageIO for all your image reading/writing needs, as that will offer you the best decoding/encoding performance.
- When drawing/scaling the same image to the screen many times, first copy it into a "managed image" via GraphicsConfiguration.createCompatibleImage(), and then using that image for all further rendering - createCompatibleImage() will return an image that is better suited for rendering to the screen (or Swing backbuffer), and will reduce the number of internal conversions and therefore improve performance.
Scaling Enterprise Java on 64-bit Multi-Core X86-Based Servers (Page last updated November 2006, Added 2006-11-29, Author Michael Juntao Yuan, Dave Jaffe, Publisher OnJava). Tips:
- To take advantage of multi-core CPUs, a software application must be able to execute tasks in parallel across CPUs.
- For enterprise applications allocate as much memory as possible to the JVM using the -Xms<size> (minimum memory) and -Xmx<size> (maximum memory) flags - otherwise you have specified the default (64MB on many platforms).
- With a large heap memory, the garbage collection (GC) operation could become a major performance bottleneck. It could take more than ten seconds for the GC to sweep through a multiple gigabyte heap.
- If your priority is to increase the total throughput of the application and you can tolerate occasional GC pauses, you should use the -XX:UseParallelGC and -XX:UseParallelOldGC
- If you need to minimize the GC pause, you can use the -XX:+UseConcMarkSweepGC flag to turn on the concurrent GC, but this does reduce the overall GC throughput.
- Using -XX:ThreadStackSize=256k flag, you can decrease the stack size to 256k to allow more threads.
- Use -XX:+DisableExplicitGC flag to ignore explicit application calls to System.gc().
- If your application generates lots of short-lived objects, you might improve GCs dramatically by increasing the young generation memory space using the -Xmn<size> flag.
- The young generation size should almost never be more than 50% of heap.
- Make sure that you start the JVM with the -server flag. It optimizes the Just-In-Time (JIT) compiler to trade slower startup time for faster runtime performance.
- The java.util.concurrent.ConcurrentHashMap (and Doug Lea's earlier open source version) is a thread-safe HashMap and you can read/write it without a synchronized block.
- With NIO a thread no longer needs to block the I/O socket to read or write data.
- java.util.logging and Log4j lets you configure the logging output dyamically by changing the logging level at runtime via configuration files, reducing logging which involves slow I/O operations and is a major cause for CPU waiting.
- The only way to make sure that your application is optimized for your hardware is through extensive performance testing.
- Use the -verbose:gc or -Dcom.sun.management.jmxremote with jconsole to monitor garbage statistics.
- When the application is fully loaded, the CPU should run between 80% and 100% of its capacity. If the CPU usage is substantially lower, you should look for other bottlenecks, such as whether the network or disk I/O is saturated. However, an underutilized CPU could also indicate contention points inside the application.
- To find contention points, do a thread dump (on windows machine, type Ctrl-Break in the DOS terminal window where the JVM is started, on Linux/Unix run the kill -QUIT process_id ) and look to see what the threads are waiting for.
- Running everything on the same physical machine is much more efficient than using a distributed architecture, especially if you can eliminate network latency, serialization and inter-process communication.
- A load balancer should be configured to forward all requests from the same user session to the same node (i.e., use sticky sessions) to avoid the costs of session replication.
Are you really Multi-Core? (Page last updated November 2006, Added 2006-11-29, Author Heinz Kabutz, Publisher cretesoft). Tips:
- [Heinz implements a multi-CPU load tester using ThreadMXBean.getCurrentThreadCpuTime() to get threaded CPU time, System.nanoTime() to get elapsed time, and CountDownLatch and AtomicLong to avoid synchronization contention across threads].
Real-Time Tracking and Tuning for Busy Tomcat Servers (Page last updated October 2006, Added 2006-11-29, Author Edmon Begoli, Publisher devX). Tips:
- Tomcat's default installation is configured to handle medium loads - for high-load environments it requires further tuning.
- The configuration and size of resource pools has a significant effect on the overall scalability and performance of a server. Too small a pool and requests have to wait to get served; too large a pool uses up extra resources that may limit other threads or services.
- A resource pool is a pool of reusable objects that are vital for application processing yet expensive to instantiate on demand: connector thread pools and database connection pools are two such pools.
- You can monitor MBeans externally (e.g. via jconsole) and internally within an application
- [Article shows an implementation of an in-process Tomcat Valve that monitors performance of thread and connector pools for requests]
Garbage collection tuning in Java 5.0 (Page last updated August 2006, Added 2006-11-29, Author Peter Mikhalenko, Publisher Builder.com). Tips:
- If the garbage collector has become a bottleneck, you can tune generation sizes. Use -verbose:gc to measure whether GC is better.
- Throughput is the percentage of total time not spent in garbage collection, considered over long periods of time.
- Pauses are the times when an application appears unresponsive because garbage collection is occurring.
- Footprint is the working set of a process, measured in pages and cache lines.
- Promptness is the time between when an object becomes dead and when the memory becomes available.
- A very large young generation may maximize throughput, but it does so at the expense of footprint, promptness, and pause times.
- You can minimize young generation pauses by using a small young generation at the expense of throughput.
- You can enable the throughput collector by using the command-line flag -XX:+UseParallelGC.
- You can control the number of garbage collector threads with the ParallelGCThreads command-line option -XX:ParallelGCThreads=<desired number>.
- The maximum pause time goals are specified with the command-line flag -XX:MaxGCPauseMillis=<nnn> - this is a hint, not a guarantee.
- Use the concurrent low pause collector if your application would benefit from shorter garbage collector pauses and can afford to share processor resources with the garbage collector when the application is running.
- A concurrent collection will start if the occupancy of the tenured generation grows above the initiating occupancy (i.e., the percentage of the current heap that is used before a concurrent collection is started). The initiating occupancy by default is set to about 68%. You can set it with the parameter -XX:CMSInitiatingOccupancyFraction=<nn> where <nn> is a percentage of the current tenured generation size.
- -verbose:gc prints information at every collection
- -XX:+PrintGCDetails prints additional information about GC collections.
- -XX:+PrintGCTimeStamps will additionally print a timestamp at the start of each collection.
- You can specify the size of the initial space reserved with the -Xmx option.
- If the value of the -Xms parameter is smaller than the value of the -Xmx parameter, not all of the reserved space is immediately committed to the virtual machine.
- For a bounded heap size, a larger young generation implies a smaller tenured generation, which will increase the frequency of major collections. The optimal choice depends on the lifetime distribution of the objects allocated by the application.
- The throughput collector uses a parallel version of the young generation collector. It is used if the -XX:+UseParallelGC option is passed on the command line.
- The concurrent low pause collector is used if the -Xincgc or -XX:+UseConcMarkSweepGC option is passed on the command line. In this case, the application is paused for short periods during the collection.
- The incremental low pause collector is used only if -XX:+UseTrainGC is passed on the command line. It will not be supported in future releases, but if you want more information, please see Sun's documentation on using this collector.
- Do not use -XX:+UseParallelGC with -XX:+UseConcMarkSweepGC.
Back to newsletter 072 contents
Last Updated: 2021-08-29
Copyright © 2000-2021 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us