Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips January 2009
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 098 contents
Enhance performance with class sharing (Page last updated September 2008, Added 2009-01-28, Author Adam Pilkington Graham Rawson, Publisher IBM). Tips:
- IBM Java 5 JVM gave you the ability to share classes between JVMs with a cache. In IBM Java 6, you can make the cache persistent and can use it to share compiled code.
- The IBM Java 6 JVM introduces the ability to compile Java methods using Ahead of Time (AOT) compilation technology to create native code that can not only be used in the current JVM but can also be stored into a shared class cache.
- Ahead of Time (AOT) code is native code and generally executes faster than interpreted code but not as fast as JIT-generated code.
- In IBM Java 6 the default implementation of the shared memory cache has been changed to use a memory mapped file. This gives cache persistence across an operating system restart.
- The primary objective of Ahead of Time (AOT) compilation is to accelerate application startup by providing a precompiled version of Java methods.
- A method that has been Ahead of Time (AOT) compiled may also be JIT-compiled if it meets the necessary recompilation criteria. However, the aim of AOT compilation is to select methods required at application startup and the aim of JIT compilation is to optimise frequently used methods, so it is possible that a method that is AOT-compiled may not be subsequently invoked sufficiently to trigger a JIT compilation.
- The Virtual Address Dump utility or vadump is part of the Microsoft resource kit and can be used to provide information on the memory usage of an application - use: vadump -os -p <pid>
- Examples using eclipse and tomcat show a 25% improvement in startup time and multi-megabyte reduction in footprint for multiple copies running on the same machine, from IBM Java 6 shared memory and AOT.
Premature Optimizations, The Rest of the Story (Page last updated December 2008, Added 2009-01-28, Author Kirk Pepperdine, Publisher kodework). Tips:
- "forget about small effeciencies about 97% of the time" - is it important that you used a bubble sort instead of the more efficient quick sort? Well, 97% of the time the answer is ... no.
- There are just so many more important places in which to apply optimizations for an enterprised size application - tiny optimizations in reductions or network latencies or in the utilization of any technology that is slower than the CPU (which is about all of them) will give you a bigger bang for your buck than spending time optimizing execution times.
- Go for the optimizations when they are going to be a difference, not just because you can.
- Most apps have problems that can't be solved with an execution profiler, rather they need to profile GC, JDBC, Memory, contention, I/O, etc, because chips and Java are now fast enough.
What Volatile Means in Java (Page last updated November 2008, Added 2009-01-28, Author Jeremy Manson, Publisher jeremymanson). Tips:
- Memory writes that happen in one thread can "leak through" and be seen by another thread, but this is by no means guaranteed, it's non-deterministic in the absence of synchronization.
- Without explicit communication (such as by using volatile), you can't guarantee which writes get seen by other threads, or even the order in which they get seen.
- When one thread writes to a volatile variable, and another thread sees that write, the first thread is telling the second about all of the contents of memory up until it performed the write to that volatile variable.
- Given two threads with a volatile variable updated in thread 1, you are guaranteed all subsequent reads in thread 2 will see the latest data written in thread 1. Without a volatile write, thread 2 could see it's own value of the variables or it could see thread 1's values (or yet another thread), the outcome is not specified nor likely to be consistent on different platforms.
Serialization Cache (Page last updated November 2008, Added 2009-01-28, Author Dr. Heinz M. Kabutz, Publisher javaspecialists). Tips:
- When ObjectOutputStream serializes an object it is cached in an identity hash table. If it writes the object again, only a pointer to it is written. Similarly on reading, it is put in a local identity hash table, mapping the pointer to the object. Future reads of the pointer simply return the first object that was read. This minimizes the data that needs to be written and solves the circular dependency problem.
- If you modify the contents of an object and then write that object again to an ObjectOutputStream, the serialization mechanism sees that it is the same object again and just writes a pointer to the object. This means changes are lost in the serialization.
- Never serialize mutable objects, or at least don't re-serialize modified objects without first resetting the stream.
- Serializing lots of objects can lead to a memory leak as the ObjectOutputStream caches all objects it serializes in its identity hash table. You need to reset the ObjectOutputStream to avoid this (but then you can lose object identity back references in the serialized stream, so be careful).
- An approach used by RMI is to serialize parameters into a byte array and then to send that across the socket. This avoids ObjectOutputStream memory leaks ensures changes to objects are sent.
A Better, Faster Way to Stress and Load Test Today?s Web Applications (Page last updated December 2008, Added 2009-01-28, Author Daniel Baloche, Publisher JDJ). Tips:
- A load test aims to find out if an application is stable and won't crash when a certain number of users are accessing it. A load test simulates a user at a constant rate over a period of time.
- Load test results should answer: the number of simultaneous users the server can handle; the average response times; the variation in behaviour under different loads and load variation.
- Define what an acceptable response time before you run tests (so that you know whether that is achieved or needs tuning).
- Monitor the CPU and memory usage throughout a load test - the server is considered overloaded if these usage figures regularly exceed 90%.
- A stress test progressively increases the load of the application to discover the different breaking points.
- A stress test should answer: How many users can the application handle while maintaining an acceptable response time? What is the load threshold above which the server begins to generate errors and/or refuse connections? Does the server remain functional under high load or does it crash?
- To do a stress test, testers should ramp up the load, starting from normal and up to the maximum predicted limit, and monitor the response times and error rates. A sudden change indicates that a threshold has been passed.
- Make sure that the load testing tool you use can compare or enables you to compare a performance between the two sets of test results.
- Define success and failure criteria for tests before doing them.
- A realistic test considers: the number of virtual users; their simulated pattern of activity and think times; different types of users; values and variables and datasets and how they will realistically populate caches and simulate data volumes; realistic load balancing to simulate actual distributions.
- Playing the same requests with the same values produces an unrealistically high performance due to the use of various caches: preloading into memory cache, connection pools, system swap, etc. But completely disabling the caches (when available) will produce an unrealistically poor performance.
- To obtain a realistic load balance, configure your scenario to have several IP addresses used to play the load.
- Analyse test results for the local of bottlenecks, then modify if necessary. Make sure to modify one variable at a time and redo the tests, otherwise multiple changes can have subtle interactions that are hard to qualify and repeat.
- Crash testing run regression tests against the application at a specified maximum load to see where it breaks.
- If your main purpose in testing was to make sure that the system fails and recovers gracefully, make sure your stress tests take unpredictability into account. For example, how does the system react when you double the baseline number for concurrent users/HTTP connections?
Exploring Scalable Data Processing with Apache Hadoop (Page last updated November 2008, Added 2009-01-28, Author Tom Wheeler, Publisher OCIWeb). Tips:
- MapReduce is a programming model pioneered by Google which makes use of two functions: map (performs some operation on key/value pairs to produce zero or more key/value pairs) and reduce (called once for each unique key from the map function's output, passed a list of all values associated with that key, performs some merge operation to yield a smaller set of values).
- MapReduce programs are inherently simple - the difficult part is the infrastructure which distributes the MapReduce calculations and associated data across a series of machines in a cluster, monitors the progress of active jobs and transparently handles machine failure. Hadoop provides the infrastructure to do all of those things.
- Hadoop's scalability comes from the fact that the map and reduce operations can be run in parallel across several machines by breaking the input into smaller chunks.
- Typical filesystems like NTFS, ZFS, etc, can span multiple disks, they are not intended to span multiple computers as The Hadoop Distributed File System (HDFS) does.
- HDFS can replicate data across multiple machines in a cluster providing fault tolerance and high capacity storage given that the overall capacity will be based on all usable space of all disks across all machines.
- HDFS assumes that the data will be written only once and is able to gain extra performance by optimizing for subsequent reads while disallowing subsequent writes.
- Moving large amounts of data will be constrained by either network transfer speed or disk write speed, so HDFS operates on the principle that moving computation is cheaper than moving data. In other words, HDFS makes it possible for calculations to be run on the machine where the data resides.
- HDFS is said to be "rack aware" meaning that it can be configured to know about the proximity of machines to one another and replicate data near the nodes which might need it.
MapReduce programming with Apache Hadoop (Page last updated September 2008, Added 2009-01-28, Author Ravi Shankar and Govindu Narendra, Publisher JavaWorld). Tips:
- Google's MapReduce framework supports searching millions of pages with results in milliseconds. Hadoop is the open source implementation.
- The MapReduce algorithm segments the data into multiple buckets, processes each bucket in parallel, then aggregates the results. The buckets can be distributed across multiple machines as well as multiple processors.
- The MapReduce algorithm steps are: divide the data into buckets; map each bucket to a set of results ideally processing where the data is rather than moving the data; process the results to produce aggregate results.
- HDFS is designed to be highly fault tolerant, not require any high-end hardware, but be scalable and independent of any specific hardware or software platform, hence easily portable.
- [Article presents an extended example of using Hadoop].
Back to newsletter 098 contents
Last Updated: 2018-03-27
Copyright © 2000-2018 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us