|
|
|
I had a bit of a rant about the difference between being able to use a profiler and being able to performance tune an application. It came from the fact that every site I have ever been to in the last five years has owned a profiler, but still had performance problems.
Of course, I have what is known as "experience bias". Doctor's don't tend to see healthy people visiting their offices to get a diagnosis, so consequently they can have a distorted view of how healthy the overall population is. People who have just lost their job and are finding it difficult to find another feel that the economy is not doing well, even when it is. The same type of "experience bias" afflicts me: I will not be asked to come and performance tune a project that is doing fine. But that doesn't change the fact that all those projects that I do get to already have a profiler. And they know how to use it, having almost always sent at least one developer to attend the vendor course. In fact I'm often presented with several saved profiles to look at, as one of my first tasks!
Knowing how to use the profiler doesn't seem to be quite enough. Knowing how to performance tune is a wider skill, and it is one that we here at JavaPerformanceTuning.com give you the ability to perform whether that be by using our 3,000+ tips; or by reading our recommended book "Java Performance Tuning, 2nd ed" with it's 300 tuning techniques; or by our excellent Java Performance Training Courses; or even just by keeping up to date by reading our monthly newsletters with their comprehensive focus on Java performance;
This month we've selected a diverse and interesting set of articles to extract tips from. GUIs (wait cursors); J2EE (session scopes, messaging EJBs, app-server tuning); performance management; several JVM and core classes tips; and a look forward to several of the classes due to become available with 1.5 and it's inclusion of the java.util.concurrent package.
In addition, we have our usual sections. Kirk's roundup covers micro-benchmarks, adaptive JVM advantages, McCabe Complexity, the shelf-life of tips, obfuscation, and more. Our interview this month is with Steven Haines, J2EE architect for Quest Software, who shares with us his expert knowledge of current performance issues. Our question of the month asks about the effectiveness of pooling objects. Javva The Hutt details how he set up his constant response-time production server; and, of course, we have many new performance tips extracted in concise form.
Every month, Jack has been posting about 100 performance-tuning tips that he extracts out of performance tuning articles and other sources on the net. Even at that rate, it has been impossible for Jack to keep pace with the number of tips that are being generated from all of the available sources. Even though we do a cursory vetting of the tips that we post, that still leaves a large number of tips that one can only hope have some validity? For example, I recently read an article that was published at Developer-Works. Given the difficulty to get an article through the editorial process before DW will publish it, I was quite surprised to find that I could not reproduce the results stated in the article. Does this mean that those results are invalid? Because none of the conditions under which the tests were run, nor was the source code published, it's a question that I can't answer.
Producing a micro-performance benchmark that you can trust is difficult. The reason why it is so difficult is because of the JIT optimizing compilers. If the optimizing compiler determines that it can throw away a set of instructions, it will! It might take 2 or 3 recompiles but, that code will disappear. Now, if the effect that you are trying to measure is embodied in code that the optimizing compiler thinks it can dispose of, you wouldn't be able to measure the effect. Worse yet, if the optimizing compiler decides to do it's magic in the middle of your benchmark, then again, your results will be invalid. Now this in itself is not too big of a problem -- as long as you know about it. And here-in lies the problem, the optimizing compiler is just going to quietly to wht it does, and you are not going to know about it.
About the best you can do is use statistics and common sense to determine if your benchmark has run without any interference. The key numbers to consider are, the minimum run time, the average run time, and the 95th percentile run time (one tailed probability distribution function). If all of these numbers are "close" to each other, then you can be sure that the minimum runtime is the value that you are looking for -- unless the instructions have been optimized completely out of the test run. And now for performance tips that you can count on.
In one of the most interesting posts that I've seen in any of the performance discussion groups, a the bartender at the Java Ranch offered the best explanation that I've seen to date on how a Java program could run faster than it's C/C++ equivalent. First, the optimizing compiler compiles byte code down to native assembler. Ok, no big deal here, so do the C/C++ compilers and if that were the end of the story... The interesting part comes with the realization that the native code produced by the optimizing compiler will have undergone optimizations based on the current runtime conditions, something that the C/C++ static compliers are incapable of doing. Thus, the native code that is produced by the JIT may actually be more highly optimized for the task it is performing. To complete the story, a greenhorn stepped in an noted that when he used JRockit Windows (BEA's VM optimized for server performance), he was able to measure an 28% increase in performance over the highly optimized C version of the benchmark.
In another post, we see a good example of where software metrics plays a role in the decision making process. In this post, a greenhorn has a piece of code that he needs to performance tune. Problem is, the code has an extremely high McCabe Complexity number. Now, McCabe Complexity is a measure of the number of paths that one can take though the code (a count of branch statements). The higher the number, the more complex the code is, the lower level of abstraction that can be found in the code. These characteristics often lead to code that is more error prone and much more difficult to refactor, which is of course what the greenhorn wanted to do. Now, having a high complexity measurement should not scare you off of refactoring but, it should be a warning that you are about to embark on a journey that will most likely be as difficult as it is unpredictable.
From the server side, we come across a discussion that questions the usefulness of using an external over an internal cache. The discussion wandered through the various issues of GC, IPC [inter-process communication], memory footprint until it finally clicked. The overhead of serialization and de-serialization would render the technique [external caching] ineffective. The participants concluded that it would be much better to just add RAM to the system. This point becomes even more powerful when you start to consider the improvements to GC in the JDK 1.4.
In another post on the serverside, the question of whether to use a second VM or increase the allocation of threads and heap in a single VM. As with any performance questions, the solution often involves a trade-off. In this instance, the trade-off is having all of the data in a single VM or having it spread out between the two. The next trade-off is the effects on GC. Running GC over a smaller space will result in fewer application pauses and ease the burden on the thread scheduler. So, with this in mind, we must look to the application to see if it can point us to one direction or the other.
This application is running in Weblogic 6.1, which pins us down to the JDK 1.3. Details about the application indicate that it's stateless. This is great news as it makes the recommended solution to use two VMs easy to implement in this case. But one other point, most performance tuning tips do have a shelf-life and this tip is no different. Improvements in heap and thread management in the JDK 1.4 might be enough to reverse the advice to run the application in a single VM. Certainly if the improvements in the JDK 1.5 prove to be just a significant, these types of performance tuning techniques will become less prevalent.
This month, we feature the re-emergence of Java Gamming. As was reported a couple of months ago, Java Gaming was moved to the new java.net site. We at Java Performance Tuning.com are quite happy to see that the community has successfully moved and has retained its character and fervor.
Not really to do with performance but, an interesting thread was started with someone asking the question: is doing this (referring to the code snippet below where an object lock is reassigned during a synchronized block) a bad idea?
SomeObject myObject; syncronized(myObject) { ...stuff... if(somecondition) myObject = new SomeObject(); ...more stuff.. myObject.methodthis(blah); }
Clearly, there is a problem with this code as myObject
has not been initialized
and consequently, the synchronized statement will fail. But, lets just say that
myObject
has been properly initialized, what then? The answer is nothing. Not
that this seems like something that you should be doing but, there will be no
side effects on synchronization if you do. This is because the synchronized statement
forces the thread to grab the monitor for the object, and myObject
is a reference.
Now, you will not be able to send a notify message to myObject
after the assignment.
If you do send a message notify message to myObject
, you will cause the application
to throw an exception because, you don't own the monitor of the object being referenced.
Ever see a claim stating that an obfuscating tool can actually have a positive impact on performance? Well, it used to be true and it most likely is true to a certain extent. The trouble is, it used to be true back in the days when most people were still developing with the JDK 1.1. Now with JIT/optimizing compilers built right into the VM, the performance is less effected by long method names, many methods and a whole host of other issues that can interfere with an applications performance. Best bet, get an evaluation license and check it out for yourself before investing in the technology.
Finally, Jack and I have been experimenting with the new JRockit release. This VM, based on the JDK 1.4 specification, contains a really interesting monitoring feature. If you turn on monitoring but don't attach a client, the monitoring adds no overhead to the cost of operating the VM. If you do attach a monitoring client, you'll find that the monitoring creates less than a 5% drag on performance, maybe as low as 1%. It's not often that I will plug a product in this column but, this is certainly one technology that is worth investigating.
When a colleague pointed me to this story about turning phones into sex toys, I couldn't decide whether it was a joke. But then I realized that it was technically feasible, and in fact quite easy to implement, so consequently whether this particular story was a joke or not didn't matter. Sooner or later it would be an available product. There's progress for you. It's only a matter of time before we have the orgasmatron (check Woody Allen's "Sleeper" to find the technical specs for that gadget).
June 4. Server needs a seeing to. We have monitors in in place to give us early warning signs, and our weekly trend analysis indicates that we may soon start breaching response times. Seems to be down to increased workload.
June 11. Publicity time. I need to make sure that we get maximum exposure for our efforts. It always helps for lots of people to see you doing good things. Caught Parsons at lunch and starting being expansive about our fabulous monitoring setup. Naturally he wanted to see it. So I got Boris to go through all the logs with the graphics. Gotta have graphics if you want to show something to the executive level. Without graphics, it seems like you don't actually have data for some people. Parsons set up a meet with some of the other Big D's to repeat the display. Boris was loving the attention. I was delighted to be getting the exposure. Parsons was pushing the proactive nature of his department. He'd obviously done his sums, because he gave the other Big D's a detailed breakdown of exactly what the downtime would have cost if we hadn't been monitoring the system and the workload had become too heavy to handle. Afterwards he came round to me with a worried look on his face: "You will be able to maintain performance without breaching the thresholds, won't you?". Yes of course we will you silly suit. Why would I be broadcasting this otherwise?
June 18. Showed Boris and Brainshrii the monitoring framework. More to the point, showed them the "slack" in the monitor that meant we didn't have much to do to maintain performance. Boris grunted his usual noncommital grunt, but I could see he was impressed. Brainshrii went away looking thoughtful.
June 25. Brainshrii hooked up the monitor to the log analyser and adjusted the monitor delay code to automatically adjust the performance according to the workload and current response times, flattening out the response time curve. He also added a threshold to the delay so that we get warned if it goes too low. This means that the server now has auto-regulated performance. Response times should appear to be completely flat no matter the workload for the forseeable future. Wish I'd thought of that, so nice. Of course we can't detail the mechanism to anyone, I imagine someone or other would go ballistic. But how many complex client/server systems do you know of that have completely predictable and deterministic response times under all workloads?
BCNU
This month we interviewed Steven Haines, J2EE architect for Quest Software, Inc.
JPT: Can you tell us a bit about yourself and what you do?
I am a J2EE Architect working for Quest Software?as its J2EE Domain Expert. Quest's strength in the monitoring field is derived from the fact that we seek out experts in specific disciplines and bring them in to apply their knowledge toward developing monitoring and performance management software. My role has been to evaluate various application servers (BEA WebLogic, IBM WebSphere, Oracle Application Server, JBoss, etc.) and to determine what performance information we can gather from them. We are then able to define the criteria by which we determine whether an application server is running optimally. To date, we have delivered 24-by-7 diagnostics tools for WebLogic and WebSphere.
? JPT: What do you consider to be the biggest Java performance issue currently?
Focusing in the J2EE arena, the biggest problems I have encountered are architectural. Most project teams do not spend enough time designing their applications for performance and scalability from the beginning. Luckily, design patterns have been formalized and more organizations are paying closer attention them. The key is to use these design patterns correctly.
? JPT: Do you know of any requirements that potentially affect performance significantly enough that you find yourself altering designs to handle them?
Probably the biggest factor that affects the performance of application servers is memory. Regardless of the application domain, the largest hurdle when deploying applications is tuning memory and reducing that "pause" experienced when a major garbage collection runs. Therefore, projects that I have worked on have had significant requirements placed on memory, and specifically on HTTP Sessions. Reducing the amount of memory in HTTP Sessions can greatly improve performance. In the past, my designs have adjusted to scrutinize every potential byte that may enter a session. I identify global data to move to the application scope and data that can be stored in cookies in the user's browser. Finally, most remaining stateful information is shifted to the EJB tier where it can be more easily managed.
? JPT: What are the most common performance related mistakes that you have seen projects make when developing Java applications?
The most prevalent mistake I have seen is the misuse of memory, specifically with respect to HTTP Sessions. Many development teams work in isolated environments, performing unit tests on their work, but seldom performing load tests. A memory-hungry application can stand up to a minimal load (20 users, or so) with great performance. But, as soon as the load is increased significantly, performance rapidly degrades. For example, storing 100K of data in a session object only yields a couple megabytes for 20 users, but an increase to 1,000 users pushes that count up to 100MB, just for user sessions. Furthermore, if the application is running in a clustered environment, the problem is compounded by the replication of session data between servers.
? JPT: Which change have you seen applied in a project that gained the largest performance improvement?
The most significant gain I have seen applied in a project has been the result of adopting?a pre-built architecture. Most developers want to build everything from scratch, "we can all do it better ourselves, right?" but pre-built architectures, like Struts, have been tested by thousands of users so we can be comfortable with their performance and scalability. The drawback of using a pre-built architecture is that developers have to learn how to use someone else's code. Once they overcome that learning curve, they are much more productive, and when deployment time comes, they are pleased with?the results.?
? JPT: Have you found any Java performance benchmarks useful?
Unfortunately no - Standard benchmarks, such as ECPerf and SpecJAppServer, only measure the performance of an application server. Plus, most of the results posted by application server vendors are on servers tweaked to maximize the performance of the benchmark on their system. This is not necessarily going to reflect the performance of?every organization?s applications. The only benchmark that matters is one that is generated for each organization?s particular applications on those application servers, running transactions representative of that organization?s end-users. I always recommend running a load tester, such as Benchmark Factory or Load Runner, on the application to determine throughput and then watching the application with a tracer, such as PerformaSure, to determine where the bottlenecks are.
JPT: Do you know of any performance related tools that you would like to share with our readers?
At Quest, we have some great tools that I have worked with extensively. If developers would like to see how well their WebLogic Server is running internally and isolate and diagnose configuration problems, I would highly recommend looking at Spotlight on WebLogic. For diagnosing application issues, we have PerformaSure, which allows users to trace requests "from HTML to SQL" identify "hot" transactions and pinpoint their locations. I would never deploy a real-time application without first running it through PerformaSure.
? JPT: Do you have any particular performance tips you would like to tell our readers about?
Start with a solid architecture and be mindful of performance considerations from the beginning by minimizing the amount of data being stored in session objects. Developers should cache as much data as possible (limit the hops between tiers in the environment) and invest the money and time into tools to help identify bottlenecks before?deploying the application into production.
JPT: Thank you for the interview.
Thanks for the opportunity to answer your questions. -Steve
(End of interview).
Should I use object pooling to avoid objects with a short life?
In the early JVMs, the answer to this was clear cut "yes". But now, even the embedded and J2ME
JVMs can handle short-lived objects much more efficiently.
In their HotSpot FAQ, Sun engineering states that pooling should definitely not be used any more,
that pooling actually gives worse performance with the latest HotSpot engines. This is rather a
sweeping statement. Object pools are still useful even with HotSpot, but presumably not as often
as previously. Certainly for shared resources, pooling will always be an option if the overhead
associated with creating a sharable resource is expensive. And for scaled applications, huge numbers of object creations and their associated garbage collections are still a very significant issue which can be completely bypassed using object pools.
Various recent tests have shown that the efficiency of pooling objects compared to creating and
disposing of objects is highly dependent on the size and complexity of the objects. And in some
applications where deterministic behavior is important, especially embedded applications, it is worth
noting that object pools have deterministic access and reclamation costs for both CPU and memory,
whereas object creation and garbage collection can be less deterministic (you cannot specify when garbage collection will occur, nor for how long it will last).
When recycling container objects, you need to dereference all the elements previously in the
container so that you don't prevent them from being garbage collected. Because there is this
extra overhead in recycling, it may not always be worth recycling containers. As usual for tuning, pooling is a technique best applied to ameliorate an object-creation bottleneck
that has already been identified, not one that should be applied up front. However to facilitate pooling being easily added to an application you should use the factory pattern to allocate the objects. This way you can change the allocation implementation at any time. Use a simple create ("new") with garbage collection reclamation initially for your factory method, then measure performance. If you identify a bottleneck caused by the object creation and garbage collection of a certain set of objects, you can easily change the underlying allocation implementation to use (thread local) pools.
The JavaPerformanceTuning.com team
At JavaPerformanceTuning.com, we scan the internet for any articles with interesting Java performance information. When we find one, we add it to our huge article list, but we also extract the performance tips from those articles, and list those extracted tips in our newsletter. Below, you can see this month's extracted tips.
http://www-106.ibm.com/developerworks/java/library/j-pj2ee6.html
Proper handling of the four session scopes (Page last updated 2003 July, Added 2003-08-26, Author Kyle Gabhart, Publisher IBM). Tips:
include
directives combine mutliple pages to one page scope; for the include
action you would need to use the request scope to share data between the pages.
forward
and include
actions can use this scope to share data.
http://www.javaspecialists.co.za/archive/Issue075.html
An Automatic Wait Cursor: WaitCursorEventQueue (Page last updated 2003 July, Added 2003-08-26, Author Nathan Arthur, Publisher Kabutz). Tips:
http://www.javaworld.com/javaworld/jw-07-2003/jw-0718-mdb.html?
Add concurrent processing with message-driven beans (Page last updated 2003 Jul, Added 2003-08-26, Author Amit Poddar, Publisher JavaWorld). Tips:
http://www-106.ibm.com/developerworks/java/library/j-jtp07233.html
ConcurrentHashMap and CopyOnWriteArrayList offer thread safety and improved scalability (Page last updated 2003 Jul, Added 2003-08-26, Author Brian Goetz, Publisher IBM). Tips:
http://www.devx.com/Java/Article/16755
Continuous Performance (Page last updated 2003 Jul, Added 2003-08-26, Author Cliff Sharples, Publisher DevX). Tips:
http://www-106.ibm.com/developerworks/library/j-perf07303.html
Compilation speed, exceptions, and heap size (Page last updated 2003 Jul, Added 2003-08-26, Authors Jack Shirazi, Kirk Pepperdine, Publisher IBM). Tips:
http://builder.com.com/article.jhtml;jsessionid=?id=u00220020814R4B01.htm
Deciding between iterators and lists for returned values (Page last updated 2003 Jul, Added 2003-08-26, Author Ryan Brase, Publisher builder.com). Tips:
http://www.ebizq.net/topics/real_time_enterprise/features/2299.html
Performance Tuning To Make WebSphere App Servers Sing (Page last updated 2003 June, Added 2003-08-26, Author Gian Trotta, Publisher Candle). Tips:
http://www.fawcette.com/javapro/2002_07/magazine/columns/weblication/default.asp
The Necessity of Performance Profiling (Page last updated 2002 Jul, Added 2003-08-26, Author Peter Varhol, Publisher JavaPro). Tips: