|
|
|
Back to newsletter 020 contents
Has anyone noticed that the -O no longer does anything? Where has the -O gone?
Why has it been neutered and finally, will we miss it? It used to be that static
analysis was the mainstay of performance tuning. During my bit-jockeying days on
the Cray Research line of computers, I relied at lot on the ability of the Cray C
optimizing compiler to produce the best machine code possible. This was not an easy
task. For one thing, there was the requirement that code that ran in vector mode
must produce the same results as if it were run in scalar mode. Consequently, some
very mundane techniques commonly used in C would force the optimizing compiler to
generate scalar code instead of vector code. For those of you who may be familiar
with Cray supercomputers, you know that most desktops will out-perform an XMP when
the XMP is running in scalar mode. The real power was in its vector processing
features. You could force the compiler to optimize the code using a pragma
statement, but the consequences were that your results may not be correct.
The real problem was that the compiler had no idea of how you were going to use a particular function. With pointers in C, there was no guarantee that you wouldn?t over-write memory before accessing it in a tight loop. Of course, this determination could be made at runtime but, since this would come at what was determined to be an unacceptable performance penalty, only static analysis was ever employed. Consequently, performance tuning involved a lot of profiling and tweaking of code to ensure that the resulting executable was optimal.
Now, along comes Java and, like the Cray C compiler, it only performed static analysis of the code. This offers very little gain for all the effort. Granted JIT?s improved the picture by caching the machine instructions for commonly used code. But, the addition of HotSpot has truly added value with its ability to perform dynamic analysis of an application's code base. In most cases, the cost of performing the analysis is paid back in spades as most Java applications now run much faster. This result called into question the value of the -O parameter given that Java is supposed to be platform independent. As I have described earlier, in order to get C code to vectorize, you needed to change your style of coding. Consequently, how the code performed became dependent on the platform. This suggests that static analysis produces byte code that performs differently on different platforms [and I can confirm that, I've had exactly that experience for a small subset of optimization techniques - Jack].
The optimizations derived through the dynamic analysis of running code are far superior to those that could be produced with the -O option. This has reduced the need to statically optimize the byte code. With each successive release of HotSpot, the dynamic optimizations are only improving. So, will you miss the -O option? Given all of the other bottlenecks that exist in the JVM, probably not. Will the -O return? It?s hard to say but it seems clear that the separation of optimization from compilation has improved the performance characteristics of most applications. This leads one to believe that the best scenario is for the compiler to produce solid byte code and for HotSpot to perform the platform specific optimizations.
Now lets move onto this month?s summary of the discussion groups.
From the Saloon down on the JavaRanch we have the usual array of performance related questions that lead in some very interesting directions. The first question I?m going to deal with is which piece of code is faster. The only answer that was provided was this excellent advice.
Rules of optimization
The last bit of advice was this useful URL, http://c2.com/cgi/wiki?OptimizeLater. If you?ve not been to a wiki web before, then you?re in for a special treat. With the right query, the wiki web will auto generate html pages with the information you?re looking for. The best part is that if you?ve anything to add, there are facilities for that also.
Next on the list is the age-old question of is C++ faster than Java. After the usual barrage of claims, there suddenly appeared a gem. A greenhorn replied that he had implemented a matlab like engine in Java. The development took two months. His employers then had him rewrite the package in C++. The Java code took up 1/3 of the disk space and ran about 20% slower. The interesting bit is that the conversion took six months! As a colleague of mine used to say, "I?d rather waste the computers time than mine".
Are two collections better than one? Well, that all depends. If you?re trying to remove duplicates from an ArrayList like one greenhorn was attempting, then the answer is yes. That is of course if maintaining the order of elements is important. Here?s a code fragment.
ArrayList a = new ArrayList(a_master); long cur = System.currentTimeMillis(); Set set = new HashSet(); List newList = new ArrayList(); for (Iterator iter = a.iterator(); iter.hasNext(); ) { Object element = iter.next(); if (set.add(element)) newList.add(element); } a.clear(); a.addAll(newList);
Now, if maintaining order is not important, and then once you?re dealing with more than 100,000 elements, it?s best to switch techniques to:
ArrayList a = new ArrayList(a_master); long cur = System.currentTimeMillis(); HashSet h = new HashSet(a); a.clear(); a.addAll(h);
This really shows the usefulness of a properly constructed micro benchmark.
And finally, the debate over which is faster, dot net or Java, has spilled into the Performance discussion forum. It seems early for dot net to come up with a fair answer to this question. The original poster had his ideas but I have questions about the validity of his micro benchmark. It will become more interesting once some real comparisons come into play.
Ever probing the boundaries of performance, a gamer at www.javagaming.org posted a question on how to profile Java3D. Apparently, there is currently no good means to have the Java3D package reveal it?s performance secrets. Since gamers live and die by being able to measure just about every aspect of a gaming experience, this will certainly need to change.
The next question of interest was how to measure down to the nanosecond in Java. Out of the discussion came a number of salient points. First, the current accuracy of timings in Java is 10ms. This does not vary between System.currentTimeMillis() and the TimeStamp class. Although portability was posted as the reason why, the retort was that one could provide the highest possible resolution in a portable manner. Jeff Sutherland (Sun performance expert) remarked that Sun was thinking of adding this capability though when was not stated. The last point was a link given to an article that discusses hi-resolution timers. Just click here http://www.fawcette.com/archives/premier/mgznarch/javapro/2001/08aug01/km0108/km0108-1.asp.
To my surprise, logging is an issue for gamers as well. To this end, there was an interesting discussion on what was the best means to log. There was a vote for the new 1.4 logging package as well as Log4J. There was also another interesting link to zero cost logging. You can find this article at http://www.netbeans.org/servlets/ReadMsg?msgId=319106&listName=nbdev.
Last but certainly not least is www.theserverside.com. The first post I?ll discuss here is one concerning a 5x slowdown of an application running in WAS 3.5.1 when the transactions were changed from TX_REQUIRED to TX_SUPPORTS. One possible explanation is that TX_SUPPORTS causes WAS to call ejbLoad() and ejbStore() for every call to a particular bean. With the TX_REQUIRED option, if no changes are made to the bean, these calls can be avoided. Of course, this is an implementation specific detail.
A few weeks ago, I had the pleasure of watching Martin Fowler deliver a talk centered on the material that can be found on his website and will be appearing in his next book. It was, of course, patterns. During the course of the talk, the question of pattern abuse came up. Martin?s answer to that question was education. You may be wondering how this relevant to the server side discussion groups. Well, a post asked the question of how a particular architecture would perform. One of the questions posed was: "Is performance worse if you use design patterns"? The responses played true to Martin's comment as they educated the poster on the judicious use of patterns. Amongst the responses was one that suggested that the number of layers in an application was a question of good design, not performance. Yet another suggests that the poster resist the temptation to use every design pattern possible. I would add that one should only build what the requirements call for. I?ve witnessed a few projects that delivered late by not following that edict.
If you?re willing to anti up the fee (of $1100USD), there is an interesting report published at http://www.cmis.csiro.au/ADSaT/j2eev2.htm on J2EE server performance. A more through check of the site revealed some performance tidbits that one could readily view. You may want to checkout the new ECPerf results published by BEA. They now have the highest BBOP (37381) rating but it comes at a price of $38/BBOP. That price is without clustering, a configuration that IBM has always contested as being unreasonable for a production environment. [More recent results are now available, see this month's news section].
On a final note, I was in a meeting with a vendor this week. What was interesting was that they claimed their Java component ran faster on Windows NT 4.0 using IBM?s VM than it did on any other VM/platform combination (including RedHat Linux 7.1). Since this runs counter to the benchmarks that I?ve seen, I questioned them on it but they stuck to their story. It seems the truth will be told in an upcoming benchmark.
Back to newsletter 020 contents