Back to newsletter 017 contents
I was busily preparing this months Round-Up when a blizzard of postings appeared on the ECPerf website. First, BEA posts a set of numbers, then IBM follows up with their own. Not to be outdone, BEA counters with an even better BBOP/$ ratio. Wow! Then in drops Pramati and, at last, a note from Borland. I?m thinking, what gives? Why are the J2EE vendors felling a sudden need to awaken the normally dormant ECPerf site? And, what does all of this BBOP stuff mean anyways? Can we really trust a single number to tell the story? It reminds me of my college analytical chemistry class. My partner and I had diligently worked out a five-point calibration curve. Our professor came by, looked at the curve, and made the comment, "why do you have so many data points. Only use one data point and zero. That will give you a perfectly straight line." We, of course, did not follow his advice, as we did not trust ourselves to get the data point in the right spot. Even though obtaining these "extra" data points was time consuming, it gave us the confidence that our test results were correct and that our conclusions were well founded. Though the makers of the ECPerf specification may have learned this lesson in a different manner, they too have demonstrated that they understand that a single number is but one dimension of a multi-dimensional world.
The ECPerf expert group specified that each result set must be contained within a standardized report that conforms to a standardized format. Moreover, no report can be published without being scrutinized by the ECPerf review committee. The committee has strict guidelines on what is acceptable or how they are to reject any submission. IF we look at the ECPerf website, we?ll find that BEA is ahead with a posted result of $7/BBop. Wow, that is a significant improvement from their first posting of 18$/BBop. It smashes IBM?s posting of 13$/BBop. So, it?s clear, WebLogic is currently the cheapest J2EE server to operate. Now, IBMs response on (www.theserverside.com) is that BEA?s test does not represent a real world production system. The ECPerf specification seems pretty clear on this point. Is this just sour grapes from IBM or is there something to it? And, if there is something to IBM?s outcry, why did the ECPerf committee approve the results? Clearly BEA sacrificed BBop/min (7,539.90 vs. 16,696.17/min) to achieve their goal. Even so, this point is totally overlooked in IBM?s statement. So what is IBM objecting to?
If you read through the complete report posted by BEA, you?ll see that they ran their application on a single Dell PowerEdge 4600. Now, if you turn to the golden rules found in table 1 in clause 1 of the ECPerf specification, you?ll find the characteristic "Services should be redundant and fully available with the runtime mapping unknown to the application developer." Is this the basis of IBM?s complaint? IBM does claim that a customer would not deploy an application on such a system and consequently, the test is not real world. Well, I thought about it for a bit and ran through a little calculation that put some redundancy costs into BEA?s test. The new number worked out to be $12/BBop. I then looked at the Q&A posted by Sun after they withdrew their test results and suddenly, my professors long lost words jumped out of the page at me.
In that Q&A document, Sun makes the statement "We feel the testing results do not really prove anything other than the spirit of the ECPerf has been hijacked to show how fast competing hardware/OS/VM products are. Borland, along with the Giga Information Group recommend testing between application servers be conducted on a standardized platform. This is the only way to truly know what the differences are between competing application servers." In other words, ECPerf is sensitive to its environment. Since your environment will likely be much different than others, the best way for you to measure your BBop/$ and BBop/minute metrics is to measure it yourself. On that note, lets look at the interesting performance tidbits that have been discussed this month.
Last month, we introduced JavaDevTalk. This month?s discussion includes a note and a micro-bench mark pointing out a performance bug due to HotSpot not enforcing eight byte alignments of eight byte quantities on the stack. The result is that your code may run up to four times slower. The suggested work around is to add an extra int to force the proper byte alignment. Running with the -server option (JDK1.4) seems to fix the problem. For more information, see bug #4490869
In another high quality posting, a Java programmer is pondering the performance hit of using methods instead of direct access to access variables. In his tests, he could not see a difference until he was performing +1,000,000 iterations. The conclusion: write clean readable code and then optimize the trouble spots. The responses contain a fair warning about running micro-benchmarks.
And now partners, lets mosey over to the Saloon at the Javaranch and listen to the boys wag their tongues on the latest performance news. The first table is asking which is better, to place a synchronized block around a static shared DateFormat or to create a new instance for each thread. Well the friendly ranch hand steps in to point out that regardless of whether the data is static or not, synchronizing causes you to take hold of the monitor and lock the entire object. This lock is held until the method exits or wait() is called. If wait() is called, then the thread remains inactive until it?s lucky enough to receive a notify() or notifyAll(). Best be creating a DateFormat per thread given this behavior.
Over at the bar, a couple of greenhorns are pondering on why some JSP tags are hanging out for so long in their JVM. In the end they conclude that the tags are being cached using a soft or weak reference, which would account for their disappearance during a forced GC. A good profiler would put this one to rest once and for all.
On the question of whether or not very long variable and method names can affect performance, the bartender and some ranch hands step forward to set the greenhorn straight. Longer names do take slightly longer to resolve and do increase the size of the class file. Having said this, they all go on to point out that the benefits of readability far outweigh the possible benefits of using shorter names. And of course, one can use an obfuscator if one is interested in cutting down class file size.
Once the bartender had dispensed with this task, he was immediately put back to work explaining what a code coverage tool offers. His answer (borrowed from Borland) was that code coverage tools identify dead code that can be removed. Dead code is code that is unreachable. It also shows execution frequencies. That is, which methods are executed and how much time is spent in each one. This information is the basis of many a performance tuning exercise.
This month, a member of Sun's HotSpot team, Ken Russell, has visited JavaGaming. It would seem that a portion of the HotSpot team is now involved with high performance graphics applications. Ken's contribution is in response to a posting regarding the age-old problem of GC kicking in to cause a pause during redraws. What follows is a great question and answer session between Ken and one of the list members on future directions and thoughts within the HotSpot team. [see http://www.javagaming.org/discus/messages/27/1141.html, a fascinating discussion, still in progress.]
Also, Ken was nice enough to post the URL for the Grand Canyon demo. It?s at http://java.sun.com/products/fjc/tsc/articles/jcanyon. Finally, you can find Ken?s JavaOne talk at http://java.sun.com/javaone/, look for session number 3167.
Another question: is System.arraycopy faster than a for loop for moving data between two int buffers? The moderator Jeff provides the short answer that arraycopy is as fast as it gets as it uses special hooks to make a call directly back into the VM. This avoids the JNI data-copying overhead. Since Jeff plugs the relevant chapter in his book, I?ll include the URL here. Try http://java.sun.com/docs/books/performance.
Last but certainly not least is the venerable TheServerSide. As you can imagine, there are a number of posting requesting help getting ECPerf going. It seems as if people really are interested in seeing it work for themselves. I applaud their efforts. There was an interesting posting regarding the upper limits of Tomcat. The poster was noticing some performance problems when the traffic reached 400 users. A response did include a clustering suggestion. Unfortunately, bugs in the available plug-in may force one to consider the commercial alternative available from Borland. The estimated cost, $400/CPU.
For those of you who can read French, there is an article posted in the magazine Le Monde Informatique. For those of us who non parlez en francais, the essence of the article is summarized here. The article claims that CMP 2.0 is one to two years away from being ready for prime time. Although they offer few technical details, they claim that they were unable to achieve acceptable performance using CMP 2.0 in Weblogic 6.1 even after using two months of performance tuning help from BEA. In the end, they could not support 50 users. Comparisons were carried out against RDBMS access using Servlets and JDBC, Access via Weblogic and Toplink, and Versant.
And finally, there is a question regarding the possibility of keeping a transaction open over multiple pages. Once piece of advice offered was to cache the information in the HTTP session. Failing that, one could use brute force to store the state information in a local DB.
This month I?d like to introduce the Meadow Muffin Award designed to honor a vendor, group, or individual who publicly release or make outlandish performance related statements. The award honors two comedians, Delmar McGregor and Cecil Wiggins whose antics have forever altered the minds of Silicon Valley North. The rules for the award are fairly basic. The award will be presented on an as needed basis. The Round Up is looking for your nominations. Our fully staffed research team will verify all nominations. Please send all nominations to us.
The Java Pet Store (JPS) is an application designed to demonstrate J2EE technologies. As such, the code is easy to read and understand. Microsoft?s .Net group decided to take the JPS and use it to "prove" that .net out performs the J2EE. In the process, they collapsed JPS from three tiers to two, lost two-thirds of the code and claimed their version ran 28 times faster. All was well until Oracle and IBM picked up the gauntlet and tuned the JPS (whilst maintaining the original architecture) to obtain an 18 times increase over the .NET times ( http://otn.oracle.com/tech/java/oc4j/pdf/9ias_net_bench.pdf), and an even larger performance increase for IBM as reported by James Gosling at JavaOne. In recognition of their outstanding efforts, the first Meadow Muffin shall be awarded to Microsoft?s .Net group. Congratulations to all involved. [That would be the Microsoft marketing department who clearly believe that a tutorial is a real enterprise application -ed].
Back to newsletter 017 contents