Newsletter no. 29, April 28th, 2003

Java Performance Tuning
Java(TM) - see bottom of page

Tools: |

Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!

Training online: Concurrency, Threading, GC, Advanced Java and more ...

Newsletter no. 29, April 28th, 2003

JProfiler

Get rid of your performance problems and memory leaks!

Modern Garbage Collection Tuning

Java Performance Training Courses

COURSES AVAILABLE NOW. We can provide training courses to handle all your Java performance needs

Java Performance Tuning, 2nd ed

The classic and most comprehensive book on tuning Java

Java Performance Tuning Newsletter

Your source of Java performance news. Subscribe now!

Training online

Threading Essentials course

JProfiler

Get rid of your performance problems and memory leaks!

You may notice a new page layout to the JavaPerformanceTuning site. You can thank Kirk for dragging our look into the 21st century. Yes, I finally broke down and agreed to use tables in the web pages. Until now, we have pretty much stuck to HTML 1.0, no embedded graphics (except for the tool reports), no frames, no javascript, in fact generally almost nothing that slows down the download and page presentation.

Using tables doesn't slow down the presentation enough for me to worry about though. And there are no extra remote calls, just a little extra HTML code and a little more CPU required to lay out the page. But the look is much better (I hope you'll agree).

We are still tweaking the look, and any feedback is welcome. As a consequence of his hard work, we decided to inflict our interview on Kirk this month.

You might also notice, if you look down along the menu bar or down the boxes on the left of any page on the site, that we have a new training section. We've put together a set of excellent Java Performance Tuning training courses, and encourage you to look over the course outlines, accessible from our training page

A note from this newsletter's sponsor

Measure, analyze and maximize J2EE application performance
during load testing with PerformaSure. Read the Aberdeen
white paper "Honing In on J2EE Performance Assurance".

Well, sad to say, but I've given up trying to keep up. There are so many good articles coming out, it's like trying to hold back the tide. So instead, I'm giving you access to my backlog page. In fact, I'm even backlogged on my backlog page. I have lists of other URLs and I haven't even had time to check whether they should go on the backlog page!

I did manage to find the time to extract tips from articles across the board this month. J2EE, J2ME, bytecode, garbage collection, object design and Java 3D. That even included a three part series on J2EE performance tuning.

Of course all our regular sections are still here too. Kirk's roundup covering the interesting recent Java performance discussions not to mention McDonalds' performance metrics; and carrying on with Kirk, we interview him in this month's interview. Our question of the month follows up from last month's fairly controversial question of the month benchmark

Javva The Hutt rants about the future of computer languages; we have a new tool report on Quest's JProbe profiler; and, of course, we have over 70 new performance tips extracted in concise form.

A note from this newsletter's sponsor

Get a free download of GemStone Facets 2.0 and see how this
patented technology supports JCA 1.0 to seamlessly integrate
with J2EE application servers to enhance performance.

Tool Reports

Quest's JProbe performance profiler

News

Java performance tuning related news.

Reports

Survey of J2ME devices looking at performance, from last July

A note from this newsletter's sponsor

Java/J2EE performance or scalability problems? Do NOT buy additional
hardware, do NOT buy additional software, do NOT refactor your app.
JTune^TM GUARANTEES improvements with none of the above costs.

Tools

Older Articles

Jack Shirazi

Kirk's Roundup

I?m writing this column sitting at 37 thousand feet (11277 meters for the rest of the world) hurtling past Melbourne Florida at mach 0.81. For those of you who may not be all that familiar with the east coast of Florida, Melbourne is just south of the historic Cape Canaveral, home of the Kennedy Space Center. I?ve flown over this site many times and I always enjoy the bird?s eye view of the launch sites. It is from this site that the crew of the Challenger and the Columbia enjoyed their last minutes of terra firma as they prepared for their journey into space. It was only a short 500 years ago that that Ponce de Leon explored this area during his quest for the fountain of youth. At that time, these arduous voyages to the "new world" were fraught with danger. Many vessels and their crews were lost as the old world crossed the chasm between it and the new world. But, the dangers did not deter these adventurous souls.

Just as the Spanish galleons represented one of the most complex machines of it?s time, there is no doubt that the Space Shuttle is, to date, one of the most complex machines ever built. And just as even the most minor of failure on a galley often had catastrophic consequences, so too is this the case for the Space Shuttle. And just as the loss of these Galleys did not stop the flow of explorers from crossing the expanse between the continents, I firmly believe that the latest tragedy will not stop the flow of explorers ever eager to cross the expanse that separates us from the unknown.

JavaGaming.org

A while back, I asked if anyone had noticed that the -O option stopped doing anything Well, it comes as no surprise that a participant of the Java gaming discussion group noticed that because the byte codes were not longer optimized, class files were often larger than the original source files. It was upon making this observation that he went about looking for a better compiler. The most promising possibility is GCJ from the GNU foundation. GCJ is a compiler that can generate byte or native code from either source of byte codes. What follows is a lengthy review of the GCJ project in which the author describes the problems that were encountered and how they were solved. In the end one can conclude that GCJ is still not ready for prime time.

On a different thread, a post is asking for help in locating a class optimizer. The resulting discussion yielded two useful tips. First, class optimizers are pretty much only good for reducing the size of a class file. Secondly, most of the optimizations are done by the JIT. Now if you don't know how extreme game programmers are, consider this. The originator of the post was looking for a 135 byte reduction in size! I can?t imagine why but then again, I?m not on the gaming front line.

The JavaRanch

Over at the saloon at the Java Ranch, a greenhorn asked if there was any speed difference between ++j and j++. Though the answer would appear to be fairly obvious, the members of the discussion group did the prudent thing, they tested it. In one test, ++j ran in 8800ms and j++ ran in 5700ms. Here?s the code

            long start=System.currentTimeMillis());
            for (int i = 0; i < Integer.MAXINT; ++i);
            system.out.println(start-System.currentTimeMillis());
            start=System.currentTimeMillis());
            for (int i = 0; i < Integer.MAXINT; i++);
            system.out.println(start-System.currentTimeMillis());

These results would seem to suggest that the pre-fix and the post-fix operators are not equal in expense. But, the member switched the code about and reran the tests.

            long start=System.currentTimeMillis());
            for (int i = 0; i < Integer.MAXINT; i++);
            system.out.println(start-System.currentTimeMillis());
            start=System.currentTimeMillis());
            for (int i = 0; i < Integer.MAXINT; ++i);
            system.out.println(start-System.currentTimeMillis());

The results were 8800 ms for the first loop and 5700ms for the latter. The difference is most certainly due to the HotSpot optimization. So, although the micro-benchmarks can be useful, they can lead one to erroneous results if one is not careful.

In another discussion thread following on the same micro-benchmarking theme, the discussion centered around the speed of HashMap vs HashTable. This discussion was mentioned in an earlier round-up so, I will not repeat that discussion. What is new is that the sheriff started speculating. In this case, a greenhorn stepped in to remind everyone that garbage collection is non-deterministic and can interfere with timings. Also, it does take some time for HotSpot to decide that it?s worth the time to optimize a section of code. As mentioned before, this does have the effect of skewing the timings.

Having said this, micro-benchmarks can provide valuable information when they are conducted with care. But since care needs to be taken when running any benchmark, the results are not all that surprising.

A greenhorn posted a note about his application running the VM out of memory. In this case, the greenhorn was storing ResultsSets into a Vector. During the discussion, it became clear that not only should the amount of data being returned from the server be restricted, it was also seen that the Vector class was a less than optimal solution for the overall problem. Trouble is, Vector had been deeply embedded throughout the entire application. Consequently, the expense of replacing this class with a more appropriate one was prohibitive. One comment that came out was that is that greenhorn should be looking not to preserve a rigid architecture that was causing the problems. I can?t comment on the architecture but it does seem to me that the people who made the small decision to refer to a class instead of an interface did not take into consideration the long term effects of their seemingly trivial choice. But, more on that in a future column.

The Server Side

From the server side we have the typical number of application server centric questions. The first we?ll deal with comes from a list member that is trying to get Jboss to scale. The architecture in question is running with a webserver, JBoss and a database. The post offered no information on where the problem (if any) might lie thus making it difficult to provide a specific answer. Even so, some advice was offered that if followed would allow one to scale this architecture. The first step is to determine where the bottleneck exists. The can be achieved by running a stress test. Once the bottleneck has been located, one can try to load balance the system so that more of the constrained resource can be used to enlarge capacity beyond the critical choke point. The suggestion put forth was that this choke point would most likely be the database. In this case, moving the database off to it?s own piece of hardware is the solution of the day.

In another discussion that contained about as much information as you could ask for, the member was having trouble with his application hitting a hard performance wall. The application relied heavily on JMS. The client uses RMI to hit a stateless session bean. In turn, the bean passes the request to MDB. The MDB then processes the request. In all, the request passes though three queues. Though the processing times stay fairly constant, the rate cannot be increased. This result had been seen on several different systems using several different application servers and JMS products. One hypothesis that was put forth which has merit is that the system has maxed out on the number of requests it can marshal and un-marshal. The poster noted a rate of 50-60 messages per second. For the sake of argument, lets assume that 50 messages per second is the rate. If we assume a single CPU system, then we will realize a 20ms per message service time for marshalling, transference, and un-marshalling. If the queue is synchronized, then one must consider that cost also in the 20ms. What is known is that messaging systems are sensitive to both the size of their payloads as well as the rate of requests. What is not known is how large this payload is. So, is this number unreasonable? It?s hard to say. What is certain is that it?s not adequate.

Finally, performance at the bottom line

Do you know the effects of performance on your company's bottom line? (Now ex) CEO Mr. Greenberg of McDonalds does! His estimate is that a six second reduction in service time at the drive through window results in a 1% increase in sales.

Kirk Pepperdine.

Javva The Hutt

Paul Graham's talk on the evolution of computer languages is the kind of article that I call an "idea generator". The article itself would be pretty mediocre if you took out the basic idea: that of trying to predict the future of languages. But that's being mean, it's like saying that a sandwich is pretty bland if you take out the filling. Instead, I guess I should say that this is a great article because it really gets you thinking about the future of computer languages.

Is Java a dead-end like he thinks? I think that Java, as an offspring of Smalltalk, took one major feature which will be seen in languages of the future, the virtual machine. In that sense, I think you'll see Java as a core predecessor to any future language. Another major feature is HotSpot technology. Running an automated profiler on the code while it is running, automatically determining the bottlenecks, and optimizing them, all automatically. Of course you might say that this is not part of the language, but I think it is.

I don't think Strings being a separate data structure from lists is a case of premature optimization. Strings are fundamental to people because we do so much talking, writing, etc. In fact, one of the really great things about Perl is way that you could treat numbers as strings and strings as numbers a great deal according to the context. Strings are fundamental to people, I don't see any successful future language treating Strings as just another list. It is likely to stay a special type. It is extremely convenient to have it a special type. A vast number of programs out there are essentially string manipulation programs, simply because so much useful data is managed as Strings

Actually, I disagree with almost all of his specific predictions. For example, parallelism. Years and years ago I worked on a research project that looked at automatically parallelizing programs. We found that it was feasible for a large class of very complex cases. And it will only be the really long-running programs that need it in 100 years time. I expect it would be a parallel version of HotSpot technology. After a few seconds the VM identifies that the program needs to be parallelised, so it runs a parallel task of parallelising the running program. Just like HotSpot, it would probably spawn off parallel tasks bit by bit, as it recognized them.

What's funny is that he says

our ideas about what's possible tend to be so limited by whatever language we think in that easier formulations of programs seem very surprising

then goes on to use the idea of how we currently write a program as an example of how programs may be written. I think that programs in a hundred years will be written the way we handle instructions to other people. Simply put, when you want someone else to do something, you have two routes: you explain what you want done, or you give an example of how to do something and tell them to do the same but more or slightly different. You often do both of these. I think that typically you'll run through an example of what you want, then leave the computer to get on with it. Like recordable macros we have now. One of the great things the turned me on to emacs was the ability to record a sequence of edits, then just tell the thing to get on and do it to all to the rest of the doc. Couple that with a complete record of all the changes made, together with the ability to back out of any single change, and something that understands what the expected result should be like so that anomalies can be detected, and we would be where I expect programming to get to.

So, my prediction is clearly that the primary programming language in 100 years will be an intelligent macro recorder, runnning on a VM with a built-in optimizing parallelizing profiler, with every operation executing as an atomic transaction. And there will be only three data types: Strings, Numbers, and Composites.

All in all, a really great article for its main idea. It got me thinking, and just because I disagreed with a lot of what he said shouldn't change the benefits I gained from it.

Diary of a Hutt

March 5. Project Xenon. Who thinks up these stupid names, I ask you? In this case, it was probably Frezian. In what way this project resembles a noble gas, I have no idea. Perhaps Frezian liked the idea of having a "noble" project. Perhaps someone else thought he was full of gas. In any case, why didn't they start with Helium or at least Neon. That way they could have worked their way up to Xenon across project versions or new projects. Sort of like those stupid nineties project names from Microsoft which were cities on the way to Mecca. Boy, that was sure calculated to give offence at some point. Did you know that it's a secret project? Gotta give credit where credit is due, Frezian really knows how to get internal publicity. I would guess that he sent the photocopied memo to half a dozen people, carefully leaving one or two extra photocopies "accidentally" by the departmental photocopier. Every conversation that I heard that day seemed to start with "Hey, have you heard of the hotshot new hush-hush project in I.T.? I hear its called Project Xenon ..."

March 12. Naturally, I'm involved. It's a Java project, and I'll be handling the performance specifications. That little tick Weevil is doing the QA program. We've already had our first run in, about testing resources. He suckered both Frezian and HasntGotAClue into believing him. "As we can have only one dedicated test environment, of course we'll have to share this between the QA testers and the performance testers". Fair enough, I agreed, but then came the sucker punch: "Of course the QA testing needs manual oversight because of the frequent errors that break the tests, whereas performance tests can easily be run unattended overnight and analyzed the next day". I couldn't try to deny we could run automated tests, after all, that's one of the productivity benefits of my management that I've pushed often enough. So there I was, committed to years of night time access to the test environment. I could see Weevil was savouring that. The word "Slimeball" popped into my mind. Well, I'll have to come up with some strategy for future projects to avoid this, but for the moment I guess I'm up the proverbial creek without a paddle.

March 19. I've been working with HR to see if we can offer Boris and Brainshrii the option of flexi-time. Being a progressive kind of person, I feel that certainly I trust these excellent employees to adjust their working patterns to their natural bio-rhythms. In fact, I feel sure that their productivity will rise as a result of moving to more flexible working practices. Brainshrii hasn't started yet, but he seemed quite positive towards the idea when I spoke to him. He's even less intelligible on the phone than in person, but he's clear as crystal in an email. Boris seemed quite positive towards the proposed change in employee terms and conditions.

March 26. Of course, its not yet a done deal. Apparently, these things need to go through legal checks, and the employee needs to be made fully aware of how any changes in contract may affect him. But it looks like I'll be able to sleep comfortably at night knowing that my efficient lieutenants will be ensuring that any performance testing is going according to plan. Unfortunately, I probably won't be able to be present as my non-flexi-time contract would place any night-time activity into my over-time bracket. And that would be an unnacceptable expense for HasntGotAClue.

BCNU

Javva The Hutt.

The Interview: Kirk Pepperdine (Mr. Roundup)

Kirk Pepperdine is our Roundup columnist, lurking and occasionaly thrusting answers into discussion groups. Behind the scenes, Kirk contributes more to the JavaPerformanceTuning.com website, and this month, we took some time from his schedule to ask him some questions.

JPT: Tell us a little bit about yourself.

Kirk: I started my career as a Biochemist. I wasn?t all that good at biochemistry but I managed to get interesting jobs because of my abilities with computers. At that time, small computers were just finding their way into laboratories. So, after a brief stint as a Biochemist, I returned to school just in time to be included in the first stream of a CS program that was focused on OOP. After that, I started working with large distributed systems and in the process was part of group that introduced objects into an organization in the Canadian government. That organization had a real focus on application performance, something that I found very appealing. It was originally a bit of a culture shock to move to industry and find that ROI concerns drove performance to a secondary status. ROI is much more measurable in industry then it is in research and the focus switches from that of discovery to one of profit.

JPT: How did you come to be involved with JPT.

Kirk: It all started when Jack and I were engaged to performance tune a banking application. During that time, Jack and I combined our experiences to devise a process to follow. We found that the dynamics of the environment forced us to not only plan for technical challenges but for managerial challenge also. At the beginning, there was the usual mindset that tuning would just somehow happen. Once that process failed, Jack and I were able to put forward the plan which to their credit, management accepted with a fair degree of enthusiasm. From this experience, Jack was able to provide what I believe is the differentiating material between his and all other performance tuning books that I?ve read. That material covers management aspects of planning for performance. When Jack started the website and subsequently, the newsletter, I was only too happy to contribute in some small way. More recently, Jack has allowed me to take a bigger role helping him revamp the site and to create a number of specific services that focus on helping organizations build high performance applications and/or do more work with fewer resources.

JPT: What are the most common problems that you see organization struggle with as they attempt to scale their application?

Kirk: Well, this is a multi-faceted question. Let me start by saying that most organizations that I?ve had involvement in have the raw talent to produce efficient applications, though I must emphasize the adjective raw. Another factor is focus. Those most capable of performance tuning an application are often those whose overall skills are already in high demand by the organization. Thus, they are not readily available to performance tune an application unless the situation is critical. Even if the resources are available, they often lack the experience to properly plan for an exercise which they often under-estimate the size of the task and only discover after starting that they need resources that they don?t have, have not budgeted for, or otherwise will have difficulty obtaining. Another facet is having staff that have an understanding of the specific products being used. In fact, Candle just recently held a webinar in which they polled the (well over 100) attendees as to what the most their most pressing concern was. More than 54% responded that lack of product specific training was their number one concern. With the push to development business requirements being what they are, it is understandable why it is difficult for application programmers to find the time to learn about a specific application or environment to the extent required to technically tune a complex piece of software such as a relational database or an application server.

JPT: So, you don?t feel technical problems are a risk factor?

Kirk: For sure technical obstacles are at the root of the risk. But, with everything, the exact level of risk depends upon the capabilities of the person dealing with them. This is where training and experience come into play. For a manager to be able to mitigate risk, he first has to recognize that risk. If the experience level is low, then it becomes difficult to filter out real from perceived risk. Some of that can be made up with training but as helpful as training can be in jump-starting a process, having the right experience can make a bigger difference.

JPT: In your years of performance tuning, have you seen a change in the technical risk factors?

Kirk: Well, yes and no. I?ll address the no answer first. In the number of years that I?ve been addressing performance problems, the underlying technology has basically looked the same. You?ve got bits of memory, each with a different associated cost of storing and retrieving data. This ranges from I/O devices such as networks, disk drives, tape drives to SSD, and various types of random access memory up to on-CPU cache memory. Though the size and access speeds have changed on all of these devices, they still suffer from about the same rates of latency in relation to each other. In the case of the CPU, we?ve seen a dramatic increase in speed relative to the increase in drive speeds. At any rate, all of these physical devices represent scarce (or expensive) resources and this has not changed all that much. On the yes side, we?re now seeing much more sophisticated use of hardware. ASIC technology is now cheap enough for it to be economically embedded into price sensitive consumer electronics. Things like FPGA and Xlinx?s chips allow one to move critical functionality down to a hardware level without having to incur the expense of running a Fab. But, this technology is still pretty exotic. From a software perspective, to this day, Fortran compilers are still regarded as the best for scientific computing. I remember when Cray started producing their C compiler. The memory management headaches lead to some interesting performance problems, which they had to resort to using pragma to solve. This required that the programmer, which in many cases were primarily mathematicians, to have a very strong knowledge of the underlying hardware. I think that it?s generally accepted that people don?t want to have to understand when your code may cause excessive instruction buffer faults. To write a large business application in C required a level of expertise that is just difficult to come by. This is but one of the reasons that Visual Basic has been so successful. With the general acceptance of Smalltalk, it looked as if the business community had finally found the elusive environment they had been looking for to replace creaky Cobol. It was quite interesting to watch Java knock out it?s growth curve before it reached escape velocity. From a programming point of view, I still prefer the normalized view that Smalltalk presents, but Java introduces a number of concepts that were lacking in Smalltalk. They also both use a virtual machine which further removes your application from the hardware. So, as each technology has been introduced, it has solved some problems and quite naturally, supplanted others with it?s own.

JPT: What are you personally bringing to Java Performance Tuning?

Kirk: Actually nothing. What I am doing is injecting a new energy into things that Jack and I have been discussing for a couple of years. Our different career tracks have prevented us from really developing these ideas. Though I said nothing at the start, at the moment, it?s difficult to ascertain who?s developed what. This is because Jack and I have enjoyed a certain synergy that has allowed us to take each other?s ideas to that next level. From that standpoint, it?s been a very enjoyable working relationship.

JPT: What do you hope to achieve with Java Performance Tuning.

Kirk: There are so many things to be done that it?s hard to remember that one can?t do it all. First and foremost, I would like to create a forum in which I can continue to do what I really enjoy doing, providing reductions that allow applications to scale. If I can do this, then I know that Java Performance Tuning can grow into an organization that can make at least a small difference in the industry. Jack and I see this coming though the introduction of boot camps, training courses, mentoring and plain old getting our hands dirty rooting around in code and environments. The first step is a revamping of the website. Right now the only commercial look is the three sponsor slots that are used to help offset the costs of running the site. Though Jack and I are very concerned about maintaining the site?s unique character, we feel that in order to support our greater vision, we need to institute some changes that reflect that vision. Having said this, Jack wrote an open letter admonishing JavaWorld for closing their archives (they subsequently reopened them). In that letter Jack pledged to keep JavaPerformanceTuning.com open, and to keep adding fresh material to it. We are both totally committed to maintaining that pledge. So, although the site?s look and feel will change a bit, it will still be totally open as it is today.

JPT: You mentioned boot camps. What are they?

Kirk: Boot camps are a mixture of course material and lectures. The idea of a bootcamp is to use mentoring to reinforce materials delivered in short more traditional course environments. The difference here is that students get a real chance at understanding a lot more of the material than they normally would. In a standard course, material is presented; hopefully about half really understand what is going on and the rest struggle. This is not to say that courses are bad, it?s just that learning takes think time. By stretching out the training session and reinforcing it with on-site mentoring, we can achieve a much greater boost to a team.

JPT: So, you don?t see traditional courses as being effective?

Kirk: I wouldn?t say that they are not effective, they are just not as effective as a format where the material is reinforced. Having said this, the traditional course format has its uses. For one thing, traditional courses are effective in situations where mentoring is impractical. Certainly I prefer the boot camp approach to training but, there are many situations where a traditional course is best.

JPT: In addition to training, you talk about service products. Can you tell me a little about the ideas behind these products?

Kirk: After doing a number of tuning engagements, you start to see patterns develop. The tuning service products take advantage of these patterns to provide managers with something they understand. After all, tuning is a voyage of discovery. You profile, see what?s wrong, try to figure out why it?s that way, and then embark upon fixing it after which you need to repeat the process until the performance characteristics are within tolerance. This is the equivalent to a while (! fast enough) loop. So, how many iterations will it take for the exit condition to be meet? It?s often a difficult question to answer, something that managers don?t like to hear because it means we can?t predict something, and therefore we can?t really control it. In other words, it's a risk that we don?t know how to assess. What the service products do is try to provide a bound for the time frame. Because we have experience doing this over many projects, we can take advantage of patterns we?ve seen. Most project teams simply lack the number of data points required to fetter out these patterns.

JPT: Do you see yourself pushing out into any other areas?

Kirk: As a matter of fact yes. I have just started an effort to build a tool with the help of another developer that I?m pretty excited about. It should really allow one to get a unified look at the entire Java runtime environment. But you'll have to wait a little while before I give out the details on that tool. When it's ready, you'll find it at JavaPerformanceTuning.com.

JPT: Thanks for you time. (End of interview).

Is the empty loop benchmark a: cheat/trick/April fool joke/real/... ?

Extraordinarily, the Java JIT Compiler optimization which enabled last month's benchmark to work as listed there has been backed out in version 1.4.2 of the JDK. So sadly, that benchmark only works with 1.4.1 JDK, which unfortunately makes that already controversial page somewhat useless.

Last month, our question of the month on the performance of Java against other languages showed an empty loop benchmark (Note that if you tested the initially published version, there was an editing mistake that meant it didn't work as advertised, but the version listed now should). This benchmark lead to some expressions of annoyance, disbelief and downright disgust. This month we'll consider those concerns.

The empty loop benchmark from last month's QOTM is NOT an April fool joke. It is completely serious (sort of). Try it, you'll see it works. Read the code, it is simple and comprehensible in all the languages. There is no coding trick involved.

The source of any confusion you might have about it is that, like all benchmarks, this benchmark is misleading. It is not measuring quite what you expect. If you read the code, you probably assume that the benchmark measures the time taken for any language to iterate a loop counter. In fact the benchmark measures the ability of a language to optimize an application by eliminating inefficient instructions. The Java server mode JIT compiler eliminates the empty loops completely from the JIT-compiled code. Had I chosen to use the client mode 1.4.1 JVM (client is the default mode), the loops would not be optimized away. Had I chosen to use Microsoft's C++ compiler, the C version would take no time. Clearly, the choice of compiler is critical.

The infamous PetStore .Net vs J2EE benchmark actually measured whether Microsoft's technicians or those from TheMiddlewareCompany were better at optimizing a toy version of an enterprise application. The empty loop benchmark effectively measures the maturity of Java's HotSpot compiler. It shows how Java's HotSpot compiler is now mature enough to start competing with the mature compilers of other older languages.

You might say that this is a trick, that empty loops are a special case which Java specially recognizes. Certainly it is true that in the past C compilers have been built to recognize well known benchmarks and produce specially optimized code for those benchmarks. Has Java done the same here? No the HotSpot compiler does not have a "special case" optimization for this benchmark. At a high level HotSpot can look for code which does not affect the state of a program, and can optimize that code to eliminate redundant instructions. For example, calls to empty methods are quickly eliminated by HotSpot. Instructions which simply assign values to variables can be eliminated by HotSpot if those values are not subsequently used. The empty loop benchmark demonstrates a general class of optimizations that HotSpot can apply. Note that the empty loop is not a No-op. There is an integer increment. The compiler needs to recognize that the variable does not affect the state of program and can safely eliminate it.

Can other languages apply these optimizations? Yes of course. Every programming language ultimately converts source code from that language into instructions that some computer understands. One way of characterizing the efficiency of a language is to take comparable source codes, and determine the efficiency of the subsequent machine level instructions issued by the language runtime execution. Comparable source codes could produce identical machine level instructions if the languages have support for the optimizations applied. In the benchmark that is what I did. The results are that some languages with some compilers do not provide optimal machine level instructions. Java with server mode HotSpot does. Microsoft C++ compiler does too, and since this optimization is fairly basic I would actually expect many other compilers have the capability too.

So does all this mean that Java is really faster than other languages? Hundreds of times faster? Well, that was just being cheeky. The truth is that Java is faster for some things. I have no doubt that there are other things it may not be faster for. Personally, I now find Java fast enough for everything (though like any program in any language you may need to do some tuning to get there). And I expect that sooner or later, it will be faster for everything that matters. Which sounds pretty good to me.

Finally, there were some of you who wanted a complex application comparison of Java vs C/C++ for some reason or other. You can find one over at http://www.rolemaker.dk/articles/evaljava/. But you're fooling yourself if you think such a comparison makes any difference. If you are trying to justify your language (whichever) based on it's speed compared to other languages, then have fun. The only language you can justify is Assembler (which is basically prettified machine code). Which is why game coders have traditionally written their tightest loops in Assembler. If you intend to switch to become an Assembler programmer on the basis of it's speed advantage, good luck to you. Perhaps I can recommend SIMPLE instead:

SIMPLE

SIMPLE is an acronym for Sheer Idiot's Programming Linguistic Environment. This language, developed at the Hanover College for Technological Misfits, was designed to make it impossible to write code with errors in it. The statements are therefore confined to BEGIN, END, and STOP. No matter how you arrange the statements, you cannot make a syntax error. Programs written in Simple do nothing useful. They thus achieve the results of programs written in other languages without the tedious, frustrating process of testing and debugging.

From an article which appeared in the November 2, 1984 edition of the Waterloo mathNEWS, Author unknown.

The JavaPerformanceTuning.com team

Tips

http://www.informit.com/isapi/product_id~{AFB2EC69-5AF9-46A4-BC64-5AB282552FF2}/content/index.asp
J2EE perf tuning 1 (Page last updated 2003 March, Added 2003-04-28, Author Steven Haines, Publisher informIT). Tips:

The goal of performance tuning is to service more users faster and not break down in the process. This requires maximizing: Concurrent users; Throughput (transactions performed per second); Reliability.
Load test the application running in your application server and estimate the number of concurrent users it can support before requests start failing and/or the response time does not meet your requirements.
Ensure that you have representative transactions that reflect the real use of your application; If it is not representative, there are no guarantees that your application will stand up to real users.
Tune to maximize the throughput of the server.
Tuning includes minimizing the number of failed requests. All servers will produce some failures mostly due to network latency or timeouts.
You need to tune both the application code and the application server configuration. You also need to tune any resources these two depend on, so that they are not waiting on external resources (such as the database).

http://www.informit.com/content/articlex.asp?product_id={F35A459F-C8EF-49D5-9F14-01A25768DBE3}
J2EE perf tuning 2 (Page last updated 2003 March, Added 2003-04-28, Author Steven Haines, Publisher informIT). Tips:

Analyse (expected) user activity and generate load tests based on that activity.
Measure performance metrics from the application, application server, underlying platform, and any external resources.
A load tester can: perform user-defined transactions on a system with a given frequency; control the number of simultaneous users in the load; simulate user think-time between requests; and increase the number of users in a test according to a defined rate.
Statistics to gather include: Application Server memory usage, database connection usage, thread usage; Application class or method response times, call paths, exceptional methods; Application total transaction rates and requests rates; Platform CPU, processes; Database performance; legacy system performance.
JMX provides a standard mechanism for obtaining configuration and runtime information for Java products. Application code needs to be instrumented to obtain relevant application performance statistics.
Tuning is an iterative process: Start with a configuration that "looks good," load test the system, observe performance, change parameters, and start over.
Pay particular attention to the concurrent user load and the transaction throughput of the system.
The greater the throughput, the better the performance.

http://www.informit.com/content/index.asp?product_id={5E454C58-FD93-4080-937C-EA4372B9D337}
J2EE Performance Tuning, Part 3: Application Server Architecture (Page last updated 2003 April, Added 2003-04-28, Author Steven Haines, Publisher informIT). Tips:

The size of your connection pools can greatly impact the performance of your application. When tuning, JDBC connection pool size is one of the factors with the highest impact on your application's overall performance.
Tune to minimize activation and passivation of cached beans, and to maxmimize cache hits.
The size of bean pools must be large enough to service the requests from business processes without having to wait for a bean before it can complete its work. If the pool size is too small, there are too many processes waiting for beans; if the pool size is too large, you are using more system resources than you actually need.
Two of the most influential tuning parameters of Stateless Session Bean and Message Driven Bean pools are the size of the pools that support them and the number of beans preloaded into the pools.
Tune the session timeout to avoid retaining unused resources too long in the server, while avoiding inconveniencing slow users.
If JMS thresholds are too low, messages will be lost; if JMS thresholds are too high and the server is used to an excessive upper limit, it can degrade the performance of your entire system.
JMS tuning parameters include: Message Delivery Mode (persistent or non-persistent); Time-to-live (defines a expiration time on a message); Transaction States; Acknowledgments.
EJB choice of method transaction levels affects performance: Supported is the lowest cost; Required is safe, but a little more costly; and Requires New is probably the most expensive.
The size of the application server thread pool limits the amount of work your application server can do; the tradeoff is that there is a point at which the context-switching (giving the CPU to each of the threads in turn) becomes so costly that performance degrades.
The JVM heap size is important to performance. The rule-of-thumb is to give the application server all the memory that you can afford to give it on any particular machine.
Your goal when tuning garbage collection is to size the generations to maximize minor collections and minimize major collections.

http://www.javaworld.com/javaworld/jw-03-2003/jw-0307-j2segc.html
1.4.1 garbage collectors (Page last updated 2003 March, Added 2003-04-28, Author Greg Holling, Publisher Javaworld). Tips:

[Article describes the six garbage collectors available from 1.4.1].
Incremental garbage collection (-Xincgc) provides shorter, but more frequent pauses for garbage collection. Overall garbage collection may take longer.
The parallel (multithreaded) algorithms (-XX:+UseParNewGC, -XX:+UseParallelGC) are optimized for machines with multiple CPUs.
The parallel scavenging collector (-XX:+UseParallelGC) is optimized for very large (gigabyte) heaps.
The concurrent garbage collection (-XX:+UseConMarkSweepGC) allows the stop-the-world phase to be as short as possible, which means that application pauses for garbage collection should be minimized.
If you have a single-processor client machine and are having problems with pause times in your application, try the incremental garbage collector
If you have a single-processor server machine with lots of memory and experience trouble with application pause times, try the concurrent garbage collector.
If you have a multiprocessor machine, especially with four or more processors, try one of the parallel garbage collection algorithms. These should significantly decrease pause times. If you have lots of memory (gigabytes), use the scavenging collector; otherwise, use the copying collector.
Don't even consider changing GC parameters until you've profiled and optimized your application.

http://www.javaspecialists.co.za/archive/Issue064.html
Disassembling Java Classes (Page last updated 2003 January, Added 2003-04-28, Author Heinz Kabutz, Publisher Kabutz). Tips:

Disassembling i++ and ++i show that these have identical code, so neither one can be faster than the other.

http://www-106.ibm.com/developerworks/java/library/j-jtp02183.html
Immutable objects (Page last updated 2003 February, Added 2003-04-28, Author Brian Goetz, Publisher IBM). Tips:

You can share and cache references to immutable objects without having to copy or clone them
Immutable objects are inherently thread-safe, so you don't have to synchronize access to them across threads.
The Flyweight pattern employs a factory method to provide immutable objects, giving out the same instance for objects of the same value.
util.concurrent.CopyOnWriteArrayList creates a new array when the list is modified. This allows iterators to be immutable and therefore traversed without synchronization or risk of concurrent modification, eliminating the need to either clone the list before traversal or synchronize on the list during traversal. If traversals are much more frequent than insertions or removals such as with event listener classes, CopyOnWriteArrayList offers better performance than ArrayList.

http://www.theserverside.com/resources/articles/RodJohnsonInterview/article.html
Rod Johnson interview (Page last updated 2003 February, Added 2003-04-28, Author TheServerSide, Publisher TheServerSide). Tips:

Several benchmarks have indicated that using entity beans usually leads to poor performance, yet many developers simply ignore this. Use entity beans when required, not just in every case.
Performance is an important business requirement, and systems that don't perform don't meet business expectations.
Sometimes a stored procedure can provide much more efficient persistence (and more concise code) than Java code.

http://www.theserverside.com/resources/articles/RodJohnsonInterview/JohnsonChapter4.pdf
Chapter 4 of Expert1on1, Design Techniques and Coding Standards (Page last updated 2003 January, Added 2003-04-28, Author Rod Johnson, Publisher Wrox). Tips:

Program to an interface, not an implementation. There is a slight performance penalty for calling an object through an interface, but this is seldom an issue in practice, whereas the ability to change the implementing class of any application object without affecting calling code allows performance improvements to be made easily an selectively
The disadvantage of parameter consolidation is the potential creation of many objects, which increases memory usage and the need for garbage collection. Objects consume heap space; primitives don't.
Consolidating method parameters in a single object can occasionally cause performance degradation in J2EE applications if the method call is potentially remote (a call on the remote interface of an EJB), as marshaling and unmarshaling several primitive parameters will always be faster than marshaling and unmarshaling an object. However, this isn't a concern unless the method is invoked particularly often (which might indicatepoor application partitioning ? we don't want to make frequent remote calls if we can avoid it).
Code that uses reflection is usually slower than code that uses normal Java object creation and method calls, however, this seldom matters in practice, and the overhead of reflection is usually far outweighed by the time taken by the operations the invoked methods actually do.
Unnecessary optimization that prevents us from choosing superior design choices is harmful.
The overhead added by the use of reflection to populate a JavaBean when handling a web request won't be detectable.
In some cases such as when replacing a lengthy chain of if/else statements, reflection will actually improve performance.
StringBuffer is more efficient than concatenating strings with the + operator.
Console output may also seriously degrade performance when running in some servers.
Correct use of a logging framework should have negligible effect on performance
It's important to ensure that generating log messages doesn't slow down the application. If a log message might be slow to generate, it's important to check whether or not it will be displayed before generating it.
Log settings that show the class, method and line number should be switched off in production, as it's very expensive to generate this information
Writing log messages to the console or to a database will probably be much slower than writing to a file.
All logging packages should allow automatic rollover to a new log file when the existing log file reaches a certain size. Allowing too large a maximum file size may significantly slow logging, as each write to the file may involve substantial overhead.

http://wireless.java.sun.com/midp/ttips/optimize/
J2ME Optimization Tips and Tools (Page last updated 2002 November, Added 2003-04-28, Author Eric D. Larson, Publisher Sun). Tips:

Obfuscation is a great way to reduce the size of your finished product.
Code optimization should be postponed until the very end of the development cycle.
Use System.currentTimeMillis() to accurately determine the amount of time it takes to execute a given block of code.
The J2ME Wireless Toolkit version 1.0.4 includes a profiler tool. The profiler is the main tool used to pinpoint performance problems.
java.lang.Runtime class's totalMemory() and freeMemory() methods are useful for monitoring your heap size.
The 1.0.4 version of the J2ME Wireless Toolkit allows you to set the heap size, along with storage size (for RMS), and also the VM speed emulation and network throughput emulation to help you gauge performance of your application in your development environment before you deploy.
Version 1.0.4 of the toolkit also includes a real-time memory monitor.
To see when the system is performing garbage collection, enable the toolkit's Trace Garbage Collection option.
To keep heap space free, be sure to set objects to null as soon as you're done with them.
Set an Image to null after a paint to free up a good chunk of memory.
(J2ME only tip) Explicitly call the System.gc() method to manage the GC schedule. Instead of just letting the system garbage-collect at its own discretion, try calling System.gc() when you know the user will be reading a screen and thus won't be interacting with the application immediately.

http://www.sys-con.com/java/article.cfm?id=1855
Java 3D (Page last updated 2002 July, Added 2003-04-28, Author Dan Pilone, Publisher JavaDevelopersJournal). Tips:

Be aware of your memory allocation particularly in behaviors. To many garbage collections will destroy the 3D flow.
Carefully lay out your scenegraph and be aware that rendering is happening in a separate thread. Java 3D could render frames between updates, causing strange inter-frame effects.

Jack Shirazi

Last Updated: 2026-03-30
Copyright © 2000-2026 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/newsletter029.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us