The Roundup September 2003

With all of the emphasis on Intel and gigahertz clock speeds, you might think that the need for super computers (such as those that were produced by Cray) is a thing of the past. Though still a niche market, super computers and super computing research is alive and kicking. You can find the evidence at www.top500.org, at site that lists the top 500 super-computers. Take for example, the number one super computer according to the ISC: Earth Simulator.

In March of 2000, the Japan Atomic Energy Research Institute (JAERI), National Space Development Agency of Japan (NASDAJ), and the Japan Marine Science and Technology Center (JAMSTEC) started construction on the Earth Simulator super computer. This computer is dedicated to modeling the changes to atmosphere, oceans, and solid earth. By February 2002, all 640 processor nodes of the simulator were operational and the Earth Simulator Research and Development Center verified a sustained performance of 7.2 Teraflops. According to ISC at their 2003 super computing conference, the Earth Simulator, at 35.86 teraflops sustained, is now the fastest computer on the planet.

Even more interesting, Linux based clusters now comprise more than one third of the top 500 super computers, and the MCR Linux cluster (Xeon 2.4 GHz Quadrics cluster) running at 11.2 Teraflops at Lawrence Livermore National Laboratory ranks number 3 on the list . Though most feel that super computers based on off the shelf components will dominate the list, at least one expert, Horst Simon, has demonstrated that this strategy does have it?s limits. To further this point, companies like Acconovis, Cray and NEC still rely on proprietary processors and even many of the Linux clusters rely on proprietary inter-connect hardware.

As is always the case, the work done in this arena eventually does trickle down to our world, the business domain. Currently, much of this technology is being applied to Grid computing. This important move has not been lost on many of the software vendors including Oracle Corporation. Though clearly just a clever marketing ploy, with the 10g label, Oracle is in the process of rebranding all of their technology to match the march into this new form of high performance computing. Clearly, they see the future in the commoditization of these exotic clusters of hardware. But, beware of marketing because in order for software to take advantage of this new breed of hardware, it will need to undergo radical changes in design and architecture. But, this is not news; it?s a continuation of a trend that has always been with us. And with that note, lets move back to our reality by pulling up a chair at the Big Moose Saloon at the Java Ranch.

The JavaRanch

What is the name of that theory that after so many postings, the topic of discussion will have changed? Well anyway, what starts out as a discussion on how to store lots of integers turns into an interesting observation on the effects of garbage collection on a micro-benchmark. In this case, it was the code fragment below that brought an abrupt change in the topic of conversation.

  public long getFreeMemory() {
    // make sure to garbage collect before the measurement
    for (int i = 0; i < 10; i++) {
      System.gc();
    }
    Runtime r = Runtime.getRuntime();
    long freeMem = r.freeMemory();
    return freeMem;
  }

In this code fragment, we see that System.gc() is called 10 times before free memory is measured. The question is, why should one repeatedly call System.gc()? After all isn?t once enough? It turns out that the answer is a bit complicated. The first point is that programmatically calling for a garbage collection is only taken as a hint. The VM is not actually required to perform a gc. Secondly, the call for garbage collection may only result in a partial collection. Consequently, many partials may be needed before garbage collection can be eliminated as a factor in your micro-benchmark. The one thing that I found to be missing is a call to sleep. Since garbage collection runs in it?s own thread, I?ve often found that sleeping for a short period of time (500ms to 1 second) allows the garbage collector to complete it?s job before I start in and take some measurements. On last point, using the ?verbosegc is quite necessary in the experimental stages as it will tell you if garbage collection is interfering with your benchmark.

Sometimes the best code is the code that we don?t write. That is the theme of a discussion that was triggered when someone asked about removing duplicates from a CSV file containing email addresses. In this case, performance over ?time to market? was even less critical as this was billed as being a one-time task. Here, performance is defined as the quickest solution that one can produce to solve the problem. To solve the problem, one could use a quick java program (put the values into a hashmap) or rely on an external tool such as a macro in excel (after all, the data is csv) or even use cygwin or Unix tools such as sort and uniq. In the end, this question is not about performance, it?s about getting results. Performance is about meeting a users requirement for delivering results in a timely manner. This may, or may not involve a lot of effort. One thing is for sure; the payback must be worth the effort or you're certainly misallocating resources, in this case, your time!

The Server Side

Though our first posting from the ServerSide is loosely related to performance, it does bring up some interesting points about EJBs. The question is poised around the observation that a log of high performance computing relies on asynchronous calls and threading. Why does the EJB specification specifically prohibit these mechanisms? One of the first responses suggested that message beans were there to provide this asynchronous mechanism. Though messaging can be used in this manner, it is a very heavy protocol to achieve this effect. And, it does not solve the problem of threading. The issue with threading stems from the need for the server to be able to manage it?s own resources. With threading, there is a distinct possibility that a caller may return and leave a thread running. This running thread (which is outside of the control of the container) would leave the server in a state where it was unable to manage resources. On top of this, EJB was designed to be a service-based architecture. The caller requested a service and then waited around until the service completed. In this model, one can still thread, just not in the server itself. Just to show how the technology is maturing, IBM is now working on asynchronous beans. Though they may add additional complexity, it?s clear that both threading as well as asynchronous capabilities may offer some nice performance gains for EJB technology.

If you believe that capacity/availability planning, is a black art, then you?re only somewhat correct. As is shown in one post, if you follow a number of guidelines, you can effectively plan for capacity and availability. As was posted in a response, one needs to characterize ?the nature of the workload.? This is usually quantified by determining the number of concurrent users, the type of work that they do, and how often they perform work. It was pointed out that the number of concurrent users may need to be translated into the number of requests made to the system. For example, one business operation may result in four round trips to the server. Once the average unit of work has been determined from the aforementioned information, one can estimate how much bandwidth, CPU, and I/O is required. From the answers to these questions, you can now begin to estimate how much hardware is required to host your application. One last point, be sure to include some space for growth and effective monitoring of a system can help with future capacity planning exercises.

JavaGaming.org

At www.javagaming.org (now a java.net community), there is a fascinating thread in which the results of a micro-benchmarking exercise are discussed. In this thread, the originator lays out results for a micro-benchmark that he performed. The tests uses 7 different techniques to calculate the cross product of 2 vectors. The thread (which can be found here) is too long to effectively summarize here but some of the main points are:

The same synchronized code running in MS Windows appears to run slower on a multi-processor machine than it does on a single cpu even if only one cpu is being utilized.
The ?server flag on VM running MacOS is ignored.
There was little difference between using a local variable and using a field. (some of this effect was shown to be due to inlining by hotspot).
Though some results did vary, most of the results held on different machines.
VTune allows you to see the native code produced by HotSpot giving one a nice insight into the capabilities of that technology.

In another thread, a link to a benchmark that reportedly compares the relative speeds of C/C++, Java, and C# is published. This sparks a long discussion on the relevance of benchmarks in general when it comes to writing real world applications. One of Jack?s benchmarks even gets a mention (more on that later). The thread ends with two summaries that say about the same thing. Benchmarks stress a small number of aspects of a computing environment. On the other hand, applications may or may not place the same level of stress on that environment. So although benchmarks can be helpful in determining which techniques maybe effective in certain circumstances, they should not be the overall driving factor when making technology/design decisions. In other words, one should use the tools that are most appropriate for the job.

April Fools Joke Finally Revealed

In a side note on the benchmarking thread found at www.javagaming.org, Jack?s micro-benchmark that was published in April edition of this newsletter was cited. Proving that you can?t get anything past these guys, it was quickly pointed out that the benchmark is ?a rather witty practical joke.? Well, since Jack and I have had several emails on this benchmark, it?s time to come clean. Yes, the results from the benchmark are real and yes, one can interpret them as being misleading (and even a witty practical joke given that they were published on April 1st). But, from this benchmark, we can learn several valuable lessons, the first being that dynamic optimization is a very powerful tool. The second, is that you should know what you are actually benchmarking: it is not always what you intended to benchmark.

Kirk Pepperdine.

Back to newsletter 034 contents

Last Updated: 2025-10-27
Copyright © 2000-2025 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/roundup034.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us