|
|
|
Back to newsletter 038 contents
If you’ve not yet seen, the serverside has a new look. But the real news is that the middleware company has decided to launch www.theserverside.net, the .NET counter part to www.theserverside.com. The effort is being headed up by none other than Ted Neward. Ted has a depth of experience in both Java and .NET having authored several titles on both technologies. In addition, he is a regular speaker on the “No Fluff, Just Stuff” Java Symposium put together by Jay Zimmerman and is also involved with the MetaData expert committee in the Java Community Process. So, Ted is neither a Java or a .NET zealot. Instead, he has a balanced understanding of technology and how to best apply it. The only question is, will this effort be a successful bridging of the MicroSoft and Java technologies or will it be Java Pet Shop part deux for the Middleware company? With Ted at the helm, my bet is on the former.
From the ServerSide, we have a couple of postings dealing with session beans. The first asks about the performance implications of using statefull session beans, the second asked about using stateless session beans and how one would attain server affinity. When clients interact with a server in the context of a session, they often perform a number of actions, which build up state. The question is, where is that state maintained? If one can avoid maintaining state on the server, then one can use stateless session beans. If that state is either partially or wholly maintained on the server, then one must use statefull session beans. A consequence of having to store state within the session bean is that the session bean is no longer shareable and the server is required to maintain a session between the client and its statefull session bean. If a stateless session bean is all that is required, then the server is free to offer up any bean to any client.
From this description, we can start to understand the performance implications of maintaining state on the server. First and foremost, if only statefull sessions are used, then the server is forced to maintain one statefull session bean for each client. In large systems, this can consume a large number of resources. If instead, stateless session beans can be used, than the server can maintain that number of beans that are needed to service it’s clients. Since it is typical that clients make a request and then think for some time, the required number of stateless session beans can be far fewer than the number of clients being serviced. A final note, it may have become obvious to you that when one uses session beans, server affinity is not required. It’s automatic with statefull and not needed with stateless session beans.
In another posting from the serverside, a question is asked about how one might measure network round-trip times when more than one server is involved. On the surface, the problem seems to be a very tricky one indeed. Though we often measure network round-trip time, it’s often only between two machines. Code often looks like:
long beginTime = System.currentTimeMillis(); otherSystem.remoteCall(); long endTime = System.currentTimeMillis();
Now, the time does include the method call, time taken to convert the parameters into a byte array, the time taken to make the RMI call to the other system, the network transfer time, the time to read the byte array and convert it back into parameters, perform the task at hand, and then reverse the process to return the calculated value back. So, what we’d like to eliminate is the time taken to perform the calculation in the remote method. No problem, we just add in this code on the server:
public Object remoteCall() { long beginRemoteCall = System.currentTimeMillis(); dothework(); long endRemoteCall = System.currentTimeMillis(); }
With these two numbers, we can calculate network round trip time (which of course, includes overheads). But, when more than one machine is involved, what can we do in that case? The recommendation posted was to continue to apply the same technique to the down stream remote calls. Once all of the numbers were collected, performance trends could be extracted out of the collected data. With those trends, one should be able to determine which parts of the system needed to be tuned.
With that, we move onto the JavaRanch where we find a question concerning the overhead related to the use of finally. Though none of the responses answered the question, they did point out a couple of really interesting points. We use finally blocks if we need code to be executed whether or not an exception is thrown inside the method. For example, it is good practice to place calls that close streams and connections in a finally block. If one is to ensure that connects are closed in the case where an exception is thrown, then one would need to duplicate the code in the exception handlers. This is a violation of the DRY (or Don’t Repeat Yourself) principle. Simply put, if you do NOT follow DRY it tends to make code more difficult to maintain, it tends to add more code bulk to the code base, and it makes it more difficult for hotspot to perform a complete analysis on your code, thus reducing it’s ability to optimize the code. Yet more proof that the engineers at Sun fully realized that optimizers such as HotSpot are sensitive to coding style and with that realization, built HotSpot with good coding practices in mind.
Why would one get better performance out of a single-threaded application than it’s multi-threaded counter part? Along with the question, the posting detailed the conditions so that we might all share in the mystery. The thread, which performed read-only operations, typically took 500ms to complete it’s task. The interesting thing is that with 10 threads running, each thread would complete in 5000ms. This very suspicious result would seem to indicate that each thread needed to wait for other threads to run to completion. But, there was no synchronization in the application so, what could be preventing the other threads from running? One suggestion was that the application was running on a single CPU. A quick check and that turned out to be the case. Mystery solved. The next mystery is why was the application, which was running on a multi-CPU machine only utilizing a single CPU? Though no specific answer appeared in the posting, it was assumed that somehow the VM was using green-threads (simulated threads) and not (OS) native threads. Even if this is not the answer, it was still useful to discover that only a single CPU was being utilized.
Is there is any group that is more interested in the performance of the garbage collector (GC) than this group? Certainly there are more questions that focus on GC and object creation at the www.javagaming.org than at any of the other discussion groups. True to form, our first post is about how the garbage collector and the VM’s internal memory spaces handle the creation of very large objects. At issue, the game in question creates large jpeg frames at rate of 1/30th of a second. The question is, since GC is much cheaper in Eden, should one size Eden to be large enough to contain all of the frames until a GC is triggered. What came out of the thread is a great, if sometimes confused explanation of GC and train GC in general. The final advice is just as confusing but given the source, does warrant serious consideration. The advice given: video games should use a very small Eden. Following this advice would result in the object being created directly in old generation where the application would suffer from longer gc times. One reason for using a smaller Eden might be that it might be impossible to keep these large object in Eden and as such, the time needed to copy them to a survivor space or old generation maybe more expensive than the cost of the GC. In that case, the apparent bad advice may not be all that bad.
In another post, we continue to see just how difficult it is to get a micro-benchmark right in Java. At the root of many of the difficulties are the mechanized optimizations that are performed by HotSpot. In this series of postings, we follow a couple of very experienced developers as they try to develop a micro-benchmark to measure the performance of a new technique to perform some trig calculations. As is typical, the first set of numbers published contained the cost of optimizing the code, gc and a number of other effects that we definitely don’t want to be included in the measurement. The first clue that the numbers were flawed is the fact that the –Xcomp flag produced different results. This is a clear indication that method compile times are being included in the benchmark. To avoid this anomaly, one must pre-heat the method cache.
The one thing that was missing from the posting was the use of statistics. One can clearly identify when a micro-benchmark is flawed by looking at the average, medium, hi value, smallest value, standard deviation and variance. In a good benchmark that is free of interference, the variance and standard deviation are very small (0 being the ideal target) and the average, medium, hi value and low values are all very close (all being the same is the ideal target). Once these number are in line, then odds are, you have a good micro-benchmark.
Back to newsletter 038 contents