Back to newsletter 040 contents
This month I had the pleasure of helping a friend (Henry) performance tune his website. This site is fairly well trafficked so I was pretty surprised to learn that the entire site was supported by some fairly antique hardware sporting 512MB of memory, a single IDE drive and a single CPU (PIII). On this hardware ran RedHat Linux, a cold fusion engine, and mySQL. With this meager amount of hardware, one would think that an upgrade was in order. Fact is, before the exercise even ended, we realized that this pittance of memory, CPU and disk was all that was needed to support the application.
As it stood, the application supporting the website was memory bound. Applications suffering from this type of problem tend to look I/O bound because they put a tremendous amount of stress on the I/O sub-systems as they force the OS to swap pages in and out. In this instance, a combination of caching and running mySQL in the same limited memory space was the major source of the problem. The solution was to eliminate the cache, which in turn reduced the dependency on memory and consequently I/O.
But, the story doesn't end here because while Henry was working to eliminate the cache, he needed to have the website up and running. I recommended a few changes to the garbage collection and heap parameters that might help ease the problem. After a couple of hours of setting parameters and observing, we were able to radically change the GC profile and in doing so, stabilize the application enough so that we could focus on the cache. The reason that we were able to identify which settings worked and which ones didn't is because we were methodical in our approach; we only made one change at a time and then we re-measured after each change.
At the time of writing this, the site still suffered from a memory leak so there is still more for Henry to do. That said, not having to babysit an ailing system has freed him to focus on the next step.
Down in the Saloon at the java ranch, there was a discussion that was started by the age-old question: which is faster, this code fragment or that code fragment? The code fragment itself is not all that interesting as all of the coding solutions were acceptable and would produce almost identical results. What was interesting was a discussion thread that was started by the statement "faster is not always better". The discussion nicely develops a longer explanation that states that one needs to consider and discuss the cost of performance as well, after all there is a cost and a benefit to each performance improvement that we make which allows us to assign a value to it. If there were little or no cost benefit in making a performance improvement then why would we do so?
One of the techniques used to avoid trips to the database and network is caching. With caching we try to hold the values that we need, close to where we need them. The question becomes which values do we need and how long and how should we hold on to them? In the motivating posting at the JavaRanch, the desire was to reduce the number of round-trips to the database. The interesting point to come out of the discussion was the light treatment of the question of queuing in performance books. Whether or not the criticism is justified, this discussion thread does a great job of classifying and explaining various techniques that are used by caching technologies. The post is way too long to repeat it here so if you are interested in further exploration, I would suggest that you visit http://saloon.javaranch.com/cgi-bin/ubb/ultimatebb.cgi?ubb=get_topic&f=15&t=000768.
Our focus on the ServerSide shifts to the edge of the J2EE world. The first post starts in with a performance problem regarding their MVC architecture. The posting describes an application that suffers serious degradation in performance of the MVC (Servlet) layer as more users log in. The application creates bit objects and then stores them into Servlet session variables. The thread winds its way through a couple of suggestions until it strikes gold. Session state can be serialized to disk between calls and it is this activity that is most likely causing the difficulty. The post lists a couple of remedies such as customizing serialization. In tests that I've done, I've found that this act alone can drop serialization times by as much as 40% or more. The second suggestion was to try and reduce the size of the session object by seeing if state is sharable. The third suggestion was to save some of the state in the database and the final idea was to place some of the state in the hidden fields in the HTML forms. It's quite possible that the best solution would be a combination of all of these techniques as they all tackle the same problem: reducing the amount of work to serialize and store session state.
In yet another edge case, the posting is more about databases than J2EE but this is ok because the discussion brings out an interesting point. The posting is quite detailed as it goes on to describe the problem. At the core, the posting talks about having to gather information from three tables. The response is general enough that I don't believe the exact details are necessary. The response pointed out that using three queries with inner joins might be faster because the results of outer joins may contain many null fields that the JDBC driver does not optimize away. The posting goes on to tackle the problem of conducting full text searches. As is standard for text search, all text is converted to a single case (typically upper case). Using UPPER() in SQL will cause the DB to perform a table scan instead of relying upon indexes. The solution offered was to create a trigger that inserted the upper case data into the table alongside the original string. Once the new column is indexed and the queries are changed to work against that column, things should be much better.
Moving over to Java Gamming, we continue to find information posted that you just can't get anywhere else. For instance, have you seen the -XX:-OptoScheduling and -XX:-Inline flags? I have to admit that the OptoScheduling flag is certainly new to me. A quick search on google and bingo, I got a hit. There is one mention of it in the JDK 1.4.2 release notes lists of bug fixes. That said, I still don't know what it does but the gamers seemed to have some success using it to speed up their floating point calculations. As with every performance tuning tip, your mileage may vary so make sure you take some measurements.
The gamers are reporting good news on the enhanced loop syntax found in the JDK 1.5. There is no performance penalty for using the new syntax. Tests that showed improved performance turned out to be flawed (beware the micro-performance benchmark) and after a bit of confusion, the fact that HotSpot can eliminate bounds checking of array indices was confirmed.
Finally, we have a quote from Jeff Kesselman [a Sun engineer with a great deal of performance expertise] that backs everything that Jack and I have been preaching about performance tuning. In response to a complaint, Jeff writes "EVERY step we've taken with Java performance has been to improve the performance of exactly what you are talking about: writing clear, clean, well encapsulated code." What Jeff is saying is that in order to test HotSpot and their JIT, they need to run some test code. Since the test code follows these principles, you'll get the best results if you do also.
Back to newsletter 040 contents