Back to newsletter 066 contents
This month was an interesting one for me as I was asked to present with Dr. Heinz Kabutz at Sun Tech days in Johannesburg South Africa. We co-presented on the topic of (you guessed it) performance tuning. The theme, "measure, don't guess" played large in the talk. Though I've used this theme in previous talks, this edition had very different feel to it. What made it so was the inclusion of a fictional case study. Though I say fictional it was based on an aggregation of engagements from over the years.
We called the mythical client Joegoslow and as you can imagine, Joegoslow was suffering from a performance problem that their staff was having trouble getting a handle on. Throughout the talk we discovered why they can't get a handle on the problem and what I did to help them move from making wild guesses to actually being able to measure, and in measuring see the problem.
The talk also contained the sub-theme that you should know what you are looking for before you go looking. This sub-theme was inspired by the problems that I often run into when I try to read profiling data. If you are dealing with anything beyond a very small application you quickly realize that unlike what you saw in the vendor demo, you're going to be staring at a huge amount of data trying to make sense of it all. Well don't worry, you are not alone.
A number of years ago I was looking at profiling data with the client's contracted lead developer looking over my shoulder. Without really thinking through things we had just thrown an execution profiler on the application and of course, I didn't have a clue as to what I was looking for. I took a quick look back at the lead only to see that he had a look of complete confusion on his face. His look at me said, "is this speaking to you?" Under those circumstances what else could I do but reply, "oh yeah, these are great". Fortunately he didn't ask anymore questions any of which would have exposed me for what I was. Why were we in such a state? We were looking at the profiling output from a very large Java program yet we didn't have any idea of what we needed to be looking for.
The point illustrates what I believe is one of the biggest difficulties when trying to performance tune a system, knowing what to look for. It sounds like something that should be obvious but then again, if you knew what you were looking for, wouldn't you just be able to go and fix it? The short answer is; while I may know what I'm looking for, I am not so sure that I'm going to find it. If I don't happen to find it that means either there was something wrong with the test or the problem that I was looking for isn't really a problem at all! In other words, how I know what I'm looking is derived from my hypothesis of the problem. That hypothesis was in turn suggested to me by observing the external symptoms exhibited by the application. A hypothesis is just a two dollar word for a guess.
However, it is important that we make a hypothesis (or that guess, as it may be) as it allows us to design and conduct a test. It also tells us what we should be looking for as a result of the test. If the result isn't there, then the hypothesis was wrong. If you find what you are looking for then the hypothesis moves from being a guess to being a fact that is backed up with real evidence. Having strong evidence in these situations will give you the political capital that you'll need to get the problem fixed.
This month we will start our look at performance discussions with a question posted at Java Gaming. The first post illustrates that sizing perm space matters. The question initially asked about some very strange GC behavior, illustrated in the output from setting the verbose garbage collection option.
[Full GC [CMS (concurrent mode failure): 261308K->261317K(516096K), 1.3826536 secs] 261493K->261317K(524224K), [CMS Perm : 65535K->65534K(65536K)], 1.3828157 secs]
This record states that the Concurrent Mark and Sweep garbage collector has failed. But just exactly what does this mean? The clues to this come in the rest of the observation. In fact, there is enough information in this small post to spawn off a series of articles.
The comment is that GC keeps running over and over and over without allowing the application to progress. Why this is happening is that the application is starved for memory. The tricky evidence in contained in the portion of the GC log, 261493K->261317K(524224K). If you read this, it says that memory before GC was 261493K and after GC it was 261317K. So GC runs and no significant space is recovered. However the value of 524224K suggests that there is plenty of memory available yet the problem persists. The answer is, this is a classic case of perm space being too small (as can be seen from the Perm space GC stats showing Perm using all its 64MB). No word on if resizing perm space fixed the problem.
From theserverside.com we have a claim that Thread.sleep() is evil, it lies, or more precisely, the post was claiming that his code that contained a sleep ran for much longer then he expected it to. However, all Thread.sleep() can do is put your thread to sleep for the specified number of milliseconds. It cannot tell you when it will run again as that is a function of the operating system. So if you write thread.sleep(1000); your thread will sleep for 1 second after which it will be available to be executed. That doesn't mean it will be executed which implies that your application may take a little bit longer then expected. http://www.theserverside.com/discussions/thread.tss?thread_id=38490
Finally at the JavaRanch we have a posting describing an application whose performance slowly degrades over time. The application starts off being able to process 200 records/minute. That slowly degrades to a tiny 25 records/minute. The question asked: is this degradation due to a memory leak? This very astute observation can be easily answered by turning on garbage collection logging in the virtual machine. This simply requires that you set the -verbose:gc flag on the command line. This measurement will give you a definitive answer to the question. Once you have that, you can shift your activities to focusing completely on identifying the source of the bug.
Back to newsletter 066 contents