Back to newsletter 100 contents | All Javva's articles
Point of view. It depends so much on who or what you ask. For example, if you asked me about my health, I'd say I wasn't feeling so great most of the week, with a cold that is making me feel tired, sweaty, sore, and generally not up to anything intensive. I'm definitely not giving 100% this week, and I even took a day off when I felt particularly drained. On the other hand, if you ask my timesheet, it will tell you that I was sick for exactly seven and half hours this week.
Which is right? Both, of course, it all depends on what you are measuring, on your point of view. What are you looking for? Are you a bean counter, wanting to tally the lost cost due to sickness to the company? Are you a friend concerned about my wellbeing? Are you a reader of this column saying to yourself 'yeah, yeah, I got the point now move on or I will'?
Have you noticed that people have shorter concentration timespans than they used to? If so, that suggests you are able to concentrate long enough to notice that, so well done you.
Anyway, when looking for delays to system throughput, point of view is critical. You have to keep shifting it if you want to get anywhere. Take my current search. I start with a global point of view. The data comes in here, it gets processed and ends up with a result over there. I can tell how long it takes from here to there. If I flood here with too much data, I can tell how many results I get over there every second. Basic throughput and latency.
Now, I can see that over there I got 100 results in a minute. That seems a bit slow to me, but what do I know? I need to shift my point of view to that of the business. Is it slow? Do they care? Hey, Mr. Business, how much data could you chuck in here, and what happens if I can only give you 100 results a minute? That's my primary question. If Mr. Business says, no prob that's fine Javva, stop bothering me now I have money I need to lose, then obviously I look around for another problem. Or another job.
This time, Mr. Business said nothing. However, my boss says 'look, whatever it is, assume we are going to get 10 times as much. Now tell me where the problems are and how to get rid of them'. Such is the unsatisfactory world of throughput tuning. Nobody can ever seem to give you an answer on what is needed, and when. Have a guess. Your guess is, apparently, as good as anyones.
So time to shift my point of view. I need to follow the data and see where it is delayed. But there is no way to do that, lots of black box Java processes do the processing. So instead I have to use a different point of view. I'll monitor those transit points that I can identify. I'll monitor all the basics: cpu, memory, thread-level and process-level, stack traces, JDBC I/O, locks. And I'll try to infer where the bottleneck is by eliminating where it can't be. CPU not fully utilised? Not a CPU problem. GC sequential load less than 5% and no full GCs? Not GC. Memory stable and easily in limits? Not Memory. DB server not particularly loaded, JDBC times per call reasonably small, no outstanding very long JDBC calls? Not DB I/O. Disk I/O very small? Not Disk I/O. Network I/O - hmmm, that's a difficult one. That's very application specific. I can see bandwidth is nowhere close to being loaded, but really I need to see network I/O latencies, and I can't know what is slow without application specific monitoring. Well, I'll leave that for the moment. So what is left?
Shift my point of view. Throughput is like pumping water through a hose. Either something isn't pumping fast enough, or somewhere the hose is squeezed really small, restricting the flow. In both cases, somewhere we have a problem with CPU or Memory or I/O or contention or parallelism. Suddenly, the light bulb lights. I can see contention in one component, and not much parallelism in another. Switch point of view - to contended stack traces for one, to the architecture for the other.
Point of view. You have to keep shifting or you won't see the full picture.
BCNU - Javva The Hutt.
Back to newsletter 100 contents