Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips June 2008
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 091 contents
Upside down benchmarking (Page last updated April 2008, Added 2008-06-30, Author Kirk Pepperdine, Publisher Kodework). Tips:
- Increasing throughput on an upstream component can overload a downstream component and actually increase overall service time.
- Decreasing throughput on an upstream component can actually reduce the load on an overloaded downstream component sufficiently to bring it below the "knee" of the response time-load curve, and actually imporve overall service time.
- Using the concurrent collector allows the application threads to progress faster - and this can lead to an overall increase in rate of garbage, forcing the collector to work harder overall and actually end up decreasing application throughput. The Stop-the-world collector can act as a throttle to the application that optimises overall application throughput (at the cost of pause times).
- Clustering to inmprove scalability is okay if the bottleneck is within the main body of the application running inside the JVM. If it is external to the main body of the application or a resource that will be shared by the nodes in the cluster, clustering may result in no benefit or even a decrease in scalability.
Rock Star Joshua Bloch (Page last updated May 2008, Added 2008-06-30, Author Janice J. Heiss, Publisher Sun). Tips:
- Branch free code runs fast, cache effects can be deadly, and inlining heuristics have a huge effect on performance.
Rock Star Tony Printezis (Page last updated May 2008, Added 2008-06-30, Author Janice J. Heiss, Publisher Sun). Tips:
- Don't try to code for the garbage collector, it can actually make GC worse - keep your code simple and understandable.
- If you want the job done as quickly as possible and don't care how long your application is going to be stopped by the garbage collector, the throughput collector is the best choice.
- If you have an interactive job that needs to interact with people, other applications, or users through web pages, then a low latency garbage collector is the best choice.
- The low latency garbage collector is not optimized to maximize throughput so might not be the best choice for an application in which throughput is of most importance.
- The throughput garbage collector is not optimized to provide low latencies.
- There are three basic components to garbage collector performance: throughput (garbage collector overhead); responsiveness (garbage collection pauses); and footprint (how much memory space the garbage collector requires). Typically, for any collector, you have to pick two and sacrifice the third.
A systematic approach to problem solving (Page last updated June 2008, Added 2008-06-30, Author Kevin Grigorenko Daniel Julin Carolyn Norton John Pape, Publisher IBM). Tips:
- Characterize the problem by asking Who, What, When, Where, and Why.
- Ask exactly What happened - what are the symptoms, what caused an alert or would have caused an alert if the appropriate thing was being monitored, and what is different from the baseline, that is how the system would normally behave (you do have a baseline, right?)
- Determine details of where the problem occured - machines, applications, processes, etc. Pinpoint the exact location. Identify and retain any relevant logs and screenshots - which actions were happening and what should you look at to see the problem. Specify the full topology and configuration of the system in as much detail as possible.
- When did the problem occur - obtain exact timestamps, note offsets, time zones, clock synchronization. Does it occur at any regular intervals? Ensure that any regular scheduled or batch processes are identified and noted, as these can easily cause issues.
- Ask what has changed - why did the problem occur when it did and not another time; why did it occur where it did and not elsewhere.
- Scan the entire system after a problem to identify any error messages and dump files. A large percentage of problems can be identified by doing this rigorously.
- Select specific symptoms for more detailed investigation, and repeat the procedure until the issue is narrowed down.
- All troubleshooting exercises boil down to watching a software system that exhibits an abnormal or undesirable behavior, making observations and performing a sequence of steps to obtain more information to understand the causing factors, then devise and test a solution.
- In the isolation approach, rather than focusing on one particular symptom and analyzing it in ever greater detail, you look at the context in which each symptom occurs and then attempt to simplify and eliminate factors until you are left with a set of factors so small and simple that, it is clear what caused the problem.
- A skilled troubleshooter rarely performs any investigative step without a very specific objective that is rooted either in drilling in deeper, or in isolating factors, to determine one more factor that is caused the problem.
- It is useful to maintain an issue table of current (not historical) issues. This should include: problem ID; problem description; symptoms; actions to be performed; fixes that may eliminate or workaround the problem; and theories about what is causing the problem.
- A timeline or log book of an investigation is useful for reference. It should include one entry with precise date and time stamp for: each problem occurrence; each significant change made; each major diagnostic step; each system involved; where each diagnostic artifacts (logs, traces, dumps, and so on) is saved.
- When looking at problems, avoid tunnel vision by: systematically holding checkpoint meetings to asses the current state of progress; work in parallel where more than one theory could explain the issues; and regularly re-ask the "big picture" questions - "what is the problem?" and "are we solving the right problem?"
12 ways you can prepare for effective production troubleshooting (Page last updated August 2007, Added 2008-06-30, Author Daniel Julin, Publisher IBM). Tips:
- Create and maintain a system architecture diagram
- Create and track an inventory of all problem determination artifacts (logs, dumps, screenshots, etc)
- Pay special attention to dumps and other artifacts that are only generated when a problem occurs
- Review and optimize the level of diagnostics during normal operation - very detailed diagnostics can cause a substantial performance overhead, but you need enough to be able to diagnose problems.
- JVM verboseGC logging is very useful, and usually relatively low overhead on a well-tuned system.
- Crash dumps should always be enabled, as they are invaluable for identifying problems.
- Heap dumps and system dumps can involve significant overhead, so consider carefully before setting them up to be triggered automatically.
- Increase request logging at the HTTP server to show not just a single log entry for each request, but a separate log entry for the start and end of each request.
- A moderate level of performance counters monitoring is always a good idea.
- Use minimal tracing to capture one or a few entries only for each transaction (e.g. web requests or EJB requests).
- Keep application level tracing and logging turned on for all essential work, but ensure that it does not impose a significant overhead.
- Monitor low-level operating system and network metrics.
- Be prepared to actively generate additional diagnostics when a problem occurs
- Define a diagnostic collection plan -- and practice it
- Establish baselines that will allow you to answer the question "What's different now compared to yesterday when the problem was not occurring?"
- Keep copies of the various log files, trace files, and so on, over a representative period of time in the normal operation of the system, such as a full day, as well as copies of any dumps that are generated on demand.
- Retain information about the normal transaction rates in the system, response times, and so on for historical comparisons.
- For your baseline, keep operating system level statistics on a healthy system, such as CPU usage for all processes, memory usage, network traffic, and so on.
- Periodically purge, archive, or clean-up old logs and dumps
- Eliminate spurious errors and other "noise" in the logs
- Keep a change log: a rigorous log of all changes that have been applied to the system over time. When a problem occurs, you can look back through the log for any recent changes that might have contributed to the problem. You can also map these changes to the various baselines that have been collected in the past to ascertain how to interpret differences in these baselines.
- Setup ongoing system health monitoring. Monitor at least: Significant errors in logs; Metrics produced by each component; Spontaneous appearance of dumps and crash logs; pings through various system components.
JXInsight Software Performance Engineering (Page last updated May 2008, Added 2008-06-30, Author William Louth, Publisher JInspired). Tips:
- JXInsight Software Performance Engineering consists of 12 major activities that manage and monitor the performance of a software application: Access Performance Risk; Identify Use Cases; Select Performance Scenarios; Specify Performance Objectives; Construct Performance Models; Determine Resource Needs; Evaluate Performance Models; Monitor Software Performance; Analyze Performance Data; Confirming Performance Expectations; Tune Software and System Performance; Manage Capacity.
- Identify and order performance risks in terms of impact severity and probability with possible risk reduction treatments prescribed.
- Create a use case catalog containing those use cases deemed to have performance risks (useful for regression testing).
- Use a use case catalog together with execution frequency and user perception to create performance tests
- Define clearly the objective of each performance test in terms of system and user goals.
- Identify the resource requirements for each performance test, detailing key performance indicators such as remote procedure call count, cpu consumption, memory allocation, and IO reads and writes.
- Evaluate different options using benchmarking and simulation, to determine optimal configurations, topology, peak volumes, and arrival rates.
- Obtain baselines for key performance indicators together with trend, cause and effect analysis.
- Specify service level agreements in association with application users so that expectations are fully matched.
- Manage both the software and system in terms of cataloging, classifications of workloads, demand forecasting, optimization and overhead reduction.
Back to newsletter 091 contents
Last Updated: 2020-08-28
Copyright © 2000-2020 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us