Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips May 2011
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 126 contents
http://queue.acm.org/detail.cfm?id=1854041
Thinking Clearly about Performance (Page last updated September 2010, Added 2011-05-30, Author Cary Millsap, Publisher ACMQueue). Tips:
- Response time is the execution duration of a task, measured in time per task.
- Showing response times with the response can be seen as evidence to the end user that the application owner values the user's perception of the application performance.
- Usually people who are responsible for the performance of groups of people are concerned with throughput (count of task executions that complete within a specified time interval); individuals working solo are more concerned with response times.
- Response time and throughput are reciprocally related - but subtly, modified by the factor of the number of parallel tasks that can be concurrently processed. This means you must measure throughput and response times separately, you cannot infer one from the other.
- Performance targets should specify both response time and throughput.
- Response times give an indication of what the end-user sees, and throughput give an indication of how the overall system is handling the current set of tasks.
- response times variance is very important, specifically targets should be couched in centile achieved response times rather than averages, e.g. 90% (or 99.9%) of response times must be less than a second. This type of response time target matches much more closely to user perceptions.
- A sequence diagram is a good tool for conceptualizing flow of control and the corresponding flow of time, and can show how simultaneous processing threads work in parallel, all of which can help analyze performance. But they don't scale well where there is a lot of activity to analyze.
- A profile is a tabular decomposition of response time, typically listed in descending order of component response time contribution, and is an excellent mechanism for analyzing performance.
- Amdahl's law should direct you where to concentrate effort. It states: Performance improvement is proportional to how much a program uses the thing you improve.
- Work on what gives you "the biggest bang for the buck", i.e. the biggest improvement given the costs of the possible improvements.
- when everyone is happy except for you, make sure your local stuff is in order before you go messing around with the global stuff that affects everyone else, too.
- Efficiency is an inverse measure of waste. The more waste you eliminate, the more efficient the application becomes. Reducing total service time without adding capacity nor sacrificing business functionality is eliminating waste.
- Load is competition for a resource induced by concurrent task executions.
- Utilization is resource usage divided by resource capacity for a specified time interval.
- Response time equals service time plus queuing delay. On a lightly loaded system, the queuing delay is low; as load increases queuing delay increases hyperbolically
- The knee (or elbow) is the point at which throughput is maximized with minimal negative impact to response times. The knee occurs at the utilization value where a line through the origin is tangential to the response-time curve.
- The knee in response times is important on a system with random arrivals because these tend to cluster and cause temporary spikes in utilization. These spikes need enough spare capacity to consume so that users don't have to endure noticeable queuing delays. For example, the spare capacity needed to handle request clusters on a 1-core system is at least 50% CPU utilisation; 2-core is at least 43%; 4-core - 34%; 8-core 26%; 32-core - 14%; 64-core - 11%; 128-core - 8%.
- Spike utilisation beyond the knee should not last for more than 8 seconds.
- Coherency delay comes from waiting for a resource to process your task - contended locks and non-empty queues typically cause coherency delay.
- The random nature of coherency delays means that you can never catch all your problems in preproduction testing. So you need a reliable and efficient method for identifying and solving the problems that will inevitably occur in production.
- People tend to measure what's easy to measure, which is not necessarily what they should be measuring. You need to measure what actually matters - what the users see or what you are paying for, etc.
- You need to write your application so that it's easy to fix performance in production - and the first step is to make it easy to measure in production.
http://www.javacodegeeks.com/2011/01/10-tips-proper-application-logging.html
10 Tips for Proper Application Logging (Page last updated January 2011, Added 2011-05-30, Author Fabrizio Chami, Publisher Java Code Geeks). Tips:
- SLF4J pattern substitution support is highly efficient in avoiding string conversions and concatenations until required (where {} denotes an argument substituted only when actually logged). If using pattern substitution, you can dispense with a logEnabled() guard (e.g.
if(log.isDebugEnabled()) log.debug("...");
) as that idiom is only used to prevent unnecessary object creation.
- Perf4J or equivalent is useful to log performance statistics.
- Be careful when logging a collection - the conversion to string is potentially hugely inefficient and could even cause memory issues.
- If you log too much or improperly use toString() and/or string concatenation, logging can cause performance issues.
- From experience the ideal logging pattern should include: current time, logging level, name of the thread, simple logger name (not fully qualified) and the message.
- Log at TRACE level every method that accesses external systems (including database), blocks, waits, etc. I.e. methods which have execution time significantly larger than thecost of a logline. This allows you to identify badly performing code just from logs.
- Log data that is communicated to and from external systems - this makes identification of precise causes of performance issues (and errors) much easier.
http://kirk.blog-city.com/100_cpu_with_output_from_jstat.htm
100% CPU with output from jstat (Page last updated February 2011, Added 2011-05-30, Author Kirk Pepperdine, Publisher kodewerk). Tips:
- jstat gives command line sampled statistical output of all the heap spaces.
- A full perm space will result in full GCs - repeated full GCs if the perm cannot expand and the full GC clears up just enough for the next cycle (e.g. if there is generated code from reflection causing perm to fill up with generated classes that can then be unloaded each time).
- Perm space leaks are quite often found in applications running in application servers such as TomCat or GlassFish.
http://79.136.112.58/ability/show/xaimkwdli/a4_20110216_1400/mainshow.asp?STREAMID=1
Yet More Performance Tuning (Page last updated February 2011, Added 2011-05-30, Author Kirk Pepperdine, Publisher kodewerk). Tips:
- need data, test harness, monitoring, for perf testing
- benchmarking & tuning process: Identify performance targets; Run test; If targets are not met then analyse profiles; identify the dominating performance issue; fix the dominating performance issue; start the next cycle.
- The activity that is dominating how the CPU is utilized is the current biggest performance issue - this could be I/O, memory management, algorithm calculation, context switching, etc.
- if the system > 10% utilization or system utilization = user utilization then the bottleneck is in the OS
- If the CPU is not near 100% utilization there is no dominator
- If the CPU is near 100% utilization then if the object lifecycles are inefficient the problem is the JVM (GC), else it is the application
- monitor both system and user utilization when looking at CPU utilization.
- Examples of OS monitoring tools are activitymanager, vmstat, taskmanager.
- visualVM and jvisualVM has a generational count object allocation memory profiler which lets you detect memory leaks. A memory leak is caused by objects that never go away and that you are creating more of - so there are many different generations of them, hence a memory profiler with generational counting is ideal for detecting these.
- Use a memory profiler with generational counting recording allocation stack traces, and take a snapshot and a heap dump, and together that identifies teh memory leak - the objects with a high number of generations are leaking; the allocation stack trace shows where it is being created in the code; and the heap dump shows which objects are retaining references to the leaking objects and hence keeping it alive.
- Bad response times while the CPU is underutilised where the thread pool size is smaller than the number of cores suggests the thread pool is too small.
- tda - thread dump analyzer - does a nice job of organizing a thread dump, and plugs into jvisualvm.
- If there are enough threads and the CPU is not fully utilized and response time is too low, thread contention is a prospective candidate. A counting the locks in a stack dump or several stack dumps will identify the resource causing the concurrency limitation. Thread dump analyzer will quickly identify any contended lock from a stack trace.
- HPjmeter is a nice gc log viewer, though sometimes requires a little conversion of GC logs from Sun JVM logs.
- premature promtion can cause too frequent full GCs
- To size survivor spaces, look at the tenuring distribution - if the age distribution is tight (e.g. almost all objects are of size zero) then survivor spaces are too small.
http://java.sys-con.com/node/1793969
Application Performance Monitoring in Production (Page last updated April 2011, Added 2011-05-30, Author Michael Kopp, Publisher JDJ). Tips:
- Define performance targets like first impression and page load times (generally users will tolerate up to 3-4 seconds for these but will get frustrated after that).
- Response times and the number of concurrent users, are two good metrics to start with when defining targets.
- Transaction oriented applications will typically target throughput as a key metric.
- Measure not just averages but also volatility as the user experience is affected more by volatility and higher centile response times rather than the averages.
- Count errors as part of performance measurements, as typically errors lead to slower average (and individual) responses.
- Identify the flow of the data for transactions so that you can identify where and what to measure.
- The hierarchy of performance monitoring is: define targets (e.g. required response times or throughputs with transaction window times); determine where to measure and what to measure to ensure targets are monitored for (e.g. browser page load time, http server request service time, etc); measure averages and volatility
Jack Shirazi
Back to newsletter 126 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips126.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us