Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips March 2013
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 148 contents
http://java.dzone.com/articles/taming-performance-beast-%E2%80%93
Taming the Performance Beast - a Practitioner?s Way (Part 1) (Page last updated January 2013, Added 2013-03-27, Author Gopal Sharma, Publisher DZone). Tips:
- Performance improvement through code changes is limited by the architecture and design of the system. This means that if you get the architecture or design wrong, small code changes will be insufficient to achieve the desired performance.
- Time spent on low-level code optimization early in development generally turns out to be a waste of time.
- Most performance problems occur at the inter component communication stage - and this should be addressed at the design and architecture stages.
- Normalized databases can be a performance bottleneck - many systems now embrace non-normalized, unstructured storage for improved performance.
- Coding practices that are good for performance include: avoid creating unnecessary objects; be careful with Strings; prefer lazy initialization; try to minimize mutability of a class; try to use standard library classes instead of creating your own; try to use primitive types instead of wrapper classes; use the collections approriate for the way the object is used; be careful with looping.
- Care should be taken over selection and implementation of core components like security and logging which will obviously be used extensively across the application.
- If the system is very far off the target performance, start tuning the system at much lower volume where performance is nearly adequate; then iteratively tune and increase loads until the target is reached. When using this iterative "tune and step up load" procedure, increase the load as soon as the performance target is achieved, as further tuning may be wasted on tuning a bottleneck that doesn't matter as much at the next higher load.
- Performance tuning is iterative: test & measure; analyse; target improvements and loads. Repeat until the performance goals are met.
- Iterative performance tuning steps are: Test performance and compare to the previous version; Collect performance data; Identify bottlenecks; Identify changes needed to improve performance; Apply changes.
- A performance tuning environment should be as close to the actual production environment as possible.
- The easiest performance issues are solved by substituting an alternative implementation, e.g. better logic, better algorithm, more appropriate data structure, more suitable collection, a more appropriate utility object, increase in parallelism by tweaking thread sizes, optimizing a SQL query, changing looping logic.
- Design changes needed to achieve are performance targets are often expensive, requiring extensive changes across the code.
- JVM overheads (class loading, interpretation, profiling, hot methods detection, compilation and garbage collection) can compete with the application for system resources, so often need tuning.
- To avoid frequent GCs, one can (in order of increasing complexity): Increase the heap size; Change the GC policy/algorithm; change code implementation to reduce object churn.
http://architects.dzone.com/articles/how-analyze-java-thread-dumps
How to Analyze Java Thread Dumps (Page last updated October 2012, Added 2013-03-27, Author Tae Jin, Publisher DZone). Tips:
- If two or more threads utilize the same resources, contention between the threads is possible and deadlocks can occur.
- Thread contention is where one thread is waiting for a resource (e.g. a lock or time on the CPU), while another thread currently uses that resource.
- Deadlock is a special type of thread contention in which two or more threads hold some locks but are each waiting for the other threads to release their locks in order to complete their tasks.
- Statuses of threads (java.lang.Thread.State) are: NEW (created but has started processing); RUNNABLE (occupying the CPU); BLOCKED (waiting to get a lock that a different thread holds); WAITING (waiting in a wait(), join() or park() method); TIMED_WAITING (waiting in a sleep(), wait(), join() or park() method - can be exited by a timeout as well as resource becoming available).
- Daemon threads terminate when there are no other non-daemon threads left.
- Common methods for obtaining a stack dump are: jstack; jVisualVM; sending a signal to the JVM (e.g. kill -3/SIGQUIT on unix).
- If a thread is blocking other threads, you should see one or more threads, in a BLOCKED state waiting for monitor entry with a "waiting to lock ADDRESS" at the top of the stack (where ADDRESS is some identifier that uniquely identifes the lock waiting to be acquired) while one thread should be running which has locked that address - it should have a "locked ADDRESS" entry in it's stack. Note that one thread could be waiting for lockA that another thread holds, but that thread in turn could be waiting for lockB that a third thread holds, so there can be chains of locks blocking threads.
- When a deadlock occurs, you should see one thread blocked ("waiting to lock ADDRESS1") in trying to acquire a lock that another thread holds ("locked ADDRESS1"), and that first thread itself will be holding another lock ("locked ADDRESS2") that the second thread is trying to acquire ("waiting to lock ADDRESS2"). There could be a chain of these so it might not be between two threads, but the simplest deadlock is with two threads like this.
- A thread blocked waiting for remote input will be listed as being in a RUNNABLE state, not a blocked state, but will be in some kind of socket read method call. It's difficult to say whether such a thread is receiving data or is just infinitely blocked, as intermittent data reception is unlikely to show up in a random stack trace unless it takes a long time to process, and most remote input handling threads are designed to accept the data and pass it on to other threads for processing.
- A thread in WAIT status is typically waiting for a signal or input from other threads to carry on processing. Worker threads that are currently idle will usually be in this state, often waiting on some kind of queue.
- If the CPU is abnormally high, determine which threads have the highest CPU usage using system calls (like ps on linux or perfmon on windows) then map to the Java thread using the thread nid values in the stack dumps; alternatively use Java thread cpu calls such as Top plugin to JConsole or the ThreadMXBean.getThreadCpuTime() call to find the highest thread CPU users directly. Take a few traces - a high CPU usage thread will typically spend most of the time in the calls taking the CPU, so these will be the most often showing stack traces.
- If application processing is slow but there doesn't seem to be an abnormally high CPU usage, look for blocked threads across several stack dumps. These are likely showing lock contention causing a bottleneck in an otherwise parallel application. The stack traces and lock information show exactly where the contention is occurring.
- You are strongly recommended to name threads whenever a new thread is created, as this makes for much easier understanding of stack traces. For thread pools, use a thread factory which provides custom naming [article has an example implemetation for this].
- The ThreadMXBean has methods that let you get blocked time and wait time which allows you to get more detailed information about which threads are the most contended.
http://jaxenter.com/high-speed-multi-threaded-virtual-memory-in-java.1-46188.html
High-speed, multi-threaded virtual memory in Java (Page last updated February 2013, Added 2013-03-27, Author Alexander Turner, Publisher jaxenter). Tips:
- Memory-mapped files can map tera- or even petabytes of memory into a process' address space. The process does not need to bother itself about whether the memory is in RAM or on disk; the operating system takes care of that.
- You can access memory-mapped files from Java using the MappedByteBuffer class. The memory accessible this way may be on disk or in memory at any time, depending on the memory pressures on the operating system - so this is a non-deterministic access time solution to large memory requirements; but if you are already using disk mapping, this solution provides a simpler cleaner way to handle such data.
- The max size of a MappedByteBuffer is 4GB, so to use more than this you need to use multiple mappings.
- If two threads access the same stream or RandomAccessFile at the same time, this can easily cause data corruption and even file conflict errors. Memory mapped files using multiple MappedByteBuffers to the same memory is perfectly okay (though naturally you still need to take care that your data doesn't get corrupted with uncontrolled interleaved writes - but that is a normal Java concurrency update issue.
- The memory mapped in by memory mapped files using MappedByteBuffer is not subject to garbage collection (the MappedByteBuffer itself is, but the memory mapped in is not part of the heap so doesn't get scanned by the GC). This can be an advantage if you want to process data and avoid GC for that data; but can be a drawback as if you are making changes in your memory mapped area, you have to manage your own custom garbage collection cleanup for those areas.
http://www.ibm.com/developerworks/library/co-websphere-sterling-performance-tuning/index.html
Integration performance tuning for WebSphere Commerce and Sterling Order Management (Page last updated January 2013, Added 2013-03-27, Author Charek Chen, Vijaya Bashyam, Lei Sun, Publisher IBM). Tips:
- Performance goals are normally defined in non-functional requirements (NFR) or service level agreements (SLA). Latency (time to process a request) and throughput (number of requests processed per second) are basic measures to target.
- Synchronous communication is best for minimal latency; asynchronous communication is best for throughput (e.g. batch processing).
- As more work is added throughput increases until it reaches a maximum rate. Beyond this point, the rate will either effectively level (sometimes showing minimal increases), or decrease; because processing is bottlenecked on one or more resources. Additional work is (effectively or explicitly) queued, leading to increased latency, and/or reduced throughput.
- The main factors that influence the throughput and response time are workload (number of requests per second being pushed into the system), system capacity (total capability of all the resources), and the configuration and tuning parameters (how the resource usages are balanced).
- Key resources include: heap memory allocation (sizes of different generations); thread pools; connection pools; JMS connection pools; CPU cores.
- Some performance issues encountered: bottlenecking updates on one thread through oversynchronisation; bad or missing indexing (or more generally accessing data inefficiently through scanning rather than by index); Retrying from too early in a processing sequence because error handling was too simplistic; low cache-hit-ratios.
- A single tiered system limits the maximum capacity and throughput - consider clustering with load balancing each tier when you meet a bottleneck to achieve higher throughput.
http://searchsoftwarequality.techtarget.com/feature/Application-performance-manager-Making-the-most-of-limited-resources
Application performance manager: Making the most of limited resources (Page last updated February 2013, Added 2013-03-27, Author Jennifer Lent, Publisher SearchSoftwareQuality). Tips:
- Developers are always surprised when they see how poorly their applications perform under load. Getting developers to conduct scalability tests early in the application lifecycle is an important best practice.
- Monitoring mission critical applications, and investigating alerts that could indicate performance problems or failure, should be your top support priority.
- Narrow down your troubleshooting time: Don't alert you too often; specify precise conditions that will trigger an alert.
- If the same error occurs in two different locations, ten minutes apart, you have a problem and should alert.
- Sketch a diagram of your application, mapping out the points at which each database and legacy system connects. This helps you understand the complexity of the application network and creates awareness of potential performance pitfalls.
- With load testing and simulation in preproduction, you can prevent performance problems from happening in production.
http://www.dodgycoder.net/2012/04/scalability-lessons-from-google-youtube.html
Scalability lessons from Google, YouTube, Twitter, Amazon, eBay, Facebook and Instagram (Page last updated April 2012, Added 2013-03-27, Author Dodgy Coder, Publisher Dodgy Coder). Tips:
- Automate everything, including failure recovery. Failures are more likely as the system scales - target self-healing, simple failover, fast reboots and fast recovery approaches.
- Be prepared to throw away and rewrite a working component when you want to scale it up to the next level.
- Sometimes data availability is more important than data consistency.
- By building all applications on a common core platform, an improvement in the core helps all applications.
- The system should be built so that incremental changes can be made to subsystems without impacting the whole system.
- If you have spare CPU capacity, run operations in parallel and take the fastest solution.
- Compression is good when you have a lot of spare CPU and limited IO.
- Use approximate data where that suffices (e.g. a page counter).
- If you have spiked requests, randomise expiry times so that cache refreshes don't all happen together.
- A globally completely consistent view may not be needed: views that are slightly out-of-date with respect to each other may suffice - and that is much easier to achieve with good performance.
- Know or analyse to find where caches are most effective.
- Split everything into manageable chunks by function and data.
- Connect independent components through event-driven queues and pipelines.
Jack Shirazi
Back to newsletter 148 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips148.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us