Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips February 2015
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 171 contents
Performance Impact of an IO-Intensive Application (Page last updated November 2014, Added 2015-02-28, Author Ross Mason, Publisher mulesoft). Tips:
- Application performance degrades when disks get stressed.
- Disabling file system journaling is unlikely to provide any speedup because the journal file is in the same block as the file to be written.
- The bottleneck of an IO-intensive app is normally when the system flushes the dirty pages to disk.
- A 15K RPM disk could reach a bandwidth of 120MB/sec in the best case of sequential access. A typical flush policy interval would be 30 seconds. If an application wrote 600MB before triggering a flush, the flush would take 5 seconds (best case) or more, during which the page cache is used exclusively and disk IO is maxxed.
- On Linux the disk queue length is the "Writeback" value in /proc/meminfo or the "avgqu-sz" in sar.
- If an application thread is writing while the kernel is flushing and a GC starts at this time, the GC cannot proceed until the flush completes which then lets the application thread proceed to a safepoint. This can make a long pause (in the GC log, you would see a long pause but not very much usr nor sys time).
- With Linux, various swappiness values (including the likely default) causes IO when the system comes under memory pressure, to make room in the page cache - and the swapped pages have to be swapped back in too. Disabling swappiness (by setting it to 0 in /proc/sys/vm/swappiness) would make the kernel flush the dirty data more frequently which will in turn increase the IO pressure potentially making IO worse.
- Review the kernel's flushing policy for your specific IO workload - adjust /proc/sys/vm/dirty_expire_centiseconds and /proc/sys/vm/dirty_background_ratio .
- Storing frequently accessed files on different devices can help avoid the problem of a single congested device queue.
- SSD has 6 to 7 times the bandwidth of a spinning disk but it occasionally needs to do data compaction, the impact of which could be bad.
- A logical volume manager introduces a small latency overhead that is not negligible when SSD is used.
- Always avoid unnecessary disk access.
Top 10 Most Common Java Performance Problems (Page last updated February 2015, Added 2015-02-28, Author Theodora Fragkouli, Publisher javacodegeeks). Tips:
- Eager fetching produces fewer remote requests but they are more complex so individually slower.
- Lazy fetching produces more remote requests but each is simple and fast.
- In-memory data is faster to access than persisted data, so caching improves performance. Caches must be properly configured so as not to exhaust memory; and hit/miss ratios monitored to ensure they are effective.
- Distributed caching requires cache updates to be propagated but this has overheads which in some situations can make the cache less effective than no cache.
- Pool size is important: too few connections make transactions wait; too many connections can cause database overload giving longer response times. Check how long the application waits to acquire a connection from the pool, and optimise database communications and database structure to minimizae communications.
- Basic bad garbage collection (GC) symptoms are CPU spikes and bad application performance. Produce and monitor GC logs, and configure heap size and where necessary schedule JVM restarts.
- Monitor the heap for memory leaks - increasing the heap may be a sufficient solution, otherwise you need to analyse the heap to determine the leak.
- Deadlocks occur when two or more threads are trying to access the same set of resources and they are each waiting for another one to release a resource. Diagnose deadlocks by getting a threaddump.
- Thread gridlocks occur when too much synchronization makes the threads wait in turn for a single resource. Symptoms include slow response times and low CPU utilization. Diagnose gridlocks by getting a threaddump to see where threads are waiting to acquire a resource..
- Check thread pool utilization and CPU utilization and decide whether to increase or decrease pool sizes: too small a pool will make requests wait to be served; too large a pool will cause CPU overload again slowing down overall request service time.
Efficient data transfer through zero copy (Page last updated September 2008, Added 2015-02-28, Author Sathish K. Palaniappan, Pramod B. Nagaraja, Publisher IBM). Tips:
- Each time data traverses the user-kernel boundary it must be copied, which consumes CPU cycles and memory bandwidth.
- A zero copy request (eg FileChannel.transferTo) has the kernel copy data directly from disk to a socket without going through the application, improving application performance and reducing context switches.
- In applications that do a great deal of copying of data between channels, a zero-copy technique can offer a significant performance improvement if one of the channels is a FileChannel.
Tuning Java Servers (Page last updated November 2014, Added 2015-02-28, Author Srinath Perera, Publisher InfoQ). Tips:
- Use profiling to solve three goals: Improve throughput; Improve latency; Find and fix leaks.
- Your goal is to achieve maximum throughput while keeping the latencies within acceptable limits.
- Application performance is decided by the scarcest resource in the system (the bottleneck).
- Server performance is limited by one of: CPU; IO; Waiting to acquire reources.
- Unix "Load average" represents the number of processes waiting in the OS scheduler queue. Load average will increase when any resource is limited (e.g. CPU, network, disk, memory etc.). A load average of more than 4x number of cores is a (too) high load.
- If performance targets are not being met and the machine has unused capacity, you should: test increase concurrent request load; check for locks; increase thread pool size; check the network has additional capacity.
- If performance targets are not being met and the machine is fully loaded, you should: check for other processes loading the machine; CPU profile the application if its CPU usage of the application is high; check if garbage collection is taking more than 10% of application elapsed time; check for IO load; check if the machine is paging.
- If you have tuned a server and still not reached acceptable performance for given concurrent loads, you can either consider scaling horizontally or a redesign.
- Disk access, network access, and locks are common causes of long-running operations causing high latency.
- To reduce IO impact on latency: Avoid unnecessary IO operations; batch IO operations; prefetch data.
- Avoid synchronized blocks and locks as much as possible - concurrent data structures from java.util.concurrent package can help.
- Try to release locks as soon as possible after acquiring them; minimize long-running operations such as IO while holding a lock.
Improving lock performance in Java (Page last updated January 2015, Added 2015-02-28, Author Vladimir Sor, Publisher plumbr). Tips:
- Lock contention occurs when thread A is trying to enter a synchronized block/method currently executed by thread B; thread A has to wait until thread B exits the synchronized block, thus releasing the lock.
- Synchronization in the JVM is optimized for the uncontended case (a thread entering a synchronized block where it already owns the lock) and this case poses almost no overhead during execution.
- Surround data access and updates with locking, not full code. In particular, synchronizing the whole method might lock for too long, instead synchronize just the block that handles data.
- Lock only what is necessary, minimize the scope of the locked block.
- Use a lock that is specific to the data being operated on; multiple locks for each data item is more scalable than one lock being used across multiple different structures.
- Concurrent data structures (like ConcurrentHashMap) which are designed to minimize or avoid locks tend to be a useful alternative to let you avoid or minimize locking.
- Only expose access to locks for the code that needs that access.
- Atomic operations let you avoid the need for a lock.
BigList: a Scalable High-Performance List for Java (Page last updated November 2014, Added 2015-02-28, Author Thomas Mauch, Publisher DZone). Tips:
- A memory efficient data structure should: minimize the overhead of the structure itself; store primitives efficiently; avoid copying large chunks.
- It is not possible to make a copy of a huge collection because of memory limitations; such collections need to efficiently provide views on the underlying data without copying or altering the data.
- If building a segmented block data structure where elements can be added, it is efficient to leave spare capacity in a block to allow inserts to be performed without having to split the block on each insert.
- JVM is fast in boxing and unboxing primitive values, but garbage collection overheads can make it dramatically slower.
Back to newsletter 171 contents
Last Updated: 2018-11-28
Copyright © 2000-2018 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us