Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Tips March 2012
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 136 contents
Optimizations Can Have Side Effects (Page last updated February 2012, Added 2012-03-27, Author Andrew Koenig, Publisher Dr. Dobb's). Tips:
- Optimizations can affect performance and code paths in an unpredictable way - and can unexpectedly cause worse performance for some inputs.
- Memory lookups are affected by where the memory is stored, and any underlying optimizations and padding. If memory is (inadvertently) stored so that some items block concurrent access to others, this will cause degraded performance.
- Because performance tuning involves making the most common operations faster, if you are doing something common, it's usually faster than if you are doing an uncommon operation.
- Doubling the size of the input could produce very different performance, as the code path might encounter inefficiencies that are not present for the lower volumne input.
- Make sure you measure your performance for multiple types of possible scales.
Java Concurrency / Multithreading - Tutorial (Page last updated March 2012, Added 2012-03-27, Author Lars Vogel, Publisher vogella). Tips:
- Concurrency can improve the throughput and performance of a program if some tasks can be performed in parallel. The theoretical possible performance gain of parallelising can be calculated by Amdahl's Law (F+ ((1-F)/n)) where F is the percentage of the program which can not run in parallel and N is the number of processors.
- Concurrency problems are of visibility and access: visibility/read-write - one thread reads shared data which is changed by another thread - what data does the thread read? access/write-write - multiple threads try to write to the same shared memory - what value does the memory have? Problems that can result are: Liveness failure - e.g. deadlocks; Safety failure - e.g. corrupt data.
- The synchronized keyword ensures: that only a single thread that locks the monitor associated with the synchronized object can execute a block of code at the same time, and that the thread that acquired the lock in the synchronized block sees the effects of all previous modifications that were guarded by the same lock.
- A variable declared as volatile guarantees that any thread which reads the field will see the value most recently written by any thread.
- The operation i++ is not atomic.
- AtomicInteger, AtomicLong, etc provide methods like getAndDecrement(), getAndIncrement() and getAndSet() which are atomic operations.
- The simplest way to avoid concurrency problems is to share only immutable data between threads.
- Creating a new thread has some performance overhead; Too many threads can lead to reduced performance as the CPU needs to switch between the threads.
- Non-blocking algorithms, which are usually based on the compare-and-swap operation, are typically faster than blocking algorithms.
Dealing with real-time data in UIs (and others) (Page last updated February 2012, Added 2012-03-27, Author Gabriele Carcassi, Publisher java.net). Tips:
- You need to schedule updates to the GUI using a Runnable dispatched onto the UI thread, or you will generate a race condition.
- An update scheduled to run in the UI thread can happen at any time, not under your control, so you need to ensure that the update references the correct data, and that changes reschedule updates to the UI.
- Passing immutable copies to the screen update Runnable is good practice to ensure that you detreministically specify what will be displayed.
- Screen updates faster than a couple of times per second may be precise but are probably be counterproductive (unless you are generating moving pictures).
- Queueing tasks too fast for the event thread can cause performance and memory problems.
- When processing asynchronous events, decouple the rate at which they are received from the rate at which they are processed and the rate at which they are dispatched; aggregate where necessary so that a number of events on one side generates a single event on the other.
- Avoid computation on the UI thread so it only needs to do simple screen updates and nothing intensive.
- UI updates above 50 times per second (50 Hz) are completely unnecessary.
- Options for when queued data grows too fast: keep only the last; keep the last N; keep the last N seconds; queue indefinitely; save to disk; send an alert.
- If the target system is not able to handle the rate of data you are pushing to it, you need to skip or aggregate data pushes so that you don't overwhelm it.
Attila Szegedi on JVM and GC Performance Tuning at Twitter (Page last updated February 2012, Added 2012-03-27, Author Charles Humble, Publisher infoq). Tips:
- Include service discovery capability in the client and enable fast failover when the current server doesn't respond in time.
- Finagle lets you to write scaling non-blocking IO clients and servers with fail over load balancing, retry and some capabilities baked right in.
- For most data storage needs, you need to have a sharding strategy.
- Tune the GC by inspecting the GC logs, see the overall utilization of memory, memory patterns, GC frequencies - and tune using that data.
- Log GC in production, GC logging is low overhead: GC will only log when it does something - if it's too frequent then your problem is not the logging, your problem is that it's doing something too frequently; when it's infrequent then the cost of writing a few lines to the log is completely negligible.
- The best way to minimize the frequencies of your GC is to give your JVM as much memory as possible. The frequency of minor GCs is inversely proportional to the size of the new generation; you want to avoid old generation GCs altogether if possible.
- If you have tuned the JVM as well as you could and your GC performance is still unacceptable, you need to go to the application and try to see whether you can ease the pressure on the GC by tuning the code for reduced GCs.
- Code level improvements that can reduce GC pressure include: more memory efficient representations of data; choosing the right data types to represent the data; minimize garbage generation; (desperate measure) using larger chunks of memory either 'on-heap or off-heap'.
- Weak references eagerly deleted by the garbage collectors when the reference is no longer accessible, so they don't normally contribute to memory pressure.
- Soft references contribute to memory pressure but throughput collectors clear them all at once when memory fills up while CMS gradually clears them, so while you do get this memory sensitive gradual eviction of soft reference data, you also get increased unpredictability of your garbage collectors and that's not really what you want with CMS.
- The best thing you can do to improve the concurrent performance of programs is to have completely isolated threads that never need to communicate. If you need to access shared data but most of it is immutable, you're again fine because multiple threads can just access them without synchronization.
- When you actually identify performance bottlenecks with the synchronized keyword, then it makes sense to try to think about whether you can apply techniques like removing synchronized blocks and using volatile and java.util.concurrent atomic constructs.
Fun with my-Channels Nirvana and Azul Zing (Page last updated March 2012, Added 2012-03-27, Author Martin Thompson, Publisher Mechanical Sympathy). Tips:
- Always work on the most limiting factor, because when it is removed the list below it can change radically.
- For inter-process communication on the same machine, you can use shared memory via memory-mapped files in Java and this will be faster than TCP loopback.
- Avoid locks because they cause context switches which causes pauses and jitter.
- For tightest latency, don't have more threads than cores.
- Use affinity of threads to cores and sockets to minimize context switching
- Long call stacks causes longer GC pauses as the garbage collector has to walk the stacks.
Ultimate Sets and Maps for Java, Part I (Page last updated December 2011, Added 2012-03-27, Author Ilya Katsov, Publisher highlyscalable). Tips:
- Open address hashing is usually more memory efficient than hashing with chained nodes; libraries that use open address hashing include fastutil, Trove, and Colt.
- With Open address hashing the greater load factor, the worse the performance.
- The performance of a contains() search operation on an open address table may be improved by additional reordering of elements during insertion; One possible approach is the Brent algorithm.
- Open address hashing shows bad worst case search performance; Cuckoo Hashing offers a trade-off of worse memory consumption (compared to open address hashing) but more consistent performance of lookups.
- For small sets an array backed set with binary search provides good performance and an excellent memory footprint.
Back to newsletter 136 contents
Last Updated: 2017-10-01
Copyright © 2000-2017 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us