Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips July 2014
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 164 contents
The Beginner's Guide to Mobile Performance Testing (Page last updated May 2014, Added 2014-07-18, Author Tim Hinds, Publisher neotys). Tips:
- Most mobile users expect sites to load at least as fast as on their desktop
- Check the performance and behavior of the mobile application under conditions such as heavy user load, peak usage times, low battery, bad network coverage, low available memory, simultaneous access to application servers by several users.
- Mobiles can suffer from poor network conditions compared to broadband and wifi including packet loss and higher latency: network conditions, the types of device, packet loss, latency and bandwidth all matter.
- Test your mobile application under limited bandwidth to see if your users will get the best user experience and ensure your servers won't have a problem under load.
- Browser-based mobile apps can't be accessed without an internet connection, don't have access to all the device features, and they tend to appear slower and more sluggish to end users compared to native apps.
- You need to replicate the user server load from mobile browsers to effectively performance test.
- You should test Web page rendering on target mobile devices (like CPU, memory or browser rendering engine) as it can affect end-user performance perception.
- Since the code on each mobile device type is in fact different, testing a native application requires performance testing on each target platform as well as maintaining a set of test devices in each selected platform.
- To test performance your mobile app on a specific device you need to use a real device. However to understand the performance of your application under load with various natwork conditions, you will probably be best served by an emulation tool
- Testing mobile app performance requires factoring in the geographic locations of the target devices - they can be distant from your data center, and crossing mobile boundaries as they operate.
Java Performance: The Definitive Guide - Free Sampler (Page last updated April 2014, Added 2014-07-18, Author Scott Oaks, Publisher O'Reilly). Tips:
- Good algorithms are the most important thing for fast performance.
- Simply pruning code can make a significant contribution to performance: it is generally true that a small well-written program will run faster than a large well-written program.
- Performance test all features regularly, new features can impact the performance of older features.
- Avoid known bad coding patterns, e.g. you should guard a log line with an if(log is loglevel) statement to avoid constructing the arguments to a log call that would subsequently be discarded.
- You need to measure and analyse CPU usage, I/O latencies and throughput of all parts of the system before you can determine which component is causing the performance bottleneck.
- The performance test harness can give misleading results, so they need validating in another way.
- The database communication is often the bottleneck.
- Tuning one part of a system to make it faster may result in more load to another part of the system that is already under load, causing the overall system to run slower.
- Optimize code by profiling and focusing on the operations that take the most time.
- The simplest explanation for a performance issue is the most likely cause - don't look for obscure causes unless you have ruled out simpler causes.
- Optimize code paths for the most common cases.
High-speed, multi-threaded virtual memory in Java (Page last updated February 2013, Added 2014-07-18, Author Alexander Turner, Publisher jaxenter). Tips:
- Combine memory and disk with memory-mapped files.
- On a 64bit JVM you can map petabytes of memory into non-heap addressable space, and let the OS worry about whether the memory is in RAM or on disk at any time.
- MappedByteBuffer instances look like normal ByteBuffers, but their memory is virtual: at any moment it might be on disk or in RAM; the OS manages that transparently to the application.
- Normal IO is a threading nightmare: Two threads cannot access the same stream or RandomAccessFile at the same time without causing chaos; You can thread using non-blocking IO but this is complex and requires extensive code changes. Mulit-threaded IO using a memory mapped file is straightforward.
- If multi-threading with MappedByteBuffer, use separate instances in each thread backed onto the same memory to avoid issues with position state.
- When using memory mapped files, your performance is at in-memory speeds as long as the memory mapped segments you are working on fit in memory. Once you go outside this area the OS needs to page out and in the pages you are addressing, so there is IO overhead. This is still highly efficient if you need to address more memory than you have available in RAM, but you should be aware of the tradeoff.
- When using memory mapped files, the OS will try to keep memory pages which having been recently accessed in RAM, thus providing a RAM cache without any engineering.
Lock Free Queue Evolution II: Over The Edge (Page last updated May 2014, Added 2014-07-18, Author Nitsan Wakart, Publisher SkillsMatter). Tips:
- Most applications have bursts of input activity rather than steady input - so you should test with bursts of input to get realistic performance profiles.
- You need to be aware if your becnhmark is hyperthreading, running on cores on the same socket, or running on cores across different sockets, as the performance can be very different.
- ConcurrentLinkedQueue is a non-locking multi-producer/multi-consumer queue which has a lower overhead than an ArrayBlockingQueue
- An enhancement of multithreaded ring buffer queueing is ensure the index is monotomically increasing, so a read miss by the consumer doesn't matter because it knows that the producer index it has is a minimum index (Martin Thompson's addition).
- FastLow enhancement of ring buffer queueing is to use null/non-null entries in the elements as signifying entries the producer can write/the consumer can read.
- A bitwise & on numbers that are powers of two is faster than a modulo, e.g. if X is a power of 2, then X%4 is slower than X&(4-1). So using capacities that are powers of two can help if you eed to frequently calculate offsets (like with ring buffers). The modulo instruction is an expensive insruction on many CPUs, not just for it's time but also because it isn't parallelised on some CPUs.
- JOL (Java Object Layout) tool shows the in-memory structured layout of a particular class. This is useful for optimizations targeted at cache line hits and misses (avoiding false sharing). Two instance variables in the same class that will be written to by multiple threads can cause cache misses because both variables are in the same cache line so each cache has to invalidate it's cache line everytime the thread on the other cache writes. Moving the two variables into different cache lines (by padding the class) allows both variables to be written two by multiple threads reducing cache invalidations (because each thread writes to a different cache line).
- A volatile array does not mean the elements of the array are volatile, they're not.
- The LMAX disruptor has better throughput for steady input, but a FastFlow queue implementation is faster for bursty inputs.
- System.nanoTime called from multiple threads causes a bottleneck.
- Java 7 update 45 is the first Java version to correctly implement memory barriers. You're strongly recommended to be on this or a later version if using concurrency (and note: you are).
Java 8 Friday: 10 Subtle Mistakes When Using the Streams API (Page last updated June 2014, Added 2014-07-18, Author Lukas Eder, Publisher Java, SQL and jOOQ). Tips:
- You can consume streams only once. Reusing them will produce an exception.
- Use Stream.limit() to limit the number of elements processed.
- An infinite stream with a set of filters that fails to correctly filter can run forever without producing anything.
- If you use Stream.parallel() you can use up all your CPUs - possiblt incorrectly.
- The order of filtering on a Stream is significant.
- Files.walk() is not efficient if you want to filter out subtrees from the walk.
- Stream.parallel() does not prevent deadlocks if your downstream consumers lock, you need to code to avoid deadlocks as usual.
9 Principles of High Performance Programs (Page last updated May 2014, Added 2014-07-18, Author Todd Hoff, Publisher highscalability). Tips:
- Work with the system CPU caches, modern computers have idle CPUs while they wait for data to come from main memory.
- Avoid context switching as much as possible - consider how to avoid your hot threads being evicted from the core they're on.
- Batching can help improve performance significantly. Adapt batch sizes to available resources.
- Magic numbers don't scale, they don't adapt to changing circumstances.
- Allocate memory up front.
- Avoid copying.
- Try to maintain predictable memory usage.
- Complete all work queued up for a thread before it goes back to sleep.
- Only signal a worker thread to wake up when its queue becomes non-empty. Any other signal is redundant and a waste of resources.
Back to newsletter 164 contents
Last Updated: 2020-03-30
Copyright © 2000-2020 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us