Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Tips March 2014
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 160 contents
Using SharedHashMap (Page last updated March 2014, Added 2014-03-31, Author Jack Shirazi, Peter Lawrey, Publisher Fasterj.com). Tips:
- SharedHashMap is a high performance persisted off-heap hash map, shareable across processes.
- ProcessInstanceLimiter allows you to limit the number of processes running for a class of processes.
- Off-heap memory storage is useful for low latency applications, to avoid GC overheads on that data.
- Low latency applications benefit from "no-copy" implementations, where you avoid copying objects as much as possible, thus minimizing or even avoiding GC.
- Techniques to support "no-copy" implementations include: methods which atomically assign references to existing objects or create the object if absent; using primitive data types wherever possible; writing directly to shared objects; using off-heap memory.
- Externalizable is usually more efficient than Serializable, and can be massively more efficient.
- SharedHashMap is thread-safe across threads spanning multiple processes.
- SharedHashMap supports concurrency control mechanisms for data objects referenced by threads spanning multiple processes.
11 Best Practices for Low Latency Systems (Page last updated January 2014, Added 2014-03-31, Author Benjamin Darfler, Publisher CodeDependents). Tips:
- Scripting languages are not appropriate for low latency. You need something that gets compiled (JIT compilation is fine) and has a strong memory model to enable lock free programming. Obviously Java satisfies these requirements.
- For low latency ensure that all data is in memory - I/O will kill latency. Typically use in-memory data structures with persitent logging to allow rebuilding the state after a crash.
- For low latency keep data and processing colocated - network latency is an overhead you want to avoid if at all possible.
- Low latency requires always having free resources to process the request, so the system should normally be underutilized.
- Context switches impact latency. Limit the thread count to the number of cores available to the application and pin each thread to its own core.
- Keep your reads sequential: All forms of storage perform significantly better when used sequentially (prefetching is a feature of most OSs and hardware).
- For low latency, following pointers through use of linked lists or arrays of objects should be avoided at all costs, as this will require random access to memory which is much less efficient than sequential access. Arrays of primitive data types or structs is hugely better.
- Batch writes by having a continuous writing thread which writes all data that is passed to its buffer. Do not pause to pass data from other threads, pass data to the buffer immediately.
- Try to work with the caches - fitting data into caches is ideal. Cache-oblivious algorithms (ones that work with the cache regardless of its size) work by recursively breaking down the data until it fits in cache and then doing any necessary processing.
- Non blocking and wait-free data structures and algorithms are friends to low latency processing. Lock-free is probably good enough if wait-free is too difficult.
- Any processing (especially I/O) that is not absolutely necessary for the response should be done outside the critical path.
- Parallelize as much as possible.
- If there is a garbage collector, work with it to ensure pauses are absent or within acceptable parameters (this may require highly constrained object management, off heap processing, etc).
Java Marshalling Performance (Page last updated February 2014, Added 2014-03-31, Author Todd Montgomery, Publisher InfoQ). Tips:
- When marshalling: aim to scan through primitive data type arrays; consider boundary alignments; control garbage, recycle if necessary; compute the data layout before runtime; work with the compiler and OS and hardware; be very careful about the code you generate, make it minimal and efficient.
- Having data together in cache lines can make the data access an order of magnitude faster.
- Order fields in their structure (probably naturally ordered by declaration order) according to the order in which they will be accessed.
- Use a (direct) buffer correctly sized and sequentially read/write the elements in the buffer using data primitives and no object creation for highly efficient marshalling.
- FIX/SBE (Simple Binary Encoding) marshalling can be 50 times faster than Google Protocol Buffers.
Top 10 - Performance Folklore (Page last updated January 2014, Added 2014-03-31, Author Martin Thompson, Publisher InfoQ). Tips:
- Sequential disk and memory access is much faster than random access.
- Working off heap allows you to avoid putting pressure on the garbage collector.
- Decoupled classes and components are more efficient. Keep your code cohesive.
- Choose your data structures for the use required.
- Making the procedure parallel is not necessarily faster - you need to use more threads, locks, pools, etc. Make sure you measure the difference. Single threaded solutions can be faster, or at least fast enough.
- If going parallel, use message passing and pipelining to keep the implementation simple and efficient.
- Logging is usually slow. Efficient logging would be asynchronous, and log in binary.
- Beware that parsing libraries can be very inefficient.
- Performance test your application.
Performance tuning legacy applications (Page last updated December 2013, Added 2014-03-31, Author Nikita Salnikov-Tarnovski, Publisher Plumbr). Tips:
- Applications can be optimized forever - you must set performance goals appropriate to the business or you'll waste time and resources tuning beyond the performance needed.
- User response times can be generally categorized as: 0.1 seconds feels like an instantaneous response (normal browsing should be in this range); 1 second is a noticeable delay but allows the user to continue their flow and they still feel in control (searches can fall in this range); 10 seconds makes the user feel at the mercy of the computer, but can be handled (e.g. generating a PDF); over 10 seconds and the user's flow is completely disrupted (so should only be targeted for things the user would expect to resaonably wait for, like end of day report generation).
- Without sensible categorization of performance goals, everything tends to get dumped into the we want instant response" bucket - but this would be very expensive to achieve, so it's important to categorize correctly.
- A site should be optimized against the "Google Page Speed" and "YSlow" recommendations.
- Measure ideally from the user perspective, but at least for full service times of requests. If profiling tools are unavailable or impractical, a detailed logging trail of requests can help you identify underperforming components.
Combining Agile with Load and Performance Testing: What am I in for? (Page last updated January 2014, Added 2014-03-31, Author Tim Hinds, Publisher Neotys). Tips:
- Load and performance testing can determine how much load an application can handle before it crashes, when to add another server, when to reconfigure the network, where code needs to be optimized, and more.
- Performance testing during continuous integration let's you catch issues early when they are cheaper to fix and won't impact release dates. Use SLAs (Service Level Agreements) to provide targets for the continuous integration performance tests.
- Sometimes a small change can lead to disproportionate effects without it being realised beforehand. Integrating load testing into the continuous integration process stops small apparently innocuous changes getting pushed out to production without seeing their performance effect on the system, ensuring that performance targets remain satisfied.
- Performance testing during continuous integration gives developers quick feedback on how their change has affected performance, allowing for changes to be handled in a more "normal" development way, rather than having to wait for results on code that they haven't worked on for a while.
Back to newsletter 160 contents
Last Updated: 2017-10-01
Copyright © 2000-2017 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us