Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Tips October 2013
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 155 contents
Writing and Testing High Frequency Trading Engines in Java (Page last updated September 2013, Added 2013-10-28, Author Peter Lawrey, Publisher DZone). Tips:
- For low latency you need simplicity, the critical path must do less.
- For low latency a key driver is how easy it is to take operations out of the critical path.
- In low latency you need to understand completely what the critical code is doing, and combine layers to minimise the work done while achieving the same outcome.
- For low latency you need to avoid GCs during critical path operation.
- For low latency you need to avoid context switching - don't yield the critical threads, instead spin (or at least busy wait) to retain the CPU on the core, ready for next operation. Binding specifc threads to a core is also a good idea (if 50 microseconds matter).
- Lock free coding avoids the very bad outliers you can get with locks
- With low latency you should use direct memory access for critical structures (allows precise control of layout and garbage collection).
- In low latency systems, you need to specifically monitor for outliers and build in tools which will identify what causes them.
- You only need to use "unnatural" Java in the low latency critical code sections, most of the code should be coded normally.
- Recycling mutable objects can eliminate GCs - but that's only useful in low latency critical code.
- Kernel bypass network adapters can be useful for low latency.
- double and long arithmetic is 100x faster than BigDecimal (and Date objects too)
- Use primitive specific implementations of collections (like trove) for primitive data collection manipulation.
All Scalability Problems Have Only a Few Solutions (Page last updated September 2013, Added 2013-10-28, Author Frank Kelly, Publisher softarc). Tips:
- Instrument your code and environment so you can identify where the bottlenecks are.
- Scalability problems are solved by: 1. Doing less work (simplest option); 2. Tuning what you have (simple to moderate difficulty); Throwing more resources at it (moderately difficult); Fixing the code (difficult); Rearchitecting (most difficult option).
- Do less work: cut out unnecessary operations; use faster inaccurate solutions of that is acceptable; go for eventual consistency instead of immediate.
- Use tuning options (usually configuration) already available to you: OS memory and heap tuning: nic and TCP stack tuning; set OS priorities; increase isolation of your app from others; compression; disk speed options; more threads; indexes; etc.
- Throwing more hardware at performance problem is often a cheap option that quickly solves the issue.
- Code tuning is expensive, but cheaper than re-architecting and redesigning. Look at: algorithm efficiency; batching; database comms; caching; decompose problems to simpler ones; avoid repeating calculations; avoid locks.
What are Reentrant Locks? (Page last updated September 2013, Added 2013-10-28, Author Anirudh Bhatnagar, Publisher DZone). Tips:
- The "synchronized" keyword uses intrinsic locks (monitors) associated with every object. These locks are not interruptible, nor can you try to acquire one with a timeout; and they are limited in scope to the blocks which acquire them.
- ReentrantLock allows you to: interupt the lock; try to obtain it with a timeout; acquire and release locks acrosss multiple methods; poll for locks; apply a fairness policy in acquiring locks.
- You can avoid deadlocks by applying lock ordering (all threads always acquire locks in the same order); but using ReentrantLock.tryLock() methods with retries or timeouts ot both is a simpler mechanism.
- Use ReentrantLock.lockInterruptibly() to allow locked operations to be cancelled by being interruptible.
- ReentrantLocks can be fair or non-fair locks. Fair locks are acquired in the order in which they were requested; unfair locks have non-deterministic acquisition ordering. Fair locks are less efficient because of the contxt switching overheads; unfair locks will often have better performance (the current thread can acquire the lock without being suspended off the CPU).
5 Coding Hacks to Reduce GC Overhead (Page last updated July 2013, Added 2013-10-28, Author Tal Weiss, Publisher takipiblog). Tips:
- Reducing the number of temporary allocations increases throughput.
- Strings are immutable - when you concatentate strings you are implicitly creating temporary strings, and sometimes even implicitly creating sting builder objects. Build your strings explicitly with your own StringBuilder objects, avoid intermediate strings.
- Allocate collections with sufficiently large initial capacities to avoid them having to resize.
- If holding primitive datatypes in collections, use specialised primitive collections (like Trove).
- Most persistence libraries such as Java?s native serialization, Google?s Protocol Buffers, etc. are able process data as a stream, without having to keep the full data in memory. If available, go for that approach vs. loading the data into memory.
- Rather then building up collections using intermediate temporary collections, pass the fully-sized final collection to the processing methods and have them aggregate as they go into the one collection.
- Don't keep or cache objects unnecessarily - they're long-lived so put extra stress on the garbage collector.
- Improve performance only through testing, not guesswork.
The Anatomy of APM (Page last updated September 2013, Added 2013-10-28, Author Larry Dragich, Publisher JDJ). Tips:
- End-User-Experience (EUE) measurements are a key vehicle for demonstrating performance.
- Top Down Monitoring, also referred to as Real-time Application monitoring, focuses on the End-User-Experience (EUE). Passive EUE monitoring consists of tracking application traffic through the system and using that data to infer EUE; Active EUE monitoring consists of synthetic requests made identical to real user requests.
- Bottom Up Monitoring, also referred to as Infrastructure monitoring, provides system monitoring including at minimum, up/down monitoring of system nodes and trend metrics. Supports event correlation.
- Collecting Metrics is essential for an APM strategy to be successful. Use 5 minute averages for real-time performance alerting, and percentiles for Service Level Management. Use the metrics to create baselines that make it simpler to identify anomalies.
- Incident Management (with change management and problem management) is a pillar of APM, bringing together the various other strands, alerts, trends and metrics analysis to identify issues.
- Events can become alerts which can become incidents which cause problem tickets to be raised that get resolved.
Write Optimization: Myths, Comparison, Clarifications (Page last updated September 2011, Added 2013-10-28, Author Leif Walsh, Publisher tokutek). Tips:
- There is a tradeoff between write and read speed in B-trees. But this is currently an implementation tradeoff, it is theoretically possible to improve performance for both at the same time with current implementations as they are well away from the theoretical performance limits (when there would be a definite tradeoff penalty to improve one over the other).
- The most write-optimized structure is to simply log all the data in the format it is received. The read performance is now terrible as the entire dataset needs to be examined to identify required data.
- The most read-optimized data structure would re-sort and reorganize data including indexes so that any request for data is a simple lookup and retrieval. This makes insertions extremely slow.
- B-trees are one compromise between read and write speed, but as the data structure grows, their performance degrades. Write optimizations of B-trees involves maximizing the useful work accomplished each time a leaf is touched.
- B-trees are a couple of orders of magnitude faster at inserting data in sequential order compared to random order because the last insertion leaves the proximate leaf in memory for the next insertion.
- Using a write buffer allows you to batch together insertions that target the same leaf. The larger the batch, the bigger the potential speedup.
- The more indexes you have associated with the B-Tree data, the slower insertions will be.
Back to newsletter 155 contents
Last Updated: 2017-10-01
Copyright © 2000-2017 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us