Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips November 2013
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 156 contents
Reddit: Lessons Learned from Mistakes Made Scaling to 1 Billion Pageviews a Month (Page last updated August 2013, Added 2013-11-29, Author Todd Hoff, Publisher highscalability). Tips:
- Use SSDs as cheap RAM, not expensive disk.
- Let users feedback easily to drive improvements.
- A site doesn't need to scale immediately, as long as it's built to be modularly replaceable. Finding the bottlenecks as it becomes successful and improving sclaing there is a valid strategy.
- Logged in users should be prioritised over non-logged in users.
- Cloud based services suffer from higher network latency and noisy neighbors (so plan to work around it); but you can more easily grow as you need to.
- The more modular and API defined your system is, the easier it is to isolate and fix issues.
- When passing work between components putting it into a queue gives you buffering and queue size monitoring and a host of other possible optimisations.
- Event based handling can hit a wall which requires rearchitecting; the alternative thread based handling requires thread pool tuning but could scale more easily.
- Data is the hardest thing to move. The bigger the dataset the harder it is to move. Consequently, the data processing components need to be near the data.
- Page rendering is most efficiently done in the clientt (it won't scale well if done on the server).
- You need sufficient monitoring; and a monitoring system that is virtualization friendly.
- Use consistent hashing across the system or caching and resizing caches can be an issue.
- Use a proxy to allow you to redirect traffic - this allows you to send slow and fast traffic to different targets, and also to fix parts of the system more modularly.
- Automate everything - everything should come up and configure itself automatically.
- Limit everything so that you can see when those limits are exceeded, and upstream systems are protected from overload.
- Always assume you are going to have more than one of any component - it will be much easier to scale horizontally in the future.
Write Optimization: Myths, Comparison, Clarifications, Part 2 (Page last updated October 2011, Added 2013-11-29, Author Leif Walsh, Publisher tokutek). Tips:
- The most common thing to do when faced with an insertion bottleneck is to use fewer indexes, but that kills query performance.
- A write buffer gets you a small insertion speedup but doesn't really hurt query times.
- For B-trees, sequential insertions are orders of magnitude faster than random insertions.
- Log-Structured Merge trees uses a combined structure consisting of mutliple B-trees, each larger than the last; when the smaller B-trees gets full it gets dumped into the next biggest; as the B-tree contents are ordered, this means the dumping is efficient, and the smaller B-trees can be in memory so providing fast read query performance. The tradeoff provides some improvement in write performance at the cost of some read performance and increased complexity and an increase in data copying.
- A Cache-Oblivious Lookahead Array is like a Log-Structured Merge tree for insertion performance, but gains back the full read performance of simple B-trees at the cost of maintaining some extra information across B-Tree levels. Cache-Oblivious Lookahead Arrays are on the theoretically optimal write/read tradeoff curve for data structures.
A Painless Introduction to Java's ThreadLocal Storage (Page last updated July 2013, Added 2013-11-29, Author Patson Luk, Publisher AppNeta). Tips:
- Each instance of ThreadLocal can independently store separate values for each thread.
- InhertiableThreadLocal inherits the parent's thread values by default.
- ThreadLocals are held as keys wrapped in WeakReferences in a map held by the thread, with the value being the value you set against that ThreadLocal. Since the map is a different map instance in each thread, the values are specific to each thread.
- When a thread is eligible for garbage collection, the ThreadLocals will get garbage collected. But otherwise they stay around.
- You can sublcass ThreadLocal and override the initialValue() method to assign non-null initial value that applies across all threads. Otherwise you need to initialise the ThreadLocal value for each thread that will need to access the value.
- Because threads can be reused (e.g. for pool threads) unless you directly control the thread throughout it's life, you should cleanup thread locals after you have finished with them, else other procedures may inherit your thread local values when the thread gets reused.
Lightweight Contention Management for Efficient Compare-and-Swap Operations (Page last updated May 2013, Added 2013-11-29, Author Dave Dice, Danny Hendler, Ilya Mirsk, Publisher arxiv). Tips:
- Software-based contention management can improve the performance of hardware based compare-and-swap operations. By using contention management and backoff techniques, you can improve throughput by an order of magnitude at high contention, with only a small overhead at low contention.
- A simple and nearly-optimal technique (of the algorithms tested) for improving compare-and-swap performance at high contention is the constant backoff algorithm. This simply inserts a short wait on compare-and-swap failures before retrying. This algorithm also maintains fairness of updates across threads.
- An alternative implementation to ConcurrentLinkedQueue, using a constant-backoff compare-and-swap can improve throughput by more than a factor of 3 on intel architectures, but underperforms by a factor of 2 on sparc architectures.
- [A Java implementation of AtomicReference wrapped in a constant backoff algorithm is available at http://java.dzone.com/articles/wanna-get-faster-wait-bit ]
A Beginner's Guide to Perceived Performance: 4 Ways to Make Your Mobile Site Feel Like a Native App (Page last updated September 2013, Added 2013-11-29, Author Kyle Peatt, Publisher Mobify). Tips:
- It's not about how fast your site is; It's about how fast your users think it is. All that matters is how the user perceives the speed of your app.
- Improving performance doesn't mean that much to the user unless they actually notice the improvement.
- An improvement in load performance that was accompanied by a visual "spinner" to show users when the load was running, caused users to complain about slower performance; although actual performance was faster, by now drawing attention to how long it was taking, it was perceived as slower by users. An animation transition can hide a wait while distracting the user from even noticing that there was a wait. But note that repetitive animations are annoying so use that sparingly.
- Momentum scrolling is faster native scrolling that can be enabled with the property
- Animations should move at 60fps. All responses should occur in under 100ms to avoid feeling slow; or under 1 second to avoid the user getting frustrated.
- Gesture support (e.g. Side-to-Side Swiping, Pull-to-Refresh, Long Press, Pinch-Zoom) allows your app to present more functionality directly to the user without them having to use menu entries, which makes the app feel more natural and faster.
The Scalable Array (Page last updated August 2013, Added 2013-11-29, Author Peter Karussell, Publisher DZone). Tips:
- For very large data structures, a list of lists allows you to scale to the limits of the machine.
- You can create a list that scales indefinitely by using a list of segments where each segment is itself a list. Copying during expansion is quite efficient as only the segment reference is copied, not the segments themselves. [Deletion of internal elemnts is more problematic, but by adjusting the size of the segment being deleted, this might be kept relatively efficient].
- If your internal datastrcutures have a size which is a power of 2 then you can use the slightly faster bit operations to calculate indexes.
- Having a list type datastructure which consists internally of a list of segments allows you to specialise the segment implementation according to the requirement of the data structure usage, e.g. int arrays or byte arrays or ByteBuffers, etc.
Back to newsletter 156 contents
Last Updated: 2018-12-26
Copyright © 2000-2018 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us