Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips January 2013
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 146 contents
When Premature Optimization Isn't (Page last updated November 2012, Added 2013-01-28, Author Dustin Marx, Publisher marxsoftware). Tips:
- There is nothing wrong in early optimization if neither readability nor maintainability are damaged, and the time taken is negligible.
- There is a difference between design optimization and code optimization - optimization is usually necessary at the design phase, and is not premature; not optimizing at the design phase can easily lead to bad architecture.
- A coarse-grained service API is a valid design optimisation over a fine-grained API because it is well known that making many fine-grained calls is likely to lead to bad performance.
- Inappropriate code optimization may reduce maintainability and introduce bugs.
- Some forms of "premature" code optimisation are so well known that they are probably valid - for example using a StringBuilder/StringBuffer instead of repeated string concatenation, when building a string that would take many concatenations; and short-circuiting evaluations in conditionals. [OTOH, I've seen an article that shows short-circuit evaluation can stall the later evaluations on pipelined multi-core systems, making it slower in the average case, and were the JVM to ever implement String concatenation analysis, it's possible that mutliple code level concatenations could actually be more efficient than using a StringBuilder].
- Appropriate choices of data structures and algorithms are optimisations that are NOT premature.
- A warning sign or premature optimisation is when you start sacrificing clarity and reliability to chase some vague notion of performance.
Scalability is Easy! (To Get Wrong) (Page last updated December 2012, Added 2013-01-28, Author Eli Weinstock-Herman, Publisher lessthandot). Tips:
- Identify the bottleneck before you (try to) fix it - or you are "fixing" the wrong thing and potentially making performance worse.
- You can potentially improve the rate of a process by identifying and exploiting the constraints on that process.
- If you can switch to an API that allows you to submit several requests in a batch, that is likely to be much more efficient when making high volume external requests.
- Queue bursty requests feeding into a rate-limited system, so that the rate limit is not exceeded.
- You need to test using real intended scenarios, data rates and concurrency, or you'll find the wrong bottlenecks.
Three fundamental tricks for developers writing distributed systems (Page last updated December 2012, Added 2013-01-28, Author Pedro Belo, Publisher herokuapp). Tips:
- Use a database-based queue to run jobs that propagate data to other systems, sharing the connection between your queuing library and ORM to maintain the ACID properties.
- Enqueue IDs, not data, that enables the processing consumer to operate on coherent data.
- Ensure your distributed architecture is idempotent - network failures, broken sockets and timeouts happen only too often, so make sure you can retry any job without consequences if it gets received more than once.
- An idempotent system can easily have multiple consumers processing the same request, using the fastest result.
Efficient concurrent long set and map (Page last updated November 2012, Added 2013-01-28, Author Walter Bauer, Publisher Censhare). Tips:
- copy-on-write data structures that copy all the data, like CopyOnWriteArrayList, perform an expensive data copy so should only be used where infrequent writes occur.
- A copy-on-write data based on tree based structures can create a new tree with a new root node referencing unchanged nodes of the "old" tree, thus reusing existing data without creating a full copy of all the data. Such a structure can be used efficiently even when writes are frequent.
- [Article describes building a concurrent trie based map for long keys and values].
- Long.bitCount is fairly efficient, and since java version 1.6.0_18 is intrinsic - which means the compiler directly uses the corresponding machine instruction if that is available.
- A technique to cut down on the number of objects held is to store an array of primitives held by objects in a single object holding a primitive array, using offsets to access the data. This has the drawback that the object holding the array must implement it's own memmory management.
The fork/join framework in Java 7 (Page last updated December 2012, Added 2013-01-28, Author Patrick Peschlow, Publisher The H Developer). Tips:
- ThreadPoolExecutor has a central inbound queue for new tasks which is shared by all worker threads. It doesn't provide support for having multiple threads collaborate to compute tasks (you would need to implement sharing outside of the executor pool). ForkJoinPool does support multiple threads collaborating on compute tasks.
- Each worker thread in the ForkJoinPool has its own task queue (a double ended queue - termed dequeue) to which it can add and take tasks with no contention. When their own task queues are empty, threads steal tasks from other worker thread queues efficiently with almost no contention (they steal from the back of the queue). If no tasks are present in any worker thread queue, they access the shared inbound queue.
- If local worker threads never add (sub)tasks to their own queue, ForkJoinPool is just a ThreadPoolExecutor with an extra overhead.
- RecursiveTask returns a result, the RecursiveAction does not - so using RecursiveAction will be more efficient if you don't need the result of the task.
- ForkJoinPool task stealing provides in-built load balancing across worker threads.
- The ForkJoinPool number of threads is not a hard limit, because if a thread has to wait for results from other tasks before it can complete it's task, the ForkJoinPool recognises this leaves a temporary idle unschedulable thread (called resting), and so it allows itself to create a new worker thread to maintain the overall active thread pool size.
- If input tasks are already split (or are splittable) into tasks of approximately equal computing load, then the additional overhead of ForkJoinPool's splitting and work stealing make it less efficient than just using a ThreadPoolExecutor directly. But if tasks have variable computing load and can be split into subtasks, then ForkJoinPool's in-built load balancing is likely to make it more efficient than using a ThreadPoolExecutor.
- With many very short tasks, competition for accessing the ThreadPoolExecutor's shared inbound queue means that the ThreadPoolExecutor's threads can spend a lot of their time waiting rather than processing.
- ForkJoinPool provides an asyncMode parameter. When set, each worker thread processes its local task queue in the order in which the tasks were scheduled (FIFO), rather than the default stack based (LIFO) mode used in the local worker's queues. This can cause more overhead but increases "fairness" of task scheduling.
- If worker threads are constantly generating new tasks for themselves, new tasks on the central inbound queue can be left waiting; the latest ForkJoinPool implementation (available from Doug Lea?s "Concurrency JSR-166 Interest Site"; changes may or may not be in Java 8) has done away with the central inbound queue and instead distributes tasks directly to the threads' task queues.
Best Practices for Load Testing Mobile Applications (Part I & II) (Page last updated December 2012, Added 2013-01-28, Author Steve Weisfeldt, Publisher neotys). Tips:
- Load testing mobile applications has most of the same requirements as any load testing - you need to correctly simulate many users concurrently operating both similar and different activities.
- You need to simulate lower bandwidth and higher packet loss when load testing mobile services. To do this you need to record and reproduce the network traffic between the device and the server, probably by using a proxy, with a root certificate provided so that https interactions can be recorded.
- If a mobile device is connected for longer, front-end servers hold sockets longer, load balancers have more active TCP sessions, and application servers use more threads. Lower bandwidths and higher packet losses typical of mobile network connections will tend to make the socket connections last longer as the communications needs longer to complete.
- Bandwidth simulation support should be integrated in the load test itself (either via a tool or by coding the tests to provide limited througput from the client end).
- Efficient server-side application tailor the content delivered to the client mobile device according to it's limitations, e.g. by usingthe user-agent header in browser based applications. So the load test needs to simulate multiple mobile identities so that the load test correctly models varied user devices.
- Load tests need to replicate parallel access behavior of mobile browser, correctly simulating the appropriate number of parallel connections from any one mobile browser.
- Generating load spikes similar to that achieved from a successful promotion or new release can be difficult with in-house tools - you might consider a cloud based load test to be able to generate sufficient load (and from different geographical regions).
- Taking an average of results with significant variation does not provide an accurate picture of what is really happening. To gain meaningful insights and to validate your SLAs and performance requirements you need to analyze the results for each kind of user in more depth.
- Recording mobile test scenarios, conducting realistic tests that simulate real-world bandwidth and browser characteristics, and properly analyzing the results are some of the key areas that require different capabilities for load testing mobile applications compared with load testing other applications.
Back to newsletter 146 contents
Last Updated: 2019-04-29
Copyright © 2000-2019 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us