Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips August 2008
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 093 contents
http://www.ibm.com/developerworks/java/library/j-jtp11137.html
Stick a fork in it, Part 1 (Page last updated November 2007, Added 2008-08-28, Author Brian Goetz, Publisher IBM). Tips:
- From Java 5 java.util.concurrent provides a set of useful components for building concurrent applications: concurrent collections, queues, semaphores, latches, thread pools, and so on. These mechanisms are well-suited to programs with a reasonably coarse task granularity; using these applications need to partition their work so that the number of concurrent tasks is not consistently less than the number of processors available.
- Going forward, the hardware trend is clear; Moore's Law will not be delivering higher clock rates, but instead delivering more cores per chip. Programmers need to work with this trend of ever increasing numbers of cores, rather than expecting faster cores - this implies being able to use an abitrary number of cores for optimal efficiency.
- Java 7 will include a framework for representing finer-grained parallel algorithms: the fork-join framework, usable for scaling to thousands of CPUs. You can download it separately and use it with Java 6 or later. It will eventually live in the package java.util.concurrent.forkjoin.
- Use additional memory to reduce the number of I/Os by caching the results of previous I/O operations.
- Shorten the amount of time CPU-intensive requests take by parallelizing them. This improves the user's perception of performance even though it may require more total work to be performed to service the request, because results are received faster.
- Divide-and-conquer algorithms are often useful in sequential environments but can become even more effective in parallel environments because the subproblems can often be solved concurrently.
- The Divide-and-conquer algorithm is basically to recursively: split the problem in two; solve each half; combine the half solutions; stop splitting the (sub-)problem when it is small enough to solve very quickly in one thread.
- It is typical of fork-join implementations that the process of recursively dividing the problem will create a potentially large number of new objects. This needs to be checked for and minimized.
- A ForkJoinExecutor is like an Executor in that it is designed for running tasks, except that it specifically designed for computationally intensive tasks that do not ever block except to wait for another task being processed by the same ForkJoinExecutor.
- The fork-join framework is fairly insensitive to exact number of cores available - as long as you avoid choosing completely unreasonable parameters for the problem or the underlying hardware, you will get good results with little tuning. Choosing Runtime.availableProcessors() as the number of worker threads generally offers close to optimal results.
http://www.ibm.com/developerworks/java/library/j-jtp03048.html
Stick a fork in it, Part 2 (Page last updated March 2008, Added 2008-08-28, Author Brian Goetz, Publisher IBM). Tips:
- Coarse-grained task boundaries (such as processing a single request in a Web application) and executing tasks in thread pools usually provides sufficient parallelism to achieve acceptable hardware utilization. But as cores multiply we'll need to get more sophisticated to find enough parallelism to keep the hardware busy.
- [Article describes using the fork-join framework to spread merge sort efficiently across multiple cores].
- Even with more processors, parallelization still can't turn an O(n log n) problem into an O(n) one, but a more parallelizable problem spread across multiple cores can reduce the total elapsed time even if it takes more total CPU cycles to do the work in parallel than it does sequentially.
- In Java 7 a ParallelArray object represents a collection of structurally similar data items, and you use the methods on ParallelArray to create a description of how you want to slice and dice the data. You then use the description to actually execute the array operations (which uses the fork-join framework under the hood) in parallel.
- Multiple ParallelArray filter operations are permitted, though it is often more efficient to combine them into a single compound-filter operation.
- If you can express a transformation in terms of the operations provided by ParallelArray, you should get reasonable parallelization for no effort.
http://java.dzone.com/news/javaone-brian-goetz-concurrenc
Brian Goetz on concurrency in Java 7 (Page last updated May 2008, Added 2008-08-28, Author Alex Miller, Publisher JavaLobby). Tips:
- http://www.gotw.ca/images/concurrency-ddj.gif Graph from http://www.gotw.ca/publications/concurrency-ddj.htm Herb Sutter's article "The Free Lunch is Over" of the clock speed and transistor count of Intel chips over time, shows CPU speeds linearly increasing until 2003, then flattening out, but transistor counts (in the form of multiple cores) continuing to increase linearly along thee previous incline, thus implying that Moore's law will continue specifically as multi-core increases rather than faster cores.
- Current frameworks in Java do not scale up to many-core boxes, which will become increasingly prevalent. The shared queues and other infrastructure used by executors and thread pools becomes a point of contention and reduce scalability.
- The fork-join framework is designed to address fine-grained parallelism that will be needed to keep all your cores cranking away on CPU-intensive tasks. Fork-join is a divide-and-conquer style framework that is easy to execute and provides for a high degree of fine-grained parallelism.
- The ForkJoinExecutor allows you to submit a task for processing. Each task is broken (recursively) into smaller pieces until some minimum threshold is reached at which point processing occurs.
- Fork-join is implemented using "work-stealing" - every thread has its own dequeue and only that thread reads from the head of the queue; If any thread runs out of work, it steals work from the tail of another queue. Because the initial biggest jobs are placed at the tail of each queue, workers steal the biggest task available, which keeps them busy for longer. This reduces queue contention and also provides built-in load balancing.
http://video.google.com/videoplay?docid=-3975461488578314796
The New NIO, aka JSR-203 (Page last updated May 2008, Added 2008-08-28, Author Alan Bateman, Carl Quinn, Publisher Google). Tips:
- Java 7 java.nio.file new filesystem API - a richer interface to the filesystem including symbolic links, inode info, file attributes, copying files, moving files, path, directory and file operations.
- Java 7 newInputStream and newOutputStream, compatible with old Streams, but with new functionality and NIO channel support.
- Java 7 SeekableByteChannel is a richer NIO version of RandomAccessFile.
- Java 7 DirectoryStream, iterates over directory contents without requiring large amounts of memory for directories with many files (as compared to File.list() which always creates an array holding all file names, can be very big). Includes filters to efficiently find specific files.
- Java 7 Files.walkFileTree allows iteration over all files under a specific directory node, to all depths of the file system.
- Java 7 java.nio.file supports file change notifications, uses OS level notifications where possible.
- Java 7 can interpose on filesystem access (allowing you to add a logging layer if desired).
- Java 7 adds MulticastChannel support
- Java 7 adds asynchronous I/O for operating system level asynchronous I/O event generation, with support for callbacks and java.util.concurrent.Future, including for sockets. AsynchronousChannels are associated with Groups which in turn are associated with a thread pool which handle the asynchronous events. Includes timeout and cancellation support.
- Java 7 adds support for 64-bit buffers, & a management pool bean for buffers
- Java 7 SecureDirectoryStream enables iteration with avoidance of race conditions, for example when deleting recursively and another process is creating in the same hierarchy.
http://www.javaspecialists.eu/archive/Issue159.html
The Law of Sudden Riches (Page last updated May 2008, Added 2008-08-28, Author Dr. Heinz M. Kabutz, Publisher The Java Specialists' Newsletter). Tips:
- Additional resources (faster CPU, disk or network, more memory) for seemingly stable system can make it unstable.
- A faster system can expose race conditions that were previously hidden, actually degrading system performance and stability (until the newly exposed race condition is fixed).
http://java.dzone.com/news/design-scalability
Design for Scalability (Page last updated May 2008, Added 2008-08-28, Author riho, Publisher JavaLobby). Tips:
- Scalability is ensuring that performance does not degrade beyond acceptable response times when load increases.
- The workload consists of: Number of users, Transaction volume, Data volume, Response time, Throughput.
- Rank the importance of traffic so you know what to sacrifice in case you cannot handle all of them.
- Scale the system horizontally (adding more cheap machine), rather than vertically (upgrading to a more powerful machine).
- Keep your code modular and simple
- Don't guess the bottleneck, Measure it
- Write performance unit test so you can collect fine grain performance data at the component level
- Setup a performance lab so you can conduct end-to-end performance improvement measurement easily
- Do regular capacity planning. Collect usage statistics, predict the growth rate
- Split the work into smaller chunks and assign each chunk to a pool of worker machines - when the work grows, you just need to add more workers into the pool
- Google's Map/Reduce and the open source Java framework Hadoop are helpful for parallelizing tasks
- Reuse previous results - Memcached and EHCache are popular caching packages
- To avoid idle threads, use asynchronous procssing where possible.
- A wrong concurrent access model can have a huge impact on your system's scalability. Try to use Lock-Free data structures.
Jack Shirazi
Back to newsletter 093 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips093.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us