Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips December 2009
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 109 contents
The problem with SLA monitoring in virtualized environments (Page last updated September 2009, Added 2009-12-28, Author Andreas Grabner, Publisher dynaTrace). Tips:
- In a virtualized environment hardware interrupts are consumed by the virtualization infrastructure which keeps track of "Real Time". The interrupts are then forwarded to hosted virtual machines - but when the VMs are descheduled, interrupts are queued and then played to the VM at a higher rate when it is again scheduled. This means that while time is on average correct, drift frequently occurs and for any given two points in time, the interval measured by the VM is likely incorrect to some extent.
- In a virtualized environment Pseudo Performance Counters are made available via virtual processor registers that can be accessed from any application within the Virtual Machine - and these are the most accurate way of determining time measurements.
WalkingCollection (Page last updated November 2009, Added 2009-12-28, Author Dr. Heinz M. Kabutz, Publisher The Java Specialists' Newsletter). Tips:
- One approach to iteration is to synchronize everything. However, you can still iterate through the vector whilst it is being changed, resulting in unpredictable behaviour.
- To iterate unsynchronized collections, you need to lock the entire collection whilst iterating to avoid a ConcurrentModificationException.
- CopyOnWriteArrayList, copies the underlying array every time it is modified. This is expensive, but guarantees lock-free iteration.
- Controlling the iteration from within the collection makes concurrency easier. The internal iterator can use ReadWriteLock to differentiate between methods that modify the state of the collection and those that do not.
- If you want to use bulk update methods like addAll(), it is more efficient to only acquire the lock a single time rather than on each addition.
Multithreading and the Java Memory Model (Page last updated October 2009, Added 2009-12-28, Author Chris Wong, Publisher JavaLobby). Tips:
- when thread A writes something before thread B reads it, it does not mean thread B will read the correct value. You could ensure that threads A and B are ordered with locking but this doesn't guarantee correct reads because the memory is not necessarily written and read in order, or can be read from a partially written state.
- Code using a variable to detect termination, like
while (!asleep) ++sleep; requires the test variable to be volatile else the compiler can optimise the test variable out of the loop like
if (!asleep) while (true) ++sleep;
- Between threads, if you don't use synchronized or volatile, there are no visibility guarantees for variable data.
- Thread synchronization is enabled by: the synchronized keyword; the volatile keyword; static initialization (happens in the classloader, JVM guarantees thread safety).
- If you read or write a field that is read/written by another thread, you must synchronize. This must be done by both the reading and writing threads, and on the same lock.
- Don't try to reason about ordering in undersynchronized programs.
- Avoiding synchronization can cause subtle bugs that only blow up in production. Do it right first, then make it fast.
Why are Facebook, Digg, and Twitter so hard to scale? (Page last updated October 2009, Added 2009-12-28, Author Todd Hoff, Publisher highscalability). Tips:
- Real-time graphs of information require many requests to combine, so are difficult to scale.
- Traditional websites usually access only single-user data and common cached data, and typically only 1-2% of users are active on the site at one time. Social websites can have a high proportion of users simultaneously active, with many items of realtime data required to be consistently updated across multiple users.
- It is hard to partition a system where many users are connected and have constant update requirements. (Facebook, because of the interconnectedness of the data, didn't find any clustering scheme that worked in practice.)
- Facebook keeps data normalized and randomly distributes data amongst thousands of databases - this approach requires a very fast cache (Facebook uses memcached) - all data is kept in cache and
their caching tier services 120 million queries every second.
- Two alternative approaches to handle combining realtime graphs are: Pull On Demand - gather all data when a request is initiated by a user (this is easier to program and more suited to the web, but limits scalability); and Push on Change - any change is pushed out to all interested users and held until the user requests to view updates (so making that last action very fast).
- Pushing data out to users on each change seems to be a scalable approach for really large numbers of followers. It does take a lot of work to push all the changes around, but that can be handled by a job queuing system so the work can be distributed across a cluster.
Designs, Lessons and Advice from Building Large Distributed Systems (Page last updated October 2009, Added 2009-12-28, Author Jeff Dean, Publisher Google). Tips:
- Assume things will crash - have an implemnetation that automatically deals with that.
- Implement your systems as distributed services with few dependencies and clearly defined interfaces and protocols
- Ensure future proofing by having systems ignore tags they don't understand but still pass the information through.
- Numbers Everyone Should Know: L1 cache reference - 0.5 ns; Branch mispredict - 5 ns; L2 cache reference - 7 ns; Mutex lock/unlock - 25 ns; Main memory reference - 100 ns; Compress 1K bytes with Zippy - 3,000 ns; Send 2K bytes over 1 Gbps network - 20,000 ns; Read 1 MB sequentially from memory - 250,000 ns; Round trip within same datacenter = 500,000 ns; Disk seek - 10,000,000 ns; Read 1 MB sequentially from disk - 20,000,000 ns; Send packet CA->Netherlands->CA - 150,000,000 ns.
- CPUs are fast and usually not the bottleneck, so compression and encoding can efficiently reduce the pressure on memory and bandwidth.
- Don't build infrastructure just for its own sake: Identify common needs and address them; Don't imagine unlikely potential needs that aren't really there; use your own simple infrastructure at first.
- Ensure your design works if scale changes by 10X or 20X - but the right solution for X is often not optimal for 100X.
- Aim for low avg. times (happy users!), but the 90%ile and 99%ile also important; variance is important and cvan be reduced with redundancy and timeouts
- Use caching, higher priorities for interactive requests, and parallelism.
- Make your apps do something reasonable even if not all is right ? better to give users limited functionality than an error page.
- Support low-overhead online profiling: cpu profiling; memory profiling; lock contention profiling
Java Performance Tuning, Profiling, and Memory Management (Page last updated September 2009, Added 2009-12-28, Author vikaswaters, Publisher JavaLobby). Tips:
- An object is created in the heap and is garbage-collected after there are no more references to it.
- -server or -client must be the first argument to Java.
- [Article describes various garbage collection spaces and garbage collection processes.]
- The perm space stores: Class information, constant strings, Strings created with String.intern(), and reflective objects (classes, methods, etc.).
- If your application is spending too much time in garbage collection, you need to tune the JVM parameters to reduce pause time and frequency.
- Java.lang.OutOfMemoryError can occur due to 3 possible reasons: Out of heap space - simple solution increase -Xmx; Permanent generation too small - simple solution increase XX:MaxPermSize; Out of swap space.
- You can run out of native memory (swap space) if you are making lots of heavy JNI calls, but the JavaHeap objects occupy only a little space; and also if you use to many or too big DirectBuffers (allocation uses the native heap, increase available using -XX:MaxDirectMemorySize)
- The '-verbose:gc' flag helps you see the memory footprint as the application progresses.
- Profiling options include the hprof profiler (e.g. java ?Xrunhprof:heap=sites,cpu=samples,depth=10,thread=y,doe=y); -XX:+HeapDumpOnOutOfMemoryError (analysed with jhat).
- Tuning options include: code changes to reduce objects to solve memory leaks; JVM parameter changes to alter heap sizes; GC parameter changes to adjust how the garbage collection generations work; selecting the garbage collection algorithm.
Back to newsletter 109 contents
Last Updated: 2018-09-22
Copyright © 2000-2018 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us