Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips December 2012
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 145 contents
Tuesday at Java One 2012 (Page last updated October 2012, Added 2012-12-26, Author Steve Millidge, Publisher C2B2). Tips:
- Methodology for performance tuning is: monitor the OS; monitor the JVM; Profile the application; Tune the JVM; Tune the code.
- -XX:+TieredCompilation is not on by default - turning it on for the large majority of applications will see a performance benefit but in large applications (100,000's classes) it puts pressure on the Reserved Code Cache so you may need to increase that.
- Use -XX++AlwaysPreTouch if you see paging hits with large heaps - it touches the heap memory so you won't take a hit on faulting pages in after the heap has been reserved but not yet (or recently) used.
- If you have many interned strings, you may benefit from setting -XX:StringTableSize=n (default is 1009). -XX:+PrintStringTableStatistics will give you diagnostics.
- Use -XX:+PrintCompilationOutput to detect if you've hit the code cache limit - if you do, compilation stops. Expand it with -XX:+ReservedCodeCacheSize=n (default size is 64mb or 96Mb when running tiered compilation).
- If there are multiple JVMs on one box, consider reducing the number of ParallelGCThreads so that the total of all GC threads is not more than the number of hardware threads on the host.
- set java.io.tmpdir to the operating system temp directory (especially if you do a lot of bytecode manipulation).
- Tne Number one GC goal is to collect as many objects in the young generation as possible. Sizing the young gen to achieve this is the first step, then size the old gen to hold the worksing set. Size survivor spaces for medium lived objects, but don't make them too big or you are wasting space.
- There is a tradeoff between incremental and non-incremental collectors: Incremental collectors have more uniform pauses but throughput suffers.
- -XX:+PrintGCDetails, -XX:PrintAdaptiveSizePolicy and -XX:+PrintTenuringDistribution will help you view and tune the JVM ergonomics.
- G1 is targeted at throughput - don't set aggressive pause time targets as this will increase the GC overhead if you have asked for low pause times (Max pause time target defaults to 200ms) so concurrent collection will use more of your cpu.
- If you are seeing too long pause times reduce your young gen size.
- If pauses are coming too frequently increase the young gen size.
- Promotion of objects in CMS is expensive, the more objects promoted and subsequently collected the more fragmentation will occur and fragmentation should be avoided.
- CMS occupancy flags often need tuning - the JVM tries to optimise these but you need to make sure the CMS collector starts in sufficient time not to run out of heap, if it starts too late, it will fail over to a default long stop-the-world.
- When tuning G1 for low latency use -XX:MaxGCPauseMillis. Monitor MixedGCs to make sure they aren't killing your pause time goals. You can tune using the HeapOccupancy factor flags.
- GC tradeoffs are Throughput, Footprint and Low Latency - you typically have to sacrifice one for the other.
- Cassandra uses CMS in preference to G1 - they allocate too fast and G1 falls back to stop the world worse than CMS. To avoid compaction they eliminated large allocations so the free list can find holes to allocate the objects. (They break up arrays into arrays of arrays with offsets). They also moved the Row Cache off the heap into nio off-heap storage - it takes a Serialisation hit but saves a lot on JVM memory.
How to Monitor Java Garbage Collection (Page last updated March 2012, Added 2012-12-26, Author Sangmin Lee, Publisher Cubrid). Tips:
- Methods to obtain Garbage Collection information include: jstat (command line utility); -verbosegc (command line flag); jconsole (GUI); jvisualvm (GUI); VisualGC (GUI).
- Run jstat with the command "jstat ?gc PROCESSID 1000" to see stats every second, where PROCESSID is obtained from jps.
- jstat column headers are consistently named. The letters "GC" in the header means it refers to GC events; The first part of the header name refers to the heap area: S0/S1 (Survivor Spaces), E (Eden), O (Old), P (Perm), Y (Young), F (Full), N (New); The last letters refer to the units being measured (none of these means it's a count): C (current configured size in KB), U (current used size in KB), T (accumulated GC time in seconds), MN (minimum config size in KB), MX (maximum config size in KB). LGCC and GCC are the last and current causes of the GC; MTT and TT are the (max) tenuring threshold; DSS is the adequate size of survivor in KB.
- Options to use with -verbosegc include -XX:+PrintGCDetails, -XX:+PrintGCTimeStamps, -XX:+PrintHeapAtGC, -XX:+PrintGCDateStamps. Output is multliply nested entries of the form "[collector: startingoccupancy->endingoccupancy, pausetime secs]"
- HPJMeter is a good GUI application for analyzing -verbosegc output results; The Visual GC in VisualVM is a good GUI application for analyzing jstatd output.
Thinking Methodically about Performance (Page last updated December 2012, Added 2012-12-26, Author Brendan Gregg, Publisher acmqueue). Tips:
- Unacceptable latency for one user may be acceptable for another - to clearly identify issues you
need to measure performance and have targets that show you how much of a problem and issue might be.
- The problem statement methodology for identifying performance issues asks: What makes you think there is a performance problem; Has this system ever performed well; What has changed recently (Software? Hardware? Load?); Can the performance degradation be expressed in terms of latency or runtime; Does the problem affect other people or applications (or is it just you); What is the environment? What software and hardware are used? Versions? Configuration?
- The workload characterization methodology for identifying performance issues helps to separate problems of load from problems of architecture, asking: Who is causing the load - Process ID, user ID, remote IP address; Why is the load being called, what code path; What are other characteristics of the load - IO operations per sec, throughput, type;
How is the load changing over time?
- The best performance wins can arise from eliminating unnecessary work, e.g. a thread stuck in a loop or system-wide backups running during the day.
- The drill-down analysis methodology for identifying performance issues proceeds by: continually monitoring and recording high-level statistics over time across systems; Given a subsystem with a suspected problem, narrow the investigation to particular resources or areas of interest using monitoring tools and identifying possible bottlenecks; identify the root cause(s) by drilling down further and further, quantifying the issue.
- The USE (utilization, saturation, and errors) methodology for identifying performance issues is for early use in a performance investigation to identify systemic bottlenecks quickly. It can be summarized as: For every resource, check utilization, saturation, and errors. Utilization is the percentage of time that the resource is busy servicing work during a specific time interval; Saturation is the degree to which a fully utilized resource has extra work pending; errors include retries as well as failures (retries often indicate a performance issue).
- CPU utilization can vary dramatically from second to second; a five-minute average can disguise short periods of 100-percent utilization and, therefore, saturation.
- A generic list of server hardware resources are: CPU (Sockets, cores, hardware threads/virtual CPUs); Main memory (DRAM); Network interfaces (Ethernet ports); Storage devices (Disks); Controllers (Storage, network); Interconnects (CPU, memory, I/O).
- Resources to monitor include: CPU utilization per core; CPU run-queue length; CPU faults; Avalable free memory; paging/swapping; failed mallocs; network IO utilization; NIC saturation; storage controller utilization; storage device utilization; storage device wait queue length; storage device error count; lock utilization (how long the locks were held); lock saturation (the number of threads waiting to acquire the lock); thread pool utilization (busy vs idle threads); thread pool saturation (waiting requests count); process/thread errors (e.g. can't fork); file descriptor errors (e.g. can't allocate/create).
- When looking for resource bottlenecks you can rule out resources which show low utilization (e.g. less than 60%), no saturation (no queues or wait times) and no errors.
- When using a problem identification methodology which iterates over resources or subsystems, a problem identified may not be the one causing the current issue, so it is important to iterate over all resources/subsystems to get a complete list of problems before using other methodologies to identify which is associated with the current performance issue.
Six Ways You're Using Responsive Design Wrong (Page last updated December 2012, Added 2012-12-26, Author Matthew Carver, Publisher java.net). Tips:
- Take a "mobile-first" approach to developing a site and UI, e.g. standard pixel widths and font sizes won't do anymore. You need something more adaptive. start with a small-screen design and start expanding it in the browser, then adjust as needed.
- Layouts should automatically adapt to variations in screen size technology. If a new web-enabled product hits the market with an uncommon screen size, then you are already prepared for it.
- By optimizing for mobile first, you prioritize load times from the beginning of development. Faster sites are always better.
- Minify and compress CSS files. Minimize the number og plugins added to your download.
- As well as letting your window scale to the user's device/browser, allow the user to select font and sizes, and serve images appropriate to the device size (both image size and density, e.g. increased pixel density and a smaller size image for a high quality small display mobile).
- Reduce additional buttons needed to render the page - or use asyncronous loading.
- Apple's design standards recommend a minimum 44 px by 44 px target for anything intended to be tapped - bear that in mind for buttons, etc.
Saving the Failwhale: The Art of Concurrency (Page last updated December 2012, Added 2012-12-26, Author Dhanji R. Prasanna, Publisher informit). Tips:
- Contention is unavoidable - some resources are just slower, and you must wait for them. The secrets to good concurrency are 1) ensuring that these slower resources are rarely used, and 2) during such waiting periods, giving the faster tiers other work to do so that they continue to be utilized well.
- Overuse of synchronization constructs such as locks and mutexes leads to systems that perform poorly under load.
- ConcurrentHashMap is an efficient thread-safe map while HashMap is not thread-safe.
- ConcurrentHashMap doesn't do away with locks, it still uses them but it uses more than the single global lock, so that threads gain some measure of concurrency. It uses separate locks for partitions, so that multiple threads writing to the map are likely to access different partitions, using separate locks and therefore process their data simultaneously. This technique is known as lock-striping. Efficient striping uses a number of locks proportional to the number of CPU cores in a system.
- The asynchronous processing model smooths resource spikes by adding requests to a queue which is serviced by a pool of workers - spikes in requests make the queue grow rather than overloading the workers. (The ExecutorService is essentially a thread pool accompanied by a task queue.)
Java Heap Dump: Are you up to the task? (Page last updated November 2012, Added 2012-12-26, Author Pierre-Hugues Charbonneau, Publisher Java EE Support Patterns). Tips:
- Heap dump analysis is useful when troubleshooting Java heap memory leaks and java.lang.OutOfMemoryError problems. A JVM heap dump is a snapshot of the Java heap memory at a given time.
- It is usually better to generate a heap dump after a full GC in order to eliminate unnecessary noise from objects no longer referenced by the application but still present in the heap as they haven't yet been garbage collected.
- Load testing, profiling your application and analyzing Java heap dumps enable you to achieve performance goals and effectively analyse any problems.
- JVM heap dump dumping is an intensive computing task which will hang your JVM until completed. Ensure that the hang is not going to impact the current system before deciding to proceed.
- HotSpot JVMs produce hprof format heapdumps; IBM ones produce phd format dumps. (IBM needs the following environment variables set: export IBM_HEAPDUMP=true export IBM_HEAP_DUMP=true or the -Xdump:heap flag).
- Ways of dumping the heap include: including the -XX:+HeapDumpOnOutOfMemoryError flag, which will produce a heap dump on the first out of memory error; including the -XX:+HeapDumpOnCtrlBreak flag, which will produce a heap dump on typing cntrl-break in the command window; using jmap to trigger a dump; using jconsole to trigger a dump; using visualvm to trigger a dump; a kill -3 against an IBM JVM;
- The best tool for analysing a heap dump is the eclipse memory analyser (eclipse MAT).
Back to newsletter 145 contents
Last Updated: 2020-08-28
Copyright © 2000-2020 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us