Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips June 2012
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 139 contents
Garbage Collectors Available In JDK 1.7.0_04 (Page last updated June 2012, Added 2012-06-27, Author Jack Shirazi, Publisher fasterj.com). Tips:
- There are seven primary garbage collectors in the Oracle JVM: Copy (young), PS Scavenge (young), ParNew (young), MarkSweepCompact (old), PS MarkSweep (old), ConcurrentMarkSweep (old), G1 (young & old).
- All of the Oracle JVM garbage collection algorithms except ConcurrentMarkSweep are stop-the-world - the stop is known as 'pause' time. The ConcurrentMarkSweep tries to do most of it's work in the background and minimize the pause time, but it also has a stop-the-world phase and can fail into the MarkSweepCompact which is fully stop-the-world.
- The parallel scavenge (PS Scavenge & PS MarkSweep) pair of collectors are effectively two different algorithms depending on whether you turn on or off the Adaptive Size Policy.
- The "train" (incremental) garbage collector is no longer available - it was in various JVMs as -Xincgc and -XX:+UseTrainGC and these flags are no longer useful.
- There are seven primary combinations of garbage collection algorithms for the Oracle JVM: -XX:+UseSerialGC (Copy+MarkSweepCompact); -XX:+UseG1GC (G1); -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+UseAdaptiveSizePolicy (PS Scavenge+PS MarkSweep+adaptive sizing); -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:-UseAdaptiveSizePolicy (PS Scavenge+PS MarkSweep+no adaptive sizing); -XX:+UseParNewGC (ParNew+MarkSweepCompact); -XX:+UseConcMarkSweepGC -XX:+UseParNewGC (ParNew+ConcurrentMarkSweep); -XX:+UseConcMarkSweepGC -XX:-UseParNewGC (Copy+ConcurrentMarkSweep).
Big List of 20 Common Bottlenecks (Page last updated May 2012, Added 2012-06-27, Author Todd Hoff, Publisher highscalability). Tips:
- Common memory bottleneck issues include: exceeding hardware memory and overflowing to disk; subsystems which are not memory limited using up too much memory; memory fragmentation; cache buffers filling up; syncing to disk; hitting open handle limits (sockets, files, pipes, share memory, ...); not caching; out-of-memory issues; disk thrashing from too much memory; virtual machine garbage collections from badly configured or over stressed memory; C & equiv mallocs taking a long time from memory and table fragmentation and management overheads
- Common transactional bottleneck issues include: Mixing long and short queries; too many write-write conflicts; suboptimal isolation levels.
- Common disk I/O bottleneck issues include: "seek death" where too many different concurrent processes are trying to use the disk causing it to seek all over the place; random I/O (causing too many seeks); disk fragmentation; disk performance dropping dramatically when certain limits are exceeded (acceptable operational range capacity limits); using remote disks inappropriately (high volume/chatty/low latency); disk thrashing from too much memory; disk subsystems partly failing (e.g. battery low on the controller stops using onboard cache).
- Common network I/O bottleneck issues include: network capacity fluctuations from other (uncontrollable) services using the same bandwidth; TCP buffers too small; NIC maxxed out; DNS lookup delays; dropped packets; bad routing; shared remote disk traffic; server failure.
- Common code-level and configuration causing bottleneck issues include: deadlocks; race conditions; debug code incorrectly on; event-driven complexity; storing state incorrectly; not caching; not compressing; writing code that stops the hardware optimisations, e.g. doesn't keep hot data in L1/L2 caches; memory leaks; badly configured object relational mapping; not parallelising code and not using asynchronous code.
- Common architecture & design bottleneck issues include: sublinear scalability; a single subsystem or dependent external that is a bottleneck (e.g. DNS lookups) ; single points of failure; not scalabiling horizontally; statefulness; algorithms that are too complex; too high throughput; badly performing data models.
- Common operations & testing bottleneck issues include: not profiling enough, not monitoring; not logging; logging too much; exceeding system power capabilities; not caching; not compressing; CPU overload; too many context switches; too much I/O waiting; logging to the wrong locations (e.g. database logs should be written to different location than the data).
Voldemort on Solid State Drives (Page last updated May 2012, Added 2012-06-27, Author Vinoth Chandar, Lei Gao, Cuong Tran, Publisher linkedin). Tips:
- If JVM pages are not locked into memory, Linux will page out pages to swap. Whenever promotion occurs during a young generation garbage collection, some pages will have to be mapped back into memory - this can result in multi-second minor GCs. To avoid this, mlock the server heap in memory.
- Using -XX:+AlwaysPreTouch will cause the JVM to touch every page on initialization, thus getting all the page in memory before entering main() (this will increase your startup time). The AlwaysPreTouch flag is useful in testing when simulating a long running system which has all the virtual memory already mapped into physical memory.
- Cleaning a large set of objects from a large cache can produce enough garbage to overwhelm the concurrent garbage collector.
- Previously IO bound applications when moved to SSDs can change bottleneck to garbage collection.
- Fast SSDs cause memory fragmentation to be created at a much faster rate, subsequently increasing the cost of defragmentation.
- Performance tests really need to simulate real-world dataflows, or they will miss bottlenecks that will occur in production.
- If processing a set of objects prior to dereferencing them, it's much better to process and dereference on the fly rather than collecting them all in a large collection then processing them then dereferencing the collection, as the collection and objects will be promoted to the old generation and have a much higher garbage collection overhead, whereas dereferencing them on the fly can all be done in the young generation.
Java High CPU troubleshooting guide (Page last updated May 2012, Added 2012-06-27, Author Pierre-Hugues Charbonneau, Publisher Java EE Support Patterns). Tips:
- Gather information before restarting an application with a performance issue or you will not be able to determine the cause.
- Determine the baseline normal performance of the application so you can quickly identify what aspect is abnormal when there is a performance issue.
- Try to gather at least the following information: Physical & virtual host configuration and capacity (cores, RAM, etc); OS, JVM and application components version details; monitoring tools available; history of the environment and known issues; Business transaction and dataflow per application along with average & peak.
- Use top or prstat or ps with thread breakdowns on Unix (prstat -L -p ... on Solaris, top -H on Linux, ps -mp on AIX) and process explorer or perfmon on Windows to identify thread-level CPU usage.
- Generate thread dumps and correlate the thread CPU stats with the thread dumps to find what methods are creating the CPU load.
- Heavy or excessive garbage collection will often show up as high CPU - the threads identified in high CPU will be the GC Threads.
- Excessive IO / disk activity can cause high CPU usage, though typically together with high disk activity too.
Multithreading Problems In Game Design (Page last updated May 2012, Added 2012-06-27, Author Erik McClure, Publisher blackhole12). Tips:
- In games, having a dedicated thread per major piece of functionality (e.g. graphics. physics, etc) means that most functionality will lag the scene rendering by at least one frame, whcih is not optimal.
- Using dedicated thread per major piece of functionalitydoes not scale to more cores - you only use one core for any function currently in process, typically one or two cores no matter how many are available.
- The author recommends parallelizing each component rather than major piece of functionality, and using a thread pool to distribute the component processing across as many cores are available to the application.
- Networking, and possibly audio, subcomponents do lend themselves to having dedicated threads for processing.
Fitting Performance into the Software Development Lifecycle (Page last updated June 2012, Added 2012-06-27, Author Sasha Goldshtein, Publisher JavaLobby). Tips:
- Ensure you have application performance goals and define the important metrics to monitor, factoring in maintenance overheads, user loads, and requirement changes.
- Performance targets definitions should be part of the requirements gathering phase.
- The architecture phase should refine performance targets and specify metrics to monitor.
- During development you should frequently performance test critical code and near-to-completion components for compliance to performance targets.
- During testing and subsequently after deployment with any changes you should repeatedly load test and performance test the full system to determine if performance targets have been achieved or need work to meet.
- Set up an automated performance regression test suite and environment which alerts on any failures to achieve system-wide performance targets.
The Changing World of Application Performance Management - And What It Means for Testing (Page last updated May 2012, Added 2012-06-27, Author Andreas Grabner, Publisher softwaretestpro). Tips:
- Conversion rates increase 74% when page load times decrease from eight to two seconds.
- The average web shopper expects pages to load in two seconds or less.
- Up to 40% of browsing shoppers abandon sites after three seconds.
- Performance testing should be done from the perspective of the end-user.
- The slower your website or web application, the fewer pages your visitors will view - a third fewer pages for the slowest 10% compared to the fastest 10% according to an AOL study.
- Even if your tools inside the firewall indicate that everything is running OK, that?s no guarantee your end users are happy because there are many things that can impact performance outside the firewall, including externally embedded content and network paths and congestion.
- Use real-user monitoring by embedding latency monitoring in the client and gathering the statistics.
- Use synthetic end-user monitoring by simulating users using agents dispersed to locations similar to end-users and simulating user behaviour to get valid performance test metrics.
- Ensure that you have sufficient coordinated monitoring in place within the application such that you can track slow end-user requests to precise performance issues withiin the application.
Back to newsletter 139 contents
Last Updated: 2018-02-27
Copyright © 2000-2018 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us