Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips October 2014
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 167 contents
http://www.takipiblog.com/how-to-instantly-improve-your-java-logging-with-7-logback-tweaks/
How to Instantly Improve Your Java Logging With 7 Logback Tweaks (Page last updated July 2014, Added 2014-10-29, Author Alex Zhitnitsky, Publisher takipiblog). Tips:
- Small changes and configuration tweaks on the logging framework can have a large impact on logging throughput.
- Asynchronous appenders with 0 discarding threshold appears to be the fastest way to log across all appenders.
- Avoiding the logger having to work out its name gives the biggest single benefit to pattern handling (eg using a logger per class instead one single logger), and using only the level with message was the biggest overall combination.
- Prudent mode allows multiple JVMs to to write to the same file safely, but has overhead of ~10%.
- Network and pipe appenders have higher throughput than disk writing ones (not factoring in network overhead).
- Splitting logging to multiple files with SiftingAppender appears to provide a significant benefit.
http://psy-lob-saw.blogspot.co.uk/2014/07/poll-me-maybe.html
Poll me, maybe? (Page last updated July 2014, Added 2014-10-29, Author Nitsan Wakart, Publisher Psychosomatic, Lobotomy, Saw). Tips:
- In designing multithreaded components, you need to consider exactly how the component responds to interaction with other threads. Do you give hard or soft guarantees - for example the view of the size of a multithreaded collection depends on whether you want to be quick and just return the local thread's current impression of the collection size, or do you want to be thorough but slow and give the completely accurate size across all threads (which means suspending element additions and removals until the sizing operation completes! This is a design decision (not implementation), and has definite performance consenqences. It may be worth providing multiple methods, some with hard guarantees and others with soft guarantees so the developer can choose what is best for them.
- In implementing multithreaded components for the agreed interface, you need consider how much locking, waiting and spinning to apply for the particular contract, there are multiple options and your choices will produce different performance profiles in different situations (there's unlikely to be one best implementation for all configurations of producers and consumers).
- A Queue.poll() call can return empty in some implementations even when the queue isn't empty if other threads are adding to the queue if the poll method hasn't got a hard guarantee to see all current threads additions.
- Reading across threads introduces the possibilty of caches misses which has an unknown and potentially very large cost.
https://www.packtpub.com/application-development/java-ee-7-performance-tuning-and-optimization
Chapter 1 of Java EE 7 Performance Tuning and Optimization (Page last updated June 2014, Added 2014-10-29, Author Osama Oransa, Publisher Packt Publishing). Tips:
- If you failed to improve the performance of an bottleneck or to discover the root cause of some performance issue, you have learnt that techniques you used were insufficent.
- Differentiate between a consistent slow response and sudden, gradual, or intermittent changes response times, these will have different causes. Design issues are the most common reason behind consistent slow behaviour; these would normally be found in performance tests prior to production if the tests are adequate. Sudden deteriorations are more commonly a specific isolated issue fixable with a small patch.
- The design stage is the cheapest to fix performance issues, and typically you need SLAs to identify them (via proof of concept tests). SLAs should be realistic!
- Make sure you test with resources equal (or simulated equals) to the expected production environment; database data large and varied enough; external dependencies with realistic latencies; the number and variety of simulated users.
- The main reasons for bad performance in production are: inadequate performance testing; inadequate scaled testing; insufficient capacity planning; non-scalable architecture; insufficient database maintenance (archiving, indexing); lack of optimizations applied to the configurations of the OS, application server, database and JVM; changes applied with insuffcient testing; failure to monitor the application.
- Proactive performance management means: having a working process in place for performance tuning; having clear performance SLAs; good capacity planning; considering performance during design; following best coding practices; automated and manual code reviews; following best logging practices; having a dedicated performance environment (similar to the production environment); good quality performance tests; a trained team to handle performance issues; tools required for monitoring and analysing performance; continuous monitoring of different application layers.
- A common mistake is to focus on a single layer (e.g. code) and neglect other layers; avoid this. Monitor and analyse all layers.
- Make sure you have baselines from your tools (when performance is adequate) so you can compare against the baselines to identify and analyse issues.
http://www.informit.com/articles/article.aspx?p=2233979
The Pitfalls of Parallelism (Page last updated July 2014, Added 2014-10-29, Author David Chisnall, Publisher InformIT). Tips:
- Any time you force two processors to have the same view on memory, they will spend a lot of time sending messages and waiting for the results. Without contention (on x86) an atomic add instruction is only three or so times slower than the non-atomic version. But when another thread has the cache line, it can be more than 300 times slower.
- Lockless structures can sacrifice immediacy of memory for throughput - allowing stale values for a period.
- A typical cache line size for a modern processor is 64 bytes. If your data crosses cache lines, it takes longer to read. If different data (eg counters) updated by two different threads are on the same cache line, then the cache line keeps getting invalidated as the two threads contend for access to it - this latter issue can be eliinated by padding.
- Performance discontinuities occur whenever data no longer fits in a cache (L1, L2, L3, TLB, main memory) - just as when main memory (RAM) is full and the application has to use disk-based virtual memory.
- The OS thread scheduling can be wrong for your application, for example where a thread has to keep waiting on other threads because of cache conflicts. You may benefit from dedicating cores to specific threads.
http://www.infoq.com/presentations/g1-gc-logs
Are Your G1GC Logs Speaking to You? (Page last updated August 2014, Added 2014-10-29, Author Kirk Pepperdine, Publisher InfoQ). Tips:
- Do log GC to files - but log them to local files, not network attached storage!
- Use -verbose:gc -Xloggc:log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1g -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
- The current 'regional' type garbage collectors are: G1GC, IBM Balance, Azul C4, Redhat Shenandoah.
- G1GC breaks the heap into approximately 2000 regions; each region can be eden, survivor, old, humungous (seveal regions combined to hold huge objects), or unused; uses predictie algorithms to estimate how long to collect a region and so combines regions into a collection to try and reach the pause time goal.
- The only things to normally set for G1GC are the max heap size and the pause time goal, it is mostly self tuning. But if you get frequent humungous allocations causing full GCs, you may need to set the region size larger.
- A concurrent-mark-reset-for-overflow indicates the max heap is probably too small.
- InitiatingHeapOccupancyPercent (default 45%) is when the G1GC collection starts.
- Survivor sizes are very important when tuning the CMS GC, but seem to be unimportant for G1GC.
- G1GC flags to consider (not) using (with defaults): -XX:+UseG1GC, -mn, -mx, -XX:MaxGCPauseMillis=200, -XX:GCPauseIntervalMillis=1000, -XX:InitiatingHeapOccupancyPercent=45, -XX:NewRatio=2, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=15, -XX:ParallelGCThreads=n, -XX:G1ReservePercent=n, -XX:G1HeapRegionSize=n
http://blog.c2b2.co.uk/2013/07/glassfish-4-performance-tuning.html
Glassfish 4 - Performance Tuning, Monitoring and Troubleshooting (Page last updated July 2013, Added 2014-10-29, Author Andy Overton, Publisher C2B2). Tips:
- A simple but effective JVM change is to use the -server flag, rather than the default -client; the server VM has been tuned to maximise peak operating speed at the cost of startup time and memory footprint.
- Set the maximum memory with the -Xmx flag; additionally consider setting initial memmory -Xms to the same value (but test that this doesn't adversely affect garbage collection).
- Ensure you are logging garbage collection statistics: -verbose:gc -Xloggc:/path_to_log_file -XX:+PrintGCDetails -XX:+PrintGCDateStamps
- Stop explicit code level calls to the GC with -XX:+DisableExplicitGC
- Get heap dumps on out-of-memory errors with -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dumps/mydump.hprof
- Set the number of threads handling incoming connections to the number of available cores. Set the number of worker threads processing requests to a value that keeps the machine busy but not saturated so that other threads get access to the CPU too (too high a thread count and the competition for CPU resource will badly affect overall throughput).
- Monitor used heap sizes, the number of loaded classes, thread counts, and thread pool counts.
- Useful JVM monitoring tools include jstat, jmap, jconsole, jvisualvm (but these tools should not be on permanently, only for analysis).
- Useful JVM troubleshooting tools include jps -v, jstack, jmpa, jrcmd and jrmc, jRockit Mission Control, JVisualVM GCViewer, jhat
Jack Shirazi
Back to newsletter 167 contents
Last Updated: 2024-09-29
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips167.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us