Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips July 2008
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 092 contents
http://www.ibm.com/developerworks/java/library/j-benchmark1.html
Understand the pitfalls of benchmarking Java code (Page last updated June 2008, Added 2008-07-30, Author Brent Boyer, Publisher IBM). Tips:
- System.currentTimeMillis can have resolution times as bad as 60ms on some systems, and in any case resolution is very system dependent. Don't use it to try to reliably measure short times.
- System.nanoTime returns the number of nanoseconds since some arbitrary offset. It is useful for differential time measurements; Its accuracy and precision should never be worse than System.currentTimeMillis; it can deliver accuracy and precision in the microsecond range on many modern systems.
- ThreadMXBean.getCurrentThreadCpuTime offers the possibility of measuring CPU time used by the current thread. But it may not be available, and may have significant overheads. It should be used carefully.
- On Windows, System.nanoTime involves an OS call that executes in microseconds, so it should not be called more than once every 100 microseconds or so to keep the measurement impact under 1 percent.
- A task's first execution time includes the loading of all classes it uses - involving disk I/O, parsing, and verification, greatly inflating a task's first execution. This needs to be considered and handled for benchmarks.
- JVMs can decide to unload classes that have become garbage - likely not a major performance hit, but it is still less than ideal to have happen in the middle of your benchmark.
- You can check whether or not class loading/unloading is occurring in the middle of your benchmark by calling the getTotalLoadedClassCount and getUnloadedClassCount methods of ClassLoadingMXBean.
- Benchmarking steady-state performance requires something like: load all classes; Execute enough until steady-state execution has emerged (past compilation and other overheads); repeat test several times for a first estimate; calculate the number of task executions needed for a sufficiently large time; Measure the overall execution time that many repeats and obtain average execution time.
- A 10-second warmup phase should suffice for most short benchmarks.
- There is no perfect way to detect JIT compilation occurred - CompilationMXBean.getTotalCompilationTime is one way but it is not always accurate; parsing stdout when the -XX:+PrintCompilation JVM option is used is another.
- The JVM can stop using a compiled method and return to interpreting it for a while before recompiling it (when assumptions made by an optimizing dynamic compiler have become outdated). This needs to be handled in a benchmark, at least by determining if further JIT compilations have occurred.
- The optimization quality of on-stack-replacement (OSR) JITed code is suboptimal (OSR sometimes cannot do loop-hoisting, array-bounds check elimination, or loop unrolling), so artificial OSRs should be avoided in benchmarks.
- Try to recognize on-stack-replacement (OSR) where it can occur and restructure your code to avoid it if possible. Typically this involves putting key inner loops in separate methods. OSR is usually only an issue in benchmarks.
- It's best to try and clean up repeatedly with System.gc and System.runFinalization before tests until memory stabilizes.
- Benchmarks need to take account of garbage collection interruptions. But artificially invoking full GCs during thr benchmark can skew measured times.
- If you want to benchmark random file reads, you likely need to ensure that different files are read to avoid caching.
- Use large representative data sets.
- Hardware power state changes; other programs running concurrently; and many JVM options can all impact performance measurements.
- Some relevant JVM options that can affect benchmarking are: -server, -client, -Xmx, -Xnoclassgc, -XX:+DoEscapeAnalysis, -XX:+UseLargePages, -Xss128k, -Xcomp, -Xint, -Xmixed, -XX:CompileThreshold, -Xbatch, -XX:+TieredCompilation, -XX:+UseBiasedLocking, -XX:+AggressiveOpts, -enableassertions, -enablesystemassertions, -Xcheck:jni, -XX:+UseNUMA.
http://architects.dzone.com/articles/case-study-performance-tuning-
Performance Tuning a Web Shop (Part 1) (Page last updated July 2008, Added 2008-07-30, Author Jeroen Borgers, Publisher dZone). Tips:
- Slow site response can cause a site to lose customers to competitors with faster pages.
- Performance requirements should be stated explicitly; representative load testing needs to occur regularly; performance monitoring should be present in production; and tuning should only take place based on facts.
- The biggest web shop of The Netherlands (Wehkamp.nl), has around 50 000 product types, 1.8 million active customers, 1 million page views an hour (max) and 2 000 orders per hour (max). Growth of on-line selling is around 15% per year.
- [Article describes the architecture of the Java based solution used for Wehkamp.nl online store].
- Minimize the number of database calls by batching updates and writing asynchronous data.
- Ensure database indexes are present; use of prepared statements instead of unprepared statements; use Oracle materialized views; target the most expensive database queries for optimization.
- Combine high latency remote calls wherever possible.
- Find bottlenecks by measuring with tools, not guessing.
- [Article describes using Apache jmeter to load test a web app].
- Integrating load tests into the daily build enables you to quickly see the performance impact of code changes in the test environment - facilitating continuous performance testing.
- It is important to have a representative test database, one that contains the fully sized and up-to-date data, otherwise performance tests can give highly misleading results.
- Subtle differences in configuration (of databses, caches, memory and load) between test and production can cause bad predictions.
- It's important to model queries and processing so that caches are used realistically in tests, otherwise a high rate of cache hits can produce much better performance in tests compared to the production system.
http://www.devx.com/go-parallel/Article/37034
8 Simple Rules for Designing Threaded Applications (Page last updated November 2007, Added 2008-07-30, Author Clay Breshears, Publisher devX). Tips:
- Be sure you identify truly independent computations - to determine which can be effectively run concurrently.
- Implement concurrency at highest level possible - identify the segments of your code that take the most execution time. If you are able to run those code portions in parallel, you will have the best chance at achieving the maximum performance possible.
- Fine-grained parallelism runs the danger of not having enough work assigned to threads to overcome the overhead costs of using threads.
- Plan early for scalability to take advantage of increasing numbers of cores.
- Designing and implementing concurrency by data decomposition methods will be more scalable than functional decompositions.
- Make use of thread-safe libraries wherever possible.
- Use the right threading model.
- Never assume a particular order of execution.
- Use thread-local storage whenever possible; associate locks to specific data, if needed.
- Don?t be afraid to change the algorithm for a better chance of concurrency.
http://www.javaworld.com/javaworld/jw-04-2008/jw-04-realtime.html
Realistically real-time (Page last updated April 2008, Added 2008-07-30, Author Jean-Marie Dautelle, Publisher JavaWorld). Tips:
- -XX:+UseConcMarkSweepGC activates the concurrent mark sweep (CMS) collector (also known as the concurrent low pause collector). This collector attempts to minimize the pauses due to garbage collection by doing most of the garbage collection work concurrently with the application threads.
- -XX:+CMSIncrementalMode allows CMS to run even if you are not reaching the limits of memory. It prevents CMS from kicking in too late and being unable to finish its collection before available memory is exhausted.
- Setting the compilation threshold to one (-XX:CompileThreshold=1) forces the code to be compiled at first execution.
- A major problem with concurrent collection is that it cannot always keep up with the collection if too much garbage is generated too fast, resulting in a "stop the world" collection.
- Smaller young generation sizes makes each minor GC quicker, but also makes them happen more frequently.
- Javolution's concurrent contexts are an efficient way to easily run concurrent code across multiple cores.
- Javolution command: ClassInitializer.initializeAll() will force the initialization of all classes at startup, but this command may take several minutes as it initializes all the classes in your classpath including the runtime library classes.
- Many core Java collection classes have a non-deterministic update time due to resizing which can kick in at unexpected times. The Javolution collections implement the standard collection interfaces and can be used as drop-in replacements of the standard Java utility classes, and have time-deterministic behavior (achieved through incremental capacity increases)
http://jdj.sys-con.com/read/502482.htm
Turbo-Charging Applications with Mid-Tier Distributed Caching (Page last updated February 2008, Added 2008-07-30, Author Tim Middleton, Publisher JDJ). Tips:
- Keeping data in the mid-tier ensures fast access, but poses a number of challenges including: consistent views for all members; retaining transactions when they're not immediately written to back-end data stores; managing cluster membership, data partitioning, and workload distribution of the servers.
- Keeping data cached in object form in a mid-tier data grid minimizes the overhead of translating between the relational world of the back-end data sources and the object form required by the mid-tier.
- For relatively small volumes of read-only or rarely updated data, a brute force "replicate everywhere" topology may work.
- Large amounts of volatile data may require a topology that will dynamically spread the load over the members in the cluster and repartition when new members are added.
http://www.javaspecialists.eu/archive/Issue158.html
Polymorphism Performance Mysteries Explained (Page last updated April 2008, Added 2008-07-30, Author Dr. Heinz M. Kabutz, Publisher javaspecialists). Tips:
- Inlining code should affect how you code Java: stay clear of long methods; small methods result in less duplicate copy and paste; inlining is done for you by the Server HotSpot Compiler, so there is no cost in calling many small methods compared to one big one.
- The penalty for using polymorphism in Java is extremely small, and often non-existant as the compiler eliminates unnecessary branches completely
Jack Shirazi
Back to newsletter 092 contents
Last Updated: 2023-09-28
Copyright © 2000-2023 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips092.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us