Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips November 2012
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 144 contents
http://javaeesupportpatterns.blogspot.com/2012/09/outofmemoryerror-unable-to-create-new.html
OutOfMemoryError: unable to create new native thread - Problem Demystified (Page last updated September 2012, Added 2012-11-28, Author Pierre-Hugues Charbonneau, Publisher Java EE Support Patterns). Tips:
- OutOfMemoryError: unable to create new native thread is related to native memory depletion, either at the JVM process or OS level - the process size is limited by the OS to a certain theoretical size, and current OS usage limits this further to a smaller size. If that size is reached and the process tries to create a new thread, then this error gets thrown. 32-bit processes are limited to 4GB at most, and less than 2GB on some Windows OSs.
- A new Java thread backed by a new native thread (which is the usual relationship) requires additional native memory to be allocated to JVM for the new native thread. The OS will reject this request for new native memory to be allocated if: the JVM process has reached its addressable memory space; the OS doesn't have enough spare unfragmented memory (typically when the OS virtual memory is depleted). If the request is rejected by the OS, the "OutOfMemoryError: unable to create new native thread" error is thrown.
- You need to monitor JVM process size, JVM thread counts, process limitations (such as ulimit) and OS virtual memory utilization (including swap space) so that if you get a "OutOfMemoryError: unable to create new native thread" error you can identify whether the exhausting the system or the process memory limits caused the error, as options for solving the issue are different depending on which limit has been reached.
- A crash with error message similar to "java.lang.OutOfMemoryError: requested 32756 bytes for ChunkPool::allocate. Out of swap space?" is essentially the same as a "OutOfMemoryError: unable to create new native thread" error, only with the crash the resource depletion happened at a point where the JVM was unable to catch the error.
- The most common root cause of a "OutOfMemoryError: unable to create new native thread" is from the application or JEE container attempting to create too many threads at a given time when facing threads stuck in remote IO calls, thread race conditions, etc. This is best solved by tuning the application limit the number of threads and making it tolerant to the underlying conditions that cause the thread explosion.
- To investigate "OutOfMemoryError: unable to create new native thread": monitor both your JVM processes size & OS virtual memory; identify any obvious OS memory (physical & virtual memory) & process capacity (e.g. ulimit) problem; determine if your JVM processes are actually the source of the problem or victim of other processes consuming all the virtual memory; Perform a JVM Thread Dump analysis and determine the source of all the active threads; Determine what is causing your application to create so many threads; Determine if your thread configuration & JVM thread stack sizes allow you to create more threads than your JVM process and/or OS can handle; if using a 32-bit JVM, determine its heap size is too large.
- Solutions to "OutOfMemoryError: unable to create new native thread" include: reducing JVM heap size to give more space to the native side; reducing or limiting the number of threads created by the JVM; increasing OS physical and virtual memory; upgrading your JVM processes to 64-bit.
- You should carry out comprehensive load and performance testing.
http://javarevisited.blogspot.in/2012/10/10-garbage-collection-interview-question-answer.html
10 Garbage Collection Interview Questions and Answers (Page last updated October 2012, Added 2012-11-28, Author Javin Paul, Publisher javarevisited). Tips:
- The (HotSpot) heap is divided into different generation e.g. new generation, old generation and PermGen space. The PermGen space is used to store class metadata.
- Filling PermGen can cause a "java.lang.OutOfMemory:PermGen space" error, which can be fixed by increasing PermGen size using the -XX:MaxPermSize option. PermGen is garbage collected in a similar way to the old gen.
- In garbage collection logs, minor garbage collections typically start with '[GC'
- ParNew and DefNew are two young generation garbage collectors; ParNew is a multi-threaded version while DefNew is single-threaded.
- An object becomes eligible for garbage collection when no other live object (including live threads) references that object. Live objects are determined to be "live" by marking all live objects reachable from the application roots. Circular references do not prevent an object from being garbage collected, as the objects in the reference loop will not be live if no live object refers to any of them (so the objects will not be marked live).
- A non-default finalize() method prevents the garbage collector from reclaiming an object until the finalize() method has been called.
- System.gc() and Runtime.getRuntime().gc() request the system to schedule aGC, but do not guarantee that a GC will run.
- You can monitor garbage collection using tools such as JConsole and VisualVM, and via log files using -XlogGC and -X:verboseGC. Useful options include -XX:PrintGCDetails and -XX:PrintGCTimeStamps.
- A GC log line typically has the format [{Major Tag|Minor Tag} [{GC Type}: {before size}->{after size}({max size}), {time spent} secs], e.g. [GC [ParNew: 1512K->64K(1512K), 0.0635032 secs] ...
http://www.ibm.com/developerworks/websphere/techjournal/0909_blythe/0909_blythe.html
Case study: Tuning WebSphere Application Server V7 and V8 for performance (Page last updated June 2012, Added 2012-11-28, Author David Hare, Christopher Blythe, Publisher IBM). Tips:
- Improving performance can often involve sacrificing a certain level of feature or function in the application - the tradeoff must be considered carefully when evaluating performance tuning changes.
- Factors beyond the application that can impact performance include: hardware and OS configuration, other processes running on the system, performance of back-end database resources, network latency.
- Increasing the JVM heap size permits more objects to be created before an allocation failure occurs and triggers a garbage collection but can lead to an increase in the amount of time needed to find and process objects that should be garbage collected. JVM heap size tuning often involves a balance between the interval between garbage collections and the pause time needed to perform the garbage collection.
- To analyse the GC, turn on verbose GC.
- If the heap free (or heap used) after GC does not reach a steady size and continues to decrease (increase) over time, then it's likely you have a heap memory leak.
- Monitoring the heap cannot detect native memory leaks (e.g. from libraries using the JNI). Native memory leaks need platform specific tools to analyse (e.g. ps, top, perfmon).
- The percentage of time spent stopped in GC is given by the formula ('avg pause time')/('avg time between pauses' + 'avg pause time') and this can be used to compare between different configuration settings to determine improved configs.
- The IBM JVM benefits from setting initial heap size to maximum as it prevents compacting (though initial startup of the JVM will be slightly longer).
- Applications with more than average short-lived objects compared to long-lived objects can improve performance by increasing the young generation size.
- The optimal number of threads per CPU depends on how much time the application spends on non-CPU activities (IO and waiting). A typical starting point is 5 threads per CPU.
- The goal of tuning a connection pool is to ensure that each thread that needs a connection has one, and that requests are not queued up waiting for a connection; while avoiding overloading the backend system with too many concurrent connections across all the systems that connect to it.
- Performance can be improved by increasing the maximum number of persistent requests that are permitted on a single HTTP connection. SSL performance can be significantly improved by enabling unlimited persistent requests per connection (but this reduces defensiveness against denial of service attacks).
- General tuning considerations include: co-locating services on servers (they use the same resources) vs using different servers (separation of resources but higher communication latency; asynchronous (higher throughput) vs synchronous (lower latency, simpler) processing; local in-memory (fastest, not persistent and sharing overheads) vs local filesystem (peristent but not shared across servers) vs remote storage (persistent and shared but higher communication costs); persistent vs non-persistent messages; logging to faster storage (is logging overhead enough of an impact to justify the cost of faster storage?).
http://www.javaworld.com/javaworld/jw-10-2012/121016-maximize-java-nio-and-nio2-for-application-responsiveness.html
Five ways to maximize Java NIO and NIO.2 (Page last updated October 2012, Added 2012-11-28, Author Cameron Laird, Publisher JavaWorld). Tips:
- NIO.2's file change notifier gives you highly efficient notification of changes to the filesystem, without polling. Take care - knowing when a file modification ends is more useful than knowing when it begins.
- NIO selectors support asynchronous multiplexed IO, which can be more efficient than multithreaded IO, though not always (for example it's slightly slower with a higher CPU overhead compared to one blocking thread per client for low thousands of clients).
- For simple sequential reads and writes of small files a straightforward streams implementation might be two or three times faster than the corresponding NIO event-oriented channel-based implementation.
- Non-multiplexed channels - channels in separate threads - can be much slower than channels that register their selectors in a single thread.
- Memory mapping is an OS-level service that makes segments of a file appear for programming purposes like areas of memory. NIO provides access to the OS memory mapping capbility.
- Memory mapped files can have several different readers and writers attached simultaneously to the same file image.
http://mechanical-sympathy.blogspot.co.uk/2012/10/compact-off-heap-structurestuples-in.html
Compact Off-Heap Structures/Tuples In Java (Page last updated October 2012, Added 2012-11-28, Author Martin Thompson, Publisher Mechanical Sympathy). Tips:
- Direct ByteBuffer memory can be allocated that is not tracked by the garbage collector.
- Writing code that is C-like where you do all the structure, access, garbage collection and lifecycle management within a large datastructure (like a direct ByteBuffer or large primtive array) can gain you significant performance, a more compact data representation, and faster serialization and deserialization. At the expense of questioning your sanity when you come to maintain it.
- Going off-heap (e.g. with direct ByteBuffer) is very common in low-latency applications to avoid GC overheads.
https://devcentral.f5.com/weblogs/macvittie/archive/2012/11/14/back-to-basics-the-theory-of-performance-relativity.aspx
Back to Basics: The Theory of (Performance) Relativity (Page last updated November 2012, Added 2012-11-28, Author Lori MacVittie, Publisher DevCentral). Tips:
- A front-end load balancer gives you horizontal scalability. Distributing load ensures availability and lets you use increased capacity to offset any uptrend in latency.
- The industry standard load balancing algorithm is "fastest response time" - this distributes load based on the historical performance of each instance in the pool/farm.
- The "fastest response time" load balancing algorithm doesn't stop response times from going unnacceptably high if all nodes are heavily loaded, so it should be combined with an upper connection limit which prevents too many requests saturating the system and giving everyone unnacceptable response times.
- 100% utilization and consistently well-performing applications do not go hand in hand. The axiom that "as load increases performance decreases" is always true.
- You need to test (with at least three runs) to find that breaking point: stress the application and measure the degradation, noting the number of concurrent connections at which performance starts to degrade into unacceptable territory. That is your connection limit.
Jack Shirazi
Back to newsletter 144 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips144.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us