Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips November 2022
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 264 contents
https://www.infoq.com/articles/java-virtual-threads/
Virtual Threads: New Foundations for High-Scale Java Applications (Page last updated September 2022, Added 2022-11-29, Author Brian Goetz, Daniel Bryant, Publisher InfoQ). Tips:
- Shared-state concurrency -- often referred to as "programming with threads and locks" -- can be difficult. Writing safe, performant concurrent code that manages shared mutable state requires understanding subtle concepts like memory visibility, and a great deal of discipline.
- Platform thread creation is relatively expensive and resource-heavy. Their resources can be tuned but If stacks are overprovisioned, they will use even more memory; if they are underprovisioned, they risk StackOverflowException. Concurrent thread count is limited by platform memory, and to reduce risk threads are typically overprovisioned, limiting the count even more. Virtual threads use vastly fewer resources so can scale to millions.
- A thread-per-task approach aligns the applications unit of concurrency (the task) with the platform's (the thread) in which maximizes ease of development, debugging, and maintenance, and especially an all-important illusion of sequentiality. But the limitation of platform thread scaling means lots of effort goes in to decoupling the tasks to threads, as a thread-per-task approach fails to scale for platform threads. However for virtual threads, this approach does scale.
- Before virtual threads were available, developers who wanted to service large volumes of concurrent requests had several bad choices: constrain how code is written so it can use substantially smaller stack sizes, throw more hardware at the problem, or switch to an "async" or "reactive" style of programming which requires programming in a highly constrained style that loses many of the benefits that threads provide such as readable stack traces, debugging, and observability.
- virtual threads are so lightweight you can have many more inactive threads than with platform threads - this may not sound like a big benefit but "lots of inactive threads" (usually blocked on IO) actually describes the majority of server applications, so switching to virtual threads can make these much more efficient and scale better.
- Pooling platform threads to place a bound on resource utilization (because its easy to run out of memory otherwise), and to amortize the cost of thread startup over multiple requests, is a good idea. On the other hand, creating virtual threads is so cheap that it is actively a bad idea to pool them!
- Virtual threads are so lightweight that it is correct to create a virtual thread even for short-lived tasks, and wrong to try to reuse or recycle them.
- Avoid ThreadLocals with virtual threads - you could be creating hundreds of thousands of local objects, the exact opposite of what is usually intended! If there are expensive objects which need to be created sparsely, create and use explicit pools of them rather than store them in ThreadLocals.
https://medium.com/@interviewready/what-is-distributed-rate-limiting-e72a6e5b3f67
What is Distributed Rate Limiting? (Page last updated June 2022, Added 2022-11-29, Author Avash Mitra, Publisher InterviewReady). Tips:
- Overloaded services can cause degrading response times, memory exceptions, crashes, and cause cascading failures. Rate limiting prevents these issues.
- Improved efficiency and reliability can be obtained by: vertical scaling, efficient messaging (compression, multiplexing, reused long-lived connections, pushing vs pulling according to the optimal profile for a request) and graceful degradation.
- Sliding Window Rate Limiting: allow a certain number of requests in a time interval. Very simple to implement, but has memory overhead and repeated checking to eliminate dead requests from the queue.
- Timer Wheel Rate Limiting: requests have a timeout after which they are dropped. The timeout, say N-seconds, specifies the number of buckets requests can be put into, with each bucket allowing a limited number of requests. Incoming requests use the time of their request to be allocated to a bucket (eg index = current time % timeout). When the index (current time) increments, all requests in the corresponding bucket are dropped (they have timed out).
- Identify that a service cannot handle any more requests by monitoring for : increasing service response times, how long requests are queued for, and the count of dropped requests.
- You need a strategy for dealing with bad requests that take too much resource to process and block other requests in the queue from proceeding. One strategy is to increase the number of queues, partition requests across them and process these in parallel.
- Request collapsing: combine multiple queued requests for the same data into one request process.
https://dzone.com/articles/java-and-low-latency
Java and Low Latency (Page last updated September 2022, Added 2022-11-29, Author George Ball, Publisher DZone). Tips:
- Java applications can achieve the level of performance of any other language, but they still have to rely on the Operating System (OS) to provide access to the underlying hardware - and the OS may limit performance unless it is tuned and appropriately used.
- The JVM is an extremely sophisticated execution platform that can generate machine code at runtime from Java bytecode, while optimising that code based on dynamically gathered metrics - something statically compiled runtimes cannot do.
- The most obvious aspect of the Java runtime that prevents consistent latency times is garbage collection - careful approaches in choices of data structures and algorithms can minimise, and even eliminate the need for garbage collection.
- The main issues that affect latency in Java are connected to garbage collections and synchronisation of access to shared resources using locks. Specialist techniques can minimize or eliminate these, but these techniques are not "natural" for normal Java coding styles.
- One approach for low latency Java applications is to bypass the garbage collector by using off-heap memory, memory mapped to persistent storage using operating system mechanisms. This eliminates garbage collection for these objects, but requires explicit management of the lifetime of these objects.
- Serialising and deserialising objects is often a source of object allocation and garbage. To minimize this overhead, you can choose libraries that have been carefully engineered to minimize the creation of new Java objects.
- Concurrent access to shared mutable data is easily done using mutual exclusion locks. However this causes blocking when there is contention, and you can avoid this blocking by using data structures that allow safe, lock-free concurrent access.
- Modern Unix and Linux systems allow regions of memory to be marked so that they are never paged out (pinning the memory in RAM) - this is important to achieve lowest latency otherwise memory segments of the application (and off-heap memory) can be paged out and the latency of paging them back in when required adds several orders of magnitude in latency to the operations that require that memory.
- Traditional scheduling policies in Unix/Linux are designed to favour interactive threads over CPU-bound threads. This impacts latency-sensitive applications, which would prefer that want certain threads take priority over other non-latency-sensitive threads. Modern Unix/Linux systems offer alternative scheduling policies that can provide these capabilities, by allowing thread scheduling priorities to be fixed at high levels so they will always take over CPU resources from other threads when they are Runnable, so they can respond to events more quickly.
- In modern Unix/Linux systems it is possible to change which CPUs are used by the system scheduler. You can remove CPUs altogether from those available to the scheduler and utilise these exclusively for your specialised threads or groups of threads.
Jack Shirazi
Back to newsletter 264 contents
Last Updated: 2024-08-26
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips264.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us