Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips October 2015
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 179 contents
Lessons learned writing highly available code (Page last updated October 2015, Added 2015-10-28, Author Jacob Greenleaf, Publisher Imgur Engineering). Tips:
- Put limits on everything: bound queues; timeout remote calls; terminate idle connections; kill long (db) queries; etc.
- Retry, with exponential back-offs (to avoid DOS on your own systems).
- Use supervisor and watchdog processes. If the supervisor detects that the task has unexpectedly quit, restarts it from a known good state.
- Add health checks, using them to re-route requests or automate rollbacks.
- Redundancy is a requirement.
- Mature tools are more robust.
Building Scalable Stateful Services (Page last updated September 2015, Added 2015-10-28, Author Caitie McCaffrey, Publisher Strange Loop). Tips:
- Stateless services scale horizontally very well.
- Load balancing stateless services with a stateful backend means inefficiently copying data into multiple service instances. Sticky sessions reduces the inefficiency.
- For stateful services in a service instance, cache data and have the client talk to the same instance (use sticky sessions).
- Sticky sessions can be implemented using a single persisted connection; but this is not load-balanced as the connections are created without knowing the load of that connection - so these need back pressure to make the client reconnect on an overloaded server.
- Sticky sessions can be implemented using routing logic in the cluster; that routes the request to the server mandated for that client. Static routing is simple but not very fault tolerant; dynamic routing is better - use gossip protocols (but this can lead to work routed to different nodes for short periods of uncertainty) or consensus systems (but if nodes are unavailable, work cannot be routed or will get queued, so this is not a good choice for higly available systems).
- Random placement of data (write anywhere in the cluster, but have to read from everywhere because you don't know where the data is) - good for in-memory indexes and caches (eg Scuba fast scalable distributed in-memory dtabase at Facebook - if nodes are unavailable, the result includes the "uncertainty" of the result based on how many nodes failed to return data).
- Consistent hashing (deterministic placement of data/requests, depends on eg the session ID) with the nodes mapped to a hash bucket space. But you can have hotspots becuase the load is not balanced according to load, it's according to hashing (so need extra capacity to handle that). Also re-sizing the cluster is really painful (eg Uber Ringpop nodejs library, directs the request always to the same machine, so that the data for a particular journey is fully available, reduces latency and db load; uses gossip to ensure that some machine is available).
- Distributed hash tables mapping (requests are mapped into a distributed hash table which then maps the node to send the rquest on to) works quite well (eg Orleans from Microsoft based on an actor model with state machines, request goes to orleans on any machine, orleans uses consistent hashing to find where the where hash table is for the actor for that request, and forwards the request to that node. Orleans on that node uses the hash table to find where the actor it needs to send to is, and forwards the request to it. The consistent hashing is deterministic and is not associated to the workload; instead the distributed hash table can be updated when the actor is migrated from a overloaded/failed/terminated machine/actor - this allowed twitter to use a 90% utilised cluster!).
- Unbounded queues kill distributed systems; ensure all queues/inputs are bounded.
- Be prepared to tune the memory use and garbage collector (even more important in stateful services).
- Be very selective on (re)loading state on first connection/service start/restart or it can cause very long initial request service times.
- Failed requests (ge from timeouts) are likely to be retried, so continue to pull in data for failed requests so that the retry is handled quickly - as long as you have sticky sessions so that the rqeuest goes to where you are loading.
- Decouple the memory lifetime from process lifetimes, eg using persisted shared memory rather than shutdown and restart and reload.
Breaking and entering: lose the lock while embracing concurrency, Part II (Page last updated September 2015, Added 2015-10-28, Author Tyler Treat, Publisher workiva). Tips:
- You can often replace an update protected by a lock (pessimistic update) with an update using a compare-and-swap in a retry loop (optimistic update) to improve concurrency.
- A read-modify-write operation is typically applied by copying a shared variable to a local variable, performing some speculative work on it, and attempting to publish the changes with a compare-and-swap.
- Consider how a memory structure will be used in terms of processing; if elements that might be reused during processing can be adjusted to fit completely in the CPU cache (typically 64 bytes) then there can be a good performance gain from doing that. This migh require bit mangled structures.
- Concurrent updates of linked structures requires a degree of indirection to avoid losing concurrent changes being applied to different nodes.
Elements of Scale: Composing and Scaling Data Platforms (Page last updated April 2015, Added 2015-10-28, Author Ben Stopford, Publisher benstopford). Tips:
- Computers work best with sequential operations because sequential operations can be predicted. Pre-fetching is built in to many underlying handlers, and this means that random operations really suffer compared to sequential operations. Sequential data loading from disk can be four orders of magnitude faster than random data loading; even sequential vs random access performance from main memory can differ by a couple of orders of magnitude.
- Appending to a log (or journaling) is enormously faster than editing the file in-place.
- Keeping in-memory indexes to disk-based data structures allows you to main the speed of sequential writes to the disk without losing indexing or causing random writes to the indexing structures. This is limited by available memory.
- It is easier to optimise random reads than it is to optimise random writes.
- Log Structured Merge Trees balances write and read performance with comparatively small memory overhead.
- Splitting data by column allows significantly reduced amounts of data to be brought from disk as long as your query operates on a subset of all columns. This also improves compression and works particularly well for operations that require large scans.
- Partitioning by hash is a very scalable way to spread requests across machines.
- Consistency is expensive - it essentially ensures that all operations appear to occur in sequential order; for a highly concurrent system this will massively limit the system throughput.
- If you must have consistency, isolate it to as few writers and as few machines as possible.
- Command Query Responsibility Segregation optimises reads and writes using an intermediate mapping; writes go to a write-optimised (eg journaling) system, the reads come from a read-optimised system; in between the writes are mapped into the read system; to get the full set of results including the latest writes, the written system is also read for the query, but only the portion of data not yet mapped into the read-optimised system, and the results are merged.
How to load test & tune performance on your API (Part II) (Page last updated May 2015, Added 2015-10-28, Author Victor Delgado, Publisher 3scale). Tips:
- API gateways are a common architectural solution to the problem of managing access to APIs for: access control; exposing different interfaces for combinations of internal APIs; rate limiting.
- The number of simultaneous open connections you can have depends on the underlying system; you may need to increase the system wide limit for "open files", as well as the user limit; you may also need to tune keepalives, tcp timeouts, and (thread) pool sizes.
- Set your objectives before running a test.
- Prepare and measure a realistic and stable baseline environment.
- Ensure that the testing tool is not itself limiting the test.
Java heap analysis - Out of Memory!!! (Page last updated July 2015, Added 2015-10-28, Author Manoj Ramakrishnan, Publisher engineeringislife). Tips:
- Add the -XX:+HeapDumpOnOutOfMemoryError flag to let the JVM dump the heap when it first encounters an OutOfMemoryError. There is no performance overhead to the application for adding this flag - until an OutOfMemoryError is hit, then the application can freeze while dumping the heap. The -XX:HeapDumpPath flag let's you specify where the dumped heap will get written to.
- In the case of an OutOfMemoryError there are two possible solutions: Increasing the heap size until the OutOfMemoryError is no longer encountered; or fix the code so that the OutOfMemoryError doesn't occur. In the latter case, this is probably from objects being added to a collection but incorrectly not deleted - you need to identify the collection and ensure that the objects do get deleted.
- Heap analysis using Eclipse MAT is to: open the heap dump; run the Leak Suspect Report; consider the large suspects; find the path to the root using the "Path to Gc Roots -> exclude soft/weak references" option, this should identify what is holding on to the suspect leak.
Back to newsletter 179 contents
Last Updated: 2022-06-29
Copyright © 2000-2022 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us