Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Threading Essentials course
Tips December 2014
Get rid of your performance problems and memory leaks!
Get rid of your performance problems and memory leaks!
Back to newsletter 169 contents
Building Nanoservices With Java 8 And JavaEE 7 (Page last updated September 2014, Added 2014-12-29, Author Adam Bien, Publisher adam-bien.com). Tips:
- A distributed service can be scalable or consistent, but not both.
- Deployment can now be done from the OS level up, servers with SSD are fast enough. For larger deployments, you can use Docker.
- Enterprise projects and high-scale web projects often have different requirements, different restrictions. Typically an enterprise project has much more scope for small inefficiencies - using a little more memory, taking a little more time to start, etc. High-scale web projects tend to have hundreds of installations which means small inefficiencies can multiply to large ones across the full system.
- There's little point in sharing an application server with multiple projects any more, having a dedicated application installation scales better.
- You can have a good relational data model or a good object database model, but not both.
- Load tests use realistic simulations and tells you if the system achieves performance targets; stress tests are unrealistic and will identify where the system will break under extreme loads. Stress tests can expose issues earlier, but will also expose many issues that may never need fixing.
- Many web applications are now built with eventual consistency as a primary target: transactions are consistent from the point of view of the initiator, but not across the whole system (for example, if you answer a question on stackoverflow, your view shows the answer immediately in the question context, but other people might not see your answer for minutes).
Deep monitoring with JMX (Page last updated July 2014, Added 2014-12-29, Author Erik Costlow, Publisher Oracle). Tips:
- JMX let's you monitor application CPU, memory and resource usage, and other statistics, including custom statistics.
- The JMX monitoring agent is already available inside the JVM runtime. JMX connections take place at the JVM level rather than inside the application. Monitoring information is available either locally or remotely.
- JMX viewers include jconsole, jvisualvm and java mission control.
- Applications can turn on remote JMX capabilities with -Dcom.sun.management.jmxremote.port=portNum -Djava.rmi.server.hostname=OptionalHostnameConnectionRestriction
- Mission control flight recorder is a minimal-overhead black-box that continually records, and is suitable for production.
Designing Software in a Distributed World (Page last updated October 2014, Added 2014-12-29, Author Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan, Publisher informIT). Tips:
- Failure should be considered normal. Designs must work around them, and software anticipate them. Automated failure handling mechanisms should be created for any scaled system.
- Visibility into the system is a MUST. The visibility must be actively created by designing systems that draw out the information and make it visible.
- Simplicity is key. The more complex the system, the more difficult it is to have an accurate mental model.
- A standard architecture is a load balancer with multiple backend replicas. The load balancer must always know which backends are alive and ready to accept requests, and only route requests to those.
- A naive Least Loaded algorithm can cause the very problem it's intended to avoid, eg The CNN site in 2001 crashed from one backend getting overloaded - when it came up all traffic was directed at it because it was least-loaded when it started, but that overloaded it, and this happened successively for each backend until the site was completely unavailable and couldn't restart without careful manual intervention (in this case restarting the entire site simultaneously).
- A standard architecture is to use multiple disparate backend servers to handle partial responses and compose the responses into whole response at the frontend. Where appropriate a partially complete response can be sent to the requestor, with elements filled in as they come (eg for a web page composed of parts that can be separately updated).
- The fan out problem is one where one incoming query results in multiple new queries to backends. This can need careful management and rate-limiting to avoid congestion.
- A standard architecture is the server tree where solutions are sharded across servers and constructed in parallel. This is especially useful where fast answers are more important than perfect answers.
- The easiest way to store state is to put it on one machine. But this is a single point of failure and limits the total data volume to what can exist on that one machine.
- Sharding data (splitting it amongst multiple machines according to a particular data partitioning algorithm) scales well.
- Having live replicas makes data highly available but means you can have out-of-date data, which needs managing.
- It is not possible to build a distributed system that guarantees consistency, availability, and survival of partial failure - any one or two can be achieved but not all three simultaneously (this is the CAP theorem).
- CAP (consistent, available, partition tolerance) splits data management systems into CA (eg Relational DBs); CP (eg Hbase, Redis, Bigtable); AP (eg Cassandra, Risk, Dynamo).
- Modern distributed systems target highly availablity, low downtimes, and change without disruption. Having a loosely coupled system helps achieve these.
- You can rule out badly performing designs and architecture with prototyping, and even before prototyping by using estimates of processing times based on known IO delays.
Atomic operations and contention (Page last updated August 2014, Added 2014-12-29, Author Fabian Giesen, Publisher The ryg blog). Tips:
- Cores have a very slow fallback path to guarantee atomicity (only one core can operate atomic operations at a time by telling all other cores it is operating and waiting for other cores to finish memory operations), but normally faster atomic execution can happen within a single cache line. Since cache line operations are already handled well by multi-core systems, this doesn't require much overhead so is fast. This means that you should try to get atomic operations to execute against data in a cache line - an important consideration for atomic operations across more than one field.
- Private data (existing in only one thread) and immutable data has no contention. Shared mutable data is one to two orders of magnitude slower to operate on when the cache lines holding that data are contended.
- False sharing is when data is inappropriately intermingles on the same cache line such that logically there should be no contention when accessing the different data items from different threads, but practically there is because they are on the same cache lines. You can avoid false sharing by ordering the data correctly and/or padding it.
- The way to get scalable multi-processor code is to avoid contention as much as possible and, where unavoidable, to make whatever contention remains pass quickly.
On heap vs off heap memory usage (Page last updated December 2014, Added 2014-12-29, Author Peter Lawrey, Publisher VanillaJava). Tips:
- Object pooling is generally slower than explicitly creating and dereferencing objects. However in the low latency space pooling objects can work with CPU caches to reduce jitter, and also reduce GC pressure.
- final fields provide for thread-safety for primitives and also object references as long as the object referred to is immutable. If the referred object is mutable, you can still get thread-safe access and update as long as the mutability is handled in a thread-safe way (eg the example StringInterner class can have threads overwrite the internal data structure in a non-deterministic way, but the result of any call will still always produce a coherent object adhering to the class API. Specifically, the "interned" String could be interned several times as different objects in different threads but this doesn't violate the class API).
- Off heap memory and object pools both help reduce GC pauses.
- Off heap memory provides: Scalability to large memory (full virtual memory address space); no direct GC costs; sharing between processes on the same machine; persistent memory.
How Facebook Makes Mobile Work at Scale for All Phones, on All Screens, on All Networks (Page last updated September 2014, Added 2014-12-29, Author Todd Hoff, Publisher highscalability). Tips:
- 3G penetration and average 3G latency various enormously (2014 Facebook stats US: 70.6% 3G penetration, 280ms latency; India 6.9% 3G penetration, 500ms latency; Brazil 38.6% 3G penetration, 850ms latency).
- Designing for high-end users means low-end users have poor experiences, and probably vice versa.
- Core performance metrics should include daily usage, cold start times, and reliability.
- For many mobile apps like Facebook, image data dominates the downloads. Reducing image sizes, resize on the server and send appropriately sized images, thumbnails, previews; Google's WepP format is smaller for equivalent quality.
- Facebook adjusts app behaviour according to connection quality. Facebook servers provide a Round Trip Time estimate in the HTTP header in every response, while the client maintains a moving average of throughput and RTT times to determine network quality. Poor is < 150kbps. Moderate is 150-600kbps. Good is 600-2000kbps. Excellent is > 2000kbps.
- Tuning options based on connection quality (don't overload the network): Increase/decrease compression; Issue more/fewer parallel network requests; Disable/enable auto-play video; Pre-fetch more content.
- Prefetching is especially important on networks with high latency. Donā??t block foreground network requests on background network requests.
- Monitor for overfetching and excess consumption of device resources.
- Minimize uploading to servers from mobile devices, eg resize images on the client side before sending.
Back to newsletter 169 contents
Last Updated: 2018-10-29
Copyright © 2000-2018 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us