Java Performance Tuning
Java(TM) - see bottom of page
Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!
Training online: Concurrency, Threading, GC, Advanced Java and more ...
Tips February 2018
JProfiler
|
Get rid of your performance problems and memory leaks!
|
JProfiler
|
Get rid of your performance problems and memory leaks!
|
|
|
Back to newsletter 207 contents
https://www.youtube.com/watch?v=lZU6RK0oazM
Scalability Is Quantifiable: The Universal Scalability Law (Page last updated November 2017, Added 2018-02-27, Author Baron Schwartz, Publisher Usenix). Tips:
- At the unacceptable workload boundary region (it's not a sharp cutoff, it's gradual), complex systems start to run in degraded modes. This is a non-linear response area, and it's very difficult to reason about this area. When you exceed reasonable capacity limits, system performance degrades dramatically quickly.
- It's hard to figure out which component is waiting in real systems.
- Linear Scaling: if a cluster can do X work per unit time, then doubling the cluster size lets it do 2X. Equation is X(N) = L x N / 1 where X is the workload, L is the slope (usually denoted by lambda rather than L).
- If some portion of your workload is not parallelizable then you cannot get perfectly linear scalability from increasing cluster size. Factoring this in to scalability moves you from linear scalability to Amdahls Law, ie from X(N) = L x N / 1 (linear) to X(N) = L x N / (1 + S x (N-1) ) where S (usually denoted by sigma rather than S) describes the fraction of the workload that cannot be done in parallel. For example if 5% of the workload cannot be parallelized, then the maximum speedup even with N becoming very large, would be 20 times (in the extreme where the other 95% can be parallelized to nothing).
- If workers have to coordinate amongst themselves, service time can increase with more nodes because they have to spend more time coordinating with more nodes. N nodes means there is N x (N-1) connections (crosstalk) so the overhead is quadratic.
- Adding in crosstalk costs to Amdahls Law gives the full Universal Scalability law: L x N / (1 + S x (N-1) + K x N x (N-1)) where K (usually denoted by kappa rather than K) is crosstalk/coherence penalty factor. This implies there is a optimum number of nodes above AND below which the system cannot to as much work. This is the system's maximum capacity, beyond this the system degrades then eventually fails.
- Load is concurrency, the number of requests currently in flight.
- Use the Universal Scalability law to capacity plan.
https://www.youtube.com/watch?v=M54gbffeFhs
Fast and Safe Production Monitoring of JVM Applications with BPF Magic (Page last updated November 2017, Added 2018-02-27, Author Sasha Goldshtein, Publisher Usenix). Tips:
- OpenJDK has tracepoints mem_pool_gc_begin, class_loaded, object_alloc, thread_start, etc. tplist -p PID will print the tracepoints
- trace, bpf, etc tools do need root access. trace lets you attach to the running process and trace specific tracepoints. Eg trace ?SyS_write (arg1==1) ?%s?, arg2' -U -p `pidof java` attaches to tracepoint syscall ?write? and if the arg1 (the field descriptor) is 1 (stdout) then prints the message (arg2) as a string and emits a stack trace (-U) at the same time. But the stack trace is not readable for the Java elements, to add the mapping for those use the perf-map-agent project.
- You need to add -XX:+ExtendedDTraceProbes to enable some expensive tracepoints, eg method_entry.
- You can use perf to record CPU profile information from a Java process and generate a flame graph. But this is not real time analysis, you generate data and analyse it offline, and it generates a very large amount of data for high frequency events, so doesn't work for many types of profiles.
- BPF needs a linux kernelof 4.6+. BPF tracing can be much more efficient than perf because the data can be processed as it is generated, leaving only a small amount of data.
- Useful BPF commands: ustat, profile, uobjnew, stackcount, opensnoop,
https://www.youtube.com/watch?v=QwZF8xQHlxE
Collections.compare:JDK, Eclipse, Guava, Apache (Page last updated October 2017, Added 2018-02-27, Author Leonardo Lima, Publisher JavaOne). Tips:
- Eclipse and Guava have immutable types, Apache and JDK need to dynamically create an unmodifiable wrapper instance.
- Apache, Eclipse and Guava have MultiMap types.
- In the creation performance tests, all frameworks are comparable, but Eclipse was 10% faster than Apache, which was in turn 5%-10% faster than the others. In memory, Eclipse and Guava were 5% smaller than the others.
- In the groupBy performance tests, JDK is significantly faster than the others, actually benefiting from the lack of an immutable collection. Apache, Eclipse and Guava were each approx 20%, 40% and 60% slower. Memory-wise they were all similar.
- Mutable to Immutable is O(n) but mutable to unmodifiable is O(1) (the latter just creates a wrapper around the existing collection.
- Eclipse does well in performance and memories with Bag structures.
- Also useful to know that Apache has a Trie and Tree structure (very fast for when these suit the problem)
- Eclipse supports primitive data types (which reduces memory use if the data being held are primitives).
http://cdn.oreillystatic.com/pdf/PracticalMonitoringSampleChapter.pdf
Chapter 1 of Practical Monitoring (Page last updated February 2018, Added 2018-02-27, Author Mike Julian, Publisher O'Reilly). Tips:
- Monitoring isn't just a single, cut-and-dry problem, it's a huge problem set. Decide on the problem the monitoring needs to solve, then look for the tools combinations that solves them. Don't start with the tool!
- It's 2018, not 1999 - your systems can easily handle the miniscule load of an agent based monitoring tool, and these are much more flexible than agentless ones.
- Choose tools wisely and consciously, but don't be afraid of adding new tools simply because it's yet another tool - as long as it provides additional metrics that you need.
- Evaluate and prototype monitoring solutions rather than choosing them because someone else uses them or because a team member used them in the past
- Monitoring is not a job, it's a skill, and it's a skill everyone on your team should have to some degree
- It's not ready for production until it's monitored.
- Your monitoring solution should provide the answer to "is it working" so you need to define what "working" is.
- OS metrics aren't very useful for alerting. They ARE critical for diagnostics and performance analysis, but most of the time, they aren't worth waking someone up over. Unless you have a specific reason to alert on OS metrics, stop doing it.
- Collect metrics at least every 60 seconds. If you have a high-traffic system, opt for more often, such as every 30 seconds or even every 10 seconds.
- Monitoring does not fix problems. If you find yourself constantly adding more monitoring for a service, stop and invest your effort into making the service more stable and resilient. More monitoring doesn't fix a broken system, and it's not an improvement for a broken service.
- Your monitoring should be 100% automated, services should self-register
Jack Shirazi
Back to newsletter 207 contents
Last Updated: 2024-12-27
Copyright © 2000-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on JavaPerformanceTuning.com are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. JavaPerformanceTuning.com is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
URL: http://www.JavaPerformanceTuning.com/news/newtips207.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us