Java Performance Tuning

Java(TM) - see bottom of page

|home |services |training |newsletter |tuning tips |tool reports |articles |resources |about us |site map |contact us |
Tools: | GC log analysers| Multi-tenancy tools| Books| SizeOf| Thread analysers|

Our valued sponsors who help make this site possible
New Relic: Try free w/ production profiling and get a free shirt! 

Site24x7: Java Method-Level Tracing into Transactions @ $12/Month/JVM. Sign Up! 

The Interview: Mike Norman

jKool for DevOps
Light up your Apps & get a cool t-shirt

Get rid of your performance problems and memory leaks!

Java Performance Training Courses
COURSES AVAILABLE NOW. We can provide training courses to handle all your Java performance needs

Java Performance Tuning, 2nd ed
The classic and most comprehensive book on tuning Java

Java Performance Tuning Newsletter
Your source of Java performance news. Subscribe now!
Enter email:

New Relic
New Relic: Try free w/ production profiling and get a free shirt!

Site24x7: Java Method-Level Tracing into Transactions @ $12/Month/JVM. Sign Up!

jKool for DevOps
Light up your Apps & get a cool t-shirt

Get rid of your performance problems and memory leaks!

Back to newsletter 030 contents

This month we interviewed Mike Norman, the man in charge of performance at TopLink. The interview turned into a JDBC performance masterclass, which is to all our benfits. Read on.

JPT: Can you tell us a bit about yourself and what you do?

I was employed in the Telecom industry for 13 years, working on a wide variety of technologies ranging from small embedded real-time devices (only 128K ROM!) to large broadband network management systems with hundreds of thousands of lines of code. My major focus was on Object-Oriented systems and distributed computing (DCE/RPC, CORBA, Distributed Smalltalk, etc).

I had just finished working on a very large Smalltalk project and I was looking for an opportunity to switch to Java. I left the Telecom industry (no, I did not foresee the impending collapse of Telecom!) and joined the TopLink development team in September 1998. Since I had a CORBA background, the TopLink Chief Architect gave me a copy of version 0.8 of the EJB specification and said they were thinking about using TopLink to implement the persistence portion of the spec. Our first ?EJB container? shipped in mid-1999 with subsequent releases every 4 or 5 months.

I then joined the Professional Services group and traveled extensively for more than two years, teaching courses on EJBs and TopLink, presenting at conferences and consulting directly with customers. After the acquisition by Oracle, I re-joined the TopLink development group and have been focused primarily on performance.

JPT: Since TopLink was acquired by Oracle, your personal focus has been on performance. How did you plan and initiate this effort and what were your expectations?

Since TopLink has two distinct usage patterns - with EJBs and without - it became clear that we could not solely depend upon benchmarks such as SPECjAppServer (formerly known as ECperf). Additionally, ?micro?-focused information from profilers does not always translate well into ?real-world? scenarios. My main task then was to design a framework that allows developers to write simple, robust and repeatable benchmarks to gauge the performance of any arbitrary TopLink code, not just TopLink-enabled EJB code.

We approached this like any other feature of TopLink: requirements gathering, design specifications, resources, timelines, etc. The main goal of the Performance Benchmark feature is to provide quantitative data to objectively measure TopLink?s run-time performance.

The Benchmark Infrastructure framework I designed was heavily influenced by JUnit. A ?Workload? has a simple lifecycle - setUp(), work() and tearDown() - and is specified in an XML ?WorkloadConfiguration? document (analogous to a TestSuite). A WorkloadConfiguration contains information describing the number of times a workload is run as well as the desired number of threads to schedule. The Benchmark Infrastructure also allows Workloads to be run concurrently to simulate real-world scenarios where multiple threads compete within a JVM for the same resource(s). Here is a sample WorkloadConfiguration document:

<?xml version="1.0"?>
<!DOCTYPE workload-configuration SYSTEM "file:///C:/benchmark/infrastructure/lib/workloads.dtd">
    <name>Example Workload Configuration</name>
                              <value><![CDATA[Hello, World!]]></value>

In general, the Benchmark Infrastructure does not provide assistance for 'micro' tuning; however, it acts as a starting point for investigations that can subsequently ?drill-down? using micro-tuning tools. Conversely, once a round of micro tuning has been completed, Workloads can be re-run to ensure that the gains are realizable in the real world.

JPT: We all know that performance tuning can be a "Voyage of Discovery"; as such, how did your plans and expectations change as the exercise progressed?

I suppose the biggest change has been the acquisition of TopLink by Oracle. Oracle is very focused on performance throughout its product line and TopLink is now subject to a level of scrutiny unlike any in the past. Specifically, TopLink performance must be considered within the context of the whole Oracle9iAS product suite and in a variety of different configurations (Clustering, High Availability, etc.)

JPT: What was the most interesting performance problem that you found and how did you solve it?

My most significant problem is not specific to TopLink, but is generic to all J2EE multi-tiered applications. The interactions between the JVM, the native OS?s process scheduler and the JDBC driver (network I/O) makes it very difficult to get consistent results. Even separating the tiers onto different machines did not always produce similar results.

In order to minimize the variability, I eliminated the front tier by building a Workload runner that executes solely in the AppServer; additionally, I eliminated the network round-trip cost to the back-end DB tier by using a local process. I was able to create a precise ?slice? through the TopLink run-time to get exactly the timing information I required. Of course, future testing will re-introduce those tiers in order to behave like the real world.

JPT: Did you run into any performance issues that were significant enough to warrant reworking portions of TopLink?

What is perhaps more interesting is the opposing question - what portions of TopLink did we think required reworking but actually did not show any performance problems? There are some areas of TopLink that we ourselves have always thought of as expensive, but it turns out not to be the case.

For example, TopLink uses reflection extensively at run-time and ?conventional? wisdom says that this is expensive. However, it appears that the cost of reflection is not a large contributor - at run-time, network I/O dominates. Additionally, the improvements to reflection in JDK 1.4 make this even less of an issue.

Similarly, we determined that with respect to threading, TopLink scales linearly with the workload: ask TopLink to do twice as much work, it will take twice as long.

JPT: Did you run into legacy code or design that was intended to deal with performance issues which no longer hold?

No - performance related code is reviewed regularly so that it does not get ?stale?. Of course, we are always looking for ways to improve TopLink?s performance.

JPT: What do you consider the biggest Java performance issue currently?

The cost of garbage collection is still a significant ?weight? on run-time performance. Soon we will see 64-bit computing deployed across the enterprise. When a VM has thousands of megabytes of RAM available to it (remember when 640K was a lot?), garbage collection algorithms must be very sophisticated to handle the enormous number of objects to be collected. Similarly, with such large systems the cost of synchronizing amongst many threads is likely to be a significant challenge.

JPT: What are the most common performance related mistakes that you have seen projects make when developing TopLink applications?

I guess the most common performance related mistake is trying to optimize an area of code that in the end will not result in much overall improvement. I use the following rule-of-thumb when thinking about J2EE middle-tier applications:

  1. The cost of ?regular? computation within a JVM is 1;
  2. The cost of invoking against EJBs is 10;
  3. The cost of retrieving information from the Database is 100 (with some JDBC drivers it is closer to 1000!)

Thus eliminating even one or two round-trips to the database is far better than all the StringBuffer optimizations one may ever find!

JPT: Do you have any particular performance tips you would like to tell our readers about? Any TopLink performance tips?

Using my rule-of-thumb guide above, the #1 determiner of performance for J2EE applications will be the number of round-trips to the database via JDBC. In many cases, it is immaterial how many rows are returned (well, obviously after some point it will be significant). TopLink has several performance features that optimize interactions with the database, resulting in dramatic reduction in the number of round-trips.

The primary TopLink performance feature is called ?Indirection? (a.k.a. ?lazy? or ?just-in-time? reading). Consider the following simple Employee model:

Employee has an Address and multiple Phones

When TopLink builds an Employee object, not only does it issue SQL to the EMPLOYEE table, but it must also issue SQL to join to the ADDRESS and PHONE tables:

SQL joins of EMP to ADDR and PHONE tables

If the only thing that you wanted to do with this Employee was to print her name, it is a rather expensive operation - 3 round-trips to the database!

TopLink can ?stub-out? the ?address? and ?phones? attributes of the Employee, replacing it with a proxy that holds the SQL we would have sent:

    select EMP_ID, F_NAME, L_NAME from EMPLOYEE where (EMP_ID = 558)
row in EMP table
proxy for Address row
row in ADDR table

The behavior of the proxy is such that if no one asks this Employee object for its Address, then no SQL is sent to the database. Of course, if the Address is already in-memory, then it is returned. The net result is that building this Employee object is now one-third the original cost.

The above optimization may be reasonable most times, but there may be some application-specific business logic where you know that as soon as the Employee is retrieved, the proxy will be ?triggered? and some attribute of the Address is required. TopLink has a feature called ?joined-attribute-querying? that allows the designer to retrieve both the Employee and its Address in a single database round-trip (joined queries can be composed either statically at design time, or dynamically at run-time):

example Java code and SQL to access data
example row of psuedo-table EMP+ADDR

TopLink uses the above ?super-row? - essentially the two rows shown above concatenated together - to build both objects in a single database round-trip (Note: the presence of the proxy between the Employee and its Address has no impact on this query).

Another way of dealing with the round-trip cost of the database is to ?bulk shop? for your data. For example, suppose that some business operation requires a list of 100 Employees and their Addresses:

100 Employee-proxy-Address objects

Let us suppose that the Employees were retrieved from the database using a single SQL call - this has reasonable efficiency: 100 objects/1 db round-trip. However, a simple loop iterating through the list of Employees to retrieve their Addresses will trigger each proxy. The additional SQL calls drop the efficiency to a miserable level: 200 objects/101 db round-trips.

To solve this, TopLink has a feature called batched-reading; the ability to specify that all the Addresses are to be read in a batch. Regardless of which proxy is triggered, a single SQL call reads in all the Addresses for the 100 Employees:

Any one proxy reads all 100 addresses
Any one proxy reads all 100 addresses

We now have a very reasonable level of efficiency: 200 objects/2 db round-trips.

JPT: What additional significant thing did you learn from this exercise?

Post-processing the data collected by a benchmarking exercise is yet another area that people underestimate for complexity. You must throw out statistical outliers, even though you really want to keep the extremely fast runs to make your benchmark look good. Additionally, you must plot your data and visually inspect it to see if there is clustering. From here, things get very complicated very quickly; "K-means clustering" is not ?everyday? statistics. If there are multiple loci of typical responses, you cannot use simple statistical averages to summarize the results. Whenever I encountered this phenomenon, I was forced to re-design my Workloads so that only a single response cluster would appear in the results.

JPT: Mike, we would like thank you for taking the time to answer our questions and we wish you continued success with your effort to tune TopLink

This interview was conducted by Kirk Pepperdine, a principle at He can be reached from here

(End of interview).

Back to newsletter 030 contents

Last Updated: 2017-03-01
Copyright © 2000-2017 All Rights Reserved.
All trademarks and registered trademarks appearing on are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
RSS Feed:
Trouble with this page? Please contact us