[an error occurred while processing this directive]
Back to newsletter 041 contents
This month we got to interview Gavin King, founder of the Hibernate open source object/relational mapping project.
JPT: Can you tell us a bit about yourself and what you do?
I am the founder of the Hibernate project, an open source object/relational mapping solution for Java. I recently joined JBoss Inc, so that I could concentrate fulltime on developing Hibernate and building a business around support, training and consulting services. I'm also working hard to bring some of our ideas about persistence into the Java standards, via the JCP process. I've just finished writing a book called "Hibernate in Action" with Christian Bauer.
JPT: What brought you to build Hibernate.
Hibernate was designed as a fix for the well known problems of Entity Beans. I was developing J2EE applications with an Australian company called Cirrus Technologies and was very frustrated by my lack of productivity, and by my inability to apply object modeling techniques to the business problem. I ended up spending more time thinking about persistence than I spent thinking about the user's problem. That's always wrong.
JPT: Given the current buzz around Hibernate, what made you decide to join JBoss?
It's simply not possible to run a project with the kind of scope that Hibernate has, or with the sheer number of users we have to support, in your spare time. There is a limit to how far a project can scale, and to how well it can compete with commercial alternatives, when you have only volunteers working on the project. That is not to diminish the importance of volunteerism in open source; indeed, it is a fundamental aspect of open source, and one of the things that makes the open source model so interesting. But at some point, you want to really compete full tilt with the guys selling licenses. And at that point, you need two things. First, you need to be able to concentrate, and have the open source project as your first priority. Second, you need to be able to provide to your users, some of the things that they expect from commercial software. I'm thinking about things like 24/7 support and training courses. Futher, you need to be able to get out in the field and speak to people at conferences and other events, without that being your annual leave!
So, why JBoss? Well, JBoss is almost unique in that the company actually pays for the development of the open source software it sells services for. Other open source companies out there are trying to make money either from dual licensing, or by selling packaged software or services for software that they did not actually write. So in that sense, JBoss is virtually the only place on earth where I am able to make a living doing the things I just talked about. More importantly, it's just a great place to work. I have never seen such a dynamic company culture before. I guess that's the result of intersecting entreprenuerial American business culture with the semi-structured approach to development that you find in the open source community. This is just great if you are trying to build innovative technology.
JPT: How would you compare Hibernate to TopLink or one of the JDO implementations?
Hibernate is an open source solution that is as capable, usually more capable than our commercial competitors. By being free, we have attracted a huge number of users. By having a lot of users, we have gained an excellent understanding of what functionality they really need. The big difference is our community-driven development model.
We don't see Hibernate as an "inexpensive" alternative, though some of the commercial products are indeed expensive. Rather, we think people use Hibernate because it is a better fit to their requirements, is more stable, and better supported. And being open source is no excuse for bad documentation. We try to prove that with OSS, you can have your cake and eat it - we will do everything that commercial vendors do well, just as well as they do them, and also do a bunch of things they can't do (like give you the source).
JPT: Have your original ideas about O/R mapping changed much since you embarked on this project?
I went into this knowing very little about ORM, and even very little about databases. One of my first tasks was to go out and buy a book to learn SQL properly. All my understanding of the problem comes from what our users have taught us over the last two years.
JPT: Why, where and when should someone use Hibernate?
You should use Hibernate if you have a nontrivial application (definition of nontrivial varies, but I usually think of Hibernate being less applicable to applications with only ten tables or so) that use an object-oriented domain model. Not every application needs a domain model, so not every application needs ORM. But if your application does a lot of business logic - rather than just displaying tabular data on a webpage - then a domain model is usually a good thing.
Hibernate really starts to shine in applications with very complex data models, with hundreds of tables and complex interrelationships. For this kind of application, Hibernate will take away a huge amount of coding effort (perhaps up to 25%, for some applications) and will result in an application that performs better than the alternative handcrafted JDBC. This is possible because some kinds of performance optimizations are very difficult to handcode: caching, outer-join fetching, transactional write-behind, etc.
JPT: But we both know that these tiny atomic persistence frameworks start with the idea that O/R mapping is an easy problem. And to be honest, "in the small" it is and easy problem. It's only when you try to scale that these problems show up. Do you see Hibernate having any effect on diminishing this trend?
When I first started this, I thought that what was needed was a simple solution. (You often hear people say "J2EE is too complex", so this seems reasonable.) It turns out that ORM is a difficult problem, in subtle ways. It always looks simpler from the outside than it turns out to be once you start getting your hands dirty. This is why it has taken so incredibly long for decent persistence solutions to really appear. I pretty soon discovered that "simple" just wasn't going to cut it. So now we have a solution that is unashamedly complex. But hopefully no more complex than it needs to be. And it turns out that this is the best way to simplify life for the actual user.
In our book we've tried to really nail down just what the "object/relational mismatch" is, since lots of people agree that it exists, but not many have really tried to define it. I think some of the things we mention will come as a surprise to some people.
JPT: Components that live in the "middle" of an application are tricky in that they tend not to have formal performance requirements. How do you deal with setting performance requirements for Hibernate?
Well, we do have one seemingly easy way judge our performance: we can compare Hibernate performance against the equivalent JDBC. Now, it turns out that this test can be misleading. Some important performance optimizations (for example, the transaction-level cache) can actually reduce performance for the kind of very trivial benchmarks that people typically write. This is more a problem of the triviality of the benchmarks than anything else.
I have yet to see a single persistence benchmark that comes even remotely close to accounting for the real problems that affect ORM performance in nontrivial usecases. The biggest problems really come from what we call the "problem of graph navigation", where the pattern of data access used by an object-oriented application is a fundamentally inefficient way to access relational data. So, instead, we really rely mostly upon an informal way of testing performance: we put the software out there, and let our users find the problems. Which is just fine, as long as you are prepared to fix problems very quickly. Most of the performance problems we have come up against have been solved not by code optimizations, but by adding new functionality. It turns out that the overhead of Hibernate itself, compared to equivalent direct SQL/JDBC, is almost always so small as to be irrelevant. So we concentrate our effort upon producing more efficient SQL/JDBC. The bottleneck is always the database itself.
JPT: Another aspect of performance is how the product it to be used? Is Hibernate being used as you expected it to be? Are there any usage models that have pleasantly surprised? Can you talk about any usage models that you would discourage?
There are two usage models that are problematic. The first is where people try to apply ORM where it is not really suitable. The main example of where ORM - and indeed Java - is not suitable is the case of processing data in bulk. It is simply never going to be efficient to fetch millions of rows from the database, into your JVM, and then update them one at a time. Don't use Java for this. Use a stored procedure.
The second problem is sort of cultural. Some developers come to using a tool like Hibernate because they are uncomfortable with SQL and with relational databases. We would say that this is exactly the wrong reason to use Hibernate. You should be very comfortable with SQL and JDBC before you start using Hibernate - Hibernate builds on JDBC, it does not replace it. So, when developing business logic that calls Hibernate, you should be monitoring exactly what database requests ends up being generated. It is otherwise very easy to build an application that performs badly because you simply have no idea what is happening underneath. That is the cost of extra abstraction. On the other hand, it is usually very easy to come along later and fix the kinds of performance bugs that result from this kind of approach. That is the advantage of the extra abstraction, and it more than mitigates the disadvantage. But you will save yourself effort if you pay attention to the database at all stages of development.
JPT: How did you define a suite of performance testing tools?
We have some standard performance tests that I run regularly, all of which compare Hibernate against handcrated SQL/JDBC. But again, they turn out to be quite unhelpful in practice. The scalability tests I have done have so far been quite informal, and always confirmed my expectation that the database falls over before Hibernate does. Now that we have access to a real stress testing environment through JBoss, Christian Bauer is putting together some more formal benchmarks. These will include tests for the nontrivial usecases I talked about.
We are considering releasing these benchmarks to the public for the purpose of comparative testing of different ORM solutions (especially since it doesn't seem right to criticize existing benchmarks without providing some alternative). But I'm not sure about that; I don't see how we could stop other groups cheating - and I don't really want to deal with all the fuss that always accompanies benchmark results. Benchmarks are used to mislead, far more often than they are used honestly.
JPT: Most of the comment on the web (in blogs etc) have been very positive yet some have commented on the inability to take advantage of caching. Can you comment on that?
I'm not sure what you are referring to. Hibernate has an extremely sophisticated granular two-level cache architecture. It is possible to enable or disable the use of the (second-level) process or cluster level cache for a particular class or collection role. There is support for pluggable cache implementations, including EHCache, JBossCache, SwarmCache, Tangosol Coherence Cache. There is also a granular query result set cache. All this flexibility comes at the cost of complexity and can occasionally be tricky for new users. So caching is an example of something we cover in detail in the training courses.
JPT: In a recent blog entry you commented on the effect of removing finalization from Hibernate. What prompted you to make this change? What is the expected impact on developers using Hibernate?
I must admit that its been a few weeks, and I forget exactly how I stumbled onto this. It was the classic case of a performance problem being exactly where you wouldn't ever expect it to be. What's worse, the problem wasn't even very visible in Optimizeit, since it was occurring in the garbage collector's thread! Now, I should caution that in a real application, this kind of performance problem is incredibly unlikely to make an application that would otherwise perform well, perform badly. Indeed, much of the Java folklore about performance is of this variety. The amount of time and effort that is wasted fussing over the occasional use of string concatenation really could be much better spent addressing real performance problems, such as inefficient data access. And this issue certainly falls into that category.
The problem was that I had a finalizer, whose job was to check that the application remembered to close the Hibernate session, and if not, write a log message. Unfortunately, the mere existence of the finalizer meant that it took a full two garbage collection cycles to release the memory associated with that object. I found that for some very trivial transactions, this caused an amazingly high overhead. However, the big picture is that if the application did not close sessions cleanly, we would see a much, much bigger problem with performance. So the "cost" of this finalizer might be worth the benefit after all. Again, we should beware results that come from trivial tests. Trivial code almost always performs "well enough".
(End of interview).
Back to newsletter 041 contents