|
|
|
Back to newsletter 037 contents
This month we interview Jayson Falkner, author of several JSP books and maintainer of several websites built around JSPs.
Can you tell us a bit about yourself and what you do?
I split my time between working as a J2EE consultant for Amberjack Software LLC and pursuing a PhD in Bioinformatics at the University of Michigan. I am a long time Java developer, and I've been working with Servlets and JavaServer Pages since the very first releases. Part of my work with Amberjack is to web master JSP Insider, http://www.jspinsider.com, and the book support site for "Servlets and JavaServer Pages; the J2EE Web Tier", http://www.jspbook.com. I built both of these sites from scratch, and they are meant to benefit the J2EE community. Just recently I participated with the JSP 2.0 expert group, JSR 152, and finished authoring the book previously mentioned.
My friends joke, "Student by day, JavaMan by night." I like the joke in a overly geeky type of way. It describes me well.
You have recently upgraded your sites and made available all site source code for viewing. That is pretty useful, what made you do that?
Yes, this is correct. The JSP Insider website, http://www.jspinsider.com, has been upgraded recently, and both it and jspbook.com provide complete, up-to-date source-code -- you can even read what we keep in the database. There are a few motivating factors for this. One of the primary reasons is that I felt this would be of the most benefit to the JSP and Servlet communities. If you like what you see, you may have it, source-code included. Likewise, I can author articles about really helpful software and point to actual examples of the software being used in the real world. Another primary reason is that I know a lot about the latest J2EE specification, J2EE 1.4, and I want to help developers quickly adopt the new specifications so that they can take advantage of the latest code. What better method then to build a few projects that rely on the new specifications and to allow developers to see how the code works?
Before digressing, another sneaky motive lurks behind providing source-code for everything. Being a long time developer, I'm familiar with most all of the popular design patterns and competing web tier technologies, and I'm sick of having people complain about what technology X can do and JSP/Servlets can't. This is plain rubbish that new developers often pick up on because the marketing department gets to them before they have a chance to learn about what they are using. I hope to eventually document JSP Insider to the point where anyone interested in server-side Java can look to get good information about what can be done, how it should be done, and how it performs.
What sort of scalability have you achieved with your site ? (page hits, concurrent sessions, etc)
JSP Insider gets around 500k hits per month, and I've yet to really track jspbook.com. We don't track session information, as we don't require users to ever log on or register for our services. We used to actually disable sessions all together for performance, but that currently isn't done, for no good reason.
Readers would probably like to know we run everything using the Jakarta Tomcat project (the JSP and Servlet API reference implementation), http://jakarta.apache.org/tomcat, and all databases are kept locally on the same server. The server itself is a Dell 1Ghz box with 512MB of RAM. Nothing special, but it has a fast internet connection thanks to the folks at hostjsp.com.
You say 500k/month. That translates to an average of over 10 hits per minute. My experience is that peak hit rates can be up to an order of magnitude higher than the average rates, which might put your peak around 1 hit per second.
Yes, you are right. The 500k/month jumps around all the time, usually based on when newsletters go out, or if some other big site links to ours, etc....
Given your pages load pretty quickly (under 5 seconds), it seems like you don't have to handle a huge concurrency load - concurrent sessions are probably not more than 5 at peak, is that right? Of course this is ideal for any site, get reponse times down short enough so that concurrent requests are minimized.
The peak concurrency level of 5 is a fair guess, but it is important to distinguish between how many requests Tomcat is handling and how many web app sessions (i.e. HttpSession and active db connections) are being kept in memory. The web apps are more or less stateless, and content is heavily cached -- for practical purposes consider everything a static page. The helpful point should be that Tomcat is doing a fine job handling 5 HTTP requests at a time with minimal activity going on in the web app, which is reasonable given the default Tomcat configuration has at least 5 threads handling requests.
We are using the default Tomcat config...literally.
Do you have an idea of how high the jspinsider site could scale if necessary?
Using some common sense, I'd say JSP Insider should scale very well. The most obvious reasons are that we are't keeping session information and that our content doesn't change that frequently (probably once a day, at the most). Tomcat 5 allows for clustering, and it should be straight-forward to tack on more servers if the current one was getting maxed out. Plus, remember I mentioned we rely on flat files (i.e. JSP) for content storage. Our database is important, but it doesn't have to do much -- for example, none of the queries even require a join. If we need to add more server power, it really is as trivial as deploying the JSP Insider WAR to a few boxes running Tomcat. No worries about taxing database connectivity, and no worries about shuffling around session information.
In the more practical I-am-stuck-running-one-server scenario, JSP Insider should also scale really well. The caching reduces response times to as fast as Tomcat can send content over the wire. Our hosting company, hostjsp.com, has really big wires (a level 3 network), so that probably won't ever slow us down. And we have already worked hard to reduce the size of what we send back to a client to be about as small as it can be.
Finally, some points that apply in all cases. JSP Insider uses the default Tomcat 5 setup and the default OS setup. It is not optimized for what the site is doing. If I wanted to scale the site up, some easy improvements would be to ensure session tracking is disabled (HttpSession that is), disable unimportant Tomcat logging, disable unimportant web app logging, give the JVM running Tomcat as much RAM as possible, and play around with any other optimization options my Java compiler and JVM offered. I'd also probably make sure my cache setup was keeping as much information in physical RAM as it could, and I'd make sure my server wasn't using virtual memory (not the processor type, the kernel type). I'd probably also buy another stick or two of RAM for the server -- I know we have a few slots open, and RAM is really cheap.
Making up some numbers, I'd be really surprised if we couldn't handle at least double, even triple, the amount of traffic we are handling now.
Are there particular performance optimizations that you regularly apply?
Yes. The site relies heavily on flat files for both simplicity and proper multi-language support. There are obvious concerns for memory usage and the time it takes to read information from both the database and the hard-drive. Here are the best tricks I use:
#1 Database Connection Pooling - This is a well-established technique. Instead of opening a new database connection each time a query is done, a pool of several connections is shared. Tomcat provides fantastic built-in support for this via the Jakarta Commons Database Connection Pool API, http://jakarta.apache.org/tomcat/tomcat-5.0-doc/jndi-datasource-examples-howto.html.
#2 Don't read flat files from the hard disk. Reading information from the hard drive is really slow. In the JSP world there is a trivial fix to this. We save content as a JSP page, properly translated and encoded in any language. Tomcat automatically converts the JSP to a in-memory servlet the first time someone tries to view the page's content. Any dynamic information the page relies on, say links to pages describing the day's news, is passed around using the popular Model 2 design pattern. The result is that flat files are rarely ever read more than once from the hard drive.
#3 Server-Side Caching. A servlet filter that automatically caches content for you is trivial to deploy, and it can save enormous amounts of time. Here is a full writeup I did on this, http://www.onjava.com/pub/a/onjava/2003/11/19/filters.html.
#4 Client-Side Caching. Set the HTTP response headers (i.e. Control-Cache) that tell a client what it can cache. That way users aren't downloading your header/footer images, style sheets, or javascript files multiple times. Here is a write up I did for this, http://www.jspbook.com/faq.jsp#1069699404218.
#5 Simplify everything. This is a trick taken from Google, but that some sites can't use. The technique is to trim down your content to the smallest amount of stuff you actually need. Ditch flashly DHTML in favor of plainly describing the content. Style things well using an external CSS file, don't embed styles. This results in your sever needing to send less information to a client, thus faster page download times.
#6 GZIP compress when possible. Most web browsers will accept GZIP compressed content (about 1/6th the size of normal content). Compress whenever possible and you will have to send less bytes to a user to convey the same information. Here is a writeup I did on this, http://www.onjava.com/pub/a/onjava/2003/11/19/filters.html.
#7 Know what you are keeping in memory. Far too often I've seen nicely optimized sites crash because someone is abusing either the session or application scopes. Java can have memory leak(ish) stuff. You can't reference a billion objects and expect your JVM to handle it. Whenever you can, use only the request scope for passing objects between resources in your web application. If you must use application or session scope, ensure you eventually clean up the resource.
In short, I write simple code. I reduce the amount of content that is getting sent over the wire. And I cache whenever possible.
What do you consider to be the biggest Java performance issue currently?
Marketing departments and over-worked developers. Way too many people just don't understand real performance issues in Java. I'm sure you've seen the latest performance studies that show Java out performing C in some numerical method implementations (i.e. hardcore stuff). Not having pointers can be a performance boost if you are using a smart enough virtual machine and compiler. It is sad how many people don't get this concept, yet they love to say byte-code based languages will always be slower than machine code. The world we live in also goes by internet time. Perfectly smart people are required to do a lot of work in very little time...this alone results in poorly performing code, especially since overworked people don't have much time to learn about how they should have coded things in the first place.
In the more practical relm of things for Java buffs. I think not knowing the popular design patterns is the worst performance issue you can have. It is easy enough to tack on caching, or to deploy a good framework, but if you don't understand why you are doing it, it is a moot point. Investing a good chunk of your time studying good design patterns before you even start coding can save you enormous amounts of time in both speed and maintenance.
Do you know of any requirements that potentially affect performance significantly enough that you find yourself altering designs to handle them?
Yes, when working with several Java developers it is key to enforce your interfaces. Speeding up logically divided chunks of code isn't too tough, but making sure other developers don't muck with your code can be. I always make sure to alter loosely defined divisions of labor to be strictly defined divisions of labor. Usually this is as simple as writing up a few Java interfaces or providing the proper abstract base classes. In the JSP/Servlet world, this is usually as easy as using Model 2 and disabling scripting on JSP pages.
I can't even describe how much coding time and code optimization time the above trick has helped me save. Coding by yourself is easy. Coding with others is tough. Make sure poor code won't kill your multi-developer driven project.
What are the most common performance related mistakes that you have seen projects make when developing Java applications?
Failing to do the above answer is usually it. It is easier to optimize lots of logially separated code than it is to optimize one massive chunk of code, especially when the code isn't yours.
A more practical answer for the Servlet/JSP world is that people simply don't use things such as a GZIP filter or a Cache filter. Servlet filters automatically enforce a strict separation of code, which makes you automatically use my above performance trick. Thanks to this separation it really is trivial to add caching and compression support to your entire web application. If you aren't doing this, see the links I provided above when describing the common performance techniques I use.
Which change have you seen applied in a project that gained the largest performance improvement?
The Caching and Compression filters I mentioned. If your web app was coded poorly it may be slow, but caching can speed up responses to be as fast as accessing your server's RAM. If you generate really massive amounts of content to send to a client (say 60k), you can reduce it to about 1/6th of that size (e.g. 10k), which results in a user thinking your web site suddenly became six times faster. Both of these filters work well thanks to proper abstraction of functionality, and they can "save your butt" when dealing with an existing nightmarish project.
Have you found any Java performance benchmarks useful?
Sort of. I find it really helpful to be able to know how to figure out why code is slow. Knowing how to do performance benchmarking (anything from JVM profiling to simple System.out.println() method calls) really makes you aware of common problems. This results in you thinking about those problems when designing and coding a project, and more often then not, it prevents common performance problems from ever occuring. Benchmarks others provide often leads a naive developer down the roadway to figuring out how the benchmarks were done...I'd actually argue that is one of the few helpful things formally published benchmarks provide.
Do you know of any performance related tools that you would like to share with our readers?
Nope. I primarily live in the world of J2EE, where I consider performance related tools more of a exercise in the black arts rather than practice. I have yet to find something that replaces knowing general performance monitoring techniques and applying those techniques when appropriate.
Cheers, Jayson Falkner
(End of interview).
Back to newsletter 037 contents