-
Java’s new keyword
Posted on March 1st, 2010 3 commentsCreating objects in Java is easy with the new keyword. In fact, it’s one of those things that you don’t think about. Need to access a file? Just create a new File instance: new File("build.properties"). For most Java developers, that’s all they need to know. Life becomes more interesting, though, when you start working with multiple class loaders.
Class loaders? Argh! Run away, run away!
That was pretty much my reaction for many a year. I just didn’t want to know about them. They were some kind of black magic and always Somebody Else’s Problem. It’s strange, because class loaders are actually pretty straightforward. Most Java developers know that you compile Java files to these *.class files and that those compiled classes have to be loaded by the JVM somehow. That’s basically what the class loader does. But like threads, the problem is not understanding what they do, but getting them to work together.
How many times have you heard the phrase “it’s a class loader issue?” I’ve certainly heard (and said) it more times than I’d care to admit. As soon as you have more than one class loader in an application, you have to start worrying about which classes can “see” which others. It can easily become a nightmare. But class loader behaviour is perhaps a post for another time. Let’s get back to new.
So, the first time that you create a new object, the JVM has to first load the class. This happens transparently when you use new. The question is, what class loader is used? And why does it matter?
Consider a scenario from Grails. We have a build system based on Gant that loads build scripts and executes them. In one of them, we instantiate a Jetty server and start it. The sequence of object creation goes like this:

In fact, the above is a simplification of what actually happens, but it suits the purpose of this post.The JARs for the first three classes are all on the classpath of what we will call the build class loader. This loads all the classes used directly by the build. So what about Jetty’s Server class? The most important thing to understand is that the Server class must be loaded by the same class loader that loads the Grails web application. Although you can pass your own class loader to the embedded server, if it’s different to the one that loads Server you’ll run into those dreaded class loader issues.
Bearing that in mind, let’s look at what happens if the RunApp script uses new to create the server instance:
def server = new org.mortbay.jetty.Server() ... server.start()
Right about now, you should be asking yourself “what class loader was used to load the Server class?” It’s a critical question because it determines what class loader is used to load the entire web application and hence what classpath the application’s runtime dependencies should be on. In this case, the class loader used is whichever one loaded the RunApp script. The new operator effectively delegates to this.getClass().getClassLoader().
What does that mean for our example? It means that the build class loader is used to load the Server class and therefore must also be used to load the web application classes. In other words, all the application’s runtime dependencies must be included in the build class loader! What’s the problem with that, you may ask. There is one potential problem and one actual.
The potential problem is class conflicts. What if the web application depends on a different version of a library that’s already on the build system? It’s a particular problem if any of the Apache XML API libraries are on the classpath. These cause absolute havoc.
The other problem is that the more JARs you have on the classpath, the longer it takes for the JVM to find the class it’s after. That means longer start up times. It’s one of the problems OSGi was designed to solve (he was told by a man in the pub). Why put JARs on the build classpath that the build itself doesn’t need?
The solution is to work out where you want a class loader boundary and use reflection to instantiate your object:
def runtimeClassLoader = new URLClassLoader(...) def server = runtimeClassLoader.loadClass("org.mortbay.jetty.Server").newInstance() ... server.start()This is pretty easy in Groovy because the start() method is evaluated at runtime, but Java needs to know the type at compile-time. You can’t do this:
ClassLoader runtimeClassLoader = new URLClassLoader(...) Server server = (Server) runtimeClassLoader.loadClass("org.mortbay.jetty.Server").newInstance() ... server.start()because you’ll get a ClassCastException on line 2. The declared type of server is loaded by this.getClass().getClassLoader(), whereas the new instance is loaded in a different class loader. Different class loader means different classes. So you have to use reflection to invoke the methods and access the fields you need. Fortunately, you only have to jump through these hoops at class loader boundaries.
As you’ve seen, the new operator is normally something you don’t have to think about, but as soon as you start dealing with multiple class loaders, you have to be aware of and understand its behaviour. The trick is to work out suitable class loader boundaries and then use reflection to load and instantiate classes at those boundaries. It may sound like unnecessary extra work, but you can gain real improvements in application/framework reliability. If you’re lucky, things may even run a bit faster
-
A Tomcat gotcha
Posted on February 24th, 2010 1 commentServlet filters. Straightforward, right? A request comes in, goes through each filter in the chain, hits the servlet and then the response goes the other way. Sort of. What could possibly go wrong?
Not much…unless you throw error handling into the mix. The trouble is, the servlet specification doesn’t lay down many rules on how error pages are invoked. My expectation was for the servlet container to transparently offload the request to the error page and then execute the filters in reverse. In other words, the filters shouldn’t be or need to be aware that the request has gone via the error page. The request should behave like any other request.
Tomcat seriously disabused me. It takes a different approach in which the main request almost completes (all the filters execute in reverse) before the request goes to the error page. The following diagram should make it clear what I mean:
Having read the servlet specification, it does make a bit of sense. One of the few requirements is that the servlet container should pass the original, unwrapped request and response to the error page, unless a filter is configured to execute for the ERROR dispatcher. Unfortunately, if you add something to the request in your filter, or attach a variable to the thread, it won’t be available to the error page if your filter also removes that data.
That may all be a bit too abstract, so how about a concrete example? Some of the filters in Apache Shiro bind an object called the security manager to the request’s thread. Any security code can then easily grab it on demand. When the request has finished, the filter removes (unbinds) the security manager from the thread. That’s just good practice. But what happens if the error page wants to access the security manager?
Hopefully it’s obvious from the discussion so far that you need to add Shiro’s filter to the ERROR dispatcher as well as the REQUEST one. This should work. When the servlet container triggers the error page, it first executes the filter. But that’s not what happened when I tried it. Instead, the error page threw an exception saying that no security manager was bound to the current thread.
What had gone wrong? After some debugging, I discovered that a super class of the filter, OncePerRequestFilter, was skipping execution of the filter if it detected that it had already been processed. How did it do that? By checking whether a particular request attribute was set. In this case, the attribute was still there from the main request, but the security manager had been unbound from the request thread.
This problem didn’t exhibit in Jetty at all, which is kind of annoying because you would hope that code that runs on one container would also run on another. Fortunately, there is a simple moral to this story: filters should always clean up after themselves when they finish and never leave stuff in the request, session, or thread.
-
Are polyglot systems a good idea?
Posted on August 4th, 2009 5 commentsA recent tweet pointed me to a post on polyglotism by Bill Burke of JBoss. It comes in the midst of a lot of interest in alternative languages on the JVM, such as JRuby, Groovy, Scala, and many others. Its basic point is that companies indulge in multiple languages at their peril and a Java shop should really stick to Java. His view struck me as over the top, but a little reflection made me realise that software teams should take on board his message or risk some serious long term problems.
Now, I’m not a dyed-in-the-wool Java developer who believes Java is the solution to all problems. In fact, I like a profusion of languages on the JVM. Variety is the spice of life as they say. A competitive field ensures that only the strongest survive, and without new languages we would probably still be stuck with C or even assembly language. I want to see new languages that improve my productivity as a developer, enable me to write more reliable code, or simplify the code I write. Even better are languages that give me all three.
So why am I echoing Bill’s warning? Because there is a difference between having a vibrant ecosystem of competing and complementary languages and actually using many of them in a single software project, team, or even company. Here are some of the things you should consider:
Language libraries
Almost every language comes with its own class or function library these days. That’s often the case even when a language runs on a virtual machine such as the JVM or .Net’s CLR. Look at JRuby and Scala: they both run on the JVM but have their own class libraries. So if you use more than one language in a project, you’ll have to know all the various libraries pretty well too.
Context switching
It’s not uncommon to have both Groovy and Java files in a Grails project and developers often have to work in both languages to implement features or fix bugs. That means developers have to switch contexts when they move from one type of file to the other. Even with languages as close in syntax as Groovy and Java, this can cause errors to creep in, for example when you try to iterate over a list in Java using the each() method. Imagine mixing Java and JRuby, where you have different class libraries as well as wildly different syntax.
Certainly a good developer can manage this, and if he or she switches between languages frequently, the cost in time and mistakes will dwindle. But can you guarantee that a developer will always be working in all the required languages at any given time?
Other developers
Just because you can pick up a new language and be proficient with it in a few weeks, that doesn’t mean everyone will be able to do that. If you have a team, you have to take into account the other team members. Realise that some will have trouble learning new concepts, such as functional programming or dynamic languages. Don’t forget as well that using non-mainstream languages shrinks the pool of developers you can recruit from.
It takes time
So, you got the Hello World example up and running followed by the Fibonacci sequence. Great! It only took you a few minutes! Then what? Learning the basics of a language can often be fairly simple, but to become proficient takes time and you have to use it on a regular basis. Can you really afford that time for all the team members on a critical project?
When you leave
As Bill pointed out, development teams don’t always stick together. What happens when you leave? The company is potentially left with a hole they can’t fill because there aren’t any free developers that know all the languages you put into the system.
Builds
Last but not least, not all languages play nicely together in a single build. How much effort do you really want to put into making different languages fit into your build system? If it’s not much trouble, then great, but otherwise you could end up with an ongoing maintenance headache.
I don’t mean to dismiss the idea of polyglot projects or systems – a new language may provide real benefits that outweigh the problems. Just make sure that a decision is based on a decent risk assessment or back of an envelope cost-benefit analysis rather than a “hey, that language looks cool, let’s try it out” impulse. Of course, feel free to play around and try new things when you’re working on personal projects!






Social Media