In a previous post, I discussed some of the pitfalls of scaling ORMs. One option we used for fighting those problems was to apply the SQL concept of a projection within our ORM. I’m not claiming we invented the concept, but our ORM offered no such functionality. For those who’ve lapsed on their database 101 terminology, a projection can be roughly described as a type of select statement that returns a subset of the columns from a table.

The key requirements were that the functionality should be dead-simple for developers to use, and that it should help to mitigate some of the problems we were seeing trying to scale our ORM. Hibernate offers some approximation of the functionality we implemented, but EclipseLink (our ORM) does not.

There are some places in our system where we need to query for a large number of domain objects when we only need a known subset of columns. Those places are dropdown lists. For example, when we are showing a dropdown of releases, we need the name of the release and the ID. The rest we can drop on the floor.

The implementation for our release projection basically looks like this (java, as usual):

@Projection(over = Release.class, columns = {“name”, “id”})
public interface ReleaseProjection {
Long getOid();
String getName();
}

That’s it. Magic in our data access layer allows us to query for it just like we would for releases. It eliminates N+1 selects and reduces the impact of these queries on the second level cache. It also reduces the amount of data we send over the wire and lessens the load on the database. Let’s break the magic into understandable steps.

First, there is no implementation of this interface. That’s handled by the @Projection annotation and the wonder of dynamic proxies on the JVM. Our data access layers knows we want a projection, but it actually tells EclipseLink to issue a query for the underlying class. Once it has the collection of actual objects for which we queried (in this case Releases), it creates proxy implementations of our interface that delegates to the actual objects.

I know what you’re thinking? Why proxy? That’s the second optimization – we’re not fetching whole objects. We use the columns property of the @Projection annotation to set a fetch-group on our query, which limits the columns returned by the database query. In this case, we are only querying for the name and the id of the releases. By narrowing the set of columns returned from the database, we reduce the database and network I/O required to run our queries. Ideally, the list of columns we fetch would be removed from the @Projection annotation and would be derived from the metadata of the model. It’s a change we’d probably make if we were going to open-source this, but for the relatively small number of developers who use this, it’s not worth the effort.

By not fetching entire objects, we eliminate several other possible N+1 situations. When EclipseLink loads an object, it will immediately load any non-lazy associations. Depending on the number of these associations, N+1 could actually be 2N+1 or worse. In our system, the association between a user and the user’s profile used to be non-lazy. We’ve since fixed that, but there was a time when every query for a list of users was an N+1.

The proxy also allows us to work around an EclipseLink oddity related to querying for partial objects. If you set the fetch-group and did a query for releases in our scenario, asking for any other property of the release (start date for example) will cause EclipseLink to issue another query to fetch the rest of the object. You could easily end up with an N+1 again, or worse. Because the partial object is hidden behind an interface that only exposes 2 methods, there is no chance of accidentally calling a method that will issue another query.

The actual proxy code is pretty simple:

public class DelegatingProxy implements InvocationHandler {
private Object target;

private DelegatingProxy(Object target) {
this.target = target;
}

@Override
public Object invoke(Object proxy, Method method, Object[] args) {
ArrayUtils.inPlaceTransform(args, unProxy());
return InvocationUtils.invokeMethodWithReturn(target, method.getName(), args);
}
}

The unProxy method was omitted for brevity. Using It to modify the argument list makes sure we’re not operating on other proxy instances, which is important when dealing with operations like equality. When we first started using this infrastructure, we were using one of the helpers from either Spring or Apache Commons to perform the method invocation. Once we performance tested it, we learned that dynamically looking up methods on the JVM seems to be more costly than actually invoking them. InvocationUtils is a rally-defined class andinvokeMethodWithReturn does some smart caching of method lookups.

Finally, this approach allows us to reduce the number of objects that are stored in the second-level cache. Because we’re not issuing N+1s and the amount of data we’re passing across the wire is minimal (thanks to fetch-groups), we can afford to issue the query to the database each time. This method, which Adam and I hid in our persistence layer as an Easter-egg for the other developers, has my favorite name in our codebase:

private static void youBettaDont(ReadAllQuery readAllQuery) {
}