Object-Relational Mappers have their fair share of benefits. One benefit is that they can dramatically simplify data access, specifically the act of writing queries for your data. By writing your queries in the vernacular of your application, rather than in SQL, you reduce the amount of information that a developer must hold in his (or her) head at any given time.

For example, it’s basic knowledge within our domain that a user story is associated to its parent using the getParent() method and that the inverse of the relationship is getChildren(). Knowing how that relationship is physically structured within the database is irrelevant for most of our developers on a daily basis. A good ORM can and should allow the object model to speak the language of the domain by hiding the details of whether the parent/child relationship is stored as a foreign key or as some type of intersection table.

As with most powerful tools, ORMs are double edged swords that allow you enough rope to hang yourself while simultaneously shooting yourself in the foot (TRIPLE-METAPHOR-COMBO). One of the most well-known problems, which is not specific to any particular ORM or any language, is the N+1 query. There are plenty of ways to combat N+1, and the solution you choose will vary depending on your circumstances. While a small-scale N+1 will likely go unnoticed even in the tiniest of applications, a bad one will cripple even the largest app.

Another potential problem is the memory pressure that can sometimes occur when using second-level caching. There are tooling and design decisions that can eliminate (or at least mitigate) the negative impact of second-level caching, but there are trade offs.

Here’s a quick ORM caching refresher: first-level caches within an ORM are short lived, most often persisting only for the life of a single unit of work (usually a single request within a web app, but your life cycle may vary). Every ORM I know of uses first-level caching 99% of the time or more. Second-level caches are usually longer-lived, and can use a multitude of backing stores from in-memory data structures to out-of-process options like memcached. Both first and second level caches at least act as Identity Maps. Some second level caches have “advanced” features like query result caching (as opposed to identity caching) or distributed caching and cache invalidation for distributed environments.

The particularly insidious aspect of the second-level cache is that in small to medium sized applications, using a second level cache can be a good band-aid for N+1 problems. As you either knew or learned from my helpful link, N+1 problems are almost always the result of identity lookups. Those lookups are caused by either walking associations within a loop (for each story, getParent()) or by using non-lazy association mappings. Identity maps are great at identity lookups, and as long as your app has enough memory to keep the cache warm under typical load, a second level cache can be a very effective at reducing the impact of N+1.

The crux of the problem is with highly connected domain models, which most ORMs encourage, at least indirectly. They are “encouraged” because they are usually the default means of operation. When I say highly-connected, I mean that each relationship between two objects is bidirectional. Parents have collections of children, and children have pointers to their parents. In many cases, it is possible to walk from any domain object in the graph to any other domain object.

When an application reaches a certain size, something has to give. The way most ORMs handle associations and second level caching is capable of causing problems when scale starts to stretch. This is true on the JVM and the CLR. I can’t speak for MRI Ruby or other platforms, but I imagine that most of what I’m about to say holds true there as well.

In the next few paragraphs, I’m going to use the terms hard reference and soft reference as generic placeholders for platform-specific terms. The rough idea is that hard references prevent the referenced object from being garbage collected, while soft references allow garbage collection to occur. On the JVM, soft, weak and phantom references all count as what I’m calling soft. It’s been a few years since I used the CLR, but I believe the concept of a weak reference is a rough equivalent there.

Esoteric point 1: ORMs usually replace getters for associations with proxy references to the actual object or collection. Sometimes these proxies are backed with soft references, allowing the value of the field to be garbage collected. This is done using bytecode weaving or other voodoo.

This is a damned if you do, damned if you don’t scenario. Holding hard references to the associations will contribute to memory pressure because the connectedness of objects will continue to grow as more objects are loaded. Second-level caching exacerbates the problem by keeping objects alive longer than they would be otherwise. Using soft references allows the objects to be GCed but issues more queries to the database.

Esoteric point 2: both least recently used (LRU) caches and soft reference based caches have problems:
LRU: Frequently used objects will always be in memory. As additional objects are looked up, they will be pinned in memory assuming frequent access. Highly-connected models will eventually become fully loaded within memory.
Soft reference: Under load, a single object can be queried for and GCed multiple times in the same request. This leads to 2 problems. First, extra queries are issued when the object is re-loaded. Second, the system can get into a state of constant churn created by repeatedly loading and garbage collecting the same objects.

The culmination of high load, N+1 selects, highly connected domain models, and inefficient second level caching strategies can take a few forms, all bad. In the best case, your application will bog down as queries repeatedly miss the cache and are sent to the database. Hopefully your database can handle the load. In the worst case, the caching strategy is such that more and more object are faulted into memory until the garbage collector cannot keep up. The application begins to grind to a halt. Objects that are still needed but only heald with weak references are garbage collected and immediately re-queried for. Ultimately, the application becomes totally unresponsive in the midst of a storm of garbage collection and likely falls down with an OutOfMemoryException.

When using ORMs, it is important even from the beginning to be cognizant of the N+1 problem. Second level caches are not always a problem, but relying on them as a crutch to reduce the impact of N+1 selects has the potential to lead to bigger problems down the road. At Rally, we’ve used several techniques to work around problematic design decisions from the past. In some upcoming posts, I’ll talk about those techniques and discuss when to use each.