Intelligent Repositories
Background
(find out what I mean by “Intelligent Repository”)
Back during that seemingly formative experience that I call The Project, I did it pretty much the wrong way from the beginning. Instead of starting with the domain model, I started with the design of the database. That was the culture of The Company. I know better now, of course.
This focus on persistence-related concerns drew me to my first application of domain-driven design: repositories.
Having not been exposed to Jimmy Nilsson’s book, I was bound to the gospel of Evans. The examples of repositories there, as far as actual code, are scarce. I was left to mostly interpret the text and associated diagrams.
My initial stab at repositories usually ended up like this:
public class EquipmentRepository
{
protected static DBConnection db;
private Equipment[] retrieveEquipment(String sproc,
long ID, String serialNumber, String description)
{
String sprocCall = "call " + sproc + "(";
sprocCall += "'" + ID + "',";
sprocCall += "'" + serialNumber + "',";
sprocCall += "'" + description + "')";
ResultSet rcds = db.exec(sprocCall);
EquipmentFactory factory = new EquipmentFactory();
while(rcds.hasNext())
{
long ID = (long)rcds.field(0);
String serialNumber = (String)rcds.field(1);
String description = (String)rcds.field(2);
DateTime lastMod = (DateTime)rcds.field(3);
String userID = (String)rcds.field(4);
...
resultList.add(factory.createFrom(ID,
serialNumber, description, lastMod, userID, ...));
}
return resultList.toArray();
}
public long storeNewEquipment(Equipment eq)
{
...
}
public long storeEquipmentInfo(Equipment eq)
{
...
}
public Equipment[] retrieveAllEquipment(long ID,
String serialNumber, String description)
{
return retrieveEquipment("GET_ALL_EQUIP",
ID, serialNumber, description);
}
// JIT = Just In Time = only most recent edit
public Equiment[] retrieveJITEquipment(long ID,
String serialNumber, String description)
{
return retrieveEquipment("GET_JIT_EQUIP",
ID, serialNumber, description);
}
public DataSet loadTypeDetails(...)
{
...
}
...
some other embarassing functions
that returned DataSets that should
actually be <a href="http://moffdub.wordpress.com/2008/05/25/domain-driven-reports-part-1/">domain-driven reports</a>
...
}
|
Some of that code is very close to what I actually did. I, of course, eschewed the error checking, exception throwing, and exception handling that goes into real-world code.
That is not what I want to point out. The interface of the retrieval method, specifically, is what I find interesting…and unfortunate.
Readability: First, if a user of this repository wants to only specify description as a parameter, this is what the call looks like:
(new EquipmentRepository()).retrieveJITEquipment(-1, null, "foo"); |
Can you tell that this call will match all Equipment that contain “foo” somewhere in the description? Yes you can, because the author, me, told you. This is not what I’d call readable code.
Repository code ugliness: Second, the code in the repository itself is kind of…ugly. The factory should be doing more than just object construction. I should be passing the ResultSet itself. Aside from that, I have to take all of the parameters in retrieveEquipment, and one by one, whether they are specified by the caller or not, construct the stored procedure call.
It’s not a problem, just a lot of code. And I don’t much like code.
Silent coupling: Finally, there is a silent, subtle coupling occurring between caller and stored procedure. The parameters essentially mirror the arguments to both GET_JIT_EQUIP and GET_ALL_EQUIP. This seems natural to do, since, after all, those are the parameters those procedures take.
I don’t like this because the whole point of repositories is to achieve de-coupling. Greg Young, the posts of whom I find insightful on the domain-driven design Yahoo! group, described repositories as the foundation of DDD.
The real goal of a repository is for the client to describe what it wants, and get it, if it exists. I think Martin Fowler’s Query Object pattern is a significant step towards this goal.
A Query Object is an interpreter [Gang of Four], that is, a structure of objects that can form itself into a SQL query. You can create this query by refer-ring to classes and fields rather than tables and columns. In this way those who write the queries can do so independently of the database schema and changes to the schema can be localized in a single place.
Indeed, this is very nice. For quite a while, I was skeptical regarding whether implementing a mapping between domain model names and column names, plus a mini-language of criteria operators, would be worth the effort.
Then, during my spurts of downtime at work, I took a stab at it. I finished a small Java package in a couple hundred lines of code that could form “and”, “or”, and “not” of [property][op][value] criteria, where [op] could be “=” or “LIKE”.
Code has a way of clearing things up. Here is a taste:
public interface Criteria
{
public Criteria and(Criteria operand);
public Criteria or(Criteria operand);
public Criteria not();
}
public abstract class EquipmentCriteria implements Criteria
{
// maps domain model name to column name
// e.g. "Equipment Description" --> "Equipment.Desc"
private static Map columnMap;
private String attributeName;
private Object value;
public EquipmentCriteria(String attr, Object val)
{
...
}
}
public class EqualsCriteria extends EquipmentCriteria
{
public EqualsCriteria(String attr, Object val)
{
super(attr, val);
}
public String toString()
{
return "(" + (String)super.columnMap.get(this.attributeName) +
" = " + this.value.toString() + ")";
}
public Criteria and(Criteria operand)
{
return new AndCriteria(this, operand);
}
public Criteria or(Criteria operand)
{
return new OrCriteria(this, operand);
}
public Criteria not()
{
return new NotCriteria(this);
}
}
public class AndCriteria implements Criteria
{
private Criteria lhs;
private Criteria rhs;
public AndCriteria(Criteria left, Criteria right)
{
this.lhs = left;
this.rhs = right;
}
public String toString()
{
return "((" + this.lhs.toString() + ")" +
" AND " + "(" + this.rhs.toString() + "))";
}
public Criteria and(Criteria operand)
{
return new AndCriteria(this, operand);
}
public Criteria or(Criteria operand)
{
return new OrCriteria(this, operand);
}
public Criteria not()
{
return new NotCriteria(this);
}
}
|
I’m actually leaving out a class in this inheritance hierarchy that eliminates some code duplication. I contend, though, that the reader gets the point.
The signature of the retrieval methods of a repository now look like this:
public class EquipmentRepository
{
public Equipment[] retrieveBy(Criteria condition)
{
... execute "SELECT *
FROM Rooms
WHERE condition.toString()" ...
... call factory to instantiate ...
... return array ....
}
}
|
And a caller is now very pleasing to the coder’s eye:
Criteria condition =
new EqualsCriteria("Equipment ID", 17).or(new LikeCriteria("Equipment Description", "foo");
equipRepo.retrieveBy(condition);
|
Ahhh. That almost reads like a sentence. It addresses all three of the concerns:
- readability: self-explanatory
- repository code ugliness: I didn’t write it, but code in the repository is a lot cleaner and less spaghettish
- silent coupling: queries are as flexible as the caller wants them
But wait. This version isn’t using stored procedures. That is how we are able to defeat silent coupling.
Stored procedures, which are procedural by their own admission, don’t mesh too well with Query Objects. You could, if you really wanted to, use them. I won’t write the code here, but essentially, you’d code up a stored procedure “signature” in terms of Criteria objects.
If sproc GET_EQUIP takes an ID, serial number, and description, and it matches ID by exact match and the rest by wildcard, you’d code up a signature that is essentially an array of EqualsCriteria for ID, LikeCriteria for serial number, LikeCriteria for description, in that order.
Then, a caller of a repository method can pass in an array or collection of criteria (order irrelevant). The criteria in the collection must match at least one in the signature of the sproc the repository method is calling. If someone, for example, passes in an EqualsCriteria for serial number, the method would cough on it and refuse the request.
This method addresses the readability problem, but only slightly loosens the silent coupling, and replaces one piece of ugly code with another that is only slightly less ugly.
I don’t think it is entirely unreasonable to accept this as a fact of life. After all, they are stored procedures. They are procedural. Any code that deals with them quite possibly will be procedural as well.
Yeah right. I’m too neurotic for that.
The worst part of this Query Object / Stored Procedure hybrid is the silent coupling. Chances are, there will be some other stored procedure that could be used to, say, match ID by wildcard instead of exact match. Theoretically, a less restrictive kind of sproc call can be used and at least filtered in memory. I know, that example sounds stupid, but it illustrates the point.
Another example is needing to query by values in a date range. If the only sproc available gives you the ability to query for objects before a given date, you can formulate a query that needs results between X and Y as everything before Y, and out of that list, everything before X.
What I’m talking about is an Intelligent Repository. A caller declaratively specifies criteria it wants. It lets the repository dispatch to the method(s) that will get the job done.
In the date range example:
- the client tells the repository “I want all Equipment objects edited between X and Y”
- the repository searches the sproc “signatures” it has for one that will accept two dates in a BetweenCriteria
- if found, the repository calls the sproc and returns the objects
- otherwise, for each “less restrictive” sproc, find a combination that, when executed, fulfills the “between X and Y” condition
- if found, execute the sproc(s) in the right order, filter the results, if necessary, and return the objects
Wouldn’t that be cool? The encapsulation, security, and assorted benefits of stored procedures with the coolness and de-coupling of Query Objects.
The Tar Pit
Unfortunately, Intelligent Repositories are likely to be quite complex.

(according to Brooks, we’re all beasts struggling in a tar pit)
First, you need to establish a “restrictivity” hierarchy. Exact matches are more restrictive than wildcards, which are more restrictive than any match. “Betweens” is more restrictive than greater than or less than, and so on.
Manage that, and now you have a dynamic programming problem on your hands: given a set of stored procedure signatures, find the optimal solution that involves the smallest number of jumps to the database and largest amount of restrictive calls executed as well.
On top of that, we aren’t even considering “and”, “or”, and “not”. If you want to give callers of Intelligent Repositories this flexibility, this information has to be added to your stored procedure “signatures” and considered by the selection algorithm.
Add to that the passing mention I made to having Intelligent Repositories dynamically filter results in memory. I don’t even think this is possible without sophisticated configuration and code generation. Without this, the full automation of the date range example is probably not possible.
While it is a nice ideal, Intelligent Repositories are a sub-domain onto themselves. It is possible that only the largest organizations will find a palatable return on such an investment. Even so, the largest organizations aren’t guaranteed to be DDD shops, or even nominally OO — I know my current employer writes largely procedural Java, and I wasn’t that much better on The Project.
The lesson that Intelligent Repositories demonstrate is the maddening reality that no solution in this industry is ever completely adequate. There is always a trade-off, and no silver bullet, even for technical design decisions.