Nested schema-less-ness

Back to Full Throttle here on the EIP web-ring.

I was browsing Proggit and DZone and came across a post on igvita.com by Ilya Grigorik that compares schema-free SQL to the NoSQL movement. I immediately knew that this post was a post for me. I have demonstrated in the past my propensity for learning things the hard way, like the closure-object equivalency and AOP. I know I have done this with a couple of other topics here, and I am in the process of digging those up.

I am trying to refer to a previous post of mine and I can’t remember it. Is this a sign of longevitity and success, or senility?

Anywho, the premise is simple: can you simulate NoSQL (and I really mean schema-less here) with a SQL database?

Ilya goes on to demonstrate that as long as your data is not nested, you can, and rather easily too:

id field name value
4358 type person
4358 name Jim Bob
4358 weight 150
4358 job bus driver

He goes on to discuss a Ruby API he prototyped to interact with such a database and even create tables on the fly for new attributes. We actually use something similar to this at work on occasion — instead of columns, we use rows in a special table for certain pieces of data, each representing a field, for the very reason that NoSQL advocates cite: getting a new column approved takes too long and is too much work. Now I know I have mentioned this in the past as well, but that is yet another link I can’t find!

(Side note: we also use XML for the same schema-less flexibility, but XML cannot be easily used in other SQL queries, making the fields-as-rows approach preferrable in certain situations)

Aside from the optimization of type-specific value columns, instead of all of the values being of type varchar, I want to go in a different direction.

Why can’t you handle nested data structures here? Why won’t this work:

id field name value
4358 type person
4358 name Jim Bob
4358 weight 150
4358 job bus driver
4358 brother 2195967
2195967 type person
2195967 name Sidney Jones
2195967 weight 172
2195967 job unemployed

This is essentially a table that stores objects. Each object has an ID and can be refered to by other objects. This mimics the actual layout of the object in memory in a lot of languages.

I think storing an object in this way is relatively straightforward. The only twist is that the code that tries to insert into the table will need to start with the most deeply-nested part of that object first. In this example, store the person’s “brother” attribute before storing the rest of the attributes.

Querying, though, is trickier. If I wanted to retrieve all of the data for the person with ID 4358, I can’t just say “SELECT * FROM Objects WHERE id=4358” because you’ll miss out on the attributes for the “brother” attribute of ID 4358. Now of course you could then say “SELECT * FROM Objects WHERE id=2195967“, but that won’t get you very far during performance testing.

I imagine you’d need a stored procedure for this, and perhaps another column to indicate if a field is actually a reference.

id field name value is reference?
4358 type person false
4358 name Jim Bob false
4358 weight 150 false
4358 job bus driver false
4358 brother 2195967 true
2195967 type person false
2195967 name Sidney Jones false
2195967 weight 172 false
2195967 job unemployed false

The stored procedure could then take care of checking for and including referenced data in the result sets as well. Note that in the world where instead of a single value column, we have multiple value columns for several different types for efficiency and optimization purposes, the reference column would simply be another number-based column.

Unlike Ilya’s solution, which appears to treat each attribute as a seperate table and requires table creation upon introduction of a new attribute, here all you’d have to do is add a row and you have a new attribute. This certainly scales from a developer’s point of view. Whether this will scale well from a performance point of view is a case of the “It Depends”TM.

Announcer: You’re reading the EIP web-ring.

Leave a Reply