Tuesday, 12 May 2009

What If SQL Had Abstract Tables And Composite Types?

There are well-documented mismatches between the object-oriented paradigm and the relational database model. Despite the early adopters telling us we should all be using F# running against CouchDB, I’m going to go out on a limb and say that object-oriented languages and relational databases aren’t going away any time soon.

The more I see and learn about ORMs, the more I get this feeling like there’s a real “code vs data” mindset going on. To an object-oriented application developer, the database is a necessarily evil, the DBA is a pain-in-the-ass Luddite who won’t let you write code the way you want to, tables are an inadequate and inferior persistence mechanism for your beautiful object hierarchies, and – well, if someone could wave a magic wand and just make the whole database thing go away, you’d all be very happy. Even in the most wonderfully agile, test-driven greenfield project, there’s still this perception that the more we can abstract the project away from the database, the better off we’ll be. It’s like the Utopian ideal of ORMs is not to embrace the power of the database – they’d rather make the DB completely irrelevant.

I sit on both sides of this particular fence. As a developer, I find persistence as frustrating and time-consuming as everyone else – but as a business stakeholder, I want my data stored in normal, sensibly-named tables that I can get at using SQL-92 queries. Not because that’s what I’m doing now, but because I really have no idea what I’ll be doing with that data in 5-10 years, and I believe the relational model has stood the test of time. 

I’ve spent today at  Ayende’s NHibernate workshops at the Skillsmatter Progressive .NET event, and he’s shown us all sorts of NHibernate magic, including various strategies for modelling inheritance and class hierarchies with polymorphic associations, sparse tables, and the like. There’s been some discussion of stuff like composite keys, searching, indexes, unique constraints – and it’s pretty clear that when it comes to object-relational mapping, it’s the object folks that are doing all the work, and the relational side of things is really not doing anything much to make their lives easier.

There’s two things that stand out as being particular painful when it comes to object-relational mapping – composite keys and inheritance. Inheritance because it has no analogue in the relational model so you have to somehow “fake” it – which makes no difference to the code, but it can leave the DB in a bit of a mess. Composite keys because they have built-in limitations regarding nested queries (WHERE foo IN (SELECT…)) – and complex equality and value semantics, which means most ORMs strongly advise against using composite keys if you can help it.

(From this point forward I’m making stuff up as a sort of thought experiment. Don’t take any of it too literally :))

What if we added ABSTRACT and EXTENDS keywords to SQL? What if you could do this in your database?

CREATE ABSTRACT TABLE Customer (
  ID int identity(1,1) primary key
)

CREATE TABLE Company EXTENDS Customer (
  CompanyName varchar(256)
)

CREATE TABLE Person EXTENDS Customer (
   Forenames varchar(256),
   Surname varchar(256)
)

Here’s the rules:

  • Tables must be abstract or concrete.
  • You can’t insert, update or delete records from an abstract table
  • Both abstract and concrete tables can participate in foreign key relationships
  • One abstract record must have exactly one concrete record (i.e. you can’t insert a Company whose ID already represents a Person)
  • Concrete tables behave like the union of their own columns and their abstract base table’s columns

SELECT * FROM Customer

Id

1
2
3

SELECT * FROM Company

Id CompanyName
1 Monkey Butlers Ltd

SELECT * FROM Person

Id Forenames Surname
2 Eddie Van Halen
3 Jack Sparrow

 

INSERT INTO Person(Id, Forenames, Surname) VALUES(4, “Ayende”, “Rahien”)

-- will work as expected

INSERT INTO Company(Id, Name) VALUES(4, “Rhino Ltd”)

-- will fail with “Distributed key violation” or some such thing, because ID4 is already taken by a Person and so can’t be used for a Company.

You can, e.g. associate Customer with Address (so every address is ‘owned’ by exactly one customer, regardless of whether the customer in question is a person or a company) – but you can also associate Employee with Company directly, so that your data schema enforces the business requirement that a Person cannot have Employees. As a database concept, it doesn’t really add anything – but with a suitably turbocharged ORM, think of what you could do. A simple Customer.ListAll() could return an array of Customer objects – each of which is ACTUALLY a strongly-typed Person or Company. You could do things like Customer.ListAll(typeof(Company)); you could switch on the type of the objects – and all without the compromise of sparse tables or child table inheritance.

Ok, what about composite keys? Imagine we’re booking flights for an airline. There is a real-world business constraint that the same person cannot be on the same flight twice, and composite keys are tailor-made for modelling this sort of “unique combination” scenario:

CREATE TABLE SeatReservation (
   Key COMPOSITE (
      PassengerId int,
      FlightNumber varchar(5)
   ) primary key,
   SeatRow char,
   SeatNumber int
)

CREATE TABLE SpecialMeal (

  -- Notice that the “type” of this column is a SeatReservation.Key – i.e. a reference
  -- to the composite type defined as Key in the SeatReservation table.
   SeatReservation.Key SeatReservation primary key,
   IsVegetarian bit,
   IsVegan bit,
   IsHalal bit
)

SELECT * FROM SeatReservation WHERE Key.PassengerId = 12

SELECT * FROM SeatReservation WHERE Key IN  (SELECT SeatReservationKey FROM SpecialMeal)

Think of this like adding a struct type to SQL, where COMPOSITE defines a collection of columns just as in C# a struct defines a collection of fields. Composites carry field-value semantics for comparison and equality; if all the columns are equal, the struct is equal. Composite literals are defined inline like:

select * from SpecialMeal WHERE SeatReservation.Key = (178189, ‘VS207’)

- using similar syntax to SQL’s familiar INSERT INTO (Column1, Column2) VALUES(Value1, Value2)

This would allow ORMs such as NHibernate to explicitly map a composite key as a struct, benefiting from intrinsic value-object semantics. The resulting database queries could use composite keys in exactly the same way as primary keys – because the database explicitly allows a composite ‘type’ to be used wherever a primitive type is currently supported – and, most importantly, the business meaning of a composite key is explicit in both the database schema and the object model.

4 comments:

NickG said...

So basically you've spent a whole page saying you want VIEWS. Perhaps I've missed your point, but there's very little you've mentioned which you couldn't implement using updatable views :)

Dylan said...

What you've missed is the ability to define a distributed key - a customer HAS to be a Company OR a Person, but cannot be both, and cannot be neither.

I don't know how you'd enforce that with views - sure, you can join two tables to create the view, but there's no way of enforcing the circular dependency between the abstract base table and the concrete table.

Yeah, you can already do object-relational mapping of inheritance - none of this is necessary, I just thought it was interesting...

jamesw said...

You almost get the table inheritance you want with PostgreSQL:

# CREATE TABLE Customer (ID SERIAL, PRIMARY KEY (ID));
CREATE TABLE Company (CompanyName VARCHAR) INHERITS (Customer);
# CREATE TABLE Person (Forenames VARCHAR, Surname VARCHAR) INHERITS (Customer);
# INSERT INTO Company (CompanyName) VALUES ('Monkey Butlers Ltd');
# INSERT INTO Person (Forenames, Surname) VALUES ('Eddie', 'Van Halen'), ('Jack', 'Sparrow');
# SELECT * FROM Person;
id | forenames | surname
----+-----------+-----------
2 | Eddie | Van Halen
3 | Jack | Sparrow
# SELECT * FROM Company;
id | companyname
----+--------------------
1 | Monkey Butlers Ltd
# SELECT * FROM Customer;
id
----
4
1
2
3
#

Where it falls down is inserting duplicate IDs, UNIQUE and PRIMARY KEY constraints aren't inherited (and can't be) so if you needed that you'd have to use a trigger. You also need to re-create the PRIMARY|UNIQUE constraints on your inherited tables. You can, however, once that is done, rely on unique (TABLEOID || ID)s, viz:

# INSERT INTO Company(Id, CompanyName) VALUES (4, 'Rhino Ltd');
INSERT 0 1
# INSERT INTO Person(id, forenames, surname) VALUES (4, 'Ayende', 'Rahien');
INSERT 0 1
# SELECT TABLEOID, ID FROM Customer;
tableoid | id
----------+----
33572 | 1
33572 | 4
33579 | 2
33579 | 3
33579 | 4
#

jamesw said...

As for composite keys, Pg has composite types support:

# CREATE TYPE SeatReservationKey AS (PassengerID INTEGER, FlightNumber VARCHAR(5));
# CREATE TABLE SeatReservation (SeatReservationID SeatReservationKey PRIMARY KEY, SeatRow char, SeatNumber INT);
# CREATE TABLE SpecialMeal (SeatReservationID SeatReservationKey PRIMARY KEY REFERENCES SeatReservation(SeatReservationID), IsVegetarian BIT, IsVegan BIT, IsHalal BIT);
# INSERT INTO SeatReservation (SeatReservationID, SeatRow, SeatNumber) VALUES ((12, 'N1234'), 'C', 15), ((12, 'N1 245'), 'D', 18);
# INSERT INTO SeatReservation (SeatReservationID, SeatRow, SeatNumber) VALUES ((12, 'N1234'), 'C', 16);
ERROR: duplicate key value violates unique constraint "seatreservation_pkey"
# SELECT * FROM SeatReservation;
seatreservationid | seatrow | seatnumber
-------------------+---------+------------
(12,N1234) | C | 15
(12,N1245) | D | 18
(2 rows)
# INSERT INTO SpecialMeal (SeatReservationID, IsVegan) VALUES ((13, 'N1234'), 1::bit);
DETAIL: Key (seatreservationid)=((13,N1234)) is not present in table "seatreservation".
# INSERT INTO SpecialMeal (SeatReservationID, IsVegan) VALUES ((12, 'N1234'), 1::bit);
INSERT 0 1
# SELECT * FROM SeatReservation WHERE (SeatReservationID).PassengerID = 12;
seatreservationid | seatrow | seatnumber
-------------------+---------+------------
(12,N1234) | C | 15
(12,N1245) | D | 18
# SELECT * FROM SeatReservation WHERE SeatReservationID IN (SELECT SeatReservationID FROM SpecialMeal);
seatreservationid | seatrow | seatnumber
-------------------+---------+------------
(12,N1234) | C | 15
#