To GUID or Not to GUID...Schema Design Decision
Posted on 2006-07-02
(newbie alert) Pardon the long intro...I'm getting that a lot goes into schema decisions! (plus, writing it out helps me figure it out.)
I'm building a groupware ASP.NET app in which the biggest chunk of processing resource will be focused on a file I'll call "Parts". The Parts file consists of records viewed and added to (and much less frequently updated) by users in "group sessions, via a custom Gridview.
Parts file records will have many fields (50-60), because there will be many "Part types", with several sets of field requirements (and so there will often be unused fields...I've asked another question on EE about this and decided it was a reasonable trade off for performance and simplicity instead of having several child records for the specialized fields.)
Parts records will be logically grouped into "workproducts" that will usually consist of no more than a few hundred Parts records at a time. These workproducts (sets of Parts records) are what will be displayed on the Gridview. I'm planning to store all the Parts records (current, backup, archive, older version, all parts types, etc) in one table in SQL Server (another EE-helped decision...what a great resource!). So the Parts table could get to be large (see note below).
Parts records will be linked to each other in multiple parent/child "outline" relationships, which will be described in separate index files. Part record keys will also be foreign keys in different types of "Rating" records, one per "group session" participant, that provide ratings on the Parts.
Because the Parts records need a LOT of formatting on the Gridview, using accesses to the index and Ratings tables, I'm planning to read the tables in "work product chunks" into DataTable objects in cache memory and mostly operate on the detailed records there. (Parts records will rarely need to be related to each other between workproducts record sets).
So, an medium-large size example might be: one groupware session = 10 participants, 300 Parts records, 3000 Rating records, 1200 index records. Small per session, but, (hopefully!) thousands of sessions, plus backups and archives, so potentially a big Parts table.
Why GUID's in the first place? Some of the Parts are intended to represent "real world, unique entities" like a particular, specific person, place or thing. There may be multiple, disconnected copies of the app, even with the same large organization. I think this is where I got the idea of using a GUID for primary key came into play, and led to these thoughts (which now seem completely newbish)
> In some cases, people might want to be able to determine (or force) one part record to reference the exact same real world entity as another part record...and the GUID's could be that "connection".
> If someone wanted to transfer a workproduct (primarily a set of Parts Records) from one copy of the app to another, I had the idea that they could just drop it right in, and because I was using GUID's as primary key, they wouldn't have unique key conflicts with existing records.
> Users will be able to export workproducts (parts record sets) to XML (or other standard), and GUID's in the part number keys might help them manage those records better
(also, Workproducts will be organized by Projects, and need to be copied and transfered between Projects. I thought GUID's would help here, but I can see now that a copy is a copy, so I'll have to either add a copy number to the primary key, or generate new keys, and will have to update the parent/child index records anyway (Ratings records probably won't get transferred)....OK, so another little puzzle...later).
Now, later in the design, I'm seeing it this way:
> The use cases for "this Part record points to the actual same entity as that Part record", particularly between separate DB's, will be relatively rare (and will probably be used in special batch or "off-session" functions, only). I can still put a GUID in the Parts records to indicate "this particular real world entity"....it doesn't have to be the primary key.
> I'm reading more about the memory, storage and performance costs of GUID's versus, say, bigint IDENTITY (which will handle all conceivable (and perhaps inconceivable [o) growth). Since the parts table primary key has to be a foreign key in the parent/child index and the ratings, the extra 8 bytes per key, plus GUID generation, could make an important difference (most critically in CPU use...LOTS of formatting of the Gridview, involving accessing both the index and the ratings records).
> transferring/copying these sets of parts records within and between copies of the app appears to be it's own little design challenge, which GUID's won't help. I'll have to have some kind of smart method that's aware of copy and version numbers to create new keys for the Parts records and the foreign key relationships, even if I use GUID's.
> As for the export, if the user needs to import into another app, or combine exported files...any help I can provide there will have to wait for a later release (fingers crossed...)
So, I THINK I've talked myself out of using a GUID as a primary key for the Part record...but my problem is that I'm still stuck with my original thought (hah!) that it was a "good idea".
So I guess what I'm looking for here is why it might be a BAD idea...along with any reality check comments on the thought process (such as it is :-), the pro's and con's of GUID's in this scenario and in general, whether GUID's are indeed costly, and that sort of thing.
Any inputs that would help me get fully resolved about this would be appreciated. (I suspect a small nudge will do the job!)