Solved

To GUID or Not to GUID...Schema Design Decision

Posted on 2006-07-02
5
780 Views
Last Modified: 2008-03-03
(newbie alert) Pardon the long intro...I'm getting that a lot goes into schema decisions!  (plus, writing it out helps me figure it out.)

I'm building a groupware ASP.NET app in which the biggest chunk of processing resource will be focused on a file I'll call "Parts".  The Parts file consists of records viewed and added to (and much less frequently updated) by users in "group sessions, via a custom Gridview.  

Parts file records will have many fields (50-60), because there will be many "Part types", with several sets of field requirements (and so there will often be unused fields...I've asked another question on EE about this and decided it was a reasonable trade off for performance and simplicity instead of having several child records for the specialized fields.)

Parts records will be logically grouped into "workproducts" that will usually consist of no more than a few hundred Parts records at a time.  These workproducts (sets of Parts records) are what will be displayed on the Gridview.  I'm planning to store all the Parts records (current, backup, archive, older version, all parts types, etc) in one table in SQL Server (another EE-helped decision...what a great resource!).  So the Parts table could get to be large (see note below).  

Parts records will be linked to each other in multiple parent/child "outline" relationships, which will be described in separate index files.  Part record keys will also be foreign keys in different types of "Rating" records, one per "group session" participant, that provide ratings on the Parts.  

Because the Parts records need a LOT of formatting on the Gridview, using accesses to the index and Ratings tables, I'm planning to read the tables in "work product chunks" into DataTable objects in cache memory and mostly operate on the detailed records there.  (Parts records will rarely need to be related to each other between workproducts record sets).

So, an medium-large size example might be:  one groupware session = 10 participants, 300 Parts records, 3000 Rating records, 1200 index records.  Small per session, but, (hopefully!) thousands of sessions, plus backups and archives, so potentially a big Parts table.

Why GUID's in the first place?  Some of the Parts are intended to represent "real world, unique entities" like a particular, specific person, place or thing.  There may be multiple, disconnected copies of the app, even with the same large organization.  I think this is where I got the idea of using a GUID for primary key came into play, and led to these thoughts (which now seem completely newbish)
>  In some cases, people might want to be able to determine (or force) one part record to reference the exact same real world entity as another part record...and the GUID's could be that "connection".
>  If someone wanted to transfer a workproduct (primarily a set of Parts Records) from one copy of the app to another, I had the idea that they could just drop it right in, and because I was using GUID's as primary key, they wouldn't have unique key conflicts with existing records.  
>  Users will be able to export workproducts (parts record sets) to XML (or other standard), and GUID's in the part number keys might help them manage those records better

(also, Workproducts will be organized by Projects, and need to be copied and transfered between Projects.  I thought GUID's would help here, but I can see now that a copy is a copy, so I'll have to either add a copy number to the primary key, or generate new keys, and will have to update the parent/child index records anyway (Ratings records probably won't get transferred)....OK, so another little puzzle...later).

Now, later in the design, I'm seeing it this way:
>  The use cases for "this Part record points to the actual same entity as that Part record", particularly between separate DB's, will be relatively rare (and will probably be used in special batch or "off-session" functions, only).   I can still put a GUID in the Parts records to indicate "this particular real world entity"....it doesn't have to be the primary key.
>  I'm reading more about the memory, storage and performance costs of GUID's versus, say, bigint IDENTITY (which will handle all conceivable (and perhaps inconceivable [o) growth).  Since the parts table primary key has to be a foreign key in the parent/child index and the ratings, the extra 8 bytes per key, plus GUID generation, could make an important difference (most critically in CPU use...LOTS of formatting of the Gridview, involving accessing both the index and the ratings records).
>  transferring/copying these sets of parts records within and between copies of the app appears to be it's own little design challenge, which GUID's won't help.   I'll have to have some kind of smart method that's aware of copy and version numbers to create new keys for the Parts records and the foreign key relationships, even if I use GUID's.
>  As for the export, if the user needs to import into another app, or combine exported files...any help I can provide there will have to wait for a later release (fingers crossed...)

So, I THINK I've talked myself out of using a GUID as a primary key for the Part record...but my problem is that I'm still stuck with my original thought (hah!)  that it was a "good idea".  

So I guess what I'm looking for here is why it might be a BAD idea...along with any reality check comments on the thought process (such as it is :-), the pro's and con's of GUID's in this scenario and in general, whether GUID's are indeed costly, and that sort of thing.

Any inputs that would help me get fully resolved about this would be appreciated.  (I suspect a small nudge will do the job!)

Thanks!
0
Comment
Question by:codequest
  • 2
  • 2
5 Comments
 
LVL 12

Expert Comment

by:Einstine98
Comment Utility
Hi, just a comment.. the scenario is too long and will take me some time to read, understand and be able to give an accurate answer... why don't you provide a shorter summary of the problem and the question?
0
 
LVL 30

Accepted Solution

by:
nmcdermaid earned 500 total points
Comment Utility
AGAINST:

It takes 16 bytes to store a GUID as opposed to 4 bytes to store an INT. If you are going to have 60 fields, then the 12 bytes you save by using an INT may come in handy (maximum size of a record is something under than 8K)

An INT can store over 4,000,000,000 unique numbers. A GUID can store a ridiculous amount of unique numbers. However I've never had the issue of running out of numbers in an INT field.




FOR:

As you've mentioned, theoretically you can merge two totally independent databases and the GUID keys will remain unique. The is the main drawcard of GUID's, it saves you a lot of bother when integrating 'data islands'



HOWEVER:

What if that entity is already defined in the destination database (i.e. a cordless red mickey mouse telephone)? Now you have a duplicate entity in the destination database.

You just need to satisfy yourself that this is the kind of integration that you want... are the items truly guaranteed to be unique across systems, do you want any 'recognition' and idenification of elements or are you hape to just slot them in.








However there may be more to the integration issue than you think.... have a think about how you could resolve the issue when an item is exported to a database where it already exists.


0
 
LVL 2

Author Comment

by:codequest
Comment Utility
Thanks for the inputs.   Gets my gears turning already, which is what I need.  

Einstine98: With respect to simplifying the scenario, I wish I could!  I've been looking at this decision for six months, and it seems like all the factors have some weight, and I've never figured out which ones to throw out.

nmcdermaid:  With respect to integration and uniqueness, I appreciate the focus...that perhaps is the crux of the issue, since in fact that is the one thing GUID's are good for (other than simply being very LARGE terms of scope).

I think that the "key" here is that maintaining the uniqueness of an entity in the DB is not "free"...if the user wants to "reuse" the part record that identifies a unique entity, they've got to know how to find it, which slows down their entry process, requires search methods, requires a different data management philosophy (got to treat the parts records as long term items instead of per-session transients) and tilts the app from being about communication and learning to being about knowledge management.   There are alternatives, including selecting the parts you want to work with in the session in advance, or correlating parts records after the session.

I think that nets it out, and clearly indicates "no need for GUID" in primary key.  I've just got to get used to idea.  And I'm pretty sure my development deadlines will clear that up pretty quickly!

Grazie!













0
 
LVL 30

Expert Comment

by:nmcdermaid
Comment Utility
I guess the only other thing I'd like to add is that there is rarely a 'perfect' design, sometimes you just have to 'suck it and see', and wear any consequences, and treat it as a learning experience.

It does sound like you're giving it a lot of thought which can only be a good thing.

Good luck with it anyway.
0
 
LVL 2

Author Comment

by:codequest
Comment Utility
Yeah, I'm getting that real DB design is a far cry from what can be imagined..."Computer, plot coordiates to nearest Star Base."    Still more like an erector set than lego's, too.   More nuts and bolts than you'd expect, not to mention sometimes having to bend the parts a little!
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Having an SQL database can be a big investment for a small company. Hardware, setup and of course, the price of software all add up to a big bill that some companies may not be able to absorb.  Luckily, there is a free version SQL Express, but does …
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now