Link to home
Start Free TrialLog in
Avatar of Paula DiTallo
Paula DiTalloFlag for United States of America

asked on

staging model: How do I retain integrity and assure uniqueness?

Techies--
 
I need to model a staging database that is used to process incoming xml files from one vendor and api data from another.  The incoming records per load can contain all unique header/detail_line/sub_detail_line data, the same record again with updates to elements, or the same record with an added/removed detail_line/sub_detail_line. I do not, or will not know which.

How do model this? Do I use something like an added_row_timedate column to capture the uniqueness of an instance of a record/data object along with the obvious keys?  Do I create concatenated primary keys that include the added_row_timedate column? Do I not do that and generate synthetic/sequential identifier keys? Do I bother with keys in the staging environment at all, or do I enforce integrity through unique constraints?

The code snippet has a sample XML stream and the obvious portion of the model. Please advise me.


--<AllData>
--<DataObj>
--  <HeaderObj>
--     <hdr_id>1000</hdr_id>
--     <hdr_elem_2>my header element</hdr_elem_2>
--     <DetailObj>
--        <dtl_id>10</dtl_id>
--        <dtl_elem_2>my detail line element</dtl_elem_2>
--        <SubDetailObj>
--          <sub_id>1</sub_id>
--          <sub_elem_2> my subdetail line element</sub_elem_2>
--        </SubDetailObj>
--     </DetailObj>
--  </HeaderObj>
--</DataObj>
--<DataObj>
--  <HeaderObj>
--     <hdr_id>1010</hdr_id>
--     <hdr_elem_2>my other header element</hdr_elem_2>
--     <DetailObj>
--        <dtl_id>10</dtl_id>
--        <dtl_elem_2>my other detail line element</dtl_elem_2>
--        <SubDetailObj>
--          <sub_id>1</sub_id>
--          <sub_elem_2> my other subdetail line element</sub_elem_2>
--        </SubDetailObj>
--     </DetailObj>
--  </HeaderObj>
--</DataObj>
--<DataObj>
--  <HeaderObj>
--     <hdr_id>1000</hdr_id>
--     <hdr_elem_2>my header element</hdr_elem_2>
--     <DetailObj>
--        <dtl_id>10</dtl_id>
--        <dtl_elem_2>my detail line element</dtl_elem_2>
--        <SubDetailObj>
--          <sub_id>1</sub_id>
--          <sub_elem_2> my subdetail line element</sub_elem_2>
--        </SubDetailObj>
--     </DetailObj>
--        <dtl_id>20</dtl_id>
--        <dtl_elem_2>my new detail line element</dtl_elem_2>
--     </DetailObj>
--         <SubDetailObj>
--         </SubDetailObj>
--  </HeaderObj>
--</DataObj>
--</AllData>

create table [academic].[load_xaction_data]
( LoadId int not null,          -- sequential identifier/synthetic key
  ImportSourceId int not null,  -- coming from another table that tracks whether source is api or xml files
  ImportDate datetime not null,
  LoadBeginTime datetime not null,
  LoadEndTime datetime not null);
  
create table [academic].header_obj
 ( hdr_id int not null,                  -- retained exisiting data object key
   hdr_element_2 varchar(50) not null ); -- other header element
   
create table [academic].detail_obj
( hdr_id int not null,					-- retained existing data object key from header
  dtl_id int not null,					-- retained existing data object key from detail
  dtl_elem_2 varchar(50) not null);		-- other detail header element
  
create table [academic].sub_obj
( hdr_id int not null,				    -- same pattern
  dtl_id int not null,
  sub_id int not null,
  sub_elem_2 varchar(50) not null);

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Christopher Kile
Christopher Kile
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Paula DiTallo

ASKER

cpkilekofp,
Thanks for responding. I appreciate that very much. I've engaged the Business Analyst (BA) on this project do basically reveal what the workflow expectation can be regarding what the order (aka: priority) of the dataobjects should be once stored. Answer to your first question, yes--the data elements for each data object will need to be stored whether from the consumed data of the web service in the case of one vendor, or from the xml files in the case of the other vendor.

As a first draft, I've decided to assign a synthetic key to incoming dataobjects, while retaining the associated incoming dataobject key--since this is unique, but repeatable. I carry this synthetic key with the vendor key so that I can uniquely identify each transaction as it arrives in any order, with any priority.

I'd like to keep this thread open so that as more info filters in from the BA, futher discussion on the model is possible. I will leave closure up to you.
Don't worry about me.  When you have more info, let me know.  If no one posts to this question in 20 days or so, the moderators will ask us if it's ok to delete it.