Avatar of Paula DiTallo
Paula DiTallo
Flag for United States of America asked on

staging model: How do I retain integrity and assure uniqueness?

Techies--
 
I need to model a staging database that is used to process incoming xml files from one vendor and api data from another.  The incoming records per load can contain all unique header/detail_line/sub_detail_line data, the same record again with updates to elements, or the same record with an added/removed detail_line/sub_detail_line. I do not, or will not know which.

How do model this? Do I use something like an added_row_timedate column to capture the uniqueness of an instance of a record/data object along with the obvious keys?  Do I create concatenated primary keys that include the added_row_timedate column? Do I not do that and generate synthetic/sequential identifier keys? Do I bother with keys in the staging environment at all, or do I enforce integrity through unique constraints?

The code snippet has a sample XML stream and the obvious portion of the model. Please advise me.


--<AllData>
--<DataObj>
--  <HeaderObj>
--     <hdr_id>1000</hdr_id>
--     <hdr_elem_2>my header element</hdr_elem_2>
--     <DetailObj>
--        <dtl_id>10</dtl_id>
--        <dtl_elem_2>my detail line element</dtl_elem_2>
--        <SubDetailObj>
--          <sub_id>1</sub_id>
--          <sub_elem_2> my subdetail line element</sub_elem_2>
--        </SubDetailObj>
--     </DetailObj>
--  </HeaderObj>
--</DataObj>
--<DataObj>
--  <HeaderObj>
--     <hdr_id>1010</hdr_id>
--     <hdr_elem_2>my other header element</hdr_elem_2>
--     <DetailObj>
--        <dtl_id>10</dtl_id>
--        <dtl_elem_2>my other detail line element</dtl_elem_2>
--        <SubDetailObj>
--          <sub_id>1</sub_id>
--          <sub_elem_2> my other subdetail line element</sub_elem_2>
--        </SubDetailObj>
--     </DetailObj>
--  </HeaderObj>
--</DataObj>
--<DataObj>
--  <HeaderObj>
--     <hdr_id>1000</hdr_id>
--     <hdr_elem_2>my header element</hdr_elem_2>
--     <DetailObj>
--        <dtl_id>10</dtl_id>
--        <dtl_elem_2>my detail line element</dtl_elem_2>
--        <SubDetailObj>
--          <sub_id>1</sub_id>
--          <sub_elem_2> my subdetail line element</sub_elem_2>
--        </SubDetailObj>
--     </DetailObj>
--        <dtl_id>20</dtl_id>
--        <dtl_elem_2>my new detail line element</dtl_elem_2>
--     </DetailObj>
--         <SubDetailObj>
--         </SubDetailObj>
--  </HeaderObj>
--</DataObj>
--</AllData>

create table [academic].[load_xaction_data]
( LoadId int not null,          -- sequential identifier/synthetic key
  ImportSourceId int not null,  -- coming from another table that tracks whether source is api or xml files
  ImportDate datetime not null,
  LoadBeginTime datetime not null,
  LoadEndTime datetime not null);
  
create table [academic].header_obj
 ( hdr_id int not null,                  -- retained exisiting data object key
   hdr_element_2 varchar(50) not null ); -- other header element
   
create table [academic].detail_obj
( hdr_id int not null,					-- retained existing data object key from header
  dtl_id int not null,					-- retained existing data object key from detail
  dtl_elem_2 varchar(50) not null);		-- other detail header element
  
create table [academic].sub_obj
( hdr_id int not null,				    -- same pattern
  dtl_id int not null,
  sub_id int not null,
  sub_elem_2 varchar(50) not null);

Open in new window

C#Microsoft SQL Server 2008

Avatar of undefined
Last Comment
Christopher Kile

8/22/2022 - Mon
ASKER CERTIFIED SOLUTION
Christopher Kile

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
Paula DiTallo

ASKER
cpkilekofp,
Thanks for responding. I appreciate that very much. I've engaged the Business Analyst (BA) on this project do basically reveal what the workflow expectation can be regarding what the order (aka: priority) of the dataobjects should be once stored. Answer to your first question, yes--the data elements for each data object will need to be stored whether from the consumed data of the web service in the case of one vendor, or from the xml files in the case of the other vendor.

As a first draft, I've decided to assign a synthetic key to incoming dataobjects, while retaining the associated incoming dataobject key--since this is unique, but repeatable. I carry this synthetic key with the vendor key so that I can uniquely identify each transaction as it arrives in any order, with any priority.

I'd like to keep this thread open so that as more info filters in from the BA, futher discussion on the model is possible. I will leave closure up to you.
Christopher Kile

Don't worry about me.  When you have more info, let me know.  If no one posts to this question in 20 days or so, the moderators will ask us if it's ok to delete it.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck