Look for a SQL Term

I have several databases for the same program which I am working to use an ETL program to move data over into a brand new database.

Several of the entities, Items, Projects, Employees, have duplicate IDs or Keys and we are going to have to Recode, change, update etc so those so there are no duplicates in the final database.  Many to One.

My questoin is simply, what is this process called?  A co-worker is calling this normalization, but as I read about that term, that doesn't seem like the right term.  Does anyone know what this procedure is called??

Thanks so much!
red_75116Asked:
Who is Participating?
 
LowfatspreadConnect With a Mentor Commented:
not ugly ,standard procedure to ensure data independence in a "warehouse".

you don't change the source systems , its the new system which maintains the links between old and new ...

whether thats in the staging area or the main database is up to you.  you probably do maintain the existing source keys as a source reference (source system/source internal reference) on the data.

The design of the web based database , ideally shouldn't be just a copy of one of the source systems but rather a specific design for its requirements ... it presumably will have its own lifecycle independant of the source systems..

you haven't indicated if its supposed to be historical (a principle driver in the requirement to implement surrogate keys) or some sort of snapshot of the data (surrogate keys are required to ensure data integrity across the source systems)

i am assuming that you are mapping (conforming) the actual attributes of each source system into a set of "common"  codes for the new web app?

Normalisation definitly isn't the correct term ... (if i;m understanding you correctly)
0
 
_-W-_Commented:
SQL Join to Table
0
 
Pratima PharandeCommented:
this is called normalization , there are 3 Normal forms

For Example = 1NF

Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

see for more details
http://databases.about.com/od/specificproducts/a/normalization.htm 
0
Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

 
SandeepratanCommented:
If you are going to remove the duplicates by changing the data itself and not altering tables. you can call this "Data Cleaning" if you are altering tables then its Normalisation
0
 
GirardAndrewCommented:
Please assess if you need a normalized or denormalized database. Ideally, a denormalized database is much faster to query but could take up a lot of space and difficult to update.
0
 
vinurajrCommented:
This is some thing called as data integration.
0
 
red_75116Author Commented:
Let me try to clarify my question

If I have two databases with tables called customers and both tables have key fields like ID where they simply increment like

ID    |  Name
001 |  ABC Company
002 |  ACME Brick

and in the other I have
ID    |  Name
001 |  Checkers
002 |  Home Depot

And I am doing an ETL project and the customers from both of these databases are going to get copied and maintained in another database.  I  cant have all these records in the new database cause the Key field won't be unique so we have to  (clean, normalize, align, whatever the word is) and rekey some the data to a prevent any duplicates.  One simple solution would be to do this.

ID      |  Name
A001 |  ABC Company
A002 |  ACME Brick

===================
ID       |  Name
B001 |  Checkers
B002 |  Home Depot


I am asking what the task to rekey this data is called.  (clean, normalize, align, whatever the word is)

Thanks

0
 
LowfatspreadConnect With a Mentor Commented:
i believe you are looking for the process of applying surrogate keys to your warehouse/mart ...

this is the process where you supply a new set of keys to the data so that within the database you have independence from the source systems and can support multiple source systems , some using similar (external) keys... which will be resolved by the ETL processing into the relevant new internal key for the warehouse...

it is up to your business logic in the etl stage to determine if you can aggregate the different source objects into existing objects already processed from other systems into the database.

you are conforming/standardising the data values to be held


normalisation is the process of designing the structures to hold/represent the data/relationships
0
 
LowfatspreadCommented:
its often called/performed as part of the DATA CLEANSE phase of the ETL processing
0
 
LowfatspreadCommented:
0
 
red_75116Author Commented:
Lowfatspread:,

You answer sounds the closest so far, but there are existing application databases so I dont have the ability to add new keys, but am limited to having to change them.  Kinda ugly I know.

This isnt a warehouse or mart, but the concept is reallly the same.  The databases are different divisions from an ERP program and then getting merged into a web based app every 2-3 hours.

Normalization doesn't seem like thr right word, but these might be.
0
 
red_75116Author Commented:
thanks
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.