• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 252
  • Last Modified:

Look for a SQL Term

I have several databases for the same program which I am working to use an ETL program to move data over into a brand new database.

Several of the entities, Items, Projects, Employees, have duplicate IDs or Keys and we are going to have to Recode, change, update etc so those so there are no duplicates in the final database.  Many to One.

My questoin is simply, what is this process called?  A co-worker is calling this normalization, but as I read about that term, that doesn't seem like the right term.  Does anyone know what this procedure is called??

Thanks so much!
2 Solutions
SQL Join to Table
Pratima PharandeCommented:
this is called normalization , there are 3 Normal forms

For Example = 1NF

Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

see for more details
If you are going to remove the duplicates by changing the data itself and not altering tables. you can call this "Data Cleaning" if you are altering tables then its Normalisation
A proven path to a career in data science

At Springboard, we know how to get you a job in data science. With Springboard’s Data Science Career Track, you’ll master data science  with a curriculum built by industry experts. You’ll work on real projects, and get 1-on-1 mentorship from a data scientist.

Please assess if you need a normalized or denormalized database. Ideally, a denormalized database is much faster to query but could take up a lot of space and difficult to update.
This is some thing called as data integration.
red_75116Author Commented:
Let me try to clarify my question

If I have two databases with tables called customers and both tables have key fields like ID where they simply increment like

ID    |  Name
001 |  ABC Company
002 |  ACME Brick

and in the other I have
ID    |  Name
001 |  Checkers
002 |  Home Depot

And I am doing an ETL project and the customers from both of these databases are going to get copied and maintained in another database.  I  cant have all these records in the new database cause the Key field won't be unique so we have to  (clean, normalize, align, whatever the word is) and rekey some the data to a prevent any duplicates.  One simple solution would be to do this.

ID      |  Name
A001 |  ABC Company
A002 |  ACME Brick

ID       |  Name
B001 |  Checkers
B002 |  Home Depot

I am asking what the task to rekey this data is called.  (clean, normalize, align, whatever the word is)


i believe you are looking for the process of applying surrogate keys to your warehouse/mart ...

this is the process where you supply a new set of keys to the data so that within the database you have independence from the source systems and can support multiple source systems , some using similar (external) keys... which will be resolved by the ETL processing into the relevant new internal key for the warehouse...

it is up to your business logic in the etl stage to determine if you can aggregate the different source objects into existing objects already processed from other systems into the database.

you are conforming/standardising the data values to be held

normalisation is the process of designing the structures to hold/represent the data/relationships
its often called/performed as part of the DATA CLEANSE phase of the ETL processing
red_75116Author Commented:

You answer sounds the closest so far, but there are existing application databases so I dont have the ability to add new keys, but am limited to having to change them.  Kinda ugly I know.

This isnt a warehouse or mart, but the concept is reallly the same.  The databases are different divisions from an ERP program and then getting merged into a web based app every 2-3 hours.

Normalization doesn't seem like thr right word, but these might be.
not ugly ,standard procedure to ensure data independence in a "warehouse".

you don't change the source systems , its the new system which maintains the links between old and new ...

whether thats in the staging area or the main database is up to you.  you probably do maintain the existing source keys as a source reference (source system/source internal reference) on the data.

The design of the web based database , ideally shouldn't be just a copy of one of the source systems but rather a specific design for its requirements ... it presumably will have its own lifecycle independant of the source systems..

you haven't indicated if its supposed to be historical (a principle driver in the requirement to implement surrogate keys) or some sort of snapshot of the data (surrogate keys are required to ensure data integrity across the source systems)

i am assuming that you are mapping (conforming) the actual attributes of each source system into a set of "common"  codes for the new web app?

Normalisation definitly isn't the correct term ... (if i;m understanding you correctly)
red_75116Author Commented:
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now