Manage versions in data migration project

Hi All,

What are best practices in managing versions in a data migration project ?
rediscovrAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

slightwv (䄆 Netminder) Commented:
Versions of what?

The more specific information you provide in your question, the more precise our responses.
rediscovrAuthor Commented:
Versions of the migration itself. After mapping, transformation- during load we might see some errors due to which changes are to be done to scripts or mapping etc. We want to store this in a snapshot/version (use database schema per version?) so we can always look back at a given phase/snapshot of the migration effort.
slightwv (䄆 Netminder) Commented:
Scripts and associated files can be tracked in any version control software.

I suppose you could also use that to document what 'schema' holds what version?

I'm not sure why you would want to track a data load for a specific run but it is your requirement...
Need More Insight Into What’s Killing Your Network

Flow data analysis from SolarWinds NetFlow Traffic Analyzer (NTA), along with Network Performance Monitor (NPM), can give you deeper visibility into your network’s traffic.

gheistCommented:
That is COMPLETELY ABSURD to keep all data in 10 copies.

*) depuplication vendors may disagree...
rediscovrAuthor Commented:
What are best practices to manage changes done to the data migration in the intermediate db. From intermediate db, data is translated and imported into target db/system. Any best practices to manage this workflow? One option is to maintain different versions of data sets in different database schemas. We would like to look at a particular stage/snapshot of the migration phase at a given point so we want to maintain different versions of data sets. Any better way to manage/track changes ?
slightwv (䄆 Netminder) Commented:
You typically have three different systems in play:
Development, Test and Production.

You normally don't have different versions of code ant any one time in any of them except development.

You create a 'release' that is placed in Test and THOROUGHLY tested.  Once it is good, it is moved into production.

I don't see where you need different copies of different runs of an ETL process.

Once something is in test, it either passes the test cases or it doesn't.  no need to store a 'bad' run.  You capture the necessary data where the code failed then reset the test environment back to baseline.
gheistCommented:
Once you copy data from one system to ather - make original read-only and keep it around until the destination copy is validated.
rediscovrAuthor Commented:
@ Slightwv-

Thanks for your input. I understand your point. However, issue is that we are dealing with constantly changing source (design and data). So it is adding a lot to complexity. After a bad run in dev instance, we want to track changes being done to mapping,  translation and especially changes to 'data'.
slightwv (䄆 Netminder) Commented:
I might look at what you suggested but with a twist:
A new schema per developer.

You can use synonyms to control what objects are in the 'master' schema versus the local tested versions.

For example:
Create a master schema that is a 'copy' of production.  NEVER touch or modify anything in there.
The master schema has (very simplified example):
 two tables: MyTable and AnotherTable.
 one function:  MyFunc
 two procedures:  Proc1 and Proc2.



DeveloperA is making changes to the function MyFunc and table MyTable.
Have the lead person set up a schema DeveloperA:
create synonym DeveloperA.AnotherTable for MasterSchema.AnotherTable;
create synonym DeveloperA.Proc1 for MasterSchema.Proc1;
create synonym DeveloperA.Proc2 for MasterSchema.Proc2;

Then DeveloperA can create personal local copies of MyFunc and MyTable to make and test everything.
If developerA messes it up, nothing in the Master schema has been touched and everything else should function as normal.


Depending on how many developers you have, I might look at each developer getting an entire development database.

You clone a database from the baseline backups:  Do your development and when done, just drop that version.  Mess it up, drop and clone again from baseline.

As things move into production, just create a new baseline backup for future development.

I think this is cleaner but requires more server resources.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
rediscovrAuthor Commented:
@slightwv - thank you
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.