Link to home
Start Free TrialLog in
Avatar of rediscovr
rediscovr

asked on

Manage versions in data migration project

Hi All,

What are best practices in managing versions in a data migration project ?
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Versions of what?

The more specific information you provide in your question, the more precise our responses.
Avatar of rediscovr

ASKER

Versions of the migration itself. After mapping, transformation- during load we might see some errors due to which changes are to be done to scripts or mapping etc. We want to store this in a snapshot/version (use database schema per version?) so we can always look back at a given phase/snapshot of the migration effort.
Scripts and associated files can be tracked in any version control software.

I suppose you could also use that to document what 'schema' holds what version?

I'm not sure why you would want to track a data load for a specific run but it is your requirement...
That is COMPLETELY ABSURD to keep all data in 10 copies.

*) depuplication vendors may disagree...
What are best practices to manage changes done to the data migration in the intermediate db. From intermediate db, data is translated and imported into target db/system. Any best practices to manage this workflow? One option is to maintain different versions of data sets in different database schemas. We would like to look at a particular stage/snapshot of the migration phase at a given point so we want to maintain different versions of data sets. Any better way to manage/track changes ?
You typically have three different systems in play:
Development, Test and Production.

You normally don't have different versions of code ant any one time in any of them except development.

You create a 'release' that is placed in Test and THOROUGHLY tested.  Once it is good, it is moved into production.

I don't see where you need different copies of different runs of an ETL process.

Once something is in test, it either passes the test cases or it doesn't.  no need to store a 'bad' run.  You capture the necessary data where the code failed then reset the test environment back to baseline.
Once you copy data from one system to ather - make original read-only and keep it around until the destination copy is validated.
@ Slightwv-

Thanks for your input. I understand your point. However, issue is that we are dealing with constantly changing source (design and data). So it is adding a lot to complexity. After a bad run in dev instance, we want to track changes being done to mapping,  translation and especially changes to 'data'.
ASKER CERTIFIED SOLUTION
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@slightwv - thank you