asked on

Manage versions in data migration project

Hi All,

What are best practices in managing versions in a data migration project ?

slightwv (䄆 Netminder)

Versions of what?

The more specific information you provide in your question, the more precise our responses.

rediscovr

ASKER

Versions of the migration itself. After mapping, transformation- during load we might see some errors due to which changes are to be done to scripts or mapping etc. We want to store this in a snapshot/version (use database schema per version?) so we can always look back at a given phase/snapshot of the migration effort.

slightwv (䄆 Netminder)

Scripts and associated files can be tracked in any version control software.

I suppose you could also use that to document what 'schema' holds what version?

I'm not sure why you would want to track a data load for a specific run but it is your requirement...

gheist

That is COMPLETELY ABSURD to keep all data in 10 copies.

*) depuplication vendors may disagree...

rediscovr

ASKER

What are best practices to manage changes done to the data migration in the intermediate db. From intermediate db, data is translated and imported into target db/system. Any best practices to manage this workflow? One option is to maintain different versions of data sets in different database schemas. We would like to look at a particular stage/snapshot of the migration phase at a given point so we want to maintain different versions of data sets. Any better way to manage/track changes ?

slightwv (䄆 Netminder)

You typically have three different systems in play:
Development, Test and Production.

You normally don't have different versions of code ant any one time in any of them except development.

You create a 'release' that is placed in Test and THOROUGHLY tested. Once it is good, it is moved into production.

I don't see where you need different copies of different runs of an ETL process.

Once something is in test, it either passes the test cases or it doesn't. no need to store a 'bad' run. You capture the necessary data where the code failed then reset the test environment back to baseline.

gheist

Once you copy data from one system to ather - make original read-only and keep it around until the destination copy is validated.

rediscovr

ASKER

@ Slightwv-

Thanks for your input. I understand your point. However, issue is that we are dealing with constantly changing source (design and data). So it is adding a lot to complexity. After a bad run in dev instance, we want to track changes being done to mapping, translation and especially changes to 'data'.

ASKER CERTIFIED SOLUTION

slightwv (䄆 Netminder)

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

rediscovr

ASKER

@slightwv - thank you