Hello - I'm a program manager for a software dev company that deals with very data intensive software. We often import massive amounts of data via an NT service from a CSV file on a daily basis.
Typically, nothing ever changes in the file structure from day to day. However, every once and a while, an extra column gets added accidentally or a new IT guy will put his hands on it and screw things up.
When this does occur, it creates a trickle down effect that results in substantial database issues. These issues can easily take our support staff anywhere from 30 minutes to an entire day (or more) to resolve.
I know that there are ways that safeguards can be put in place to help validate the consistency of the data. What are some of the best practices & techniques used to do something like this?
I know that we could look at column data and compare it against the last successful import for things like "are they both numeric columns?", etc. That type of thing isn't fool proof because if position 3 is a zip code and position 4 is a numeric CustomerID field, what happens if a new column gets inserted in the front of the data? A numeric only check won't cut it. Beyond that, I'm drawing a blank. There has to be a much more proven and tested approach out there than just that. Keep in mind, this data can be ordered in any manner depending on the setup of the export file. I'll list a few example of the fields below that could be included: CustomerID, FirstName, LastName, Address, City, State, Zip, Phone, etc.
If anyone could help to provide ANY insight or suggestions on this, it would be greatly appreciated!! We lose hundreds of hours in labor yearly because of it.
I know it's a very difficult question since it's just string comparison but someone has to got to have come up with a process to help ensure data consistency (or at least detect when something has gone haywire).