Looking for some best practice advice on a task I'm working on. The tools I'm using are Visual Studio 2015 (C#) and SQL.
I need to look up the data types of every column in every table of a source database using the INFORMATION_SCHEMA.COLUMNS view of an SQL DB and then create that column with the same data type in a new database. In addition to this, there will be other columns created that do not exist in the source db so it's not as simple as using the create table script feature from within the SQL management studio. My concern is there's 100s of tables and having to query the column data type from the source db and create it in the new db will not be the most efficient method.
I was considering using the below query to pull out everything I'd need into a DataSet and use that as my lookup, but I'm curious if there are any suggestions on how to improve this approach and if I missing any critical information in the below query? We're not concerned with PKs and FKs for this project.
I considered scripting the existing tables, but what we're doing is pretty complicated. We're extracting data from an HR system where every row in every table contains a column called AUDITLOG which stores all historical changes for that row in XML. This XML contain more columns than the source database tables (no idea why - if i had do guess they are derived fields for the HR system's internal business logic) so there isn't a true 1 to 1 mapping. In some cases these columns are named differently (i.e., DB Field = "NFAMILYNAME" and XML Field = "Name.FamilyName", and there's no data type definitions in the XML file. Since we know the naming convention for these mismatched columns (always first capital letters before the "." + all to the right of the dot), we figured searching for them in the schema based on name was the best way to go.
This means for each of the 2,000 or so tables which contain a total of about 40,000 columns, we're hitting the database about 80,000,000 times just to pull out the data type for one field at a time. My idea was to import INFORMATION_SCHEMA.COLUMNS into a C# DataTable and use that to perform the column lookups for their data types. Granted we'll only be running this one time once the project is finished, but it's still nice to have efficiency :)
zephyr , If used the DataTable approach, would Entity Framework work in this case? Keep in mind the additional columns pulled from the XML need to be created in the new DB and don't exist in the source DB. Those would just be created as varchar(max) since we'd never know their true data type.
I always like to hear other peoples suggestions for code optimization since it usually teaches me something new so any idea are much appreciated!
zephyr_hex (Megan)
You wouldn't need a datatable with info schema if you used Entity Framework. Entity Framework has the field schema. You could look up a field's data type using Reflection.
I considered scripting the existing tables, but what we're doing is pretty complicated. We're extracting data from an HR system where every row in every table contains a column called AUDITLOG which stores all historical changes for that row in XML. This XML contain more columns than the source database tables (no idea why - if i had do guess they are derived fields for the HR system's internal business logic) so there isn't a true 1 to 1 mapping. In some cases these columns are named differently (i.e., DB Field = "NFAMILYNAME" and XML Field = "Name.FamilyName", and there's no data type definitions in the XML file. Since we know the naming convention for these mismatched columns (always first capital letters before the "." + all to the right of the dot), we figured searching for them in the schema based on name was the best way to go.
This means for each of the 2,000 or so tables which contain a total of about 40,000 columns, we're hitting the database about 80,000,000 times just to pull out the data type for one field at a time. My idea was to import INFORMATION_SCHEMA.COLUMNS
zephyr , If used the DataTable approach, would Entity Framework work in this case? Keep in mind the additional columns pulled from the XML need to be created in the new DB and don't exist in the source DB. Those would just be created as varchar(max) since we'd never know their true data type.
I always like to hear other peoples suggestions for code optimization since it usually teaches me something new so any idea are much appreciated!