How to deal with file import into database?, MSSql

Posted on 2012-03-18
Last Modified: 2012-03-21
Purely at a theoretical level, I don't know if there is a "correct" answer. I know this is really general but I would appreciate some advice.

I have a project that requires a file import on a daily basis. I'd like to optimize the database but I'm not sure what's best in this situation. I've never had to deal with a daily import of data before.

The daily list is a .csv file that is a list of vehicles with make, model, year, condition, but also includes stuff like address, city, etc. The database itself will include this imported list and mainly administration type things like: Admin, Users, Offers and then layout/display type info to facilitate searches/display, etc.

Is it worth the time to import the list, then transfer to tables like, Address, Vehicle, Users, Admin and do the foreign key, typical database structure?

Or should I just leave the list as is and only optimize the Admin stuff?

Question by:MHenry
  • 2
  • 2
LVL 42

Expert Comment

ID: 37735512
By "optimize", I think you are mean "normalize".   Optimization means designing the database so that you get good performance and/or space utilization.  Normalization means designing the database to promote data integrity.  

In some cases, a normalization and optimization are at odds; but not always.  The goal is to find a happy balance within the bounds of your stakeholder requirements.

Since the database seems to be for reporting and analysis, I don't much see the point of foreign key constraints (except possibly to identify problems in the source data).  Ditto for normalization.  You may want to consider restructuring the CSV to address specific performance issues however.  That includes decomposition as well as aggregation and indexing.

Assisted Solution

mr_nadger earned 250 total points
ID: 37736646
How much data are you importing?
Is the data is primarily for reporting or searching?
Is the data read-only, or can users amend it through whichever tool will be doing the searching?
How powerful is the server hosting your database?

As Dqmq said, you need to know what the data's for, before you can really plan the database.
Personally, I'd put in an auto-incrementing integer primary key field, if no unique field exists in the data, to help with the searching (users love a simple ID key!).

The vital step would be indexing it, e.g. make/model/year, and state/city, this would help with both searches and reporting (we recently had a very humbling experience with a Microsoft SQL designer giving us a lesson on Indexing for Dummies, so make sure you understand indexing).
After each import, re-index the data through tsql as the built-in re-indexing is a bit dubious, and make sure you run a nice dbcc checkdb every so often to make sure your database stays healthy.

Author Comment

ID: 37742866
It's about 250mb. And is actually being imported twice daily.

I was just thinking that (for example) if there's an address in the import list and an address for users the typical design would be to have an address table and do all the foreign key stuff.

But I've been thinking lately that there is no real reason to try to maneuver all that data into different tables - especially since it's being updated twice a day. It just seems to me that it's just more of an opportunity for something to go wrong.

Mainly just wanted to see if there's a compelling reason to do so...

LVL 42

Accepted Solution

dqmq earned 250 total points
ID: 37743727
>Mainly just wanted to see if there's a compelling reason to do so...

"Compelling reasons" need to come from your business case, not from our outsiders view.  The integrity of a reports/analysis database is dependent on the loading process which can be fairly tightly controlled.   Very often, the overhead of referential integrity on top of that is not cost effective.  So, in that sense, I agree with your "gut":  why add complexity risk, unless  there is a justified benefit.  Is there?

That said, I can tell you that many report/analysis imports are restructured for better alignment with the queries that are anticipated.  That's done both for performance reasons and to simplify the query development.   Your call.

Author Comment

ID: 37743772
I don't think there is a compelling reason. But that's why I'm asking here. ;)

I can move the data into more traditional tables once a customer adds the vehicle to his list. And when it's moved into the user's list I can do all the more complex queries. The imported list is mainly just a "choose from this list" kind of thing.

At least that's the way I'm thinking about it now.

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that undeā€¦
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

790 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question