Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

SQL Table Data Mapping Best Practice for ingesting unknown data types?

Posted on 2014-02-04
5
Medium Priority
?
890 Views
Last Modified: 2014-02-05
Hi all -
I'm trying to figure out a way to best map significant amounts of data to usable columns/data categories.  Let me lay out an example -

I need to create more or less a raw data table that can be manipulated and reported against from ingested excel spreadsheets.  I don't know what the headers of these spreadsheets are (or there will be new spreadsheets with additional headers).  The first thing I'm going to need to do is gather the headers and store them so the users can report against them, but how should I best store the actual data that will make sure running the queries are fast and effective?

A traditional/very simplistic version I guess would be:

Create Category table, store name and then what column ID in the 'data' table that category coincides with.  So if you have a data table and just list the columns "Column1, Column2, Column3" etc etc, and the category table dictates that Column1 coincides with "Product Name", Column2 coincides with "Product Cost", etc...easy easy, but the downside is obviously if that specific spreadsheet has only 3 columns, 47 cells of data aren't being used for each row.

A second thought I had was:
Just have a data table with 3 columns, the data value, the data category, and the row ID.  Then you could loop through every cell in the spreadsheet, store what category of data it should be classified as,and the row ID so you could recompile the row of data and how the data is related.  The downside, obviously, is that your table is going to have a new row for EVERY piece of data you ingest, and if the speadsheets have 100 columns, its 100 database rows to query just for a single spreadsheet row.

------------------

Bottom line, I don't have any experience doing this.  The database could ingest thousands of rows of data each week and has to be able to run queries quickly to report against the specific datapoints.

Any thoughts from database experts out there?  What way is best to 1) store dynamically mapped data and 2) run queries back on it as quickly as possible?

Thanks much!
0
Comment
Question by:milestonetech
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 40

Expert Comment

by:lcohan
ID: 39833557
0
 

Author Comment

by:milestonetech
ID: 39833566
Thanks lcohan, I can't afford to do this quite yet :(.  I need to take a best approach via SQL server for the time being until we expand our software side of things a bit and I can get funding for additional technology.  Thanks for the link, either way!
0
 
LVL 16

Accepted Solution

by:
Surendra Nath earned 2000 total points
ID: 39833829
I suggest you to go with option 1.

in the current database world there are two types of database storage systems.
 a) row oriented
 b) column oriented

traditionally all the databases are row-oriented including the SQL Server... i.e. they are optimized for row-oriented operations..

The recent introductions like mongo DB, Cassendara fall into column oriented versions....

As you are currently using SQL Server, I suggest you to go with the row-oriented approach.... The option 1 of yours.

In future if you are converting to MongoDB or Casssandara or HADOOP then go for column oriented approach Option 2 of yours.


Now your question about the data space wastage, as long as you populate nulls into those column SQL Server wont occupy much space for empty columns... and if you are concerned more about space, then you can also employ the row level / page level compression.
0
 
LVL 66

Expert Comment

by:Jim Horn
ID: 39834057
>SQL Table Data Mapping Best Practice for ingesting unknown data types?

(1)  Use a very large stick to reach out and touch the technical contacts of the source of this data, to make all data types well defined and static.

(2)  If above is not successful, tell your employers that the source of data is not reliable, will frequently cause manual developer intervention when the format changes, and remind them of your billable rate.

(3)  Brainstorm all ways the file could change, design a way to handle those changes by importing the entire row into a single column and manually parsing it via T-SQL, and inform employers of your estimated hours and rates.  

This MAY convince them either not to do it, or that it would be much cheaper to force the source of data to be well defined and static.

>I don't know what the headers of these spreadsheets are (or there will be new spreadsheets with additional headers).
Excel is a great application as you can manipulate it thousands of ways.
Excel is a terrible source of data for an automated feed as you can manipulate it thousands of ways, and SSIS (I can't speak for other ETL applications) requires a 'contract' schema with a data source in order to map it properly.  

Good luck dude.
0
 

Author Comment

by:milestonetech
ID: 39836181
thanks much, guys!  Surendra, your information regarding the row vs column platforms was quite helpful.  I'll stick with option 1.  Thanks for your insight as well, Jim!
0

Featured Post

Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It is possible to export the data of a SQL Table in SSMS and generate INSERT statements. It's neatly tucked away in the generate scripts option of a database.
In this article, I’ll talk about multi-threaded slave statistics printed in MySQL error log file.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question