Hi all -
I'm trying to figure out a way to best map significant amounts of data to usable columns/data categories. Let me lay out an example -
I need to create more or less a raw data table that can be manipulated and reported against from ingested excel spreadsheets. I don't know what the headers of these spreadsheets are (or there will be new spreadsheets with additional headers). The first thing I'm going to need to do is gather the headers and store them so the users can report against them, but how should I best store the actual data that will make sure running the queries are fast and effective?
A traditional/very simplistic version I guess would be:
Create Category table, store name and then what column ID in the 'data' table that category coincides with. So if you have a data table and just list the columns "Column1, Column2, Column3" etc etc, and the category table dictates that Column1 coincides with "Product Name", Column2 coincides with "Product Cost", etc...easy easy, but the downside is obviously if that specific spreadsheet has only 3 columns, 47 cells of data aren't being used for each row.
A second thought I had was:
Just have a data table with 3 columns, the data value, the data category, and the row ID. Then you could loop through every cell in the spreadsheet, store what category of data it should be classified as,and the row ID so you could recompile the row of data and how the data is related. The downside, obviously, is that your table is going to have a new row for EVERY piece of data you ingest, and if the speadsheets have 100 columns, its 100 database rows to query just for a single spreadsheet row.
Bottom line, I don't have any experience doing this. The database could ingest thousands of rows of data each week and has to be able to run queries quickly to report against the specific datapoints.
Any thoughts from database experts out there? What way is best to 1) store dynamically mapped data and 2) run queries back on it as quickly as possible?