Need Data Model for 200 million csv rows
Posted on 2007-11-16
Hi all, I'm working on an app to retrieve records from a monstrous data store. What I need is a data model for how to store this data, preferablly in sql server 2005, which is really not my area of expertise.
I have a comma delimited text file that documents web page activity with rows that contain a timestamp and data regarding the action taken, user id...
Fields are timestamp (when the action occurred), application (application requested), user id, op type (command/show/null), instruction (command being executed or page requested)
So file looks like:
There are 200 million rows of data in the file, this is from one year. This is going to be ongoing, so next year, the data will be 400 million rows total, and so on.
I need a way (that scales) to store all this data so I can efficiently request large and small scale inquiries, such as the following:
What's the average amount of time a user takes to complete application xyz.
is there a normal path a user takes to view page xyz.
what do users typically do after performing instruction xyz.
on average, for all users that have submitted instruction xyz, what set of instructions did they execute before, and in what order.
display the data chronologically
from an application standpoint, i'm thinking i'll have a window that gives users options pertaining to the item selected, (timestamp - avg, accumulative, difference; optype (c/s/none) by using a datareader (forward only/read only (firehose cursor)).
I'm really open to any ideas here.
Thanks in advanced.