Link to home
Start Free TrialLog in
Avatar of kellyclu
kellyclu

asked on

Need Data Model for 200 million csv rows

Hi all, I'm working on an app to retrieve records from a monstrous data store.  What I need is a data model for how to store this data, preferablly in sql server 2005, which is really not my area of expertise.  

I have a comma delimited text file that documents web page activity with rows that contain a timestamp and data regarding the action taken, user id...

Fields are timestamp (when the action occurred), application (application requested), user id, op type (command/show/null), instruction (command being executed or page requested)

timestamp            
2007-01-01 12:00:00
2007-01-02 12:01:00
...

application
1384438
1483838
...
 
user
238
456
...

optype
C
S
...

instruction
sf
default-navigation


So file looks like:
2007-01-01 12:00:00,1384438,238,C,sf
2007-01-02 12:01:00,1483838,456,S,default-navigation



There are 200 million rows of data in the file, this is from one year. This is going to be ongoing, so next year, the data will be 400 million rows total, and so on.

I need a way (that scales) to store all this data so I can efficiently request large and small scale inquiries, such as the following:

What's the average amount of time a user takes to complete application xyz.

is there a normal path a user takes to view page xyz.

what do users typically do after performing instruction xyz.

on average, for all users that have submitted instruction xyz, what set of instructions did they execute before, and in what order.

display the data chronologically


from an application standpoint, i'm thinking i'll have a window that gives users options pertaining to the item selected, (timestamp - avg, accumulative, difference; optype (c/s/none) by using a datareader (forward only/read only (firehose cursor)).

I'm really open to any ideas here.

Thanks in advanced.
Avatar of Racim BOUDJAKDJI
Racim BOUDJAKDJI
Flag of Algeria image

I suggest you first import data using bulk insert (or DTS/SSIS) *as it is* as far as schema is concerned then build views based on a design effort.  In other words, I suggest separating the ETL process from any design effort.

Hope htis helps...
Yeah and make sure you data model is normalized.  normalization = database design optimization

Hope this helps...
ASKER CERTIFIED SOLUTION
Avatar of Ken Selvia
Ken Selvia
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial