Data Storage

Bearing in mind I am using VB5:

I have a data set which updates daily and consists of a number of currencies (the number could change but is unlikely to do so).  For each currency there is a number of future dates, and a number of values corresponding to the dates.  
The problem is that with each update the number of dates and values could change for each currency, the dates themselves will change, and the number of dates and values will not be the same for each currency even on the same update.

If your following this then I'm impressed!  

Now, I'm going to be using these figures for historic analysis so I need a good way of storing the data day by day so that it can be easily accessed/referenced in the future.

If anyone can outline a structure for this little lot I would be more than grateful as my brain is now mush!

Thanks in advance

Who is Participating?
StapleheadConnect With a Mentor Commented:
ok... i still think that the 3nf design is better, although if you want to do the coding with the design with repeating groups, just bear in mind that you'll have to do the 'validity check' coding (that's what i was trying to say, dalin).

something else struck me: you seem overly concerned about the number of records you'll be generating. look at the total amount of space that each design will use, based on some reasonable baseline for amount of data.  the flexible approach should add reasonably negligable overhead, but pay for itself in terms of flexibility: take advantage of the relational model, and only de-normalize if performance is a show-stopper!

also, with the more flexible design (and i agree with wford here, based on *painful* past experience), when whomever signs your check comes up to you and has a new analysis objective, you'll kick yourself if the changes have to include more analysis time, db re-design, migration of data, and then, finally, attacking the task at hand.

Let me repeat your question back to you, to make sure I understand...

This sounds a bit like gambling on exchange rate futures:

Entries look something like this:

On date XXX, Currency YYY's data was updated to the following:
                   date Z1, value W1
                   date Z2, value W2
                   date Zn, value Wn

The DB design would be something like this:

Currencies       Transactions
---------------        -----------------
CID                    TID
CDesc               TransDate

Then, for each transaction the user creates, you add one or more rows to the Transactions table.  When pulling the data out, you will have the option of showing the value(s) assigned a given currency over time, or the value(s) of various currenc(ies) on a given Transaction Date, etc, etc...

Well if I understand correct (StapleHead has just added a somewhat different proposal, when my proposal failed to go thru first time)

use a database table: One table 4 columns ("Fields")

A : Date that record was entered (the date the record was inserted into the table)
B: Future date
C: Currency
D: Number

When you "update" the table you simply insert today's new values (set field A in each case to today's date), and leave the historical record untouched
All today's records can be found by selecting where A=TODAY

Historical records can be found by searching on A plus whatever other field
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Well, It is not so hard to construct an Access database and then establish a relationship, which would give you what you need.
Table1: Cuurency table, give a currencyID and the currency Name
TableTwo: futuretable list the future period( if it's for a stock market kind), you have one month, two months, ... give it a id

Now the main table: you will have curDate, currencyID, FutureID, Value.

Now, you can trieve the data any way you like: by curDate (your entry date,   by currency, by futuredate or any combination of them.

Let me know if I can be further help
batdanAuthor Commented:
Staplehead - I think I may need to explain further.  Being as you mentioned Exchange traded futures then I'll be more specific.  One record for each currency needs to be added per day i.e. 30 or so.  Each currency has it's own curve associated with it (date/value), but the curves can be made up of any number of data points irrespective of which currency the curve is for or which day I will be running the program.

So you can see that if I had 5 days data, each day having 30 currency curves, then I would end up with 150 sets of data points, which could all be different lengths!

My problem is working out how to cope the dynamicism. (is that a word?)

Thanks for the input so far guys.

No.  You have more than 150 (bease you have different future date (more or less, does not matter) records a day in the  main table, each has exact same length. No missings.

You can also do the main table has more columes, with one column for each future dates.  This with be preferable if you have a fixed number of future date and the are the same for each currency, although some may not have a value. In this case you would deal with empty cells.

How you design the database may also depends on how you want to to retreive the data.
umm... i think the word is 'dynamicity', but that's neither here nor there...

so, for any given day, any given currency could have any number of date/value pairs, right?

not a problem, the way i outlined the tables.

however, you seem to be asking how you'd deal with data of unknown size: i'm not sure what you'd want to do, but you'd typically grab a recordset from your DB, meeting your criteria (whether that be currency, or transaction date, or both!), and loop through all records. for example:

dim rsTrans as Recordset
dim lsAvg as Single

strSQL="SELECT * FROM Transactions WHERE CID=5 ORDER BY TransDate"
set rsTrans = db.OpenRecordset (strSQL)

lsAvg = 0.0
while not rsTrans.EOF
     lsAvg = lsAvg + rsTrans("CValue")

if rsTrans.RecordCount = 0 then
    lsAvg = 0.0
    lsAvg = lsAvg / rsTrans.RecordCount
end if

so, are you asking *how* to use the data, or how to *design* the db, or how to *access* the data????

Dalin's suggestion is somewhat valid, but not 3NF, so keep in mind that if you'd implement something like that, you're stuck coding for *each* *column* in the table!!!


lsSum = rs("val1") + rs("val2") + ....

and then, you'd have to come up with the number that were actually valid in order to analyze further.. yuck.

Let me clarify a little more.  The design of the database can be affected by how would you like to look at the data.   What staplehead listed is similar to what I suggested (originally, and I still think that is the right design.  However, if you are more often to look at all the future values for a given currency at the same time, and you have a limited number of futures, then have each future as a column can be benificial.  In terms of retrieve data, you pretty much set-up a query once... and then use it again and again.  I am not sure what Larry was talking about  in "stuck coding for *each**column*...

Anyway,  I would be happy to help if you desire, batdan.
Regards to you all.
batdanAuthor Commented:
OK, going by my understanding, the two suggestions we have here are:

(Just for the record, I'm constructing yield curves, so the 'dynamicity' :-) stems from a) not being able to obtain futures prices in all currencies, and b)  Less short end dates will be used when the futures rollover.)  Anyway:

Staplehead - I think this is what you're saying:

tbl 1 -
CID     CDesc
ATS    Austrian
AUD   Aussie
BEF   Belgian

tbl 2
TID      YieldDate      CID        CValue        CDate
1         07/10/98        GBP      4.76            6/10/98
1         08/10/98        GBP      4.75            6/10/98
1         14/10/98        GBP      4.70            6/10/98
1         21/10/98        GBP      4.55            6/10/98
1         07/11/98        GBP      4.40            6/10/98
1         07/12/98        GBP      4.15            6/10/98
1         07/01/99        GBP      4.07            6/10/98
2         07/10/98        DEM      5.06            6/10/98

So that the above data in tbl 2 would represent 1 single curve, and I could identify the curve by querying the table for TID = 1, or query by Currency etc.


Dalin -

tbl 1 -
CID     CDesc
ATS    Austrian
AUD   Aussie
BEF   Belgian

tbl 2                   {----------------------------------Dates-----------------------------------------}
CurveID    CID    O/N   Spot   1wk    2wk   1mth    F1    F2     F3    F4    Swp1 etc.
1             GBP   4.6    4.5      4.2    4.0     3.6      3.2    3.7    3.4   3.1   2.6
2             DEM   4.4    4.2      4.0    3.8     3.6      3.2    3.0    2.5   2.1   1.4
3             JPY    7.3    7.2      7.0    6.8     6.5      6.4    6.1    5.8   5.4   N/A
4             ATS    5.6    5.6      5.4    5.2    4.9      N/A   N/A   N/A  N/A   3.6
5             THB    6.5    6.3      6.0    5.5    5.3      N/A   N/A   N/A  N/A   N/A

So each record is a full yield curve data set, but I would have to determine which fields to use in the analysis as some curves would finish shorter than others, and sometimes the 1 mth and the 2 wk values would not be needed (before the futures roll).

hmmm, I think that I can now understand how I would use either one.  Both seem to have their own advantages, Staplehead - ease of access, and Dalin - 30 recs added a day rather than 300 or so.  I don't mean to sound dumb here but you guys may need to confirm that I have understood your suggestions accurately.

I appreciate your continued input & have more points available should you feel they are warranted.
Thanks again

I have done something similar in the past, and I would go with stapleheads:
Firstly it does not matter the number of records entered, you could write a sweeper program that went back and averaged out each days/weeks/etc.. value if numbers of record is a problem. Next it keeps your table nice and rectangular, important in Access, as each record uses the same memory, no matter if they are empty or not. Finally, ease of access is the most important, and a simple structure gives you more flexibiliy later when someone says "wouldn't it be great to see ...."

well that went on longer than I though, sorry, only meant to make a quick comment:)    
I think you get the idea.  The advantage of the 2nd way is not that much in data entry. As you can see, you can easily query and plot all the futures of a currency over time... This give you the two- demesional analysis of a currency.

    I       ###  #######
    I          ##       ####                    ###### Future1
    I                       #######       ######
    I             +++++            #######
    I      +++++++     ++++
    I                      ++++++++          +++++++++ Future2
    I                              ++++++++++
    I      $$$$$
    I           $$$$$$$$$$$$           $$$$$$$$$$$$$$$ Future3
    I                       $$$$$$$$$$$

If you don't realy care to look at the data in this way, the the 1st design would be better.

One thing I need to point out is, don't classify the first way to be staplehead and 2nd way to be dalin.  My first comments suggested the same design. I throwed out the 2nd way just for you to think about and you  should definitely consider how you are going to analynize your data.


batdanAuthor Commented:
You have a point here.  The nature of the analysis plays quite an important role in this scenario.  For EACH currency, I need to look at the previous 500 curves (data sets).  Assuming that each data set has an average of 15 dates with values, and assuming 35 currencies, I will have around 260,000 records with the first suggestion, or 17,500 with the second (but many more fields) - albeit relinquishing some flexibility/ease of use.

I liked wford's idea of the sweeper program to amalgamate data but I am not sure as yet whether the (reasonably complex) analysis will allow me to implement this with out compromising results.

Does anyone know the max number of records I can enter into a table in Access if there were only 5 fields or so?

I think I probably have sufficient food for thought from the suggestions provided to try out both methods.  As all input has been of use and of a good standard thus far, I will award the points to Staplehead (for the first resonse), but I'll also post a dummy question for Dalin to answer due to some useful material.  One more question - where should I  post the dummy question?

Staplehead - if you want to answer this question I can close it off.

Thanks to all

BTW - any more input still much appreciated

You can have tables up to 1 gig in size so you should be ok.

But for the sweeper program concept, each day you could go thru the data and colate it into 6 records/currency(or even 1) by interpolation between the known points, thus you have a standard (minium)set of data to go on for all currencys, you could get this to be fired after close of trading each day, or something like this.

Pardon Dalin, you are correct, I was just using batdan's appelations.
batdanAuthor Commented:
Sorry, I've been away for a while.  Cheers.   #D
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.