Link to home
Start Free TrialLog in
Avatar of mcs26
mcs26

asked on

Microsoft Access - How to partition & index a table

Hi,

I have a large access database (2003) which contains approx 3,500,000 records with 11 fields in one table. It contains one minute data for the currency pair EURUSD. It will obvioulsy continue to get bigger.

When I query the data for a certain period of time say 3 months last year & ordered it can take quite a while to retreieve the data which is uploaded into a chart in a ASP.NET web application with C# as the script langauge.

I have been told that indexing & partition could improve the performance. I know how to index but partitioning I do not know. Is there a limit to how many times you partition a table? Can you partition on dates? Also would I need to change my query if I partition my table?

Thanks
ASKER CERTIFIED SOLUTION
Avatar of Scott McDaniel (EE MVE )
Scott McDaniel (EE MVE )
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mcs26
mcs26

ASKER

Hi LSM,

Thanks for reply.

What would you suggest as a more robust database platform - also how much would this cost & performance wise would it have a big impact compared to an Access Database?

So partitioning would simply be breaking my table into say Quaters ie 2002 Q1, 2002 Q2... 2011 Q3?

Thanks
SQL Server Express is free, and provides you much of the functionality of the full versions.

MySQL is also free, and is used for some massive data storage needs.

There are many others out there, but in your case, I'd suggest SSE. It works very well with Access. It is impossible to say what sort of impact it will have, but if you're dealing with 3.5 million rows, then I would strongly suspect it would have a very positive impact. Of course, a lot depends on how the system is setup, and how you manage the data (i.e. indexing, etc), but in general as you get into larger datasets, the server-type database engines will provide better all around performance.

Again, I'm making assumptions as to what you mean by "partitioning", but generally speaking, yes - you would "partition" your data so that it can be queried more quickly. There are different types of "partitioning".

Some refer to moving speicific rows to specific tables to be "horizontal partitioning". The example I gave earlier would be horizontal partitioning. Another example would be zip/postal codes. In the US, zip codes consist of 5 digits (plus 4 others, if you use "zip +4" notation). The first number of those 5 digits represents a group of states. For example, the digit 1 is assigned to Deleware, New York and Pennsylvania. If you need to sort and query for regional Zip codes, you might break out all customers whose ZipCode starts with a 1 to a table named "Zips_DE_NY_PA", or simple "ZIPS1". You'd then further break down the data based on that first digit, and you'd end up with 10 distinct tables (0 thru 9), and you could then query your data as needed, without querying the single HUGE table.

There's also "vertical" partitioning, where you keep all the "rows" in the same table, but you move one or more columns to seperate tables. For example, if you need to query customers by town, county and income level, you might "vertically partition" the data to include only those columns - in other words, you'd build a new table, and you import the relevant columns into that data.

It's difficult to say what's best for you (or if a combination of both would be best), since I have no knowledge of your data, and don't really know what the true bottleneck might be.

  Move it to SQL server.  JET's not that efficent with that large of a record set.  Until you do, compact and repair often.

  You may want to play with the JET engine setting CompactByPKey.

  When set to 0, records are copied in base table order (basically order added).  When set to 1, records are ordered by PK.  If no PK is set, base table order is used.

  Try running the DB performance analyzer to see what indexing it suggests (don't follow it blindly though).  You may also want to look at using JET SHOWPLAN to see how your queries are being costed and if indexes you have defined are actually being used.

  But really, if you expect this to grow, you need to move off JET.

Jim.

 I would also look at setting MaxBuffers to 65535, which will give you a bigger cache.  Depending on your record access pattern (sequential or random), it may or may not help.

Jim.
Avatar of mcs26

ASKER

Great thanks for the advice. Like i say its one table with 11 fileds which are listed below. I run a basic select query and have the data ordered on "DTime" and take the average of Close_Bid & Close_Ask.

If i use SSE will it also need the tables to be partitioned? Begging to find Access is probably not up to what I am doing & its only going to get bigger!

Fields
DTime (DateTime)
Date (DateTime)
Time(DateTime)
Open_Bid (Double)
Open_Ask(Double)
High_Bid (Double)
High_Ask(Double)
Low_Bid (Double)
Low_Ask(Double)
Close_Bid (Double)
Close_Ask(Double)

"its only going to get bigger!"
As noted, therein lies the problem.  

FWIW ....

IF ... it was going to remain about that many records, there would be hope.  I have a 'word' table with two fields, ID field (Long Integer) from another 'product' table (~300K records) and a 'word' field - text. The word table is built (3-4 x/year) from the product description field in the product table. So, if a description has 7 'words' (text separated by a space), the in the word table for that product ID, there will be 7 entries - same ID, 7 different words.

The word table as approx 3.8 million entries, but will *not* grow much beyond that. Both id and word are indexed, and there is a combined index on both fields to prevent dup combinations.

Soooo ... on the product lookup 'search form', user enters a search term. The search term ('word' OR multiple words) is looked up in the word table to get the associated product ID (or IDs). Then, a query is run against the product table using the ID(s), and results are returned.

IF ... both tables are local in an mdb ... results are returned in approx 2-3 seconds (or less in some cases). Not only that, 4 other fields in the product table are searched simultaneously with the same search term!! And any results for those are displayed in the appropriate section of the search form.  2-3 seconds ... think about it. Blazing Hot Fast - Access/VBA/Jet/DAO !!!

Now, if the product table is on a server (but word table remains local), the search is slightly slower (as reported by client).  If both word table and product table are on server, still slower, but acceptable to client.  However, this is only an interim scenario ... as client IS moving to SQL server for other reasons.  So, Access UI will remain, but ultimately tables will be on SQL Server.

And actually, I left a LOT out.  There is a LOT more going on in those 2-3 seconds.  So, it just shows what *can be done* ...

I can thank Marcus Fischer (EE's harfang   -  https://www.experts-exchange.com/M_94825.html) for the word table technique and associated querying code.

mx
<If i use SSE will it also need the tables to be partitioned>

There's really no way to tell, but as the first step I'd simply install SSE and move the tables to that platform, and then link those new tables back to your existing Access application. The simplest way to do that would be to build a new, blank Access database and then import everything from the old db except the tables. Once you do that, then link your new SSE-based tables to the new db.
@MX,

<<And actually, I left a LOT out.  There is a LOT more going on in those 2-3 seconds.  So, it just shows what *can be done* ... >>

  I would dis-agree.  You have a very unique situation which is basically read only and my guess would be that your using seek on an index, which of course will be extremely fast as it's a search on a B-Tree index.

  But the minute you try to do anything else with a table that size with JET, your going to have problems.

  For example, why do you only build it 3-4x a year?  Bet you it's too time consuming of a process.  You probably build the table and then create the index(es) rather then maintaining the whole thing on the fly.

  For "normal" usage and querying on non-index fields, JET really can't handle anything more then 3-4K records (your usually into the minutes range already), and once you get past a million, life is very difficult.

Jim.