Microsoft Access - How to partition & index a table


I have a large access database (2003) which contains approx 3,500,000 records with 11 fields in one table. It contains one minute data for the currency pair EURUSD. It will obvioulsy continue to get bigger.

When I query the data for a certain period of time say 3 months last year & ordered it can take quite a while to retreieve the data which is uploaded into a chart in a ASP.NET web application with C# as the script langauge.

I have been told that indexing & partition could improve the performance. I know how to index but partitioning I do not know. Is there a limit to how many times you partition a table? Can you partition on dates? Also would I need to change my query if I partition my table?

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Scott McDaniel (Microsoft Access MVP - EE MVE )Infotrakker SoftwareCommented:
I would assume that by "partitioning" you mean to break up the data into chunks, and put those "chunks" into their own table? For example, you might have data from 2010 in one table, and data in 2011 in another, etc etc.

If so, then I'd say you'd need to examine what you intend to do with the data and make choices based on that. For example, if your intent is to compare monthly data (i.e. compare June to July), then it might make sense to partition the data into tables which hold that sort of data. If you intend on comparing quarterly data, then you'd host the data to refelect that.

The drawback to this is that you often need to modify your UI to refelct the new structure. Sometimes that's a minor change, but quite often it is a major one.

You might also consider moving to a more robust database platform, since you'll rapidly find yourself approaching the limits of Access.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mcs26Author Commented:

Thanks for reply.

What would you suggest as a more robust database platform - also how much would this cost & performance wise would it have a big impact compared to an Access Database?

So partitioning would simply be breaking my table into say Quaters ie 2002 Q1, 2002 Q2... 2011 Q3?

Scott McDaniel (Microsoft Access MVP - EE MVE )Infotrakker SoftwareCommented:
SQL Server Express is free, and provides you much of the functionality of the full versions.

MySQL is also free, and is used for some massive data storage needs.

There are many others out there, but in your case, I'd suggest SSE. It works very well with Access. It is impossible to say what sort of impact it will have, but if you're dealing with 3.5 million rows, then I would strongly suspect it would have a very positive impact. Of course, a lot depends on how the system is setup, and how you manage the data (i.e. indexing, etc), but in general as you get into larger datasets, the server-type database engines will provide better all around performance.

Again, I'm making assumptions as to what you mean by "partitioning", but generally speaking, yes - you would "partition" your data so that it can be queried more quickly. There are different types of "partitioning".

Some refer to moving speicific rows to specific tables to be "horizontal partitioning". The example I gave earlier would be horizontal partitioning. Another example would be zip/postal codes. In the US, zip codes consist of 5 digits (plus 4 others, if you use "zip +4" notation). The first number of those 5 digits represents a group of states. For example, the digit 1 is assigned to Deleware, New York and Pennsylvania. If you need to sort and query for regional Zip codes, you might break out all customers whose ZipCode starts with a 1 to a table named "Zips_DE_NY_PA", or simple "ZIPS1". You'd then further break down the data based on that first digit, and you'd end up with 10 distinct tables (0 thru 9), and you could then query your data as needed, without querying the single HUGE table.

There's also "vertical" partitioning, where you keep all the "rows" in the same table, but you move one or more columns to seperate tables. For example, if you need to query customers by town, county and income level, you might "vertically partition" the data to include only those columns - in other words, you'd build a new table, and you import the relevant columns into that data.

It's difficult to say what's best for you (or if a combination of both would be best), since I have no knowledge of your data, and don't really know what the true bottleneck might be.
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

Jim Dettman (Microsoft MVP/ EE MVE)President / OwnerCommented:

  Move it to SQL server.  JET's not that efficent with that large of a record set.  Until you do, compact and repair often.

  You may want to play with the JET engine setting CompactByPKey.

  When set to 0, records are copied in base table order (basically order added).  When set to 1, records are ordered by PK.  If no PK is set, base table order is used.

  Try running the DB performance analyzer to see what indexing it suggests (don't follow it blindly though).  You may also want to look at using JET SHOWPLAN to see how your queries are being costed and if indexes you have defined are actually being used.

  But really, if you expect this to grow, you need to move off JET.

Jim Dettman (Microsoft MVP/ EE MVE)President / OwnerCommented:

 I would also look at setting MaxBuffers to 65535, which will give you a bigger cache.  Depending on your record access pattern (sequential or random), it may or may not help.

mcs26Author Commented:
Great thanks for the advice. Like i say its one table with 11 fileds which are listed below. I run a basic select query and have the data ordered on "DTime" and take the average of Close_Bid & Close_Ask.

If i use SSE will it also need the tables to be partitioned? Begging to find Access is probably not up to what I am doing & its only going to get bigger!

DTime (DateTime)
Date (DateTime)
Open_Bid (Double)
High_Bid (Double)
Low_Bid (Double)
Close_Bid (Double)

DatabaseMX (Joe Anderson - Microsoft Access MVP)Database Architect / Systems AnalystCommented:
"its only going to get bigger!"
As noted, therein lies the problem.  

FWIW ....

IF ... it was going to remain about that many records, there would be hope.  I have a 'word' table with two fields, ID field (Long Integer) from another 'product' table (~300K records) and a 'word' field - text. The word table is built (3-4 x/year) from the product description field in the product table. So, if a description has 7 'words' (text separated by a space), the in the word table for that product ID, there will be 7 entries - same ID, 7 different words.

The word table as approx 3.8 million entries, but will *not* grow much beyond that. Both id and word are indexed, and there is a combined index on both fields to prevent dup combinations.

Soooo ... on the product lookup 'search form', user enters a search term. The search term ('word' OR multiple words) is looked up in the word table to get the associated product ID (or IDs). Then, a query is run against the product table using the ID(s), and results are returned.

IF ... both tables are local in an mdb ... results are returned in approx 2-3 seconds (or less in some cases). Not only that, 4 other fields in the product table are searched simultaneously with the same search term!! And any results for those are displayed in the appropriate section of the search form.  2-3 seconds ... think about it. Blazing Hot Fast - Access/VBA/Jet/DAO !!!

Now, if the product table is on a server (but word table remains local), the search is slightly slower (as reported by client).  If both word table and product table are on server, still slower, but acceptable to client.  However, this is only an interim scenario ... as client IS moving to SQL server for other reasons.  So, Access UI will remain, but ultimately tables will be on SQL Server.

And actually, I left a LOT out.  There is a LOT more going on in those 2-3 seconds.  So, it just shows what *can be done* ...

I can thank Marcus Fischer (EE's harfang   - for the word table technique and associated querying code.

Scott McDaniel (Microsoft Access MVP - EE MVE )Infotrakker SoftwareCommented:
<If i use SSE will it also need the tables to be partitioned>

There's really no way to tell, but as the first step I'd simply install SSE and move the tables to that platform, and then link those new tables back to your existing Access application. The simplest way to do that would be to build a new, blank Access database and then import everything from the old db except the tables. Once you do that, then link your new SSE-based tables to the new db.
Jim Dettman (Microsoft MVP/ EE MVE)President / OwnerCommented:

<<And actually, I left a LOT out.  There is a LOT more going on in those 2-3 seconds.  So, it just shows what *can be done* ... >>

  I would dis-agree.  You have a very unique situation which is basically read only and my guess would be that your using seek on an index, which of course will be extremely fast as it's a search on a B-Tree index.

  But the minute you try to do anything else with a table that size with JET, your going to have problems.

  For example, why do you only build it 3-4x a year?  Bet you it's too time consuming of a process.  You probably build the table and then create the index(es) rather then maintaining the whole thing on the fly.

  For "normal" usage and querying on non-index fields, JET really can't handle anything more then 3-4K records (your usually into the minutes range already), and once you get past a million, life is very difficult.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Access

From novice to tech pro — start learning today.