Link to home
Start Free TrialLog in
Avatar of Aiysha
AiyshaFlag for United States of America

asked on

Split large table in access to smaller tables.

I have large table in access and I want to split it into smaller tables based on unique value in field one. Does anyone know of a query?

Thank you.
Avatar of John Tsioumpris
John Tsioumpris
Flag of Greece image

At first decide how many tables you want to split the BIG table
Copy/paste on the structure of the BIG table to the Number of Small tables e.g small1,small2,small3....they all have the same structure of the BIG table
You could use a simple query to take a chunk from the big table
SELECT * FROM BIG TABLE WHERE ID >1000 AND ID <2000
 and out of this execute an append query to append the table
INSERT INTO [SmallTable1] ( ID,Field1, Field2 )
SELECT BIGTable.ID,BIGTable.Field1,BIGTable.Field2
FROM BIGTABLE
WHERE ID >1000 AND ID <2000

Open in new window

Avatar of Aiysha

ASKER

my query

insert into smallertable (id1, f1,f2)
select bigtable.id1, bigtable.f1, bigtable.f2 from bigtable group by f1


is there a way to name the table by the grouped f1 value?
Avatar of Aiysha

ASKER

Also I am trying to make several tables based on unique api value.
Without some inside in your data structure i am only assuming ... just give a sample of your data.
What is the purpose of splitting the tables and naming them based on their contents?  The normal request would be how to combine the tables.
I want to split it into smaller tables

Pat is right. This is a dead end.
If your table is large (Many records) and you are having performance problems, then the correct approach will usually be to add indexes where relevant.

If your table is wide (Many fields) it *might* be that your table is not normalized properly, which is a whole different ballgame.

Please provide more info.
Yeah, you're likely heading in the wrong direction.  Dividing records with the same structure into different tables isn't good design.  Programming and querying will be much more difficult.

Anders is right about the proper indexing helping with performance, if that's the issue.  If that still doesn't help, then consider migrating the back-end to SQL Server - it has a lot more power and even the free Express version can support a larger database than Access.
The concept of splitting a BIG table to smaller ones is not entirely wrong
Take for example we have a table that holds all the stock movements
And it has reached some million records.....
It has records from year 2000 - 2019...considering the fact that there is actually no chance that something from .e.g. 2001 will come and "bite" us  in the year 2019 is not that wrong to move records e.g older that 2015 to a "history" table and leave the "necessary" records...
After all if we need them there are there ..e.g. for statistics uses but for everyday use a smaller table is always better...indexes do help but at some time we will have the fact that we are iterating 10,000,000 records opposed to e.g. 500,000
Hi John,

You're right, it's possible that dividing a table into a "current" and "history" table is the best approach, but I'd argue that it should be done only after all other solutions have been tried, including moving to a robust database like SQL Server, proper indexing, and good query design.  If the table is divided, then Union queries will be needed to get a complete history with records before some arbitrary cutoff date.  Sure, it's doable, but not optimal.

And anyway, the OP didn't mention anything about dates - they wanted to divide the table based on a "unique value".  This would result in the need for Union queries to put things back together again, which would definitely add complexity and hassle to the database.
I did a few projects for Readers' Digest in the 90's.  Their customer master file at the time was 60 million rows and divided into 8 separate files.  The file a customer record would be found in was based on the first letter of the last name (which also held company names for company accounts).  The online transactions extracted the first letter and used that to decide which file to read the customer's record from.  The batch processes read each record of each file in sequence.  It was a pain but no where near as bad as how they formatted the customer record itself but I won't go there since it isn't relevant to this discussion.  

I asked "why" the split because once we know "why", we can either help to implement the solution or offer one we think would be better.  "Why" is important rather than jumping right in with code because many of the people who post are new to development and they try very hard to solve their own problems before coming to us.  They come up with what they think is the answer but they get stuck in the technical implementation.  I don't think we are doing them any favors by blindly solving the technical problem when we KNOW that the solution is questionable at best and there is quite possibly a superior solution.
@Armen i don't think that SQL Server is the panacea that everybody expects....there isn't  a magic wand that it will wave in the air and magically fixes everything....it requires quite some work to get the performance that is "promised"
When i working on similar cases and i was investigating the magical world of SQL (first releases of SSMA) i found that SQL was way way slower through the usual linked table philosophy...
I had a nice simple table around 8 million rows...a few columns..i think it was 4-5...
The results were :
Access BE (local)--> It would run in 1-2 seconds...that's fast
Access BE (Network share)--> It would run in a few seconds...not so fast but it was usable
MSSQL BE (linked table)--> sometimes it would get the results sometimes it wouldn't....slow...really slow .usually i would get the data just before timeout. (180 s)
MySQL BE (linked table)--> it never finished...timeout of "Not Responding"
PostGreSQL BE (linked table)--> it never finished...timeout of "Not Responding"
The technique of dumping old "useless" data i have seen it also in MSSQL...my previous company ERP was using it to offload some of the "load"...given the fact that it was rather poorly designed regarding visualization it would utilize everything it could (my Continuous Access forms were flying compared to their DataGridView....probably never knew about Lazy Loading... :)  )
@John, I agree that SQL Server isn't a panacea.  Of course you have to use it properly and optimize the front-end application.  But when it IS optimized, it has far more raw power, safety and stability than an Access (ACE) back-end, and can solve performance issues like the OP presented.
When IS optimized if it didn't deliver the performance then probably the shrinks would have a lot of new clients (frustrated ,troubled DBAs) ...:)
John, you wrote:
"After all if we need them there are there ..e.g. for statistics uses but for everyday use a smaller table is always better...indexes do help but at some time we will have the fact that we are iterating 10,000,000 records opposed to e.g. 500,000"

The whole point of using an index is that we don't have to iterate all of the records. It's in part how an index of type "B Tree" works, in minimizing the data reads required. So with correct indexing, we are not reading the entire table, only the relevant parts.

And yes, just putting the data in SQL server, while keeping your queries local, will often cause a performance hit. But if your continuous form is taking 180 seconds to load, that is a design issue of your form/frontend, not of SQL server.
Near as I can tell, Aiysha never told us why she wanted to separate the data.  There was no mention of performance so we don't have any clue what problem we are trying to solve.
Pat, it's so much more interesting to solve a problem that we invent, right?  :)
@Anders ...indexes are great but No of records do count...is not the same iterating 500 records and not the same 500,000 records
Also because probably you misunderstood what i wrote....do you think that i would ever...ever design an application with 180 seconds response time....probably it would 1800 ms but not 180s...tests are tests and working conditions are working conditions.
But do take a huge table...upload it to SQL server and post back the results as a Linked table...
@John
Of course number of records count. Without an index on the table, then the processing time will scale linearly with the number of records.
But if you add an index on the table (And your query uses the index properly!), then the processing time to select X records will scale logarithmic-ally with the number of records in the table. Which log depends on the type of index of the database. In essence this means that processing 1.000.000 records might only take 3 times as long as processing 1000 records, provided you are only processing the records you need, out of the 1.000.000.
So with a table going back 19 years (till 2000) , if you were to divide it into 19 tables (one for each year) you would reduce the workload by a factor of 19 (without index, remember the linear scaling), and much much less if an index is being used.

That said, the above is still a simplification, the actual results will depend on the index type, the order of the fields, the order of the index, whether or not your database solution supports included columns or not, and joins, and of course whether the developer ends up writing queries that are not sargeable.

And, we have plenty of tables on SQL with millions of records in them. Our forms loads instantly.
@Anders..good to know your forms load instantly.......
@Armen, Not knowing what problem we are trying to solve doesn't stop any of us from offering an opinion :)  And you're right.  It is usually more fun to solve problems of our own making.   I'll be back if Aishya deigns to clarify.
Hi,

I would say if someone tries to separate one big table into x smaller ones and give them different names depending on a field contents of the big table then it's a decision like "All records from Year 2018, 2019,... into separate tables" or from different projects or locations etc.

There are only rare cases when this design make sense. Usually, as said, it's better to hold them all in one table.

I also would strongly recommend to forget a database of such size that it needs a table split in an Access backend (I always recommend to forget Access as backend as nowadays there is simply no need to use a silly file database when there are so many free brilliant database servers including SQL Server (Express in case of free), MySQL, PostgreSQL and many more. An ACTIVE backend component with a security model is always better than what Access can offer, especially in case of enterprise data. It's mostly because of lazyness of the programmers or trouble with their local IT to set up a database on Access base.
Access as frontend: Really good. Access as backend: Could not be worser than that. The sample of Reader's Digest above could not be more creepy...

Of course a SQL Server (to stay with this example) is always better in performance, the sample with the 180 seconds waiting frontend clearly is a sample of wrong Access frontend design and also wrong backend migration.
This is exactly wrong:
"do take a huge table...upload it to SQL server and post back the results as a Linked table"

If you do that the performance will of course be bad. It would always need a table scan to return the needed records. More worse: If you decide to design an Access query to query the records from SQL Server only. So just double click on such a linked table would be indeed veeery slow.

SQL Server can do a lot more to speed up the things than Access can do. First is, it is an ACTIVE component which can assemble data without the help of the network or the client computer.
Creating a clustered primary key is of course the first thing to do for better performance.
But an index on SQL Server can do a lot more: If you want to get the table reads low you can use included columns in indexes where you can add fields which are not part of the index (the sorting part) but it's data is saved to the index, too, and so a query can read it's data completely from the index (of course should not be done for all fields in a table and on all indexes...). This is not possible with Access indexes.
You can index a view on SQL Server which contains only the needed fields of the tables. This is not possible with Access queries.
You can separate big tables using a file group and separate specific data into specific files of this file group which can be separated in different file servers if needed to boost performance. This is not possible with Access: You can of course link to different Access backends, but you will not get all the data treated as one table which SQL Server does automatically.
If it's still not fast enough for you you can build SQL Server clusters to handle more users and so on.

Not to forget that SQL Server can build statistics and cache queries, can show execution plans which can be optimized and many more things.

So if you have at least a view on a good indexed table and bind that to an Access frontend and you still have 180s to load then you have really a design problem...

Cheers,

Christian
@Bitsqueezer do you have any idea how many people just download an Access -->SQL migrator and expect that everything works right from the moment they click finish.... i did it on my own on my first steps in programming
My manager insisted (we argued a lot back then) that we would use such a migrator and everything would fall in place in good time...just like magic....we had a huge system in Access and because the mother company took that naive road he thought that we should do the same...he wouldn't even hear me proposing that uploading to SQL would take several months of good work to Normalize the tables,rewrite queries,make view,SPs,code changes,testing......now it has being a few years i left and they are still on Access and probably would be for ever...until the system colapses (to the day i left there was over 10 GB of pure data and they were growing and funnily they wanted more and more...like feeding stupid presentation systems to please the bosses)
And some people here even advise that this is a good idea...
I know the performance gains of SQL (and i mean all the engines) but unless you spend quite some time on taking advantage of SQL the  performance it wouldn't be "stellar"....it will be slow and pathetic.
Hi John,

of course you're right, there is more work to do than simply using a migration tool like SSMA and expecting everthing is better.

And I'm aware that it's still often the case, that there are wrong expectations on effort to do all the optimazions. But it's also wrong to say that simply uploading a big table and link it back is a proof that using a backend database server would end in a slower performance. As I said: If at least the most important PK and maybe one or two other important indexes are created, then there should be already a big performance boost in comparison to any Access backend solution - and that costs only some days to create even for hundreds of tables.
In your example above I would bet that SQL Server always outperforms Access for 8M records just by adding a good index and using the right drivers/settings in the frontend.

Cheers,

Christian
@Christian i am perfectly aware that SQL can handle not 8 but 80 or even 800 million records but "internally"...but the problem is how the FE will get that data....take a look at SSMS...the newer versions by default are limiting to 1000 or 2000...when you are simply migrating the thousands tables in a few days ...the problem is that in the end you have the tables in SQL and you need to get the data to Access....if you simply link the tables as all migrators are doing...anything that holds a decent volume of data will be a bottleneck...
Now comes the parts that you have the tables in SQL but you need to write SPs,UDFs,triggers,PT,change the recordsources of most forms/reports (usually in Access we just dump the table/query and off we go....but what if the table has million of records...in Access is OK more or less ....just try it with SQL...it will probably die...so you need to filter the data..probably prepare some small temp table....etc...etc) ...the controlsource of Combos/ListBox and every other control that holds data....and then you need to decide how that data are going to get pulled/pushed...usually we have to have the forms ..Unbound...or used tricks ...and the list goes for ever
Hi John,

I didn't say that there is not additional work to do to get the most out of it. Everything correct. I said that it is simply wrong to say, migrating a table (big or not) to SQL Server would slow down the performance like in your example above. That leads to the impression for the reader that it's better to stay with an Access database backend and that's of course a catastrophic decision.

Cheers,

Christian
I am afraid that if the author wants to avoid the hassle Access BE is a one way street...
The funny part in the whole story that about everyone is "dreaming" of moving the BE to a more robust solution like SQL but strangely almost none is willing to pay as a complete package that will handle this load...i remember once there was a tool with some potential to do the right moves that it vanished in thin air probably due to lack of "paying" interest ....(MUST)
Personally i have thought of making of similar tool but given the failure of the above i turned to other directions....
@John, in most cases you don't have to go to the extremes you describe.  We use regular linked tables with SQL BE all the time - but we do so with limited recordsets.  For example, we might have a read-only list form with selection criteria based on a passthrough query, but when the user drills down to edit a record, we open a normal bound form with a regular linked table - but only for that 1 selected record.  Modern drivers communicate with SQL Server very efficiently in this situation and take advantage of server side processing and indexes.  The form runs quickly, even over the internet to a hosted or Azure SQL database.

It sounds like you had a bad experience with your migration project, and I sympathize, but there really is a balance that works well without a complete rewrite using unbound forms, local temp tables, etc.  We do it all the time.
@Armen everything remained in testing phase...so no harm done...
Maybe after all these years should recreate the scenario and write down some numbers (i surely mistaken the ODBC timeout and is 60s)
For all the years I've been creating Access apps (~25), I've used the same method that Armen uses.  The whole point of using Access is to take advantage of its RAD capabilities. If you're not going to make use of Access' RAD tools (especially bound forms), you are taking all the baggage and getting none of the gain.  You should be writing in your favorite .Net language.   None of my currently running apps has more than 100 users (although they could have thousands as long as the client has the appropriate SQL Server licenses) but some of them have millions of rows in the tables.  All the interactive processes use queries that limit the recordset they work with.  On the occasions where I need to do bulk deletes and bulk updates, I use pass-through queries.  But I can count on my hands the number of times I've needed to create stored procedures to make something work in a reasonable amount of time.   In most cases these were created to build complex reports.  Of course my years in mainframe development using CICS with IMS or DB2 databases taught me how to code efficiently.  Working for insurance companies my CICS/IMS transactions could easily have thousands of concurrent users and our service level targets included sub-second response time for most transactions.  That means that the user hits the submit key and the response is back immediately.  Working for the phone company and companies like Readers' Digest and Pratt & Whitney, my tables contained millions of rows.

My Access apps are always developed with the idea of potential upsizing in mind and that means that I can do a conversion in the amount of time it takes to transfer the data and test everything.  That usually translates to less then an afternoon's work.  Access apps with Jet/ACE BE's work fine when built using good client/server techniques.  Access apps built only with Access in mind, perform very poorly with only a straight data conversion and require potentially days or weeks to rework to take advantage of a SQL Server BE.  I even have an app that is sold to the public which allows the client to choose ACE or SQL Server as the BE and the app works great with either BE.
I didn't bother to read all the verbiage above, but I think (based on my experiences) that the author has one BIG table with many fields that is the result of many normalized tables being linked together and a query run to create the big table that she probably got from someone else (outside of her control?)

Now, she wants to "re-normalize" the records for some reason.  Perhaps to incorporate the data into her own OLTP database, and she has no experience with doing that, so she's looking for help.  Anyone familiar with situations like this and "re-normalizing" data, other than me?

Since Access is easier to do db design work in than SQL Server, I'd work out the design in Access, then if I need to go to SQL Server, port the schema to SS.

Just thinking out-loud.... ;-)
Avatar of Aiysha

ASKER

Thank you all. This has turned into a philosophical discussion rather than technical problem. My colleague suggested me to bring this into TIBCO Spotfire and proceed with the work.

Kind regards
ASKER CERTIFIED SOLUTION
Avatar of Aiysha
Aiysha
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial