Solved

Microsoft Access Files Much Larger than Excel Equivalent...Why?

Posted on 2011-02-16
21
1,198 Views
Last Modified: 2012-05-11
have an Excel table that I have imported into Access.  The Excel file is around 5MB, but the Access database is 17MB!

Does anyone know why Access would be so much larger?
0
Comment
Question by:billelev
  • 5
  • 4
  • 4
  • +4
21 Comments
 
LVL 11

Accepted Solution

by:
Runrigger earned 400 total points
ID: 34907757
Some fields in an Access table will have an associated index with them (effectively a subset of the data used to catalog where in a table records of a certain value are).

Indexes may be on more than one field
0
 
LVL 57
ID: 34907960

  And they simply don't store data the same way...you really can't compare the two.

JimD.
0
 
LVL 119

Expert Comment

by:Rey Obrero
ID: 34908033
do a compact and repair to shrink the size of your db.
0
 
LVL 22

Expert Comment

by:rspahitz
ID: 34908064
Two different applications; two different ways to save things.  And as rigger said, additional indexes and other pieces are also added in Access.  Plus, depending on what you've done in Access, it maybe "holding onto trash" and if you "compact and repair" it, it may shrink significantly.
0
 
LVL 74

Assisted Solution

by:Jeffrey Coachman
Jeffrey Coachman earned 100 total points
ID: 34908082
Agree with the above posts.

You can probably reduce the size a bit by changing the default Datatypes/field properties that Access sets.

Also remember that as a database, (Even if all it has is one imported table), Access stores a lot of other information as well.

Comparing the File sizes is like asking why one car gets worse gas mileage than another...

"I drive to work every day.  The Ford Focus I drive gets around 25 MPG, but the BMW gets 17MB!
Does anyone know why the BMW gas mileage would be so much worse?"

;-)

Totally different cars my friend...
Quite Literally:  ..."Your mileage may vary"
;-)

JeffCoachman
0
 
LVL 22

Expert Comment

by:rspahitz
ID: 34908229
If you want to make your Excel 5MB file really small, save it as an unencoded text file...then it might be only 1MB...Excel must be a memory hog. :P
0
 

Author Comment

by:billelev
ID: 34908269
thanks, all.  It's not a compact and repair issue, just an Access vs Excel issue.

I basically have a very large table (40,000 rows and growing) that stores trade information.

There is a lot of repeat information, such as trader, customer, security type etc...I am currently storing that as individual string entries, rather than referencing a separate customer table etc.  Would a reference rather than a direct text entry make any difference?

The data isn't really relational in that all we store is customer name, not additional data such as customer address etc. so I think it is fine as one table.
0
 
LVL 119

Expert Comment

by:Rey Obrero
ID: 34908297
<It's not a compact and repair issue, just an Access vs Excel issue.>

compact and repair was suggested just to shrink the size of your db
0
 
LVL 22

Expert Comment

by:rspahitz
ID: 34908370
"Would a reference rather than a direct text entry make any difference?"

In most cases yes.  However, when you create a reference table, usually the key is an autonumber field, which takes up 8 bytes on the source table. (plus the overhead of a new table and at least 2 fields)
However, this also gains you improved performance and increased ability to update things.


0
 
LVL 11

Assisted Solution

by:Runrigger
Runrigger earned 400 total points
ID: 34908395
A relational db structure will always be better than a flat table especially eliminating repeated reference data.

Its actually a much cleaner solution in terms of user entry errors which could introduce inconsistencies in the data.

Implementation is clearly that much more difficult, especially if the data is being imported.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:billelev
ID: 34908662
Runrigger, you hit the nail on the head with your last comment.

I have data being submitted by a large number of people, and as a result I would have to map imported data to existing fields...The end result, of course, is a highly consistent data set...But getting there will be really hard.

For example, say I have a customer table, and one such customer name is "Example Customer"

Say someone submits a trade with customer "ExampleCustomer"...It would be very difficult to try to correct for that, I think.
0
 
LVL 22

Expert Comment

by:rspahitz
ID: 34909344
This is the classic case of properly identifying things.  Allowing arbitrary entry or names makes it very difficult to confirm a match; using a table lookup guarantees a valid entry.

So if someone mis-enters a name in an arbitrary box, it requires extensive AI to try to determine if that is a match to something in the DB.
However, if you can't pick an invalid entry then the only concern is whether the right entry was picked.  And, of course, you will need a way to allow someone to add new entries, but that can be manages in a different way.
0
 
LVL 74

Expert Comment

by:Jeffrey Coachman
ID: 34909576
billelev,

But again, the Second post (by JDettman) here is still relevant.

Q: "Does anyone know why Access would be so much larger?"
A: "And they simply don't store data the same way...you really can't compare the two."

Now it seems that Duplicate data is now introduced...?

So what is your *actual* question here?

A straight import of an Excel sheet into an Access DB will never be the same size, because of the way Access stores data in the Database container.
0
 
LVL 11

Expert Comment

by:Runrigger
ID: 34909928
There are any number of responses here that are relevent to the question, not limited jim's response!
0
 
LVL 45

Expert Comment

by:aikimark
ID: 34910059
Normalize your data
0
 
LVL 74

Expert Comment

by:Jeffrey Coachman
ID: 34910735
Runrigger,

I meant no offense.

I was in no way saying that Jim's post is the only one to consider.
Only that his post addressed the direct question.
(Why the sizes are so radically different)

The exact subtleties of the way Access stores data (of which Indexing is a part) can be discussed at length. and it seems here that all of these aspects have been touched on.
(Paging, System Tables, Indexing, undo data, Datatypes, ...etc)
...and that's a good thing...
;-)

My point here was that the Q was asking why the file size was different:
Again,
Q: "why Access would be so much larger"
A: "they simply don't store data the same way...you really can't compare the two"

It is just that with the introduction of "Duplicate Data" the true focus of the question seemed like it was being shifted.

Or, if the question has now been revised by the asker to be:
Q: What are the differences between the way Access stores data and the way Excel Stores data.
... then this is a slightly different angle.

My guess by the brevity of the Q is that the asker was just surprised by the size difference and just want to know why.

As a side note, exporting an excel file to word (or notepad), yields a file size that is not that much bigger/smaller (in relative terms) to the Excel file.
This may have also been the trigger for the Q.

Again, I meant no offense.

;-)

Jeff



0
 

Author Comment

by:billelev
ID: 34910937
Jeff, the reason we started discussing relational db implementation is because it relates to reducing the file size of the database, but also solves a whole host of other problems also, which Runrigger correctly anticipated.

I actually think that Runrigger's response was incredibly perceptive.
0
 
LVL 57
ID: 34917467
<<Say someone submits a trade with customer "ExampleCustomer"...It would be very difficult to try to correct for that, I think. >>

  FWIW, If your doing a duplicate check on an entry, what I typically do is remove all white space and trim at both ends of a string, which prevents what you outlined.  Duplicate checks in applications is actually a tough nut to crack.  You need to have a minimum amount of attributes to ensure that you don't have say a duplicate customer.  For example, "Roto Rooter" exists in just about every city in the US, but are all independently owned and all unique.  So customer name alone would not cut it.    In fact, what were discussing here is the first step of forming a relation (a table) and the choice of a primary key.

  As far as the question itself, it's the relational DBMS aspect of Access vs the flat file approach that I was alluding to (but didn't say directly) when I said "And they simply don't store data the same way".

  Sure the data types are similar, but outside of that is the fact that Excel is a flat file approach and Access is a RDBMs, so you have a difference not only in physical storage, but a logical one as well.

  In terms of the question, a fairer compairison would be Excel vs an ASCII file or a XML file.  There the logical storage is basically the same and they are only different physically.  But even so they are all different, so all you can really say is that they will be different.

JimD.
0
 
LVL 74

Expert Comment

by:Jeffrey Coachman
ID: 34918065
billelev,

Yes, I agree on both points.
0
 
LVL 22

Expert Comment

by:rspahitz
ID: 34918304
Looking at applications that manage data, they are indeed all different.  Let me dissect with my interpretation of several.

1) Text file: stored as individual bytes of information or double- or quad-bytes depending on how universal you want the file.  So to store the letter A will take either 1 byte, 1 byte plus a storage code, 2 bytes plus a storage code or 4 bytes plus a storage code.  In addition, you have the overhead of the disk to figure out where the file will be located.

2) CSV file: stored as individual bytes with separators and qualifiers.  Depending on how strict you follow, you either get a straight text file (any of the above) or you get quotes around each data element.  So let's just say that it takes 3 bytes to store an "A"

3) Excel file: stored as text plus additional attributes.  Storing an A in cell A1 means that Excel has to identify the text value, the data type, the location, the formula, the font, the name of the cell and a host of other cell attributes.  It also needs to recognize things about the entire sheet, such as how many cells there are, where you left the cursor so it can be restored to that location, how wide you left each column if you changed them, and the named ranges and sheet name, etc.  Beyond that, it also keeps track of settings about the book, such as how many sheets are located in the book and other things like that.  How many bytes does this take to store the letter A?  Well, certainly much more than 3...maybe a few K.

4) Access file: stored as a piece of a table.  Storing an A in a field means that Excel has to have a definition for the field including name, data type and auxiliary things (like max size) and Access also tracks other things like InputMask, Output Format, default value, indexing information, and how it should show this field if you add it to a form (textbox, combo, etc.)  Them of course, you have the information about the table where this field was created, including things like which field is the primary key, are there any relationships between this table and others, which indexes are established, whether there's referential integrity enforced on the indexes, and probably a lot of other things.  Add to that the additional information about how this is used with queries, forms and reports and mix that in with information about the entire database with the name, startup form, etc. and it takes a lot of space to store a single letter "A".  I suppose that it could, theoretically, be smaller than Excel but given all of the moving parts here versus the fixed parts of an Excel cell and this will almost always take up more space.

So why does it take more to put an "A" into a database than a spreadsheet or text file?  I guess the real answer is the capability you gain in using that data.

One last thought...why does a Photoshop file take up much more space than a basic Bitmap file?  Let's just call it overhead.
0
 
LVL 11

Expert Comment

by:Runrigger
ID: 34918424
Bravo, I haven't had a lesson like that since the mid eighties, not quite as long in the tooth as TRS-80 age though - mine was ZX-81 era of introduction!

0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

When you are entering numbers in a speadsheet, and don't remember what 6×7 is, you just type “=6*7" instead. It works in every cell! This is not so in Access. To enter the elusive 42 in a text box, you have to find a calculator, and then copy the re…
Improved? Move/Copy Add-in Replacement - How to avoid the annoying, “A formula or sheet you want to move or copy contains the name XXX, which already exists on the destination worksheet.” David Miller (dlmille)  It was one of those days… I wa…
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now