Link to home
Start Free TrialLog in
Avatar of SuraDalbin
SuraDalbinFlag for United States of America

asked on

Best Method to Pull Data From Multiple Workbooks

Hello Experts,

I'm looking to create a master database (in Excel, preferably) that will pull all its information (records) from multiple workbooks that will be saved in the same network drive, but different folder path.  These multiple workbooks will be identical, same number of worksheets, same layout, same format, etc., they will just contain different information.  The master database file will have two main uses:

1. Multi-User

Managers will open file directly and generate reports based on desired criterias (in Excel I'm planning on achieving this by using filters)

2. Daily Automated Reports

At a specified time on weekdays, the file will send e-mails that will contain specific reports, to multiple managers
Although I'm a novice at vba, I believe that all these tasks can be achieved through vba code and/or excel formulas.  

In the office we all have MS Access 2010 as well, but I'm hesitant to use Access because the end-users (managers) to this master database do not know how to use this program.

I'm hoping to get help with the following:

1. Question 1

Does my logic sound right?

2. Question 2

Which function of excel would work best to pull data from closed workbooks on a daily basis?  (I read somewhere that the INDIRECT function works well, but not sure if it'll be a slow process)

3. Question 3

Since we all have MS Access, would this program be a better option to pull the data from the multiple workbooks and somehow link Excel to Access (for purposes of the front-end to the users)
Any help that you can provide me will be greatly appreciated.
Avatar of hypercube
hypercube
Flag of United States of America image

The simplest way to do it is to enter something like +[cell] in a cell of the summary workbook where you need it.

Then "cell" above can include the entire path workbook sheet and cell reference.
No formula is included in this description (except the "+" but these contents can include in formulas.

So far that's not very flexible but it's certainly a way to pull data from the other workbooks.  
They need not be open (or explicitly opened) in order for this to work.

Then, if you need to sort the data, I have a similar workbook/sheet that has this:
Column A: Path
Column B: Sheet
Column C: Filename

Then, there is a row that is filled with formulas that can include Path, Sheet, Filename, Cell, etc.

Then, there is a vba sub that will fill the table with those formulas that include the appropriate Path, Sheet and Filename information in their respective formulas.  
Running this sub is only needed to fill the table once.  It's not needed again unless you mess up the order of things with sorting.  So, if you're not going to mess up with sorts then you don't need it.  

Here's an example of cell contents of K4:
='C:\Users\Fred\Documents\...........\[ABG.xlsm]Model'!$J$2
Here's an example of the template formula for the same column that is stored in K500:
'='Path[Book]Sheet'!$J$2
The sub substitutes the values of columns A-C using specified syntax into the formula for each cell.
Some formulas have no reference to external files and others do.
ok I have several questions but first let me summarize my understanding of your present setup.

SETUP
You have users updating several workbooks that sits on network drives that are in different paths but all these workbooks are exactly the same in format and number of sheets but their data content is different from each user's input. I envision it  as 10 users (UserA, UserB, UserC etc ... having access to
I:\Users\UserA\Workbook.xlsm
I:\Users\UserB\Workbook.xlsm
I:\Users\UserC\Workbook.xlsm
...
and you want to Create 1 database say in I:\Managers\database.xlsm that will contained a collected info from the 10 Workbook.xlsm that the users update regularly under a certain format that the managers will access the database.xlsm to pull info or to generate reports.

Am I correct in understanding your Setup ?

SUGGESTION
1) If my understanding is correct then I would tend to offhand say (as we are not into details yet of what you want exactly) but I would see
either
a) VBA in Workbook.xlsm that will export data to database.xlsm so this way it is 1 code that is written and is duplicated to X number of users.
OR
b) Formulas in database.xlsm that will pull data from each workbook

The choice between a) or b) depends exactly on how you want the data in the file database.xlsm if you wanted clearly split between each User then formulas is easier if you want it to be grouped and summarized then for sure VBA is better

2) If my understanding of your setup is not correct then please clarify it for me.

For sure I would favor an Excel solution as this is the topic here chosen and is more flexible but also Access could be envisioned but is has a different setup and design.

gowflow
You mention the INDIRECT function in your question. This function can indeed be used to string together various parts of a file name and path which vary so that can be included within a formula for referring to another workbook.

INDIRECT does work well for this purpose but it does not work when the source workbooks ie the files from which you are extracting information are closed.

Thanks
Rob H
Avatar of SuraDalbin

ASKER

Fmarshall, Gowflow and Rob, thank you very much for your comments.

Gowflow,

Yes, your understanding is exactly what I need; specifically the option where all the Users' input is grouped and summarized in the database.xls, since once all data is in this file, pivot tables and other reports of the like can be generated.  I too favor the vba approach.

I'm attaching an example of what the Users' workbook file will look like, so you can get a feel for the data and layout.  As you mentioned, all users will have same format and number of sheets, but their data content will be different from each others, and so will the amount of records (Client_Names), where some users might have 100 clients and others 40, 60, etc.

Please note, although all files will contain multiple sheets (Summary, Day 1, Day 2, etc.), I am only interested in pulling the data from the "Summary" Sheet.

(1)  The database.xls file will pull its data from userA.xls, userB.xls, etc. on a daily basis (Or based on your suggestion, userA, userB, etc., will populate database.xls).  Would it be best to copy and paste all data from users to database.xls?  Or to Link all users' workbooks to database.xls?  The purpose of the database.xls file would be to provide a daily snapshot of all users' data, so in this case, whichever approach is less likely to provide complications in the future would be best.

(2) As far as providing help with the vba code, I realize that you have, in essence, answered my question, so please let me know if I woul d need to post a second question to award additional points.

Thank you very much for your help.
userA.xlsx
Copy and paste is a manual process.  I would avoid such things like the plague.  So, I would link them.

I see no particular reason why you could not link the data in some simple form and then structure it into whatever thereafter.

A *very* simple, if gross, example would be to link the entire workbooks into sheets in the database workbook - worksheet for worksheet like this:
In database.xslm define sheets UserA,UserB,UserC, etc. as well as Sheets SummaryA, SummaryB, SummaryC, SummaryALL, etc.
Link all cells in database.xslm/UserA from I:\Users\UserA\Workbook.xlsm\Sheet1!A1:XX999
Link all cells in database.xslm/UserB from I:\Users\UserB\Workbook.xlsm\Sheet1!A1:XX999
Link all cells in database.xslm/UserC from I:\Users\UserC\Workbook.xlsm\Sheet1!A1:XX999
Create database.xslm Summary sheets as needed that link from other integrated sheets above.

I'm not suggesting this as a design but, rather, a design concept in the sense that all the links ever needed are there and any manipulation is done entirely internal to the database workbook.  That's why I called it "gross".  Obviously one can make the summary formulas more complicated and link directly into them.
fmarshall,

Thanks again.  I really like this approach, where one file does the linking and all other data manipulation.  

I guess I could just throw in a small vba code in the database.xls file where it opens all the user workbooks as read-only at the "open workbook" event, to refresh all links and then close.  Also, I'm going to look into having vba code for "linking to" additional users.  

1) Any thoughts or code suggestions for this?  I'm thinking the code should prompt to browse and open the "new" user workbook.xls and then link to the same range as all other workbooks.


Thank you much.
I would NOT open those files.  In my case the system will crash as there are over 200 of them and they are big unto themselves.  Through the process I learned that the files need not be open (or opened and closed even).
The links will update when the summary file is opened and, if it's opened, will update if one of the "feeding" files is updated.

I don't see where you need VBA code to add users.  But maybe your app is different than I understand.
Here's a thought along the lines of a sheet for each User in the Summary:

In a "list" sheet of the database:
Column A has the source paths. Column B has the source filenames. Column C has the corresponding source Sheets.
If they are all the same except for the filename then it could be
A:A=C:\Users\You\Documents\Datafiles
B1=USER1.xlsm; B2=USER2.xlsm; etc.
C:C=Sheet1!

Then in each sheet of the database file corresponding to the filename:
Cell A1 has =A1 & "[" & B1 & "]" & C1 & $A$1
Cell B1 has =A1 & "[" & C1 & "]" & B1 & $B$1
etc.
And you may have to insert single quotes if the path has spaces in it....

If you need to add a user then build a userN.xlsm file.
And in the database "list" sheet add a new row for that new file.
And in the database add a sheet for that new file.
Sorry but we are being very vague here.

The file you posted is fine in sheet SUMMARY it has data from Col A to K which presume is the data you need to replicate on the Database.xlsm.

I would suggest the following:
1) Some automation in the file handed to userA, UserB like the one attached
2) a VBA that will populate basis mutual understanding when each record is added to be populated to the database.xlsm

pls check the file attached I have protected (without any password) the sheet SUMMARY and added the user which is a formula that will pull automatically the filename (if this one is Joe Blow it will pull it as well the project are autoincrement always they both display when data is added to Col C. The columns in grey are formula driven thus protected.

We can change all this but this is first basic attempt.

The idea is as follows:
When the user finishes a row or modifies a row (depending on complexity) the VBA code will lookup the file database.xlsm for that record (project + user) which is unique and replace the existing one by the one just updated. If that record is not found in database it will added after the last existing row.

This way in Database.xlsm you will have 1 sheet called SUMMARY that will hold all records of all users exactly how they have been entered.

LEt me know your thoughts.
gowflow
userA.xlsx
fmarshall,

I began the process of building the database.xls file, per your suggestion.  It seemed to be working, but it so happened that there was a power outage at the office and could not finish the process.  

Anyway, I guess the only concern I was having as I linked about 10 rows to the source files, was the amount of time it took for the database.xls file to update those few links.  I will continue testing and let you know how that approach goes.

Gowflow,

I realize that the excel file I provided was very raw and simple.  In reality, the users' files have some automations, specifically, an automation that will do what you suggested, which is to insert the name of the user to each "record" and assign a project number, exactly they way you did, however I had used a different approach, using the =max formula.  In any case, I do agree with you on both suggestions.

As far as the code to update the database.xls file, this is the part that I honestly have no idea where to begin.

Thanks for your help.
Well we cannot keep talking vague and non specific.

It is useless for you to post something that is not EXACTLY what you have.

I suggest the following - that is if you want my help -

1) You post the exact workbook that your users are working on making sure you change the confidential data
2) I build you a macro that will update the database.xlsm and will hand it to you and you could then test it.

I do not believe that you will get anywhere productive and practical (In your case as you have increasing data) by using a link as proposed otherwhyse in this thread. It may look easy and appealing at first but when you gather several workbook and increasing data it becomes quite intricate.

Let me know of your choice.
gowflow
As I warned, this was a design *approach* and not a design.  It was intended to hasten your learning.
Yes, the more links you have, the longer it will take to update.  
One of my summary files takes a while to open.  But, once opened, it's fine.  And, thereafter, updates to it happen quickly.
One is often faced with making things run faster once they figure out what they need and want.

I hope this is helpful.
Fmarshall,

Agreed, you did warn, I thought I was onto something when I read your suggestion, so yes it did help, much appreciated.

Thank you
Gowflow,

You are correct, and yes please, if you could help me with this, I would greatly appreciate it.  Attached please find the exact files that users will be using.  This file achieves two purposes for the users, it maintains a daily timesheet and it also prepares a year-to-date summary of all projects that the user has worked on.  When the user updates his/her daily timesheet, that amount of hours worked on a given project (task) gets automatically updated in the "Backlog Details" sheet, which is the summary that we previously discussed.  This sheet holds the "records" that I would like to populate in database.xls, specifically, I would like the userN.xlsm files to populate only column A to column AA of the "Backlog Details" sheet to database.xls, since this is the only relevant information all managers need.

Please note the following, related to the files:

1)  All userN.xlsm files and the database.xls will be saved in the same network drive.  The userN.xlsm files will be saved in their corresponding folder for that user.

2) Since multiple managers would need access to the database.xls, this file should be a shared file.

3) Question:  Also, since multiple users will be updating their individual userN.xlsm files to populate the database.xls, at what Excel event would be more efficient to update the database.xls?  (Close, before_save, etc.)

3) On a daily basis, preferably early in the day, the database.xls file will be used to prepare and e-mail various reports, as I mentioned in my initial post, this will probably be done via pivot tables or filters.

I realize that there probably is a much more efficient way for achieving this whole cycle (user -> database -> reports -> managers) in excel, and therefore I would be open to any and all suggestions.  Otherwise, I just need to make the best with what I have.

Again, Gowflow, I thank you for your time and for all your help.

Thanks,
userA.xlsm
ok fine as reading your last comments I thought you had choosen the other route. So please confirm that indeed you prefer the option I proposed as VBA prior to devoting time to develop this if it is not what you really want.
gowflow
at what Excel event would be more efficient to update the database.xls?  (Close, before_save, etc.)
As I mentioned earlier, there is no overt action necessary to update the database.xls file contents if there are inter-file cell references in the database.xls cells.  They will be updated when the database.xls file is opened and they will be updated when changed in the source files.  At least that's my belief and it's easy enough to confirm.  I don't have access to mine right how.

If you're relying on vba code subs to do this then I'm sure that results will vary according to the code.
Gowflow,

Yes, if you could please help me with the vba, i would appreciate it.

Thank you.
Hello Gowflow,

Just following up to see if you'll be able to help me with vba code to make this happen.  Thanks.
yes sure but sorry having here couple of days holidays sorry for the delay will attend shortly tks your patience.
gowflow
Hello Gowflow,

Thanks for your response.  Checking in if you have had a chance to work on this.  Please let me know if this doable sometime soon, that way I can plan accordingly to finish the rest of this project.

Thank you much.

Sura
First thing I would do is change the source spreadsheets to something like the attached file, though you probably don't need the summary page as the pivot table gives you the same answer.  Then follow that up with consolidation.
Once you have the mutiple spreadsheets with sound summaries, I would open a blank Access database and create a table (by linking) for each spreadsheet summary.  So, depending on how many spreadsheets you might have, you could end up with 20 tables for instance.  Then you create a query that "Unions" the data from the 20 tables into one set of data.  Then you link an Excel table or pivot table to the union query (for those who don't want to work in Access) and you should be good to go.  - Tom
Oops...  Think I forgot the attached file I referenced.
EE.xlsx
Hello Tomfarrar,

Thank you for help with this project.  I'm going to give this a try over the weekend and let you know how it worked out.

Thanks again,

Sura
Sure...  Good luck.
Whether you use an Access database or an Excel workbook to summarize the data is a matter of preference and what you intend to do thereafter.  You may find that dealing with a single application (Excel) is more convenient for you in some sense.  So, I stick with my original recommendation and comments.
Hello fmarshall,

Thank you for still following up on my question, I really appreciate it.  So, when I tried the raw layout (or suggestion) that you provided, I mentioned that the only thing I noticed was the time it took for the database file to update, which is not extremely slow, but slow nonetheless, when first opening the database.xls.  Then you mentioned that your suggestion was more of a guideline or raw layout as to how to achieve my goal with this project.

Based on the sample file I provided, the second version, (userA.xlsm), what would you suggest my database.xls should look like.  How should I proceed?  I really would like to finish this project this weekend, if possible.

Thank you for your help.
ASKER CERTIFIED SOLUTION
Avatar of hypercube
hypercube
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This is working perfectly fmarshall. Thanks again for all your help.