Solved

User-form That Will Enable A Search Against A Massive Spreadsheet Being Used As A Database

Posted on 2013-01-22
49
409 Views
Last Modified: 2013-03-09
Greetings Everyone.  I am reaching out for some help on a project that I hope will teach me a lot.  I have a couple of ideas/concepts; however, I am wide open for any new suggestions, ideas, constructive criticism, and/or anything else that will guide this project to fruition.

Okay, the basic concept is to have some sort of user interface for company users to perform searches based on particular item numbers and other attributes.  Currently, I have a Discoverer Report that pulls a "Master Sheet" every night via DOS batch code.  This is programmed to save the file with the date in the file name.  I would like to have a centralized userform/spreadsheet that the users can input some data and search against this form without it being open.

The cause for this is that the file itself contains nearly 750,000 rows, each containing a specific/distinct item/part number and its attributes.  It takes forever to open the file, plus it is somewhat inefficient to open this large file to find one attribute for a part number.  I have researched the web and found that there are options using ADO and/or DAO methods.  I am not fluent with these at all, hence I call out to the Experts.  

If there is a better way to have this done via Excel, I am all ears.  I greatly appreciate any help everyone.  I can provide sample data/workbooks to whoever is brave enough to accept the challenge.  Thanks again everyone and God bless!
0
Comment
Question by:Christopher Wright
  • 26
  • 23
49 Comments
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
This might be a good place to use an HTA (HTML Application).  I created several HTA applets for one of my clients to do lightweight things, such as look up information.  In my case, the data is in Access databases, but the ADO code will be similar.

It would be helpful to post a small, but representational, sample of the workbook.

=========
If performance is still a problem, it might be worth considering a tiered approach to the data, where the rows are distributed to different files/workbooks/databases with different names that are related to their content.

At the end of the nightly run, you might post-process the workbook data to create a slimmed down version of the data with just enough data to validate the users' queries and speed up the data retrieval.

You would do your company well to populate a database, rather than use a workbook as a database.  There are several lightweight database engines and some are free if that is a concern.
0
 

Author Comment

by:Christopher Wright
Comment Utility
I am interested in any option that will make life easier for everyone.  I will get you that spreadsheet first thing in the AM.  Okay?  Thanks for helping me out!  God bless!
0
 

Author Comment

by:Christopher Wright
Comment Utility
My apologies for the delay.  What would be a sufficient example that you'd like to see?  What in particular would you like to see?  Thanks again for the help!
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
somewhere in the 100-200 row range will, hopefully, reveal any anomalies in the data and any inherent patterns.
0
 

Author Comment

by:Christopher Wright
Comment Utility
Here you are.  Hope this will suffice.  What do you think would be the best direction to go with a userform here?  Thanks
ADO-Test-File.xlsx
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
1. Edit the following code to change the path of the Excel file in the connection string.
2. Open the Excel workbook and remove the leading space character from the worksheet name.  In fact, you will probably have a completely different name in production, so the worksheet name in the From clause will need to match your actual name (with a $ suffix)
3. Save the following code as Q_28004727.HTA
<html>
<head>
<title>Q_28004727</title>

<HTA:APPLICATION 
     ID="Q_28004727"
     APPLICATIONNAME="Q_28004727"
     SCROLL="yes"
     SINGLEINSTANCE="yes"
     WINDOWSTATE="maximize"
>
</head>
<style>
.mono {
	font-family:courier;
}
</style>
<SCRIPT Language="VBScript">
    sub ReturnRows
	Dim cn
	Dim Rs
	Dim strData
	Dim strRow
	Dim fld
	Set Rs = CreateObject("ADODB.Recordset")
	Set cn = CreateObject("ADODB.connection")
	cn.connectionstring = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Users\AikiMark\Downloads\ADO-Test-File.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=YES;IMEX=1"";"  'Persist Security Info=False;"
	cn.Open
	Rs.Open "Select * from [Item Master (Test ADO)$] Where [Part Number] Like '%" & txtFilter.value & "%'", cn, 1, 3
	strData = "<table width=""100%"" border>"
        if not rs.eof then
	  strData = strData & "<tr>"
          strRow = ""
          For each fld in rs.fields
            strRow = strRow & "<th>" & fld.name & "</th>"
          Next
          strData = strData & strRow & "</tr>"
        end if 
	do until Rs.EOF
	  strRow = "<tr>"
	  For Each fld in Rs.Fields
		if fld.name = "Cat Code" then
			strRow = strRow & "<td class=""mono"">" & fld.value & "</td>"
		else
			strRow = strRow & "<td>" & fld.value & "</td>"
		end if
	  Next
	  strData = strData & strRow & "</tr>"
	  Rs.MoveNext
	Loop
	DataArea.InnerHTML = strData & "</table>"

    end sub
</SCRIPT>
<body>
<input type="text" name="txtFilter" size="50"><p>
<input id=runbutton  class="button" type="button" value="Get Data" name="run_button"  onClick="ReturnRows"><p>
<span id=DataArea>This is a span named DataArea.</span>
</body>
</html>

Open in new window


Note: the SQL statement uses wildcard matching of strings.  You will get better performance if you use exact matching.
0
 

Author Comment

by:Christopher Wright
Comment Utility
This is working quite well so far.  It's fast and brings back the entire row with all the attributes.  Is there a possibility to have this data broken up into separate fields.  For instance the search is performed and the info populates to a list as such

ADS Part Number: ___________________________
Inventory Item ID:___________________________
Item Description:____________________________
Vendor Part Number:________________________
Manufacturer Part Number:_________________

And so on....

What I am trying to ultimately achieve is an interface/environment where a user can search by item and find multiple information but also have a "dashboard" like interface where there will be aggregates of certain vendor sales and average lead times and other data.  This would all be pulling from such files as I've provided.  The charts or whatever is chosen would be an auto run when the file is opened and the search would be on demand.  Is this a possibility?  Can such files have these types of objects?  Can they pull from multiple sources?  Thank you so much for opening up a new world for me here with the HTM route!
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Is there only going to be one record retrieved?  If so, then yes -- the output can easily be displayed vertically, rather than in grid/table form.

I'm not sure what you're asking for as far as a dashboard is concerned.  Let's stick to the original question you asked about retrieving data.  After you have an acceptable solution to the current problem, you can ask another question about dashboard.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Here is a version of the HTA that produces a list output.  I changed the comparison from wildcard to exact match.  Remember to change the path.
<html>
<head>
<title>Q_28004727</title>

<HTA:APPLICATION 
     ID="Q_28004727"
     APPLICATIONNAME="Q_28004727"
     SCROLL="yes"
     SINGLEINSTANCE="yes"
     WINDOWSTATE="maximize"
>
</head>
<style>
.mono {
	font-family:courier;
}
</style>
<SCRIPT Language="VBScript">
    sub ReturnRows
	Dim cn
	Dim Rs
	Dim strData
	Dim strRow
	Dim fld
	Set Rs = CreateObject("ADODB.Recordset")
	Set cn = CreateObject("ADODB.connection")
	cn.connectionstring = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Users\AikiMark\Downloads\ADO-Test-File.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=YES;IMEX=1"";"  'Persist Security Info=False;"
	cn.Open
	Rs.Open "Select * from [Item Master (Test ADO)$] Where [Part Number] =""" & txtFilter.value & """", cn, 1, 3
        strData = vbNullString
	do until Rs.EOF
	  For Each fld in Rs.Fields
		strData = strData & "<b>" & fld.Name & ": </b>" & fld.value & "<br>"
	  Next
	  Rs.MoveNext
	Loop
	DataArea.InnerHTML = strData

    end sub
</SCRIPT>
<body>
<input type="text" name="txtFilter" size="50"><p>
<input id=runbutton  class="button" type="button" value="Get Data" name="run_button"  onClick="ReturnRows"><p>
<span id=DataArea>This is a span named DataArea.</span>
</body>
</html>

Open in new window

0
 

Author Comment

by:Christopher Wright
Comment Utility
My apologies for the ambiguity.  As far as the dashboards are concerned, I was only referencing the direction I planned to take after we concluded this project.  Thank you for helping me out.  I greatly appreciate it.  
BTW:  This code is awesome.  Now, would I need to ask another question in order to have this formatted to show up in text boxes .  Pretty much create a display similar to a common search screen.  Thanks for all of the help.
0
 

Author Comment

by:Christopher Wright
Comment Utility
Just a quick note.  I applied this to my entire file.  It's quite slow.  Would there be an option to speed this up or anything quicker?  I'll start looking at databases.  Would I need a multi-user?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Why would you want to display the results in textboxes?  This is read-only data.  A textbox would make the user think they can update the data.

A database would most likely be the best technology to apply to this solution, since databases use indexes.  If you want to try something similar, you might do the following:
1. sort the Excel data in Part Number order
2. copy the Part Number column to a new workbook
3. save the workbook as a text file or tab-delimited file (example: PtNum.txt)
4. Save the original workbook as a text file
5. When the user wants to retrieve the part number row, the program reads the entire PtNum.txt file, uses the SPLIT() function to move the data into an array, and then does a binary search for the data.  With the 750k part number estimate, it will only take about 20 comparisons to find the part.  Since this is in memory, only the first search will be slow.
6. The big data file is opened with a FileSystemObject Textstream and the associated data line is read and displayed.

Note: the PtNum.txt file only needs to be processed once.  If the user does multiple searches, then the array can be searched from the already populated data.  In fact, you might do the I/O when the HTA form loads, so that the user never has a slow search, just a slow page load time.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
If your data pattern is consistent, then based on the 200 row sample you posted the PtNum.txt file will be about 8.7 MB.  For a brief moment, the raw data string will coexist with the array, which will fill twice the RAM footprint.  After the array is populated, the raw data string can be set to vbNullString.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Another approach would be to use a Dictionary object in place of the array.  The key would be the part number and the data would be a sequential number that is the line number of the associated data.

Note: although it is theoretically possible to store all the data in a dictionary object, the memory footprint would most likely be prohibitively large.  Based on the sample workbook, storing the data in a dictionary object would consume 153 MB of RAM.  If your users have sufficient free physical memory, then this would be the best non-database implementation.
0
 

Author Comment

by:Christopher Wright
Comment Utility
Hi there aikmark.  Thanks for the support and quick response on this.  Maybe I stated what I wanted wrong.  I don't want these data placed in textboxes per say but in formatted fields.  The intent was only to create a structured user interface.

Using the files formatted as .txt files would be an option.  The nightly reports I pull have the option of being exported as such.  If that could speed the time of the search, I'm all for it.  Another export option is html.  Would that be a better option?  

Just out of curiosity, are you familiar with Oracle Discoverer desktop?  This is what I am using to query the data each night and then export each report in sequence.  If there are options there, I am all ears as well.  Thank you so much for the help!
0
 

Author Comment

by:Christopher Wright
Comment Utility
Pertaining to the dictionary object, this may be an option.  We have a public directory accessing by everyone which has 29.7GB of 499GB free.  Would this be doable?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Disk space isn't the problem.  Available physical memory on each user's PC is the limiting factor.

Create the PtNum.txt file and see how long it takes to populate a dictionary object, how much of a memory footprint it has, and see how long it takes to find a part number.
0
 

Author Comment

by:Christopher Wright
Comment Utility
I hope you'd pardon my ignorance here, but I am not well versed in this area at all.  How can I get started on what you stated above?

Create the PtNum.txt file and see how long it takes to populate a dictionary object, how much of a memory footprint it has, and see how long it takes to find a part number.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
You might create a file with just the data from the first column.
or
You can also open the big workbook, deleted all the columns after column A and Save As a text file.
0
 

Author Comment

by:Christopher Wright
Comment Utility
I could simply create a new report in my discoverer with just the essential information, then have that export to a text file everynight for updated data.  That could work.  

How would it work if I went the database route?  Would I simply download the db locally and grant everyone access or would they need to download as well?  Thanks again!
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
It depends on where your users are located.  If they are all on the same LAN, you only need one database.  Everyone should be able to do queries against it.
0
 

Author Comment

by:Christopher Wright
Comment Utility
Perfect.  Everyone is on the same network.  What would be the best route to take database wise.  Obviously, I would like to go the free/cheapest route.  Any suggestions?
0
 

Author Comment

by:Christopher Wright
Comment Utility
Any names of databases that you think would best suit my needs here? Also, how would I go about connecting this database to the spreadsheet updated everynight or should I have the Oracle Discoverer export directly to the database?  Thanks
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
What about using the Oracle database you already have?
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Author Comment

by:Christopher Wright
Comment Utility
It is not user-friendly.  Also, users are not allowed to query against the database.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
But this is a fixed query, not an ad hoc query.
This is something suitable for a view, a daily snapshot of the data, not even live data.

What is their problem?
0
 

Author Comment

by:Christopher Wright
Comment Utility
Haha - I wish I knew.  I would not be searching for this.  I assume you'd like for me to close this one out?  Do you mind taking a peek at another one of my questions which has had 0 response?

Other Question
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I didn't know if you were going to try one of the dictionary object solutions in this thread.
0
 

Author Comment

by:Christopher Wright
Comment Utility
I'm not certain which thread you are referring to.  I beg you forgiveness.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Please look through the comments I posted in this thread.
0
 

Author Comment

by:Christopher Wright
Comment Utility
Okay, I went back and found the thread where you stated:
A database would most likely be the best technology to apply to this solution, since databases use indexes.  If you want to try something similar, you might do the following:
1. sort the Excel data in Part Number order
2. copy the Part Number column to a new workbook
3. save the workbook as a text file or tab-delimited file (example: PtNum.txt)
4. Save the original workbook as a text file
5. When the user wants to retrieve the part number row, the program reads the entire PtNum.txt file, uses the SPLIT() function to move the data into an array, and then does a binary search for the data.  With the 750k part number estimate, it will only take about 20 comparisons to find the part.  Since this is in memory, only the first search will be slow.
6. The big data file is opened with a FileSystemObject Textstream and the associated data line is read and displayed.


I have made all of the suggested modifications as directed.  Now, what do I need to do to the code itself in order for it to search/use the *.txt files instead of the *.xlsx?  Please bear with me.  I apologize for the ignorance.  Thanks for the help aikimark.
0
 

Author Comment

by:Christopher Wright
Comment Utility
The searches are still taking about 3 minutes to run.  Even after the first search has been conducted, it still maintains an average of around 3-4 minutes per search.  Just wanted to give a heads up.  Thanks again.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
put the ptnum.txt file in a zip file and upload it.
0
 

Author Comment

by:Christopher Wright
Comment Utility
Here it is.
PtNum.zip
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Alert: Of the 581089 part numbers, you have 1846 duplicate values.  Duplicate key values are not permitted in dictionary objects. You may have a similar problem with a database as well as the larger problem of retrieving data for the wrong part number.

In my VBA test environment, I measured the following performance:
Time to read 581089 lines:   0.9189453 sec

Time to read ONLY line 11:                 0 sec
Time to read ONLY line 101:                9.765625E-04 sec
Time to read ONLY line 1001:               1.953125E-03 sec
Time to read ONLY line 10001:              1.171875E-02 sec
Time to read ONLY line 100001:             0.1142578 sec
Time to read ONLY line 200001:             0.2441406 sec
Time to read ONLY line 300001:             0.3691406 sec
Time to read ONLY line 400001:             0.5 sec
Time to read ONLY line 500001:             0.6230469 sec

Time to fill dictionary object with 581089 part nums:        42.48242 sec
Time to do (above) nine dictionary item retrievals:              3.320313E-02 sec

===================
The file I am reading is the part number file you posted, so it is much smaller than the large file we will be accessing in production. I would expect those file/line read times to be larger than the above performance figures.

Once the dictionary object is filled, any single item lookup in the dictionary object would be about .0037 seconds, followed by another 1-4 seconds to retrieve the data line for that part number.

The big file can be used both to populate the dictionary object and hold the related part data. The PtNum.txt request was meant to minimize the size of the file you uploaded to EE.

===================
Here is my performance test code.  
Note: You need to add a Scripting Runtime reference to your VBA project in order to test this.
Option Explicit

Public Sub ReadPtNum()
    Dim oFS As New Scripting.FileSystemObject
    Dim oDic As New Scripting.Dictionary
    Dim oTS As TextStream
    Dim lngCount As Long
    Dim lngDupCount As Long
    Dim sngStart As Single
    Dim sngTotal As Single
    Dim strLine As String
'    lngCount = 0
'    sngStart = Timer
'    Set oTS = oFS.OpenTextFile("C:\Users\Mark\Downloads\PtNum.txt", ForReading, TristateFalse)
'    Do Until oTS.AtEndOfStream
'        strLine = oTS.ReadLine
'        lngCount = lngCount + 1
'    Loop
'    Debug.Print "Time to read " & lngCount & " lines:", Timer - sngStart
'    oTS.Close
    
'    lngCount = 0
'    sngStart = Timer
'    Set oTS = oFS.OpenTextFile("C:\Users\Mark\Downloads\PtNum.txt", ForReading, TristateFalse)
'    For lngCount = 1 To 500000
'        oTS.SkipLine
'    Next
'    strLine = oTS.ReadLine
'    Debug.Print "Time to read ONLY line " & lngCount & ":", Timer - sngStart, strLine
'    oTS.Close
    
    oDic.RemoveAll
    lngCount = 0
    sngStart = Timer
    Set oTS = oFS.OpenTextFile("C:\Users\Mark\Downloads\PtNum.txt", ForReading, TristateFalse)
    Do Until oTS.AtEndOfStream
        strLine = oTS.ReadLine
        lngCount = lngCount + 1
        If oDic.Exists(strLine) Then
            lngDupCount = lngDupCount + 1
        Else
            oDic.Add strLine, lngCount
        End If
    Loop
    Debug.Print "Time to fill dictionary object with " & lngCount & " part nums:", Timer - sngStart, "Duplicate Count:", lngDupCount
    oTS.Close
    
    sngStart = Timer
    Debug.Print oDic("1408015")
    Debug.Print oDic("D8267")
    Debug.Print oDic("P877-18.5X31")
    Debug.Print oDic("10-0010")
    Debug.Print oDic("CPR OCN1291SF590")
    Debug.Print oDic("E01228-12EW")
    Debug.Print oDic("P2-SP")
    Debug.Print oDic("GEN III MC L4 TOP XLXXL")
    Debug.Print oDic("XX67222S")
    Debug.Print "Time to do (above) nine dictionary item retrievals:", Timer - sngStart

    
End Sub

Open in new window

0
 

Author Comment

by:Christopher Wright
Comment Utility
Awesome!  I will look into this one now.  I appreciate that!!
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
This might also be solved with XML/XSLT.
0
 

Author Comment

by:Christopher Wright
Comment Utility
Actually, I was recently introduced to this and considered it but how would I go about setting this up?  I am thinking it would be quicker too, correct?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I'm not sure about performance. I thought about this because you might place the XML file on a server and file I/O to a server might be slow.  I'm not sure how well this would perform. It is technically possible do this.
0
 

Author Comment

by:Christopher Wright
Comment Utility
So, in your opinion, what is the best path to take at this point?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
1.  place a .txt version of the big data file in a server directory
2. run some of the performance tests from within an Office product against the server resident big file.
3. place a copy of the .txt file (1) on your local hard drive
4. repeat the performance tests against the local copy of the .txt file.

You will need to address the duplicate part number problem I mentioned in my earlier comment.

Depending on the performance, you might be able to use this as a temporary solution until you implement a database solution. If the server-resident performance sucks and a local copy isn't feasible, we can look at the performance of an XML/XSLT solution.
0
 

Author Comment

by:Christopher Wright
Comment Utility
I have tested this as directed.  While it was not the slowest in the world, it was in no way the fastest.  Can we look at testing the XML/XSLT?  I can simply compare performance with the .txt and make a call on which performs better and has the best dependability.  Thanks
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
What was the local performance like?
0
 

Author Comment

by:Christopher Wright
Comment Utility
Sorry, I commented on the wrong question.  :(
0
 

Author Comment

by:Christopher Wright
Comment Utility
Do want me to provide results?  As you did previously?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
yes.  I wanted you to run the performance tests, both server and locally.
0
 

Author Comment

by:Christopher Wright
Comment Utility
Roger that.

 As for the duplicate part number issue, the parts you call duplicate are in truth not duplicated.  These had the prevailing '0's chopped off due to .csv formatting.  Do you think that if we could search based on the part number and the unique ID that our Oracle database assigns to each item. But this would require the user to provide both as well.  Oftentimes the user does not have the ID.  Is there a way to have the '0's not chopped off by the .csv formats?

 I will provide the info requested in the server testing as well.  God bless!
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
Comment Utility
Have you run the performance tests?

It is usually Excel that chops off zero characters when converting values into numbers.  I expect that an intermediate solution will not use Excel, but a CSV file. The thing that concerns me is the initialization and retrieval performance local versus network. The most convenient configuration would be to have a single copy of the big file on a file server and everyone reads this.  However, there are performance issues to consider. The work-around is to have the code download the big file once a day and do all the I/O locally -- the network only gets 'hit' once per user per day.
0
 

Author Comment

by:Christopher Wright
Comment Utility
I'm going to accept this as resolved since I have killed this project.  Thanks arimark.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now