Question

Need fast sorting routine

Asked by: BeauT

Is there a fast way to read the contents (file names) of a directory into a sorted data structure? I am using the API as far as retrieving the file names, but it is the actual sorting which impacts performance. I have tried all sorts of ways and all of them result in long processing times. Just to get a comparison test speed, I have read the contents of a directory containing 5000 files into an unsorted array in almost no time. But as soom as I try and add sorting - either during or after reading all the files, my performance goes down the toilet? Any ideas? Or am I going to be stuck with having to rely on a listview for quick sorting?

Just to let you know a few things I have tried: I have tried reading all of the files directly into an array, then sorting that array into another array. I have tried reading into a memory resident disconnected recordset, which yielded the best, but still unsatisfactory performance. I even tried writing a VB link list, which came close to the recordset in performance, but not quite.

So, I am stumped...

This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.

Subscribe now for full access to Experts Exchange and get

Instant Access to this Solution

  • Plus...
  • 30 Day FREE access, no risk, no obligation
  • Collaborate with the world's top tech experts
  • Unlimited access to our exclusive solution database
  • Never be left without tech help again

Subscribe Now

Asked On
2001-02-25 at 00:48:51ID20083210
Topic

Visual Basic Programming

Participating Experts
10
Points
200
Comments
26

Trusted by hundreds of thousands everyday for fast, accurate and reliable tech support.

  • "The time we save is the biggest benefit of Experts Exchange to Warner Bros. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange." Mike Kapnisakis, Warner Bros.
  • "Our team likes having a resource that is more secure than just using Google and most experts using this service really know their stuff. It's nice to look here first versus using Google." Dayna Sellner, Lockheed Martin
  • "Anytime that I've been stumped with a problem, 9 out of 10 times Experts Exchange has either the accepted solution or an open discussion of the potential solution to the problem." Kenny Red, eBay Inc.

See what Experts Exchange can do for you.

Got a question?

We've got the answer.

Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.

Screenshot of Experts Exchange Knowledgebase

Need individual assistance?

Our experts are ready to help.

If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.

Screenshot of Experts Exchange Knowledgebase

Want to learn from the best?

Read articles from industry experts.

Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.

Screenshot of an Article

Working on a long term project?

Store your work and research.

Save solutions to your questions, answers you’ve discovered through searching plus helpful articles in your personal knowledgebase for easy future access.

Screenshot of Experts Exchange Knowledgebase

Access the answers to your technology questions today.

Subscribe Now

30-day free trial. Register in 60 seconds.

What Makes Experts Exchange Unique?

Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Trusted by the world's most respected brands.

image of each brand's logo

Faithfully serving IT professionals since 1996.

Experts Exchange Logo

Try it out and discover for yourself.

Subscribe Now

30-day free trial. Register in 60 seconds.

Related Solutions

  1. Sorting a listview with API calls problem
    I asked this question yesterday, then deleted it when I thought I had it fixed. I downloaded some sample code to sort my listviews using API calls ('cuz it doesn't sort dates or numeric values properly). The sort works fine, but the problem is that when it returns from the ...
  2. Listview
    Dear Sir I am really tired of repeating same question.Please give me an honest reply if it is beyond the scope of.I do expect quick reply as I have been logging onto the site for last 6 days and don't get proper justified answer the question i am going to ask is bottleneck f...
  3. How to sort ListView ?
    In my ListView control there are 3 subitems. one is file name (string), second is filesize(long) third is date of creation (date). How can I sort with this subitems ? (if I mentioned sortkey=.., then it sorts on alphabetic order not in numeric or date wise). plz, help me.
  4. Fasted Array sort routine?
    I'm looking for the fasted array sorting routine for string keys. Thanks, John
  5. 'API Routines'
    I have VFP 5.0 and VC++ 6.0. I want to build API routine. I can write the code but I don't know how to compile it. Is any one who can explain it step by step.

Free Tech Articles

  1. WARNING: 5 Reasons why you should NEVER fix a computer for free.
    It is in our nature to love the puzzle. We are obsessed. The lot of us. We love puzzles. We love the challenge. We thrive on finding the answer. We hate disarray. It bothers us deep in our soul. W...
  2. SCCM OSD Basic troubleshooting
    SCCM 2007 OSD is a fantastic way to deploy operating systems, however, like most things SCCM issues can sometimes be difficult to resolve due to the sheer volume of logs to sift through and the dispe...
  3. Migrate Small Business Server 2003 to Exchange 2010 and Windows 2008 R2
    This guide is intended to provide step by step instructions on how to migrate from Small Business Server 2003 to Windows 2008 R2 with Exchange 2010. For this migration to work you will need the fo...
  4. Create a Win7 Gadget
    This article shows you how to create a simple "Gadget" -- a sort of mini-application supported by Windows 7 and Vista. Gadgets can be dropped anywhere on the desktop to provide instant information, ...
  5. Outlook continually prompting for username and password
    There have been a lot of questions recently regarding Outlook prompting for a username and password whilst using Exchange 2007. There are a few reasons why this would happen and I will try to cover t...
  6. Backup Exchange 2010 Information Store using Windows Backup
    There seems to be quite a lot of confusion around the ability to backup Exchange 2010 using the built in Windows Backup feature. This stems from the omission of this feature prior to Exchange 2007 s...

Cloud Class Webinars

  1. Avoiding Bugs in Microsoft Access
    Alison Balter takes and in-depth look at avoiding bugs in Access. In this webinar you will learn about using the immediate window to debug your applications, invoking the debugger, using breakpoints to troubleshoot, stepping through code, setting the next statement to execute, ...
  2. Top 10 Best New Features in Visio 2010
    Scott Helmers gives live demonstrations of the top 10 new features in Visio 2010. This webinar will teach you how to create compelling diagrams by adding shapes to the page with a single click, linking the shapes in a diagram to data in Excel (or SQL Server, or SharePoint), ...
  3. IT Consultant Business Secrets Revealed
    Michael Munger, Experts Exchange tech pro and IT consultant, pulls back the curtain on his very successful businesses and answers question on every IT consultant and business owner should know about. He shares secrets on what he did to solve the 5 most common problems in IT, ...
  4. Disaster Recovery and Business Continuity
    Quest CTO, Mike Billon, gives an overview of the steps involved in building a dunamic disaster recovery plan. Through case studies and an examination of software/hardware tooles for monitoring and testing, you'll gain a better understandin of where you are, where you want ...
  5. Organize Your Visio Diagrams with Containers and Lists
    Scott Helmers uses cross functional flowcharts, wireframe diagrams, data graphic legends and seating charts to teach you: how to ustilize all three new structured diagram components in Visio 2010, the best practices for organizeing shapes in previous version of Visio, how to organize ...
  6. How to Us Objects, Properties, Events and Methods in Microsoft Access
    Alison Dalter gives an in-depbth look at objects, properties, events and methods in Microsoft Access. In this webinar you will learn about using the object browser, referring to objects, working with properties and methods, working with object variables, understanding the ...

Join the Community

Give a Little. Get a Lot.

Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.

Join the Community

Answers

 

by: GeoffKellPosted on 2001-02-25 at 02:19:42ID: 5875985

Here area a couple of links from my Favourites related to sorting.

http://www.mvps.org/vbnet/faq/main/sorting.htm

http://www.wiseley.com/matt_old/vb/quicksort.htm

Regards
GK

 

by: mammouthPosted on 2001-02-25 at 05:14:42ID: 5876150

This code will put 500 random number in order

Dim arrShort(500)

For i = 0 To 500
    Randomize
    arrShort(i) = Int(Rnd * 100)
Next

'This will order them
For i = 0 To UBound(arrShort) - 1
    For j = 0 To i
        If arrShort(j) > arrShort(j + 1) Then
            temp = arrShort(j + 1)
            arrShort(j + 1) = arrShort(j)
            arrShort(j) = temp
        End If
    Next
Next

 

by: mammouthPosted on 2001-02-25 at 05:15:09ID: 5876151

i think is the fastest way to sort item

 

by: jgvPosted on 2001-02-25 at 06:07:09ID: 5876181

Same site as GeoffKell, different page:
http://www.mvps.org/vbnet/code/helpers/qsvariations.htm

 

by: nahumdPosted on 2001-02-25 at 07:34:20ID: 5876311

mammouth - this is certainly not the fastest way to sort.
Your method is called bubble sort , and its complexity is n^2, and if we'll be more accurate then the program that beauT is looking for has to iterate 5000^2 / 2 ~= 12,000,000 times to sort.
An efficient sorting would run only n*log(n) times, that's mean 5000 * log(5000) ~= 60,000. (about 200 times faster)

 

by: gcs001Posted on 2001-02-25 at 07:53:03ID: 5876344

Each sorting algorithm has it's pros and cons.
The best would be to write 2 or even three different sorting routines using different algorithms.

A bubble sort, being the easiest to write, would be best for a small amount of elements.
A binary sort would probably be better for a large amount of elements.
I would suggest you actually do some searching to get some good sorting algorithms and then try them.

Not sure, but perhaps there should be some ActiveX controls or DLL's that already do this very thing.
Try searching the net for these.

Regards,
Grant.

 

by: nahumdPosted on 2001-02-25 at 08:10:47ID: 5876379

BeauT - try to change the insertion of one file into a more efficient allgorithm (I think it's called "lion in the desert" or something like that):
Instead of scanning name after name for the proper place - try to jump into the middle of the range where you should insert the item.
e.g. if you have array with 100 items, begin in checking index 50, then if the item is bigger than the item in index 50 , check for index 75, and so on....
This should reduce the complexity of insertion of one item from n to log2(n) , or in your case from 5000 in the worse case (if the item should be placed at the end of the array), to 12 in the worse case.
This algorithm is both simple and efficient.

 

by: maximilianoangelPosted on 2001-02-25 at 08:21:21ID: 5876397

(u say listview??)
The simple but not elegant is an "x" listbox in a invisible (or not) form and set its sort property =true
in this option, the "sort" is very fast, but I dont know about the "additem" overhead (u must test it)
Remember, the listbox can hold up to 32767 elememts.
I'll do some research.
Regards
Max



 

by: JonFish85Posted on 2001-02-25 at 09:20:37ID: 5876559

I've heard that the "shell" sort is the fastest:

Public Sub ShellSort(A$)
Dim NumOfEntries%, Increm%, J%, Temp$
NumOfEntries% = UBound(A$)
Increm%  = NumOfEntries%\ 2

Do Until Increm% < 1
  For I% = Increm% + 1 to NumOfEntries%
    For J% = I% - Increm% To 1 Step -Imcrem%
      If Temp$ >= A$(J%) Then Exit For
      A$(J% + Increm%) = Temp$
    Next J%
    A$(J% + Increm%) = temp$
  Next i%
  Increm% = Increm% \ 2
Loop
End sub

That is straight out of the book VB5 From The Ground Up. I have never used it, but the book claims that it is the fastest.

 

by: BeauTPosted on 2001-02-25 at 10:42:24ID: 5876749

ok, sorry to switch gears, but this should shed some light on a big stumbling block for me - what is the difference bewtween a b-tree and a binary tree?

 

by: andyclapPosted on 2001-02-25 at 11:32:13ID: 5876823

b-tree is shorter to type ;)

There are few differences but the nomenclature is a bit askew here and there: some trees store the data in the nodes, some trees store it in the leaves; there are also sparse trees and ones which maintain the leaves as a linked list...
oh dear, rambling again on a subject I've forgetten since colledge :)

 

by: bpouydogPosted on 2001-02-25 at 13:08:30ID: 5876975

I found a few interesting articles that can help you.

1. You can use QuickSort algorithm to sort your array.
Explanation of this algorithm is in attached file Sorts of All Types.rtf

2. You can use Win32 API function CopyMemory to speed up (500 times) QuickSort algorithm.  Explanation of this method is in attached file Lighting Strings.rtf

3. You can use ADO recordset to store your data and use its Sort method. Comparison with other methods is in attached file ADO Unplugged.rtf

Note that different methods have also different speeds for
searching data. Sometimes a method that is slower for creating and sorting is quicker for searching. It is your decision to choose the best method for your application.

###########################

ADO Unplugged
Rob Macdonald

Have you ever considered using an ADO RecordSet as a data structure in place of an array or collection? Lets face itADO RecordSets provide built-in sorting, searching, and persistence capabilities. In this article, Rob shows you how the RecordSet stands up against more traditional data structures.

Yes, I realize that most people use ADO to interact with databases. But theres nothing stopping you from creating RecordSets in memory just like regular data structures and using your ADO skills to manipulate them. Personally, Ive found the ability to sort, filter, and search RecordSets so valuable that I wanted to see how RecordSets stacked up against arrays and collections for a range of standard data processing tasks. This article shows you the tests I ran and what I discovered. Even if you dont like the idea of replacing more traditional data structures with RecordSets, the results should provide some interesting insights into how RecordSets behave with regular dataat least when youre using a client-side cursor.

From the top
In case youve never done it before, Ill quickly show you how to create a pseudo- or fabricated RecordSetone that you build in code, rather than attain from a Provider. In order to run my performance tests, I defined a data structure with four columns: unique string (different for each record/row), non-unique string, unique long, and non-unique long. Not very imaginative, I know, but at least youll know what Im using in each test. The following code shows you how to define a fabricated RecordSet and stuff it with data:
  Dim rsStandard As Recordset
  Dim vColumns As Variant
  Set rsStandard = New Recordset
  rsStandard.Fields.Append "lUnique", adInteger
  rsStandard.Fields.Append "sUnique", adVarChar, 20
  rsStandard.Fields.Append "lNonUnique", adInteger
  rsStandard.Fields.Append "sNonUnique", adVarChar, 20
  rsStandard.Open
  vColumns = Array("lUnique", "sUnique", _
      "lNonUnique", "sNonUnique")
  For l = 0 To DATA_SIZE _
      rsStandard.AddNew vColumns, _
         Array(usrBase(l).lUnique, usrBase(l).sUnique, _
         usrBase(l).lNonUnique, usrBase(l).sNonUnique)
  Next

As you can see, Ive created a standard RecordSet object and then appended fields to it, specifying the data type and size where appropriate. You can only do this to closed RecordSets, but once the fields are defined, you can open the RecordSet and then use AddNew to add the records. Note that when youre adding many records, its a lot faster to use AddNew with the field names and values as arguments (as I did previously) than to specify each field value as a separate statement DAO/RDO-style.

The contestants
Here are the contenders in the Data Structure Grand Prix:

1. Standard RecordSet: As created previously, with four columns.

2. Indexed RecordSet: This is a standard Recordset, but with client-side indexes defined to speed up searching.

3. Standard collection: Ive defined a VB class called TestObj with four properties to hold the four data fields already described. The Standard Collection is a VB collection of instances of this class.

4. Keyed collection: This is a standard collection, but when objects are added to it, the unique string property value is used as a key in the collection. This will enhance search times when searching by the unique string.

5. Array of user-defined Types: A user-defined type called "TestData" holds the four field values. I then defined an array of this UDT.

6. Variant array: This is a 2-D variant-based data structure.

The tests
I put each data structure through six tests using 10,000 records. Each data structure has the four fields/columns mentioned previously. To get accurate timings, some of the tests have been run 100 times, in which case I show the total, not the average. In each case, Ive run the complete test four times and presented the average of the second, third, and fourth tests.

All of my tests were performed on a Pentium II running at 400 MHz with Windows 2000 Beta 3. Ive also run the tests on smaller machines and older (all right, actually released!) operating systems, and the results scale pretty much as expected. All figures are based on compiled VB6 code. The usual warnings about relying too heavily on benchmarks apply as evernothing beats "suck it and see" when it comes to performance tuning.

Test 1: Creating data
Before you use a data structure, you need to be able to populate it. Each of the following tests place the "same" data into the data structures.
Data set size      10,000
Iterations      1
        
Standard RecordSet      2.87s
Indexed RecordSet      3.32s
Standard collection      2.07s
Keyed collection      2.22s
UDT array      0.03s
Variant array      1.53s
The results speak for themselves. UDT arrays are blisteringly fast to create. The others are all very slow by comparison, and RecordSets come out the worst.

Test 2: Iterating through data
In each of the following tests, the code loops through each record/row, copying its four fields into local variables:
Data set size      10,000
Iterations      1
        
Standard RecordSet      1.06s
Indexed RecordSet      1.08s
Standard collection      0.06s
Keyed collection      0.06s
UDT array      0.02s
Variant array      0.05s
Again, it doesnt look too good for the RecordSets.

Test 3: Finding one record based on one criterion
In each data structure, the sUnique field/column contains a unique string. The test creates a random string known to be one of those used in the data structure and searches the data structure for that string. The test is repeated 100 times to get a good figure.
Data set size      10,000
Iterations      100
        
Standard RecordSet      3.60s
Indexed RecordSet      0.03s
Standard collection      2.12s
Keyed collection      0.01s
UDT array      0.25s
Variant array      1.17s
This very common test puts a different slant on things. The Indexed RecordSet and the Keyed collection search at the speed of light, and even the UDT array is slow by comparison. As for the others, they look like they stumbled into some quicksand en route.
Of course, Keyed collections are designed specifically for this type of retrieval operation, and they appear to be a good bet at this stage. You might be wondering how the Indexed RecordSet performs so well. The time to create the Indexed RecordSet was slightly longer than the time for the Standard RecordSet (see Test 1). The extra time occurred because I asked the RecordSet to create a client-side index for each field. Please understand that this has nothing to do with database indexesits entirely a feature of the ADO client cursor library. When you build a client-side index for an ADO 2.x RecordSet that uses a client cursor, youll benefit from greatly enhanced client-based searching and filtering. Heres the code for creating the four client-side indexes used in my Indexed RecordSet:
rsIndexed.Fields("lUnique").Properties("OPTIMIZE") _
   = True
rsIndexed.Fields("sUnique").Properties("OPTIMIZE") _
   = True
rsIndexed.Fields("lNonUnique").Properties("OPTIMIZE") _
   = True
rsIndexed.Fields("sNonUnique").Properties("OPTIMIZE") _
   = True

Test 4. Finding n records based on one criterion
The lNonUnique field of each record/row is set to a random long between one and one-tenth of the data set size. There will therefore be about 10 records matching each possible value. This means that an iterative search cant stop on the first hitit must search right through the data set to find all of the matches.
Data set size      10,000
Iterations      100
        
Standard RecordSet      3.42s
Indexed RecordSet      0.14s
Standard collection      3.12s
Keyed collection      3.18s
UDT array      0.03s
Variant array      0.81s
Again, the results show a very good performance for the Indexed RecordSet, but the Keyed collection no longer performs well. This is because a key has to be unique, so key-based searching isnt possible when more than one record might match the criteria. One result that surprised me was the staggering performance of the UDT array. In uncompiled code, it was significantly slower than the Indexed RecordSet. This result shows just how efficient a 32-bit compiler is at handling longs. It also shows that none of these compiler optimizations can apply when variants are used.

For this test, the collection and array-based searches must perform an iterative search. However, the RecordSet-based search doesnt require this sort of coding. For RecordSets, all you need to do is to set the Filter property of the RecordSet to the criteria. You can then use standard iteration code to go through the RecordSet, and youll only see the records that meet the criteria. The following code shows how easy this is:
Dim fd as ADODB.Field
rsIndexed.Filter = "lNonUnique = " & _
   CLng(Rnd * (DATA_SIZE / 10))
Set fd = rsStandard("lNonUnique")
While Not rsIndexed.EOF
   ' use the data here
   rsIndexed.MoveNext
Wend
rsIndexed.Filter = adFilterNone
Before the Filter property is set, the RecordCount property of this RecordSet would be equal to the full data set size (10,000 in my trials). After the filter is set, and up until its reset in the last line, the RecordCount property will be approximately 10 (representing the records visible through the filter), depending on how the values were randomly allocated when the RecordSet was created.

Test 5. Finding one record based on two criteria
In this final "find-based" test, Im searching on the two string fields. Theres no guarantee that a match will be foundin fact, in 100 iterations, an average of only about 20 matches are made.
Data set size      10,000
Iterations      100
        
Standard RecordSet      8.44s
Indexed RecordSet      0.09s
Standard collection      4.97s
Keyed collection      4.94s
UDT array      0.87s
Variant array      3.93s
The compiler cant optimize the array handling quite as well as it did in Test 4, and collections can only have one key, so the Keyed collection offers no great advantage. The result: Indexed RecordSet wins hands down. Its also the easiest to program, since the only thing that changes from Test 3 is the Filter criterion that supports a range of Boolean operations to search for multiple field values.

Test 6: Sorting
Lack of decent sorting capability has been one of VBs key weaknesses over the years. However, look at the following table, and see if it changes your mind. The timings include the time to sort the unsorted data and then iterate through it in sorted order:
Data set size      1,000      5,000      10,000
Iterations      1      1      1
                      
Standard RecordSet      0.12s      0.64s      1.29s
Indexed RecordSet      0.12s      0.61s      1.25s
Standard collection      14.07s      >1,500s      ?
Keyed collection      10.28s      1,421.19s      ?
UDT array      0.58s      13.76s      53.02s
Variant array      0.88s      20.08s      76.16s
When considering sorting, Id ask you to consider not only performance, but also the coding effort required to implement it. The speed of RecordSet sorting clearly leaves all of the other approaches in the dust. But just as important is the fact that I only needed to add one line of code to my RecordSet iteration routine to have the data sorted this quickly:
  rsIndexed.Sort = "lNonUnique"
Obviously, for array and collection sorting, theres serious coding involved. Yes, I know you can implement a BubbleSort quickly, but we all know how inefficient it is, especially when sorting large data sets. So Ive taken the QuickSort algorithm from Bruce McKinneys excellent Hard Core Visual Basic and modified it for my own purposes. (Needless to say, if you find anything wrong with my sorting algorithm, it will be me, not Bruce, whos at fault.)

For array processing, Bruces algorithms only take up about two screenfuls of code, but it still took me half an hour to adapt it and get it workingand the end result is hardly snappy. As for the collection sorting, well, yes, there are faster ways to sort collections, but life is just too short . . .

Conclusion
This article has focused on a performance comparison between RecordSets and other, more traditional data structures. In writing it, Im basically encouraging you to think about using ADO RecordSets for more than just retrieving SQL results. What Ive shown is that, while RecordSets do take longer to create and iterate through, they can deliver dazzling performance for some complex operationsand at virtually no programming cost.
No, nothing beats arrays for straight data churningalthough arrays arent that great when it comes to adding and deleting elements. And while UDT arrays can provide amazing performance, theres a definite performance penalty associated with variant arraysespecially in compiled code.

In other words, Im proposing that RecordSets have something to offer as standalone data structures. Bear in mind that these tests show RecordSets can perform quite welleven on whats undoubtedly their weakest ground (an unnetworked PC). A RecordSet, remember, is a data structure that can be saved to file, marshaled efficiently across the Internet, bound to controls, and processed in 101 other ways that leave arrays and collections begging. Have I set you thinking?
###############################

VBA Tech
Lightning Strings
Fast, Undocumented String-handling Techniques
By Steven Roman, Ph.D.
I recently finished writing a book entitled Win32 API Programming with Visual Basic (O'Reilly & Assoc., 1999). The most frequently asked question in connection with this book is "Why would I, as a VB/VBA programmer, want to use the Win32 API?"
There are many ways to answer this question. The following are a few:
7      Using the Win32 API, a VB programmer can manipulate the user interface more completely than with VB alone. For instance, the Win32 API makes it relatively easy to add tab stops or horizontal scroll bars to a list box, or use a bitmap as a menu item.
7      The Win32 API allows the VB/VBA programmer to get more information about the state of the system - information such as the version of the operating system, a list of installed printers and fonts, or the number of buttons on the mouse. It can also be used to get a list of all open windows or all running applications.
7      The Win32 API can be used to dig deeply into the operating system. For instance, it can be used to subclass a control to change its behavior, to hook the operating system in order to watch for keystrokes or mouse actions and possibly alter their behavior, or to extract data from controls in foreign processes. You can even force one application to run code written from another application.
The book delves into all these aspects of the Win32 API and more. In this article, however, I want to show you a very simple, but very important, use of the Win32 API: sorting VB strings.
FIGURE 1 shows the results of a simple program that sorts two string arrays: an array consisting of 100 short strings, each of length 10,000 characters, and an array consisting of 100 long strings, each of length 100,000 characters.
 
FIGURE 1: A sorting program.
The application uses two methods for sorting. Both methods (slow and quick) use the same simple sorting algorithm, which puts the smallest string in the first position, then puts the second smallest string in the second position, and so on. The pseudocode is:
For i = 1 To NumStrings
        For j = i + 1 To NumStrings
          If strings(i) > strings(j) Then
            swap string(i) and string(j)
          End If
        Next
      Next
This sorting algorithm isn't very efficient, but the algorithm isn't the important issue here. Rather, it's the method used to swap adjacent strings when required. Indeed, even more efficient algorithms, such as the venerable quicksort method, require swapping.
The swapping is done two ways. The slow way using VB assignments:
' Swap strings s and t.
      temp = s
      s = t
      t = temp
and the quick way using the Win32 API function CopyMemory:
CopyMemory lng,
      ByVal VarPtr(s), 4
      CopyMemory
      ByVal VarPtr(s), ByVal
      VarPtr(t), 4
      CopyMemory
      ByVal VarPtr(t), lng, 4
As you can see from FIGURE 1, for the long string array, the quick method (using CopyMemory) is 500 times faster than the slow method, even on a rather small 100-item array. Moreover, the time it takes the quick method does not depend upon the length of the strings. Wow!
To understand how the quick method works, we need to take a look at the internal nature of VB strings. Before doing that, however, let's take a look at the Win32 API function CopyMemory.
CopyMemory - A VB Hacker's Dream
The purpose of CopyMemory is simply to copy a block of memory byte-by-byte from one memory address to another. This opens up a whole new set of possibilities for VB programmers, because VB doesn't have this sort of capability, except in the rather restricted form of LSet. Even then, the documentation recommends against using LSet for this purpose.
The simplest VB declaration for CopyMemory is:
Declare SubCopyMemory  Lib "kernel32" _
          Alias"RtlMoveMemory" (lpDest As Any, _
          lpSource As Any, ByValcbCopy As Long)
In this case, lpDest is the address of the first byte of the destination memory, lpSource is the address of the first byte of the source memory, and cbCopy is the number of bytes to copy.
This VB declaration is a bit dangerous, because the As Any form tells VB to skip any type checking, and an invalid type can lead to the dreaded General Protection Fault. Thus, great care must be taken when using this declaration. (Be sure to save all programs before running code containing this function.) We can (and will) override the default ByRef setting by including ByVal in the call to this function, as in:
CopyMemory lng,  AnAddress, 4
VB Strings
Let's now turn to a discussion of VB strings. We'll also discuss the very useful, but undocumented, VB functions VarPtr and StrPtr. I devote a 40-page chapter in my book to VB strings. Here's a very abbreviated version.
The VB string data type, BSTR, is shown in FIGURE 2.
 
FIGURE 2: The BSTR data type.
The string in this figure corresponds to the following VB code:
Dim str As String
      str = "help"
There are several important things to note about the BSTR data type:
7      A BSTR is actually a pointer variable. It has a size of 32 bits, like all pointers, and points to the first byte in a Unicode character array. Thus, a Unicode character array and a BSTR are not the same thing. This can cause great confusion, because the term string sometimes refers to the BSTR and sometimes to the character array. To be absolutely clear, we'll use the term VB string to refer to the BSTR, not the character array.
7      The Unicode character array that is pointed to by a BSTR must be preceded by a 4-byte length field and terminated by a single, null, 2-byte character (ANSI = 0).
7      There may be additional null (2-byte) characters anywhere within the Unicode character array, so we cannot rely on a null character to signal the end of the character array. This is why the length field is vital.
7      The length field contains the number of bytes (not the number of characters) in the character array, excluding the terminating null bytes. Because the array is Unicode, the character count is one-half the byte count.
Let's emphasize that code such as:
Dim str As String
      str = "help"
means that str is the name of a BSTR, not a Unicode character array. In other words, str is the name of the variable that holds the address xxxx, as shown in FIGURE 2. (Of course, the variable str has its own address, denoted by aaaa in FIGURE 2.)
Here is a brief experiment we can do to test the fact that a VB string is a pointer to a character array and not a character array. Consider the following code, which defines a structure whose members are strings:
Private Type utTest
       astring As String
       bstring As String
     End Type
      Dim uTest As utTest Dim s as String
      s = "testing" uTest.astring = "testing" uTest.bstring = "testing"
      Debug.Print Len(s) Debug.Print Len(uTest)
The output from this code is:

7
8
In the case of the string variable s, the Len function reports the length of the character array. In this case, there are seven characters in the character array "testing". In the case of the structure variable uTest, however, the Len function actually reports the length of the structure (in bytes). The return value 8 clearly indicates that each of the two BSTRs has length 4. This is because a BSTR is a pointer.
VarPtr and StrPtr
The functions VarPtr and StrPtr aren't documented by Microsoft, but they can be very useful in manipulating BSTRs.
If var is any variable, then:
VarPtr(var)
is the address of that variable, returned as a long. If str is a BSTR variable then:
StrPtr(str)
is the contents of the BSTR, which, as we've seen, is the address of the Unicode character array pointed to by the BSTR.
Let's verify these statements using the string in FIGURE 2. Note that the variable str has address aaaa, and the character array begins at address xxxx, which is the contents of the pointer variable str.
To see that:
VarPtr(str) = aaaa
      StrPtr(str) = xxxx
run the code in FIGURE 3.
Dim lng As Long, i  As Integer, s As String
      Dim b(1 To 10) As Byte
      Dim sp As Long, vp As Long
       s = "help"
       sp = StrPtr(s) Debug.Print "StrPtr:" & sp
       vp = VarPtr(s) Debug.Print "VarPtr:" & vp
       ' Verify that sp = xxxx and vp = aaaa
      ' by moving the long at address vp
      ' to the variable lng and then comparing it to sp.
      CopyMemory lng, ByVal vp, 4 Debug.Print lng = sp
       ' To see that sp contains address of char array,
      ' copy from that address to a byte array and print
      ' the byte array. We should get "help" in Unicode.
      CopyMemory b(1), ByVal sp, 10
      For i = 1 To 10
         Debug.Print b(i);
      Next
FIGURE 3:Verify that sp = xxxx and vp = aaaa.
A sample of the output is:
StrPtr:1836612
      VarPtr:1243988
      True
       104  0  101  0  108  0  112  0  0  0
Swapping Strings Using CopyMemory
Now we have the necessary background to understand our string sorting application. As mentioned earlier, the only difference between the slow and quick sorting methods lies in how they handle string swapping. The slow method uses the obvious approach:
' Swap strings s and t.
      temp = s
      s = t
     t = temp
Unfortunately, for each string assignment:
str1 = str2
VB must make a copy of the entire Unicode array pointed to by the BSTR str2 and assign a BSTR str1 that points to the copied array. It's clear that this process is very time consuming, and depends on the length of the strings.
On the other hand, the quick method uses the swapping code:
CopyMemory lng,  
      ByVal VarPtr(s), 4
      CopyMemory  
      ByVal VarPtr(s),  ByVal
      VarPtr(t), 4
      CopyMemory  
      ByVal VarPtr(t), lng, 4
This code simply swaps the contents of the BSTR's s and t. That is, it swaps the addresses of the corresponding Unicode arrays. In this way, we only need to swap 4-byte addresses, no matter how long the Unicode arrays may be.
In the first line of code, the long variable lng will receive the address of the first Unicode array. Because this address is stored in the BSTR s, we pass the address of s by value.
Actually, you might think that the code:
CopyMemory lng, s, 4
would also work, but it doesn't. In brief, the reason is that when VB sees that a string is being passed to an API function, it makes a copy of the array in ANSI format (rather than Unicode) and passes the ANSI version to the function. (For a more detailed discussion of this issue, please see my book.)
The remaining two lines of code complete the swapping.
Conclusion
It is usually said that the Win32 API can be useful to VB/VBA programmers who want to delve more deeply into the Windows operating system and do things that cannot be done with VB alone. Here is an example, however, of a case where VB can do the job, but the Win32 API can do it much, much better.
Dr Steven Roman is an Emeritus Professor at the California State University, Fullerton. He has written 35 books, including Access Database Design & Programming [1999], Writing Word Macros [1999], Writing Excel Macros [1999], Developing Visual Basic Add-Ins [1999], and Win32 API Programming with Visual Basic [1999], all published by O'Reilly & Associates, and Concepts of Object-Oriented Programming with Visual Basic [1997] and Understanding Personal Computer Hardware [1998], both published by Springer-Verlag. He has written a special object library browser, called Object Model Browser, that displays a structured view of object libraries, rather than the usual flat view.

######################################33

Sorts of All Types: VBA Algorithms for Getting Things in Order
By Rod Stephens
Sorting is an important and common programming task. Almost any data set is easier to understand and manipulate if it is properly sorted. For example, a dialog box might contain a ListBox control that provides choices for the user to select. If the list of choices is long, it will be much easier for the user to make a selection if the choices are sorted.
The ListBox control in Visual Basic has a Sorted property that makes ordering choices easy. Unfortunately, the ListBox that MSForms allows you to use with UserForm dialog boxes doesn't have this property; you must sort the choices before you add them to the list.
Sorting algorithms also demonstrate important programming techniques that you can use in other VBA subroutines.
This article describes two sorting algorithms that are useful under different circumstances: Selectionsort and Quicksort. These algorithms demonstrate several important programming concepts, including randomized behavior, recursion, and the divide-and-conquer strategy.
Sorting the Hard Way
If you are an experienced Microsoft Office programmer, you probably know that VBA doesn't provide sorting routines. It does, however, provide certain objects that can perform sorting. Word's Range object, for example, has a Sort method. One way to sort a list of items would be to create a new Word document, write the list into the document, sort the document, and read the results back into the original list. Similarly, you could create a table using Access and use a query to recover the records in sorted order.
These methods are slow and cumbersome, however. If you are using some other application, such as PowerPoint, you would also need to open a connection to Word or Access, adding further complication. Fortunately, there are several relatively easy methods for sorting using VBA code alone.
Be Selective
The Selectionsort algorithm begins by searching the list for the smallest item. It then swaps the smallest item with the item in the first position in the list. The algorithm then searches the remaining items for the second smallest. It swaps that item with the item in the second position of the list. The algorithm continues this process, searching for the smallest remaining item and swapping it into its correct position in the list, until the list is completely sorted.
FIGURE 1 shows the Selectionsort subroutine. It takes as parameters an array of strings to sort, and two integers giving the array's lower and upper bounds. The bound parameters are declared with the ByVal keyword so the subroutine cannot accidentally modify their values for the calling routine (see the sidebar "Coding Tips for Bug-Free VB" on page XX). When the Selectionsort subroutine is finished, the list array contains the strings in sorted order.
FIGURE 1: The Selectionsort algorithm implemented as a VBA Sub procedure.
' Sort items in the values array with bounds min and max.
Sub Selectionsort(values() As String, _
                  ByVal min As Long, _
                  ByVal max As Long)
  Dim i As Long
  Dim j As Long
  Dim smallest_value As String
  Dim smallest_j As Long

  For i = min To max - 1
    ' Find the smallest remaining value in entries
    ' i through num.
    smallest_value = values(i)
    smallest_j = i

    For j = i + 1 To max
      ' See if values(j) is smaller.
      If values(j) < smallest_value Then
        ' Save the new smallest value.
        smallest_value = values(j)
        smallest_j = j
      End If
    Next j

    If smallest_j <> i Then
      ' Swap items i and smallest_j.
      values(smallest_j) = values(i)
      values(i) = smallest_value
    End If

  Next i

End Sub
FIGURE 2 shows the SortListBox subroutine. This routine copies the choices in a ListBox control into an array, uses Selectionsort to sort the choices, then copies the sorted list back into the ListBox. You can use this subroutine to sort ListBox choices in dialog boxes you have created. First, fill the ListBox with choices as usual. Then, before you show the dialog box, call SortListBox for any ListBox control you want sorted.
Selectionsort is a fairly simple algorithm, which makes it easy to debug and maintain. It's also quite fast for small lists. On a 133 MHz Pentium, this subroutine can sort 1,000 items in under one second - which is more than fast enough for sorting items in a ListBox. For larger lists, however, Selectionsort can be much slower.
FIGURE 2: Using the Selectionsort Sub procedure to sort the items in a ListBox.
' Sort the items in the ListBox.
Sub SortListBox(list_box As ListBox)

  Dim values() As String
  Dim num_items As Integer
  Dim i As Integer

  ' Put the list choices in a string array.
  num_items = list_box.ListCount

  ReDim values(1 To num_items)

  For i = 1 To num_items
    values(i) = list_box.List(i - 1)
  Next i
   
  ' Sort the list.
  Selectionsort values, 1, num_items
   
  ' Put the items back in the ListBox.
  list_box.Clear
  For i = 1 To num_items
    list_box.AddItem values(i)
  Next i

End Sub
The algorithm begins by looking through its list to find the smallest value. If the list contains N items, the algorithm takes N steps to find the smallest item. It then takes N - 1 steps to find the second smallest, N - 2 steps to find the third smallest, and so forth.
In total, Selectionsort uses N + (N - 1) + ... + 1 steps to arrange the list. If you add those steps you get N * (N + 1) / 2 steps. When N is large, this functions grows like the square of the number N. If you double the number of items in the list, the algorithm takes four times as long. If you increase the number of items by a factor of 10, the algorithm takes 100 times as long.
While Selectionsort can reasonably sort 1,000 items, a list containing 10,000 or 100,000 is simply too big to sort in a reasonable amount of time. For sorting the items in a ListBox, Selectionsort is fine. However, more complicated routines that manage larger lists need a different solution.
A Quicker Sort
Quicksort is a slightly more complex algorithm that, for large lists, is much faster than Selectionsort. Quicksort uses a divide-and-conquer strategy to break the list into smaller and smaller pieces, until the pieces are small enough to handle easily.
The basic idea is to divide the items in the list into two sublists. The subroutine begins by selecting a dividing value. It places all the values smaller than the dividing value in one sublist, and all the other values in another sublist. Quicksort then calls itself to sort the two sublists. It updates its bound parameters min and max to tell the new copy of Quicksort which items to sort.
The process of a subroutine calling itself like this is called recursion. At first, recursion is a strange concept - it's not the way people naturally think. However, it's a powerful programming technique because it lets one routine break a problem into pieces small enough to handle. In this case, the Selectionsort algorithm breaks up the list until the sublists each hold only a single item. Because a list containing a single item must be sorted, the algorithm stops dividing the list at that point.
There are many ways in which the routine might pick a dividing value. For example, it could use the first item in the list. If the items are arranged randomly, this value will probably belong somewhere near the middle of the sorted list. In that case, the routine will divide the list into two sublists of roughly equal size. The sublists will be roughly half the size of the original list, so the algorithm will make good progress in its goal of cutting the sublists down to size.
On the other hand, if the list is initially sorted, the first item in the list belongs before all the other items. If the algorithm uses that item for a dividing value, it will place none of the items in one sublist and all the items in the other. The algorithm will have made little progress, and will be extremely slow.
An alternative strategy for choosing a dividing value is to pick one randomly. In that case, no matter how the list is arranged, it is unlikely the algorithm will pick an item that belongs at the front of the list. The sublists will be significantly smaller than the original list, and the algorithm will make reasonable progress. FIGURE 3 shows the Quicksort algorithm implemented in VBA.
Quicksort is much faster than Selectionsort for sorting large lists. On a 133 MHz Pentium, Selectionsort can sort 1,000 items in under one second. In about the same time, Quicksort can arrange 10,000 items. Even more impressively, Quicksort can sort 100,000 items in about nine seconds, while Selectionsort would take more than two hours.
FIGURE 3: A VBA implementation of the Quicksort algorithm.
' Sort the items in array values() with bounds min and max.
Sub Quicksort(values() As String, _
              ByVal min As Long, _
              ByVal max As Long)

  Dim med_value As String
  Dim hi As Long
  Dim lo As Long
  Dim i As Long

  ' If the list has only 1 item, it's sorted.
  If min >= max Then Exit Sub

  ' Pick a dividing item randomly.
  i = min + Int(Rnd(max - min + 1))
  med_value = values(i)

  ' Swap the dividing item to the front of the list.
  values(i) = values(min)

  ' Separate the list into sublists.
  lo = min
  hi = max
  Do
    ' Look down from hi for a value < med_value.
    Do While values(hi) >= med_value
      hi = hi - 1
      If hi <= lo Then Exit Do
    Loop

    If hi <= lo Then
      ' The list is separated.
      values(lo) = med_value
      Exit Do
    End If

    ' Swap the lo and hi values.
    values(lo) = values(hi)

    ' Look up from lo for a value >= med_value.
    lo = lo + 1
    Do While values(lo) < med_value
      lo = lo + 1
      If lo >= hi Then Exit Do
    Loop

    If lo >= hi Then
      ' The list is separated.
      lo = hi
      values(hi) = med_value
      Exit Do
    End If

    ' Swap the lo and hi values.
    values(hi) = values(lo)
  Loop ' Loop until the list is separated.

  ' Recursively sort the sublists.
  Quicksort values, min, lo - 1
  Quicksort values, lo + 1, max

End Sub
Despite this, Selectionsort has its place. Its greater simplicity makes it faster for sorting small lists. Selectionsort can sort 100 items in about 0.008 seconds, while Quicksort takes 0.014 seconds. Of course, if you need to sort 100 items, it doesn't matter much which algorithm you use. If you need to sort 1,000 lists containing 100 items each, using Selectionsort instead of Quicksort may mean the difference between waiting eight seconds and waiting 14 seconds.
Quicksort in Action
The CountWords subroutine, shown in FIGURE 4, uses the Quicksort algorithm to count the distinct words in a Word document. First, it fills an array with the selected words. If no words are selected, it fills the array with every word in the document. The routine uses only words that do not have the _code style set and that begin with a letter. Copying the words into the array is by far the slowest step in the process.
Next, CountWords uses Quicksort to sort the words. It then searches the array, skipping duplicates. For example, if the word "the" appears 100 times in the document, the routine skips all but the first occurrence. CountWords finishes by displaying the distinct words and the word count in a new document.
FIGURE 4: Using the Quicksort Sub procedure to list the distinct words in a document.
Sub CountWords()

  Dim values() As String
  Dim doc_range As Range
  Dim word_object As Object
  Dim word_text As String
  Dim num_words As Long
  Dim txt As String
  Dim ch As String
  Dim i As Long
  Dim j As Long
  Dim doc As Document
   
  ' Make sure there is a document open.
  If Selection Is Nothing Then
    MsgBox "No document is open."
    Exit Sub
  End If
   
  ' Get the selected range.
  If Selection.Type = wdSelectionIP Then
    ' If there' s no selection, use everything.
    Set doc_range = ActiveDocument.Content
  Else
    ' Otherwise use the selection.
    Set doc_range = Selection.Range
  End If

  ' Copy the selected words into the _words array.
  ' Copy only words that do not have the _code style,
  ' and that begin with a letter.
  ReDim values(1 To doc_range.Words.Count)
  For Each word_object In doc_range.Words
    If word_object.Style <> "_code" Then
      ' Remove spaces and convert to lower case.
      word_text = LCase$(Trim$(word_object.Text))
      ' See if it begins with a letter.
      ch = Left$(word_text, 1)
      If ch >= "a" And ch <= "z" Then
        num_words = num_words + 1
        values(num_words) = word_text
      End If
    End If
  Next word_object

  ' Sort the words.
  Quicksort values(), 1, num_words

  ' Merge duplicates.
  j = 1  ' The last position used.
  For i = 2 To num_words
    If values(i) <> values(j) Then
      j = j + 1
      values(j) = values(i)
    End If
  Next i
  num_words = j

  ' Display the results in a new document.
  Set doc = Documents.Add
  For i = 1 To num_words
    Selection.TypeText values(i) & vbCrLf
  Next i

  Selection.TypeText "----------" & vbCrLf
  Selection.TypeText num_words & " words." & vbCrLf

End Sub
Conclusion
While VBA doesn't provide tools for sorting, you can easily build your own. Using Selectionsort for small lists and Quicksort for large lists, you should be able to handle most of your sorting needs.
And, in case you are wondering, this article contains 417 distinct words.
The procedures referenced in this article are available for download from the Informant Web site at http://www.informant.com/mod/modnewupl.htm. File name: MOD9803RS.ZIP.
Coding Tips for Bug-Free VB
While it's nearly impossible to guarantee a program is completely bug free, there are several steps you can take to improve your chances.
Use Option Explicit. Normally, VBA creates a new variable when you first reference a variable it hasn't seen. This can quickly lead to confusion - especially if you misspell a variable name or try to use one routine's variable in another routine. For example, the following code always displays the value 1, although that's clearly not the intent:
Dim i As Integer
Dim the_num As Integer

For i = 1 To 10
  the_num = teh_num + 1
Next I

MsgBox the_num
If you begin the Declares section with the statement:
Option Explicit
VBA will generate an error message when it encounters teh_num - instead of a mystery.
Type every variable. If you declare a variable, but don't specify a type, VBA makes the variable a Variant. Variants take more memory and longer to process than other data types. The easy way they can switch data types also makes them confusing. To get the best performance and prevent confusion, give every variable an explicit type. If you really want to use a Variant, say so:
Dim i As Integer
Dim v As Variant
Declare variables on separate lines. If you declare two variables on the same line, each must have its own type declaration, or it defaults to Variant. For example, the following line of code declares two variables: variable v2 is an Integer, but v1 is a Variant:
Dim v1, v2 As Integer
This statement shows the correct way to allocate two integers on the same line:
Dim w1 As Integer, w2 As Integer
To avoid confusion, declare variables on separate lines.
Pass arguments by value when appropriate. By default, VBA passes an argument to a subroutine by reference. This means that if the subroutine modifies the argument's value, the value is changed in the calling subroutine's variable as well. Obviously, this isn't always desirable. To prevent changes from affecting the calling subroutine, you can make VBA pass an argument by value using the ByVal keyword. In the following code, argument A is passed by reference and argument B is passed by value. The MsgBox statement reports the value of i as 100, and j as 2:
Sub Caller()
  Dim i As Integer
  Dim j As Integer

  i = 1
  j = 2
  Callee i, j
  MsgBox "i = " & i & ", j = " & j
End Sub

Sub Callee(A As Integer, ByVal B As Integer)
  A = A * 100
  B = B * 100
End Sub
To prevent confusion in the calling routine, pass arguments using ByVal whenever you don't need to modify their values.
Use constants. Use constants to make the meaning of values clear. In large programs, a well named constant is easier to understand than a "magic number." If you later need to change the constant's value, it is also easier to change one definition than to change all the places you use the value:
Function CalculateTax(ByVal price As Single) As Single
  Const TaxRate = 0.09

  CalculateTax = price * TaxRate
End Sub
Test in the Immediate window. You can easily test pieces of your code using the Immediate window. Use the View | Immediate Window command to make the window appear. Enter a command in the window, just as you would in VBA code, and press J. Use ? to make the window print values. You can even invoke VBA subroutines and functions:
?CalculateTax(100.00)
9
This is particularly helpful for testing routines that are only used by other routines and are hard to run directly as macros.

 

by: maximilianoangelPosted on 2001-02-25 at 14:39:25ID: 5877149

BT.
I think (and maybe i have some time so test it) a mix of
- btree (kind of)
- some sort algorithm (the fastest we can have)
- api findfile used in a recursive Find routine
where the "path" is represented by btree nodes (so u must apply the sort routine to the "descending inmediate" nodes or leafs) and all the optimizations we now, will fast things up.
But, before triyng this approach, the Big Q
What do you want to do with the "sorted" files?
(maybe a btree path structure is usefull for you, maybe not!)
Regards
Max



 

by: wsh2Posted on 2001-02-25 at 14:52:09ID: 5877187

Ping

 

by: andyclapPosted on 2001-02-25 at 15:05:33ID: 5877228

Hi wsh2 - Long time no see! We've got a subscribe button now: pinging is sooo last century ;)

 

by: maximilianoangelPosted on 2001-02-26 at 17:49:11ID: 5881660

Yesterday, I rewrote my old API Find File Routine

I check some directory of my PC, and this is the result:
FFONE UNSORT= 16685 Items ; 8,72 seconds (5,231E-04 Items/Sec)
(the second run reports 6 seconds)

But Tonight, I rework what I told you ^^^^ and...
(as u can see no check for the recursive label, no check for pattern, no optimization, no no no)

Option Explicit
Private Type FileItem
    isDir As Boolean
    Name As String
End Type

Private Const MAX_PATH = 260
Private Const FILE_ATTRIBUTE_DIRECTORY = &H10
Private Type FILETIME
        dwLowDateTime As Long
        dwHighDateTime As Long
End Type
Private Type WIN32_FIND_DATA
        dwFileAttributes As Long
        ftCreationTime As FILETIME
        ftLastAccessTime As FILETIME
        ftLastWriteTime As FILETIME
        nFileSizeHigh As Long
        nFileSizeLow As Long
        dwReserved0 As Long
        dwReserved1 As Long
        cFileName As String * MAX_PATH
        cAlternate As String * 14
End Type
Private Declare Function FindFirstFile Lib "kernel32" Alias "FindFirstFileA" (ByVal lpFileName As String, lpFindFileData As WIN32_FIND_DATA) As Long
Private Declare Function FindNextFile Lib "kernel32" Alias "FindNextFileA" (ByVal hFindFile As Long, lpFindFileData As WIN32_FIND_DATA) As Long
Private Declare Function FindClose Lib "kernel32" (ByVal hFindFile As Long) As Long
Dim mRCcounter As Long
Public Function FindFilesTWO(saResults() As String, ByVal sPath As String, Optional sPattern As String = "*.*", Optional bRecurse As Boolean = False)
If mRCcounter = 0 Then ReDim saResults(0 To 100)
Dim fhandle As Long, fdata As WIN32_FIND_DATA, i As Long
Dim fCache() As FileItem, fLocalCounter As Long
ReDim fCache(0 To 25)
sPath = Trim(sPath): If sPath = "" Then Exit Function
If Right(sPath, 1) <> "\" Then sPath = sPath + "\"
fhandle = FindFirstFile(sPath + sPattern + Chr(0), fdata)
If fhandle <> -1 Then
    Do
        If Asc(fdata.cFileName) <> 46 Then  'dot!
            If fLocalCounter >= UBound(fCache) Then ReDim Preserve fCache(0 To UBound(fCache) + 25)
            i = InStr(fdata.cFileName, Chr(0))
            fCache(fLocalCounter).Name = Left(fdata.cFileName, i - 1)
            fCache(fLocalCounter).isDir = fdata.dwFileAttributes And FILE_ATTRIBUTE_DIRECTORY
            fLocalCounter = fLocalCounter + 1
        End If
        DoEvents
    Loop Until (FindNextFile(fhandle, fdata) = 0)
    Call ShellSort(fCache(), 0, fLocalCounter - 1)
    For i = 0 To fLocalCounter - 1
        If fCache(i).isDir Then
            Call FindFilesTWO(saResults(), sPath + fCache(i).Name, sPattern, bRecurse)
        Else
             If mRCcounter >= UBound(saResults) Then ReDim Preserve saResults(0 To UBound(saResults) + 100)
            saResults(mRCcounter) = sPath + fCache(i).Name
            mRCcounter = mRCcounter + 1
        End If
        DoEvents
    Next i
End If
Call FindClose(fhandle)
ReDim Preserve saResults(0 To mRCcounter - 1)
FindFilesTWO = mRCcounter
End Function
Private Sub BubbleSort(list() As FileItem, lowIDX As Long, hiIDX As Long)
Dim i As Long, j As Long
Dim tmp As FileItem
For i = lowIDX To hiIDX - 1
   For j = lowIDX To i
       If UCase(list(j).Name) > UCase(list(j + 1).Name) Then
           tmp = list(j + 1)
           list(j + 1) = list(j)
           list(j) = tmp
       End If
   Next
Next
End Sub
Private Sub ShellSort(list() As FileItem, lowIDX As Long, hiIDX As Long)
If lowIDX >= hiIDX Then Exit Sub
Dim diff As Long, range As Long, tmp As FileItem
Dim high As Long, low As Long, high1 As Long
range = hiIDX - lowIDX: diff = range: diff = Int(CSng(diff) / 1.3):
Do While diff
    high = lowIDX + diff
    For high = high To hiIDX
        Do While high <= hiIDX
            low = high - diff
            If UCase(list(low).Name) > UCase(list(high).Name) Then Exit Do
            tmp = list(low): list(low) = list(high): list(high) = tmp
            If diff = 1 Then
                If low > lowIDX Then high = low
            Else
                high = high + 1
            End If
        Loop
    Next
    diff = Int(CSng(diff) / 1.3):
Loop
End Sub

With the same directory (and subdirectories)
FFTWO API + SHELL!!!!= 16685 Items ; 13,67 seconds (8,20E-04 Items/Sec) aparently the sort overhead is about 50%-100% of the api find file routine.

As u can see, findfileTWO "sort" every folder content, scan the sorted folder (fcache), and recursively call itself or put the files in the result array.
Please, if you improve the routine, let me know (Ill try)
Regards
Max

 

by: maximilianoangelPosted on 2001-02-26 at 17:52:30ID: 5881669

NB Please, note that I do not fully test the routine.
Take it as a almost working but initial idea.

 

by: maximilianoangelPosted on 2001-02-26 at 17:59:33ID: 5881687

NB This is only a allmost working idea, needs testing. (1st check the shellsort routine, it works in descendat mode)


 

by: andyclapPosted on 2001-02-26 at 18:34:45ID: 5881759

Interesting this, the big question is which sort algorithm is best for your needs - a lot depends on how ordered your directory listing starts off. If it's very un-ordered, I rekon building a binary tree as you get each filename would be quick. If it's pretty ordered, a shell or quicksort after you have all the names is good. You could also think of a balanced binary tree, but this is probably the same as a quicksort.

 

by: maximilianoangelPosted on 2001-02-27 at 14:59:44ID: 5885218

Hey BT Hi, but where are u???
I put some hands on.. here it is
(remember the FF1 API UNSORT... 8 or 6 seconds)
FF2 API + SHELL SORT!!!!= 16685 Items ; 10,87906 seconds (6,52026529972304E-04 Items/Sec)
FF2 API + SHELL SORT!!!!= 16685 Items ; 10,76719 seconds (6,45321373600855E-04 Items/Sec)
As u can see, I lcase the path and filenames, and speed up the shell sort routine (and add the sort order)
also add Counter Reset (I forgot)
No pattern check, No recurse Check, and more testing to do...
The entry point is findfiletwo (the bootstrap)

Option Explicit
Private Type FileItem
    isDir As Boolean
    Name As String
End Type

Private Const MAX_PATH = 260
Private Const FILE_ATTRIBUTE_DIRECTORY = &H10
Private Type FILETIME
        dwLowDateTime As Long
        dwHighDateTime As Long
End Type
Private Type WIN32_FIND_DATA
        dwFileAttributes As Long
        ftCreationTime As FILETIME
        ftLastAccessTime As FILETIME
        ftLastWriteTime As FILETIME
        nFileSizeHigh As Long
        nFileSizeLow As Long
        dwReserved0 As Long
        dwReserved1 As Long
        cFileName As String * MAX_PATH
        cAlternate As String * 14
End Type
Private Declare Function FindFirstFile Lib "kernel32" Alias "FindFirstFileA" (ByVal lpFileName As String, lpFindFileData As WIN32_FIND_DATA) As Long
Private Declare Function FindNextFile Lib "kernel32" Alias "FindNextFileA" (ByVal hFindFile As Long, lpFindFileData As WIN32_FIND_DATA) As Long
Private Declare Function FindClose Lib "kernel32" (ByVal hFindFile As Long) As Long
Dim mRCcounter As Long
Public Function FindFilesTWO(saResults() As String, ByVal sPath As String, Optional sPattern As String = "*.*", Optional bRecurse As Boolean = False)
'the bootstrap (all for this)==============
mRCcounter = 0
sPath = Trim(sPath): If sPath = "" Then Exit Function
ReDim saResults(0 To 100)
'===========================
FindFilesTWO = realFindFilesTWO(saResults(), sPath, sPattern, bRecurse)
End Function
Private Function realFindFilesTWO(saResults() As String, ByVal sPath As String, Optional sPattern As String = "*.*", Optional bRecurse As Boolean = False)
'no pattern test, no recurse test (in fact, recurse = true!)
Dim fhandle As Long, fdata As WIN32_FIND_DATA, i As Long
Dim fCache() As FileItem, fLocalCounter As Long
ReDim fCache(0 To 25)
If Right(sPath, 1) <> "\" Then sPath = sPath + "\"
fhandle = FindFirstFile(sPath + sPattern + Chr(0), fdata)
If fhandle <> -1 Then
    Do
        If Asc(fdata.cFileName) <> 46 Then  'dot!
            If fLocalCounter >= UBound(fCache) Then ReDim Preserve fCache(0 To UBound(fCache) + 25)
            i = InStr(fdata.cFileName, Chr(0))
            fCache(fLocalCounter).Name = LCase(Left(fdata.cFileName, i - 1))
            fCache(fLocalCounter).isDir = fdata.dwFileAttributes And FILE_ATTRIBUTE_DIRECTORY
            fLocalCounter = fLocalCounter + 1
        End If
        DoEvents
    Loop Until (FindNextFile(fhandle, fdata) = 0)
    Call ShellSort2(fCache(), 0, fLocalCounter - 1)
    For i = 0 To fLocalCounter - 1
        If fCache(i).isDir Then
            Call realFindFilesTWO(saResults(), sPath + fCache(i).Name, sPattern, bRecurse)
        Else
            If mRCcounter >= UBound(saResults) Then ReDim Preserve saResults(0 To UBound(saResults) + 100)
            saResults(mRCcounter) = sPath + fCache(i).Name
            mRCcounter = mRCcounter + 1
        End If
        DoEvents
    Next i
End If
Call FindClose(fhandle)
ReDim Preserve saResults(0 To mRCcounter - 1)
realFindFilesTWO = mRCcounter
End Function
Private Sub ShellSort2(list() As FileItem, ByVal FromIdx As Long, ByVal ToIdx As Long)
'in this case "a">"A"
Const kOrder = -1       'asc = -1  desc = 1
If FromIdx >= ToIdx Then Exit Sub
Dim tmp As FileItem
Dim diff As Long, range As Long
Dim high As Long, low As Long, high1 As Long
range = ToIdx - FromIdx
If kOrder < 0 Then high = FromIdx: FromIdx = ToIdx: ToIdx = high: high = 0
diff = range: diff = Int(CSng(diff) / 1.3)
Do While diff
    high = FromIdx + diff * kOrder
    For high = high To ToIdx Step kOrder
        Do While high * kOrder <= ToIdx * kOrder
            low = high - diff * kOrder
            If Asc(list(low).Name) > Asc(list(high).Name) Then Exit Do
            If list(low).Name > list(high).Name Then Exit Do
            tmp = list(low): list(low) = list(high): list(high) = tmp
            If diff = 1 Then
                If low * kOrder > FromIdx * kOrder Then high = low
            Else
                high = high + kOrder
            End If
        Loop
    Next
    diff = Int(CSng(diff) / 1.3)
Loop
End Sub

 

by: BeauTPosted on 2001-02-27 at 15:03:31ID: 5885229

WOW!! I have never had this much of a response to a question. Since it seems to be pretty popular, I think I will keep it open for at least a few more days. Besides, I have the tough decision now of figuring out who the points should go to. In any case, mucho mucho gracias to everyone who has responded so far.

 

by: mammouthPosted on 2001-02-28 at 04:19:58ID: 5887059

bpouydog : The link http://www.informant.com/mod/modnewupl.htm. is not working, where can i find this file MOD9803RS.ZIP?

i went to http://www.informant.com/mod but didn't find the file

 

by: maximilianoangelPosted on 2001-03-05 at 10:28:42ID: 5901857

hello, Mr BT
I have a suggestion...
Why dont we try to do this a little more interactive?
(I think that is the way to get the most of this very interesting Q.)
Regards
Max

 

by: BeauTPosted on 2001-03-06 at 10:25:35ID: 5904627

Thanks for the help everybody. What I actually did was use a binary tree, but maximilianoangel seemed to put the most work into it.

 

by: BeauTPosted on 2001-03-06 at 10:26:47ID: 5904634

Thanks ot everyone for the help.

 

by: maximilianoangelPosted on 2001-03-06 at 15:10:08ID: 5905460

Well, thanks. See ya.
(Im a little confused)
Regards
Max

20120131-EE-VQP-002

3 Ways to Join

30-Day Free Trial

The Experts

98% positive feedback on 31,087 answers since March 2000. angeliii is a Microsoft Most Valuable Professional for his work with MS SQL Server & Develoment.

He has also proven his knowledge of Visual Basic Programming, PHP Scripting and Oracle Databases.

The Experts

97% positive feedback on 10,752 answers since July 2000. lrmoore has more than 18 years experience in the networking industry.

The six-time Mircosoft MVPs specialties include firewalls, virtual private networking, and network management.

Testimonials

"...and excellent source for support... Kind of like having your very own IT dept." Electriciansnet

Testimonials

"I was apprehensive at signing up at first. However... it has already made my life as an IT administrator much easier." JaCrews

Testimonials

"WOW! You guys have great, active, and knowledgeable people on here." moore50

Business Clients

Business Clients

In the Press

"If you’ve got a question... Experts Exchange can supply an answer.”

In the Press

"...an invaluable aid for both IT professionals and those who require tech support."

In the Press

"where IT professionals provide quick answers on just about any topic"

Business Account Plans

Loading Advertisement...