Link to home
Start Free TrialLog in
Avatar of linuxrox
linuxroxFlag for United States of America

asked on

How do I create a large array of unique strings?

Hello!  What i need is a function that creates a large array of unique strings.  The array will need to be around 30,000 elements.  each string in each element needs to be unique and composed of A-Za-z0-9
so lets just say each string was something like:

A1F3kj87DE1385    //something like that.  i'd like each string to not be over 32 chars but if that isn't possible then it's okay.  Basically i'll have another loop in my program that adds things to a file and one part of each line in the file will be one of these unique strings in this array.

Can someone provide me with a function that will achieve this goal?
thanks!!
Avatar of Mike Tomlinson
Mike Tomlinson
Flag of United States of America image

Do they have to be "random"...or would you be happy with a sequence of values that "count" using your specified characters?

Try this out...just make a Form with a Button and a TextBox.  Click the TextBox repeatedly and see what happens to the value:
Option Explicit
 
Private Sub Form_Load()
    Text1.Text = "A1F3kj87DE1385"
End Sub
 
Private Sub Command1_Click()
    Text1.Text = NextRevision(Text1.Text)
End Sub
 
Private Function NextRevision(Optional ByVal curRevision As String) As String
    Dim i As Integer
    Dim char As String
    Dim chars As String
    Dim index As Integer
    Dim charArray() As String
    
    ' modify the line below to include the characters you
    ' want in the sequence
    chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    
    ' blank revision passed in, return first letter in sequence
    curRevision = Trim(UCase(curRevision))
    If curRevision = "" Then
        NextRevision = Left(chars, 1)
        Exit Function
    End If
    
    ' make sure current revision has only valid characters in it
    ' otherwise...return first letter in sequence
    For i = 1 To Len(curRevision)
        char = Mid(curRevision, i, 1)
        If InStr(chars, char) = 0 Then
            MsgBox "Returning: " & Left(chars, 1), vbCritical, "Invalid Starting Revision: " & curRevision
            NextRevision = Left(chars, 1)
            Exit Function
        End If
    Next i
    
    ' build a character array from the current revision
    ReDim charArray(Len(curRevision) - 1)
    For i = 1 To Len(curRevision)
        charArray(i - 1) = Mid(curRevision, i, 1)
    Next i
    
    ' see if the just the last letter needs to be "incremented"
    char = charArray(UBound(charArray))
    index = InStr(chars, char)
    If index < Len(chars) Then
        charArray(UBound(charArray)) = Mid(chars, index + 1, 1)
        NextRevision = Join(charArray, "")
        Exit Function
    End If
                
    ' last letter "rolled over"
    ' propagate the "carry over" as far as it needs to go
    charArray(UBound(charArray)) = Left(chars, 1)
    For i = UBound(charArray) - 1 To 0 Step -1
        char = charArray(i)
        index = InStr(chars, char)
        If index < Len(chars) Then
            index = index + 1
            charArray(i) = Mid(chars, index, 1)
            NextRevision = Join(charArray, "")
            Exit Function
        Else
            charArray(i) = Left(chars, 1)
        End If
    Next i
    
    ' "carry over" propagated all the way thru sequence
    ' add first letter of sequence to the beginning of the revision
    NextRevision = Left(chars, 1) & Join(charArray, "")
End Function

Open in new window

Avatar of linuxrox

ASKER

that's pretty cool.  honestly i'd rather it be random.
like some function i throw in a loop like:

for x = 0 to 30000
somearray() = unique_function()  //which makes somearray() have 30,000 unique strings

next x

your code is nifty though and i like the idea!!  i have some code i was using but i have some problems with it...let me see if i can get it..
Try this code below and you'll see that the file that it writes "num.txt" contains duplicates.  after it creates the file just use a text editor like editpad pro or something and sort the results Ascending.

you can then show the duplicates in editpad pro by putting this in the searchbox at the bottom:

^(.*)(\r?\n\1)+$  click "regular expression" at right bottom on editpad pro and it will highlight the duplicates.

Here's what i don't understand about this, if i don't put the values into an array (in this case the array is called id1) and lets say i just print the value from num1 = GenerateCode(32), then i don't get duplicates!  i don't understand that at all!
Option Explicit
Private Sub Form_Load()
Dim num As Long
Dim num1 As String
Dim id1(2000) As String
For num = 0 To 2000
 
num1 = GenerateCode(32)
id1(num) = num1
num1 = ""
Next num
Dim item As Variant
For Each item In id1
Dim nFileNum3 As Integer
nFileNum3 = FreeFile
Open App.Path & "\num.txt" For Append As nFileNum3
Print #nFileNum3, item
Close nFileNum3
Next
End
end sub
 
Public Function GenerateCode(length As Integer)
Dim strInputString As String
Dim intLength As Integer
Dim intNameLength As Integer
Dim strName As String
Dim intStep As Integer
Dim intRnd As Integer
Call Randomize
   strInputString = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" 'these are the characters which will be in the password
  
   intLength = Len(strInputString)
  
   intNameLength = length 'edit this according to how long u want ur password to be
  
   'Call Randomize ' jus to make it random :D
  ' Call Randomize
  
   strName = ""
  
   For intStep = 0 To intNameLength
       intRnd = Int((intLength * Rnd) + 1)
       strName = strName & Mid(strInputString, intRnd, 1)
   Next
  
   GenerateCode = strName
End Function

Open in new window

if i just do this below then i don't get duplicates....makes no sense...but i can't do this because i need the values in an array..
Dim num As Long
Dim num1 As String
Dim id1(2000) As String
For num = 0 To 2000
num1 = GenerateCode(32)
Dim nFileNum4 As Integer
nFileNum4 = FreeFile
Open App.Path & "\num.txt" For Append As nFileNum4
Print #nFileNum4, num1
Close nFileNum4
id1(num) = num1
num1 = ""
Next num

Open in new window

You will get duplicates with that code...if you're not then it is just a matter of not generating enough values and you just got lucky.  =)

Do you really need an Array?  How about a Collection instead?
a collection would be fine as well...as long as i can pull a value out of the collection while in another loop you know?

what is it about that code that causes there to be duplicates?  can that code be modified so it wouldn't have them?
ASKER CERTIFIED SOLUTION
Avatar of aikimark
aikimark
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Your GenerateCode() function simply generates a string of random characters of a specified length.  It has no knowledge of what codes were generated BEFORE.  Given a small sample set, it is possible to generate a list with no repeats but there is no guarantee that it will be so.

To make a list with NO repeats you have to check ALL of the values already in the list to make sure you don't add a repeat.

There are two generic algorithms to do this:

(1) Brute Force - Generate a value.  Is it in the list?  Yes - Pick another value.  No - Add it to the List.  Repeat until you have the desired number of values.  Obviously as your list gets bigger and bigger it will take longer and longer to determine if a value has already been "chosen".  An advantage of the Collection object over an Array is that you can add items with a Key and then quickly tell if a given Key already Exists in the Collection.  (A Dictionary object is even  better in this respect.)

(2) Shuffle - Generate a list of "ordered" values.  Iterate over the list and swap each item with a random item from another slot in the list.  This is just like a card deck shuffle.
i see.  i ran aikimark's code and i don't see any duplicates.
do you all think this would be a reliable way to do this?
Private Sub Form_Load()
Dim somearray(30000) As String
Dim j As Long
For j = 0 To 30000
   somearray(j) = RandString
Next j
MsgBox UBound(somearray)
Dim item1 As Variant
For Each item1 In somearray
Dim nFileNum5 As Integer
nFileNum5 = FreeFile
Open App.Path & "\num.txt" For Append As nFileNum5
Print #nFileNum5, item1
Close nFileNum5
Next
End

Open in new window

Idle_Mind:

why though would i not get any duplicates when i just run the loop and write the file bypassing putting the values into an array?
Simply put...you got lucky!

Standby...I'll post code that generates a collection with no duplicates.
Idle_Mind:  hehe, well honestly, i've never had any dup's when it wasn't in an array!! : )
@linuxrox

The VB Rnd function has a period of 2^24, so you can expect to get 559240 unique 30-character strings.  You will get more unique strings if their lengths are reduced.

If you need more than that:
There are 64k unique seeds available via the Randomize statement and you can start the (PRNG) sequence at some offset other than the first pseudo-random number as long as the offset is not equal to your string output length.

Reference (my analysis of the VB PRNG):
http://www.15seconds.com/issue/051110.htm
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If you want to ensure that you have no duplicates, you can use a collection, whose items' key property are unique.
Dim somearray(30000) As String
Dim colThing As New Collection
Dim j As Long
Dim strTemp As String
On Error Resume next
j = 0 
Do
   strTemp = RandString
   colThing.Add "a",strTemp
   If Err = 0 Then
       somearray(j) = strTemp
       j = + 1
   Else
       Err.Clear
   End If
Loop Until J > 30000

Open in new window

This is great guys!  i truly appreciate the help on this!! very nice jobs!!
you guys rock!
@linuxrox

Thanks for the points and the interesting question.

* If you are using a collection to guarantee string uniqueness, you might as well use the string to hold the strings as well, replacing the somearray.

* If speed is a consideration, you might use the code in the snippet.  It is about 25% faster than the code in the accepted comment.
Public Function RandString_Fast(Optional parmLen As Long = 30) As String
    Static chars(0 To 61) As Byte
    Static b() As Byte
    Static lngPosn As Long
    Static lngLoop As Long
    If chars(0) = 0 Then
        For lngLoop = 0 To 9
            chars(lngLoop) = Asc(lngLoop)
        Next
        lngPosn = 10
        For lngLoop = Asc("a") To Asc("z")
            chars(lngPosn) = lngLoop
            lngPosn = lngPosn + 1
        Next
        For lngLoop = Asc("A") To Asc("Z")
            chars(lngPosn) = lngLoop
            lngPosn = lngPosn + 1
        Next
        lngLoop = 62
        'Note: move the ReDim statement outside this If...EndIf
        '       construct if different lengths will be requested
        '       in a single execution
        ReDim b(0 To parmLen - 1)
    End If
    
    For lngPosn = 0 To parmLen - 1
        b(lngPosn) = chars(Int(Rnd * lngLoop))
    Next
    
    RandString_Fast = StrConv(b, vbUnicode)
End Function

Open in new window

I get another performance boost (35% better than accepted solution) with:

    Dim lngPosn As Long
    Dim lngLoop As Long
AWESOME!!!  performance boosts rock!!! thanks once again!!