linuxrox
asked on
How do I create a large array of unique strings?
Hello! What i need is a function that creates a large array of unique strings. The array will need to be around 30,000 elements. each string in each element needs to be unique and composed of A-Za-z0-9
so lets just say each string was something like:
A1F3kj87DE1385 //something like that. i'd like each string to not be over 32 chars but if that isn't possible then it's okay. Basically i'll have another loop in my program that adds things to a file and one part of each line in the file will be one of these unique strings in this array.
Can someone provide me with a function that will achieve this goal?
thanks!!
so lets just say each string was something like:
A1F3kj87DE1385 //something like that. i'd like each string to not be over 32 chars but if that isn't possible then it's okay. Basically i'll have another loop in my program that adds things to a file and one part of each line in the file will be one of these unique strings in this array.
Can someone provide me with a function that will achieve this goal?
thanks!!
ASKER
that's pretty cool. honestly i'd rather it be random.
like some function i throw in a loop like:
for x = 0 to 30000
somearray() = unique_function() //which makes somearray() have 30,000 unique strings
next x
your code is nifty though and i like the idea!! i have some code i was using but i have some problems with it...let me see if i can get it..
like some function i throw in a loop like:
for x = 0 to 30000
somearray() = unique_function() //which makes somearray() have 30,000 unique strings
next x
your code is nifty though and i like the idea!! i have some code i was using but i have some problems with it...let me see if i can get it..
ASKER
Try this code below and you'll see that the file that it writes "num.txt" contains duplicates. after it creates the file just use a text editor like editpad pro or something and sort the results Ascending.
you can then show the duplicates in editpad pro by putting this in the searchbox at the bottom:
^(.*)(\r?\n\1)+$ click "regular expression" at right bottom on editpad pro and it will highlight the duplicates.
Here's what i don't understand about this, if i don't put the values into an array (in this case the array is called id1) and lets say i just print the value from num1 = GenerateCode(32), then i don't get duplicates! i don't understand that at all!
you can then show the duplicates in editpad pro by putting this in the searchbox at the bottom:
^(.*)(\r?\n\1)+$ click "regular expression" at right bottom on editpad pro and it will highlight the duplicates.
Here's what i don't understand about this, if i don't put the values into an array (in this case the array is called id1) and lets say i just print the value from num1 = GenerateCode(32), then i don't get duplicates! i don't understand that at all!
Option Explicit
Private Sub Form_Load()
Dim num As Long
Dim num1 As String
Dim id1(2000) As String
For num = 0 To 2000
num1 = GenerateCode(32)
id1(num) = num1
num1 = ""
Next num
Dim item As Variant
For Each item In id1
Dim nFileNum3 As Integer
nFileNum3 = FreeFile
Open App.Path & "\num.txt" For Append As nFileNum3
Print #nFileNum3, item
Close nFileNum3
Next
End
end sub
Public Function GenerateCode(length As Integer)
Dim strInputString As String
Dim intLength As Integer
Dim intNameLength As Integer
Dim strName As String
Dim intStep As Integer
Dim intRnd As Integer
Call Randomize
strInputString = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" 'these are the characters which will be in the password
intLength = Len(strInputString)
intNameLength = length 'edit this according to how long u want ur password to be
'Call Randomize ' jus to make it random :D
' Call Randomize
strName = ""
For intStep = 0 To intNameLength
intRnd = Int((intLength * Rnd) + 1)
strName = strName & Mid(strInputString, intRnd, 1)
Next
GenerateCode = strName
End Function
ASKER
if i just do this below then i don't get duplicates....makes no sense...but i can't do this because i need the values in an array..
Dim num As Long
Dim num1 As String
Dim id1(2000) As String
For num = 0 To 2000
num1 = GenerateCode(32)
Dim nFileNum4 As Integer
nFileNum4 = FreeFile
Open App.Path & "\num.txt" For Append As nFileNum4
Print #nFileNum4, num1
Close nFileNum4
id1(num) = num1
num1 = ""
Next num
You will get duplicates with that code...if you're not then it is just a matter of not generating enough values and you just got lucky. =)
Do you really need an Array? How about a Collection instead?
Do you really need an Array? How about a Collection instead?
ASKER
a collection would be fine as well...as long as i can pull a value out of the collection while in another loop you know?
what is it about that code that causes there to be duplicates? can that code be modified so it wouldn't have them?
what is it about that code that causes there to be duplicates? can that code be modified so it wouldn't have them?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Your GenerateCode() function simply generates a string of random characters of a specified length. It has no knowledge of what codes were generated BEFORE. Given a small sample set, it is possible to generate a list with no repeats but there is no guarantee that it will be so.
To make a list with NO repeats you have to check ALL of the values already in the list to make sure you don't add a repeat.
There are two generic algorithms to do this:
(1) Brute Force - Generate a value. Is it in the list? Yes - Pick another value. No - Add it to the List. Repeat until you have the desired number of values. Obviously as your list gets bigger and bigger it will take longer and longer to determine if a value has already been "chosen". An advantage of the Collection object over an Array is that you can add items with a Key and then quickly tell if a given Key already Exists in the Collection. (A Dictionary object is even better in this respect.)
(2) Shuffle - Generate a list of "ordered" values. Iterate over the list and swap each item with a random item from another slot in the list. This is just like a card deck shuffle.
To make a list with NO repeats you have to check ALL of the values already in the list to make sure you don't add a repeat.
There are two generic algorithms to do this:
(1) Brute Force - Generate a value. Is it in the list? Yes - Pick another value. No - Add it to the List. Repeat until you have the desired number of values. Obviously as your list gets bigger and bigger it will take longer and longer to determine if a value has already been "chosen". An advantage of the Collection object over an Array is that you can add items with a Key and then quickly tell if a given Key already Exists in the Collection. (A Dictionary object is even better in this respect.)
(2) Shuffle - Generate a list of "ordered" values. Iterate over the list and swap each item with a random item from another slot in the list. This is just like a card deck shuffle.
ASKER
i see. i ran aikimark's code and i don't see any duplicates.
do you all think this would be a reliable way to do this?
do you all think this would be a reliable way to do this?
Private Sub Form_Load()
Dim somearray(30000) As String
Dim j As Long
For j = 0 To 30000
somearray(j) = RandString
Next j
MsgBox UBound(somearray)
Dim item1 As Variant
For Each item1 In somearray
Dim nFileNum5 As Integer
nFileNum5 = FreeFile
Open App.Path & "\num.txt" For Append As nFileNum5
Print #nFileNum5, item1
Close nFileNum5
Next
End
ASKER
Idle_Mind:
why though would i not get any duplicates when i just run the loop and write the file bypassing putting the values into an array?
why though would i not get any duplicates when i just run the loop and write the file bypassing putting the values into an array?
Simply put...you got lucky!
Standby...I'll post code that generates a collection with no duplicates.
Standby...I'll post code that generates a collection with no duplicates.
ASKER
Idle_Mind: hehe, well honestly, i've never had any dup's when it wasn't in an array!! : )
@linuxrox
The VB Rnd function has a period of 2^24, so you can expect to get 559240 unique 30-character strings. You will get more unique strings if their lengths are reduced.
If you need more than that:
There are 64k unique seeds available via the Randomize statement and you can start the (PRNG) sequence at some offset other than the first pseudo-random number as long as the offset is not equal to your string output length.
Reference (my analysis of the VB PRNG):
http://www.15seconds.com/issue/051110.htm
The VB Rnd function has a period of 2^24, so you can expect to get 559240 unique 30-character strings. You will get more unique strings if their lengths are reduced.
If you need more than that:
There are 64k unique seeds available via the Randomize statement and you can start the (PRNG) sequence at some offset other than the first pseudo-random number as long as the offset is not equal to your string output length.
Reference (my analysis of the VB PRNG):
http://www.15seconds.com/issue/051110.htm
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If you want to ensure that you have no duplicates, you can use a collection, whose items' key property are unique.
Dim somearray(30000) As String
Dim colThing As New Collection
Dim j As Long
Dim strTemp As String
On Error Resume next
j = 0
Do
strTemp = RandString
colThing.Add "a",strTemp
If Err = 0 Then
somearray(j) = strTemp
j = + 1
Else
Err.Clear
End If
Loop Until J > 30000
ASKER
This is great guys! i truly appreciate the help on this!! very nice jobs!!
you guys rock!
you guys rock!
@linuxrox
Thanks for the points and the interesting question.
* If you are using a collection to guarantee string uniqueness, you might as well use the string to hold the strings as well, replacing the somearray.
* If speed is a consideration, you might use the code in the snippet. It is about 25% faster than the code in the accepted comment.
Thanks for the points and the interesting question.
* If you are using a collection to guarantee string uniqueness, you might as well use the string to hold the strings as well, replacing the somearray.
* If speed is a consideration, you might use the code in the snippet. It is about 25% faster than the code in the accepted comment.
Public Function RandString_Fast(Optional parmLen As Long = 30) As String
Static chars(0 To 61) As Byte
Static b() As Byte
Static lngPosn As Long
Static lngLoop As Long
If chars(0) = 0 Then
For lngLoop = 0 To 9
chars(lngLoop) = Asc(lngLoop)
Next
lngPosn = 10
For lngLoop = Asc("a") To Asc("z")
chars(lngPosn) = lngLoop
lngPosn = lngPosn + 1
Next
For lngLoop = Asc("A") To Asc("Z")
chars(lngPosn) = lngLoop
lngPosn = lngPosn + 1
Next
lngLoop = 62
'Note: move the ReDim statement outside this If...EndIf
' construct if different lengths will be requested
' in a single execution
ReDim b(0 To parmLen - 1)
End If
For lngPosn = 0 To parmLen - 1
b(lngPosn) = chars(Int(Rnd * lngLoop))
Next
RandString_Fast = StrConv(b, vbUnicode)
End Function
I get another performance boost (35% better than accepted solution) with:
Dim lngPosn As Long
Dim lngLoop As Long
Dim lngPosn As Long
Dim lngLoop As Long
ASKER
AWESOME!!! performance boosts rock!!! thanks once again!!
Try this out...just make a Form with a Button and a TextBox. Click the TextBox repeatedly and see what happens to the value:
Open in new window