PeterBaileyUk
asked on
dont write duplicate into array in vb.net
I have some duplicate words in my table data.
I identified them with expert help as part of ID: 41760693
I think removing them post insert is not right and wonder if the duplicate can be removed just after the split process of the string
but before they get inserted into the sql server db itself.
The for loop i have:
I identified them with expert help as part of ID: 41760693
I think removing them post insert is not right and wonder if the duplicate can be removed just after the split process of the string
but before they get inserted into the sql server db itself.
The for loop i have:
For Each drAccessRecord As DataRow In dtRecordsFromAccess.Rows
'create array of words from string
Dim StrArray() As String = Split(drAccessRecord(FieldDescription))
'deal with progress
Form1.ProgressBar1.Step = 1
Form1.ProgressBar1.Minimum = 1
Form1.ProgressBar1.Maximum = y
'for each word in the array
For index = LBound(StrArray) To UBound(StrArray)
'dont allow unwanted characters
StrClientCodeWordPos = drAccessRecord(FieldNameClientCode) & "_" & RemoveUnwantedChr(StrArray(index)) & "_" & index + 1
StrClientCode = drAccessRecord(FieldNameClientCode)
StrFull = RemoveUnwantedChr(drAccessRecord(FieldDescription))
StrClientName = StrClientName
'get the word
StrWord = RemoveUnwantedChr(StrArray(index))
intWordLen = Len(StrArray(index))
'mark position of word (important to preserve sentence creation'
IntWordPosition = index + 1
IntNoOfWords = UBound(StrArray) + 1
'insert word and other values into sql db
cmdInsert.CommandText = "INSERT INTO TblWords (ClientCodeWordPosition, ClientCode, ClientName, Word, WordLen, StrFull, WordPosition, NoOfWords) VALUES ('" & StrClientCodeWordPos & "','" & StrClientCode & "','" & StrClientName & "','" & StrWord & "'," & intWordLen & ",'" & StrFull & "'," & IntWordPosition & "," & IntNoOfWords & " )"
cmdInsert.ExecuteNonQuery()
Next index
ASKER
ive got a loop comparing each element with the previous ive kept the same array for now just a question of creating the new array (i think)
ive put this after the split
ive put this after the split
Dim pos As Integer = 0
Do Until pos = StrArray.Count
Dim i As Integer
For i = StrArray.Count - 1 To pos + 1 Step -1
If StrArray(pos).ToString = StrArray(i).ToString Then
'do not add row to new array
else
'add row to new array
End If
Next
pos += 1
Loop
ASKER
I am trying to solve here is the latest fragment
For Each drAccessRecord As DataRow In dtRecordsFromAccess.Rows
Dim StrArray() As String = Split(drAccessRecord(FieldDescription))
Dim StrArrayNoDup() As String
Dim pos As Integer = 0
Do Until pos = StrArray.Count
Dim i As Integer
For i = StrArray.Count - 1 To pos + 1 Step -1
If StrArray(pos).ToString = StrArray(i).ToString Then
' do not add
Else
StrArrayNoDup(i) = StrArray(pos)
End If
Next
pos += 1
Loop
ASKER
it didnt work but maybe its just something silly i did the idea seems sound
So you are saying that the duplicates are in StrArray? If that is the case then you can use Enumerable.Distinct in order to remove duplicates; e.g. -
Proof of concept -
-saige-
Dim StrArray = Split(drAccessRecord(FieldDescription)).Distinct(StringComparer.OrdinalIgnoreCase)
Proof of concept -
Module Module1
Private description = "I HAVE duplicates with CAse MiXing the DuPliCates should be removed when I split and use distinct ensuring caSE MixinG is accounted For"
Sub Main()
Console.WriteLine("A normal split of: {0}{1}{0}Produces {2} individual verbs with duplicates.", Environment.NewLine, description, Split(description).Count())
Console.WriteLine()
Console.WriteLine("On the other hand, a split of the preceeding passed through{0}Enumerable.Distinct() produces {1} individual verbs not considering case mixing.", Environment.NewLine, Split(description).Distinct().Count())
Console.WriteLine()
Console.WriteLine("If case mixing is an issue, a split of the preceeding passed through{0}Enumerable.Distinct(StringComparer.OrdinalIgnoreCase) produces {1} individual{0}verbs with no ordinal duplicates.", Environment.NewLine, Split(description).Distinct(StringComparer.OrdinalIgnoreCase).Count())
Console.ReadLine()
End Sub
End Module
Produces the following output:But it looks like you need to retain the position of the removed duplicates, is this accurate?-saige-
ASKER
I take a string of a vehicle description from an access db, its split then the words are stored along with their positions.
I dont need the duplicated word at all so:
if FieldDescription is: "BM BM 125 roadstar"
currently it goes to sql server and stored
bm pos 1
bm pos 2
125 pos 3
roadstar pos 4
the array if its possible which i think so by your description
would create the array as
bm pos 1
125 pos 2
roadstar pos 3
then that can be added to sql as is without the duplicate
I dont need the duplicated word at all so:
if FieldDescription is: "BM BM 125 roadstar"
currently it goes to sql server and stored
bm pos 1
bm pos 2
125 pos 3
roadstar pos 4
the array if its possible which i think so by your description
would create the array as
bm pos 1
125 pos 2
roadstar pos 3
then that can be added to sql as is without the duplicate
That is correct. Using the method I described, Distinct after your split, on "BM BM 125 roadstar" will produce an array { "BM", "125", "roadstar" }.
Which means the resulting entries into sql will be
BM pos 1
125 pos 2
roadstar pos 3
But if the description is "BM bm 125 roadstar", you will end up with an array of { "BM", "bm", "125", "roadstar" }, this is why I add a Comparer (in this case StringComparer) so that I can tell Distinct to remove ordinal duplicates.
-saige-
Which means the resulting entries into sql will be
BM pos 1
125 pos 2
roadstar pos 3
But if the description is "BM bm 125 roadstar", you will end up with an array of { "BM", "bm", "125", "roadstar" }, this is why I add a Comparer (in this case StringComparer) so that I can tell Distinct to remove ordinal duplicates.
-saige-
ASKER
I am just populating the table hopefully it really was a relatively simple change, vb.net is actually quite amazing.
ASKER
i tried like this:
Dim StrArray = Split(drAccessRecord(Field Descriptio n)).Distin ct(StringC omparer.Or dinalIgnor eCase)
it ran but added no records
if i do this
Dim StrArray ()= Split(drAccessRecord(Field Descriptio n)).Distin ct(StringC omparer.Or dinalIgnor eCase)
i get an underline
Dim StrArray = Split(drAccessRecord(Field
it ran but added no records
if i do this
Dim StrArray ()= Split(drAccessRecord(Field
i get an underline
Try with your original declaration:
Your Projects configuration may not accept implicit variable declarations or you may have Option Explicit defined at the top of your code file. Make sure you add .ToArray() after the Distinct method or you will get a runtime error.
-saige-
Dim StrArray() As String = Split(drAccessRecord(FieldDescription)).Distinct(StringComparer.OrdinalIgnoreCase).ToArray()
Your Projects configuration may not accept implicit variable declarations or you may have Option Explicit defined at the top of your code file. Make sure you add .ToArray() after the Distinct method or you will get a runtime error.
-saige-
ASKER
Its failing on the for loop For index = LBound(StrArray) To UBound(StrArray)
Using cnSql As New SqlClient.SqlConnection("Data Source=MAIN-PC\SQLEXPRESS;Initial Catalog=Dictionary;Integrated Security=True;MultipleActiveResultSets=True")
Using cmdInsert As New SqlClient.SqlCommand
cmdInsert.Connection = cnSql
cnSql.Open()
y = dtRecordsFromAccess.Rows.Count
For Each drAccessRecord As DataRow In dtRecordsFromAccess.Rows
'Dim StrArray() As String = Split(drAccessRecord(FieldDescription))
Dim StrArray = Split(drAccessRecord(FieldDescription)).Distinct(StringComparer.OrdinalIgnoreCase)
Form1.ProgressBar1.Step = 1
Form1.ProgressBar1.Minimum = 1
Form1.ProgressBar1.Maximum = y
For index = LBound(StrArray) To UBound(StrArray)
StrClientCodeWordPos = drAccessRecord(FieldNameClientCode) & "_" & RemoveUnwantedChr(StrArray(index)) & "_" & index + 1
StrClientCode = drAccessRecord(FieldNameClientCode)
StrFull = RemoveUnwantedChr(drAccessRecord(FieldDescription))
StrClientName = StrClientName
StrWord = RemoveUnwantedChr(StrArray(index))
intWordLen = Len(StrArray(index))
IntWordPosition = index + 1
IntNoOfWords = UBound(StrArray) + 1
cmdInsert.CommandText = "INSERT INTO TblWords (ClientCodeWordPosition, ClientCode, ClientName, Word, WordLen, StrFull, WordPosition, NoOfWords) VALUES ('" & StrClientCodeWordPos & "','" & StrClientCode & "','" & StrClientName & "','" & StrWord & "'," & intWordLen & ",'" & StrFull & "'," & IntWordPosition & "," & IntNoOfWords & " )"
cmdInsert.ExecuteNonQuery()
Next index
Form1.ProgressBar1.PerformStep()
Form1.Label3.Text = "# of Files Read = " & Math.Round((Form1.ProgressBar1.Value.ToString / y) * 100, 2) & "%"
Form1.Label3.Refresh()
Next
End Using
cnSql.Close()
End Using
Catch ex As Exception
Finally
con.Close()
End Try
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
thank you i will have a follow on question about "" words but i will ask a new question tomorrow.
ASKER
not sure how to make the new array equal to the one with dups
Open in new window