Link to home
Start Free TrialLog in
Avatar of perove
peroveFlag for Norway

asked on

Text-string-parsing-problem

perove's-Parsing-problem..

Hi,
I need a function that will format a comma-separated string/list-thing.
This has to be done according to some rules.

An example:

The String:
"TXT1, TXT2(E12, E13), TXT3, TXT4, TXT3, TXT2(E12,E16), TXT6"

should be transformed to
"TXT1, TXT2(E12, E13, E16), TXT3, TXT4, TXT6"


Each of the TXTn elements should only be one time in the new list.
And the elemnts with a "semi element" (the one with parantesis) like TXT2(..) are alos to be only one time, but with
all of the semi-elements included.
Avatar of jjafferr
jjafferr
Flag of Oman image

Looks like a homework to me
Sorry I couldn't resist saying that
Avatar of perove

ASKER

Well ..I agree that it loks like that but....it is'nt.
(I graduated in -94)

I actually need this for priting a labels on a system I developed for bakery-business. Most of my clients has moved the system to SQL server a long time ago where the database is more "sophisticated" but this client has made a lot of modifications in a old-access97 version.

And as alway I'm late with everyting so I turned to this group in hope of getting some quick answer..

The TXTn and Exxx are just my way of simplifing the problem..

perove
Ok, let me give it a shot,
Is the data in the txt file seperated with commas? like you posted
"TXT1, TXT2(E12, E13), TXT3, TXT4, TXT3, TXT2(E12,E16), TXT6"
Is it only 1 line per text File?
Or multipule lines?
So will TXT2() repeat on the 2nd or 5th line in the txt file?

obviously, it is best to post here a few lines from the text file.
Avatar of perove

ASKER

Yes it is separated by commas.

It is not a file,  it is just a string, the fuunction take a string as argument and give back the formatted string..

perove
Avatar of perove

ASKER

<<So will TXT2() repeat on the 2nd or 5th line in the txt file?>>

No..or Yes or..  TXT2 can be repeated several times in the string. It not necceseraly just 2 times

perove


perove

I have to leave now, but will work on it maybe after 3 hours.

jaffer
Is there always a space after the comma? Typically, how many items per string? How many strings?
Avatar of perove

ASKER

Yes there is always a space after the comma.
(don't know if there is one pr now, but we can say that as "a rule")
I can also make my client use another separationcaracther for the Items with semi element if that make things easyer..


10-15 items pr string
100 strings (but this function only handle one at the time ofcource)

and hey ..added some points

Got to go now, will check it it out first thing tomorrow..
(hoping for a solution)


Avatar of perove

ASKER

To clearify:
<<I can also make my client use another separationcaracther for the Items with semi element if that make things easyer..
>>
What I mean is:

So that the imput list is:
"TXT1, TXT2(E12; E13), TXT3, TXT4, TXT3, TXT2(E12;E16), TXT6"

and the output is
"TXT1, TXT2(E12; E13; E16), TXT3, TXT4, TXT6"

perove
Hey perove,
for a fellow Expert, I present this work to you :o)

Please email me so that I can send you the working mdb (if you are interested):

Make 2 Tables, tblTitle, with the Text field myTitle,
tblDetails, with the Text field myTitle, another Text field myDetails.

The string and syntax I used is:
A = "TXT1, TXT2;E12;E13, TXT3, TXT4, TXT3, TXT2;E12;E16, TXT6, TXT3;E15;E16, TXT3;E16;E17"
so you seperate the TXT with commas, but when there is a E, you attach the TXT with the Es through semi colon, like TXT;E1;E2 then normal comma
The result will be

TXT1, TXT2(E12;E13;E16), TXT3(E15;E16;E17), TXT4, TXT6

you can change the semi colon to anything you wish, just change it in the code.



Copy this code and paste it On Click of a command button:

'Make Reference to DAO object from VB editor > tools > Reference

    Dim AB1() As String
    Dim AB2() As String
    Dim dbs As Database
   
'Part 1: Lets Seperate the values and Enter the Txt in a Table called tblTitle, and the Es in tblDetails

    Set dbs = CurrentDb

    A = "TXT1, TXT2;E12;E13, TXT3, TXT4, TXT3, TXT2;E12;E16, TXT6, TXT3;E15;E16, TXT3;E16;E17"
   
    AB1 = Split(A, ",")             'split based on commas
    For i = 0 To UBound(AB1)        'loop to get the values seperated by commas, start from first value
           
      If InStr(AB1(i), ";") Then    'if the value contains semi colon, then we need to split it further
        AB2 = Split(AB1(i), ";")    'do the second split
        For j = 1 To UBound(AB2)    'loop to get the values seperated by semi colon, start from second value
       
        'Check for Title duplicates
        If DCount("[myTitle]", "tblTitle", "[myTitle]= '" & AB2(0) & "'") = 0 Then
            dbs.Execute "INSERT INTO tblTitle VALUES ('" & AB2(0) & "')"                        'write Title in Table tblTitle
        End If
       'Check for Details duplicates
        If DCount("[myDetails]", "tblDetails", "[myDetails]= '" & AB2(j) & "'") = 0 Then
            dbs.Execute "INSERT INTO tblDetails VALUES ('" & AB2(0) & "' , '" & AB2(j) & "')"   'write Title in Table tblDetails
        End If
       
        Next j
     
     
      Else                          'these are normal value without semi colon
        'Check for Title duplicates
        If DCount("[myTitle]", "tblTitle", "[myTitle]= '" & AB1(i) & "'") = 0 Then
            dbs.Execute "INSERT INTO tblTitle VALUES ('" & AB1(i) & "')"            'write Title in Table tblTitle
        End If
       
      End If
     
    Next i
   
'Part 2: Now that we have the values in the Tables, lets put them together the way we want

    Dim d As Database
    Dim r1 As Recordset
    Dim r2 As Recordset
    Set d = CurrentDb
    Set r1 = d.OpenRecordset("select myTitle from tblTitle")        'open tblTitle to read the values

    r1.MoveFirst                                                    'Move the pointer to the First Record
   
    Do Until r1.EOF                                                 'loop till the last Record in tblTitle

    AddTxt = AddTxt & "," & r1!myTitle                              'lets start putting the values in a Row
   
    If IsNull(DLookup("[myTitle]", "tblDetails", "[myTitle]= '" & r1!myTitle & "'")) = False Then   'if NO records exist, go in
        AddTxt = AddTxt & "("                                       'Add a parentheses to the Row before inner values are added
        Set r2 = d.OpenRecordset("select myDetails from tblDetails where [myTitle]= '" & r1!myTitle & "'")  'filter tblDetails based on myTitle
        r2.MoveFirst                                                'Move the pointer to the First Record
        Do Until r2.EOF                                             'loop till the last Record in tblDetails
           
            AddTxt = AddTxt & r2!myDetails & ";"                    'seperate the items between parentheses with ;

            r2.MoveNext                                             'move to the next Record of tblDetails
        Loop                                                        'loop tblDetails
        AddTxt = Mid(AddTxt, 1, Len(AddTxt) - 1) & ")"              'finished the value between paretheses, lets close the parentheses
    End If                                                          'the NO Record checking finishes here
   
    r1.MoveNext                                                     'move to the next Record of tblTitle

    Loop                                                            'loop till the last Record of tblTitle

    MsgBox Mid(AddTxt, 2)                                           'lets take the first comma out
   
'    DoCmd.OpenQuery "qryDeleteDetails"                             if you want to Delete all Data from tblDetails
'    DoCmd.OpenQuery "qryDeleteTitle"                               if you want to Delete all Data from tblTitle


jaffer
a minor change, so I better repost the whole thing again, instead of making you look for it, so here I go:


Make 2 Tables, tblTitle, with the Text field myTitle,
tblDetails, with the Text field myTitle, another Text field myDetails.

The string and syntax I used is:
A = "TXT1, TXT2;E12;E13, TXT3, TXT4, TXT3, TXT2;E12;E16, TXT6, TXT3;E15;E16, TXT3;E16;E17"
so you seperate the TXT with commas, but when there is a E, you attach the TXT with the Es through semi colon, like TXT;E1;E2 then normal comma
The result will be

TXT1, TXT2(E12;E13;E16), TXT3(E15;E16;E17), TXT4, TXT6

you can change the semi colon to anything you wish, just change it in the code.



Copy this code and paste it On Click of a command button:

'Make Reference to DAO object from VB editor > tools > Reference

    Dim AB1() As String
    Dim AB2() As String
    Dim dbs As Database
   
'Part 1: Lets Seperate the values and Enter the Txt in a Table called tblTitle, and the Es in tblDetails

    Set dbs = CurrentDb

    A = "TXT1, TXT2;E12;E13, TXT3, TXT4, TXT3, TXT2;E12;E16, TXT6, TXT3;E15;E16, TXT3;E16;E17"
   
    AB1 = Split(A, ",")             'split based on commas
    For i = 0 To UBound(AB1)        'loop to get the values seperated by commas, start from first value
           
      If InStr(AB1(i), ";") Then    'if the value contains semi colon, then we need to split it further
        AB2 = Split(AB1(i), ";")    'do the second split
        For j = 1 To UBound(AB2)    'loop to get the values seperated by semi colon, start from second value
       
        'Check for Title duplicates
        If DCount("[myTitle]", "tblTitle", "[myTitle]= '" & AB2(0) & "'") = 0 Then
            dbs.Execute "INSERT INTO tblTitle VALUES ('" & AB2(0) & "')"                        'write Title in Table tblTitle
        End If
       'Check for Details duplicates
        If DCount("[myDetails]", "tblDetails", "[myDetails]= '" & AB2(j) & "' and [myTitle]='" & AB2(0) & "'") = 0 Then
            dbs.Execute "INSERT INTO tblDetails VALUES ('" & AB2(0) & "' , '" & AB2(j) & "')"   'write Title in Table tblDetails
        End If
       
        Next j
     
     
      Else                          'these are normal value without semi colon
        'Check for Title duplicates
        If DCount("[myTitle]", "tblTitle", "[myTitle]= '" & AB1(i) & "'") = 0 Then
            dbs.Execute "INSERT INTO tblTitle VALUES ('" & AB1(i) & "')"            'write Title in Table tblTitle
        End If
       
      End If
     
    Next i
   
'Part 2: Now that we have the values in the Tables, lets put them together the way we want

    Dim d As Database
    Dim r1 As Recordset
    Dim r2 As Recordset
    Set d = CurrentDb
    Set r1 = d.OpenRecordset("select myTitle from tblTitle")        'open tblTitle to read the values

    r1.MoveFirst                                                    'Move the pointer to the First Record
   
    Do Until r1.EOF                                                 'loop till the last Record in tblTitle

    AddTxt = AddTxt & "," & r1!myTitle                              'lets start putting the values in a Row
   
    If IsNull(DLookup("[myTitle]", "tblDetails", "[myTitle]= '" & r1!myTitle & "'")) = False Then   'if NO records exist, go in
        AddTxt = AddTxt & "("                                       'Add a parentheses to the Row before inner values are added
        Set r2 = d.OpenRecordset("select myDetails from tblDetails where [myTitle]= '" & r1!myTitle & "'")  'filter tblDetails based on myTitle
        r2.MoveFirst                                                'Move the pointer to the First Record
        Do Until r2.EOF                                             'loop till the last Record in tblDetails
           
            AddTxt = AddTxt & r2!myDetails & ";"                    'seperate the items between parentheses with ; <<< Change semi colon here if you wish ===

            r2.MoveNext                                             'move to the next Record of tblDetails
        Loop                                                        'loop tblDetails
        AddTxt = Mid(AddTxt, 1, Len(AddTxt) - 1) & ")"              'finished the value between paretheses, lets close the parentheses
    End If                                                          'the NO Record checking finishes here
   
    r1.MoveNext                                                     'move to the next Record of tblTitle

    Loop                                                            'loop till the last Record of tblTitle

    MsgBox Mid(AddTxt, 2)                                           'lets take the first comma out
   
'    DoCmd.OpenQuery "qryDeleteDetails"                             'if you want to Delete all Data from tblDetails
'    DoCmd.OpenQuery "qryDeleteTitle"                              ' if you want to Delete all Data from tblTitle


jaffer
Avatar of perove

ASKER

Hi jjafferr.
Thanks for the code, but there are some problems.

As I said this is a Access97 mdb, unfortunatly the Split() function did'nt come into Access's VBA until Access2000 so I dont have this function here..

Also I was hoping for a solution without involving tabels and DAO stuff, just a plain txt-parser thing.
I will leave the q open & increase the points. Will anyway give you something for your efford whien the q is answerd.
Thanks again.
perove
:o(  No split, Back to the drawing board again.
Avatar of perove

ASKER

I have done something simliar before, the problem I had was with the damn  semi elements.
Here is my code for a simliar problem where i format a comma-separated-list so that a element is only shown once

Function CountCSVWords(S As String, Separ As String) As Integer
'
' Counts words in a string separated by Separ.
'
Dim WC As Integer, Pos As Integer
  If VarType(S) <> 8 Or Len(S) = 0 Then
    CountCSVWords = 0
    Exit Function
  End If
  WC = 1
  Pos = InStr(S, Separ)
  Do While Pos > 0
    WC = WC + 1
    Pos = InStr(Pos + 1, S, Separ)
  Loop
  CountCSVWords = WC
End Function


Function GetCSVWord(S As String, Indx As Integer, Separ As String)
'
' Returns the <Indx>th word from a comma-separated string.
' For example, GetCSVWord("Nancy, Bob", 2) returns Bob.
'
Dim WC As Integer, Count As Integer, SPos As Integer, EPos As Integer
  WC = CountCSVWords(S, Separ)
  If Indx < 1 Or Indx > WC Then
    GetCSVWord = Null
    Exit Function
  End If
  Count = 1
  SPos = 1
  For Count = 2 To Indx
    SPos = InStr(SPos, S, Separ) + 1
  Next Count
  EPos = InStr(SPos, S, Separ) - 1
  If EPos <= 0 Then EPos = Len(S)
  GetCSVWord = Mid(S, SPos, EPos - SPos + 1)
End Function

Function ReplaceStr(TextIn, SearchStr, Replacement, CompMode As Integer)
'
' Replaces the SearchStr string with Replacement string in the TextIn string.
' Uses CompMode to determine comparison mode
'
Dim WorkText As String, Pointer As Integer
  If IsNull(TextIn) Then
    ReplaceStr = Null
  Else
    WorkText = TextIn
    Pointer = InStr(1, WorkText, SearchStr, CompMode)
    Do While Pointer > 0
      WorkText = Left(WorkText, Pointer - 1) & Replacement & Mid(WorkText, Pointer + Len(SearchStr))
      Pointer = InStr(Pointer + Len(Replacement), WorkText, SearchStr, CompMode)
    Loop
    ReplaceStr = WorkText
  End If
End Function

Function GetUniqueElementsCSV(s3 As String) As String
'Dim s3 As String


Dim i As Integer
Dim j As Integer
Dim retstr As String
Dim curstr As String
Dim curstr2 As String

For i = 1 To CountCSVWords(s3, ", ")
    curstr = GetCSVWord(s3, i, ", ")
    If InStr(1, retstr, curstr) = 0 Then
    retstr = retstr & curstr & ", "
    End If
Next i
GetUniqueElementsCSV = retstr
End Function


'To test the function:
'?GetUniqueElementsCSV("Txt1, Txt2, Txt2, Txt4, Txt4, Txt6")

perove
Wait Jaffer,

I have a split function "work around" for access 97 at work.  I send it in the morning if it can wait. T minus 12 hours

Dave
Hey Dave,
thanks for accepting the invitation and comming in, I knew your input would be valueble.

jaffer
Avatar of perove

ASKER

Still i really really really  would prefer a solution without putting this into a table before I format the string
I know I should have stated this in my original q but..

sorry ..please don't give up on me...


perove


cant resist..  i think this is it

Function Split(ByVal strIn As String, strDelimiter As String) As Variant
 
  Dim astrItems() As String
  Dim intPos As Integer
  Dim intStart As Integer
  Dim strTemp As String
  Dim intCount As Integer
 
  intStart = 1
  ' Make sure we've got an array here!
  ReDim astrItems(0 To 0)
  Do
    intPos = InStr(intStart, strIn, strDelimiter)
    If intPos > 0 Then
      strTemp = Mid$(strIn, intStart, intPos - intStart)
      intStart = intPos + Len(strDelimiter)
    Else
      strTemp = Mid(strIn, intStart)
    End If
    If Len(strTemp) > 0 Then
      ReDim Preserve astrItems(0 To intCount)
      astrItems(intCount) = strTemp
      intCount = intCount + 1
    End If
  Loop While intPos > 0
  splitString = astrItems
End Function

use like in this example

Sub test()

Dim sString() As String

sString = splitString("a,b,c", ",")

    For i = 0 To UBound(sString)
        Debug.Print sString(i)
    Next
End Sub

perove,

I know your pain with Access 97

No Split, join, replace... i have workarounds for them all at work

Hope all goes well..

Dave
Ill have a play..

See you soon
Hey Dave,
you must be having an Elephants memory, remmembering such a code!
that split code worked with a minor change, the Name of the Function be splitString.

I am trying working on a code too.

jaffer
>> you must be having an Elephants memory, remmembering such a code!

nar, found it on the web again
Avatar of Mourdekai
Mourdekai

Sorry to come in this late in the game, but I've been following from the beginning.  I think I have something that will work for you perove.

Use the code below (I borrowed the sort function from another Q at EE)

I apologize for the lack of comments (and a lot of bad naming), but I didn't think I would really come up with a solution that worked.  Just pass the function a line like this:

?ParseString("TXT1, TXT2(E12; E13), TXT3, TXT4, TXT3, TXT2(E12;E16), TXT6")
TXT1, TXT2(E12;E13;E16), TXT3, TXT4, TXT6

Hope this helps...

'********************************

Function ParseString(strLine As String) As String
   Dim blnDupe As Boolean
   Dim txtArray() As Integer
   Dim eArray() As String
   Dim i As Integer
   Dim intInnerElements As Integer
   Dim iPos As Integer
   Dim iPosSearch As Integer
   Dim intElements As Integer
   Dim intNewNum As String
   ReDim Preserve txtArray(0)
   ReDim Preserve eArray(0)
   Dim strTemp As String
   Dim strResult As String
   
   intElements = 0
   
   For iPos = 1 To Len(strLine)
        If Mid(strLine, iPos, 3) = "TXT" Then
            intNewNum = GetNumAfterText(Mid(strLine, iPos + 3))
            blnDupe = False
            For iPosSearch = 0 To UBound(txtArray)
                If txtArray(iPosSearch) = intNewNum Then
                    blnDupe = True
                    Exit For
                End If
            Next iPosSearch
            If Not blnDupe Then
                If intElements <> 0 Then
                    ReDim Preserve txtArray(UBound(txtArray) + 1)
                End If
                txtArray(intElements) = intNewNum
                intElements = intElements + 1
            End If
        End If
   Next iPos
   ReDim Preserve txtArray(UBound(txtArray))
   Sort txtArray()
   For intElements = 0 To UBound(txtArray)
        For iPos = 1 To Len(strLine)
            If Mid(strLine, iPos, 3) = "TXT" Then
                If GetNumAfterText(Mid(strLine, iPos + 3)) = txtArray(intElements) Then
                    If Mid(strLine, iPos + 3 + Len(Trim(txtArray(intElements))), 1) = "(" Then
                        i = 3
                        strTemp = "("
                        Do Until strTemp = ")"
                                strTemp = Mid(strLine, iPos + i, 1)
                                i = i + 1
                            If strTemp = "E" Then
                                intNewNum = GetNumAfterText(Mid(strLine, iPos + i))
                                blnDupe = False
                                For iPosSearch = 0 To UBound(eArray)
                                    If eArray(iPosSearch) = intNewNum Then
                                        blnDupe = True
                                        Exit For
                                    End If
                                Next iPosSearch
                                If Not blnDupe Then
                                    If intInnerElements <> 0 Then
                                        ReDim Preserve eArray(UBound(eArray) + 1)
                                    End If
                                    eArray(intInnerElements) = intNewNum
                                    intInnerElements = intInnerElements + 1
                                End If
                            End If
                        Loop
                    End If
                End If
            End If
        Next iPos
        Sort eArray()
        strResult = strResult & "TXT" & txtArray(intElements)
        If intInnerElements > 0 Then
            strResult = strResult & "("
            For i = 0 To UBound(eArray)
                strResult = strResult & "E" & eArray(i) & ";"
            Next i
            strResult = Mid(strResult, 1, Len(strResult) - 1)
            strResult = strResult & "), "
        Else
            strResult = strResult & ", "
        End If
        intInnerElements = 0
        ReDim eArray(0)
   Next intElements
   ParseString = Mid(strResult, 1, Len(strResult) - 2)
End Function

Function GetNumAfterText(strNum As String) As Integer
    Dim i As Integer
    Dim blnNumeric As Boolean
    Dim strTemp As String
    i = 1
    strTemp = 1
    Do While IsNumeric(strTemp)
        strTemp = Mid(strNum, i, 1)
        i = i + 1
    Loop
    i = i - 2
    GetNumAfterText = Trim(Mid(strNum, 1, i))
End Function

Function Sort(TempArray As Variant)
    Dim Temp As Variant
    Dim i As Integer
    Dim NoExchanges As Integer

    ' Loop until no more "exchanges" are made.
    Do
        NoExchanges = True

        ' Loop through each element in the array.
        For i = 0 To UBound(TempArray) - 1

            ' If the element is greater than the element
            ' following it, exchange the two elements.
            If TempArray(i) > TempArray(i + 1) Then
                NoExchanges = False
                Temp = TempArray(i)
                TempArray(i) = TempArray(i + 1)
                TempArray(i + 1) = Temp
            End If
        Next i
    Loop While Not (NoExchanges)
End Function
NICE!!!!!!
Although I didn't give up yet, but my mind is scattered know a days.
I hope Mourdekai's solution will work for you perove.

Good job Mourdekai.

jaffer
Avatar of perove

ASKER

Mourdekai, good job.
but..
The only thing is that the Txt(n) was just sample of text.
In the "real world" I cannot search for Txt (hard coded) I need it to work on all kinds of word that is written according to the rules.

So can you re-write it to work on any text..I will thorow in a couple of houndred more words 4 U. Hope that is OK -if not youll get your reward as it is since I should have explaind it a bit better..

This a example on a real string the the function have to parse. (it is in norwegian..)

 "hvetemel, vann, surhetsregulerende middel (E331;E330), sukker, surhetsregulerende middel (E333;E331;E335;E331)"

perove
Avatar of perove

ASKER

POINTS NOT WORDS.....the sentence in my previous comment:
<<..I will thorow in a couple of houndred more words 4 U. >>

should be

<<..I will thorow in a couple of houndred more POINTS 4 U. >>

perove
ASKER CERTIFIED SOLUTION
Avatar of Mourdekai
Mourdekai

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
jaffer

In case you did not notice this, perove posted a q for your help here:

https://www.experts-exchange.com/questions/21113765/Points-for-jjafferr.html
Avatar of perove

ASKER

Mordekai.
Need som changes to this function. Interested in say 200 p
(or anyone else)
The cahges are that.

1. The values inside the paranthesis does not always start with a E nor does it have to be a number
(se sample-string)

2.There is a error if one of the elements contans a substring of another element

3.The "subvalues eg the one inside the paranthesis can contain several words.

So here is a test string
Parsestring("hvetemel, vann, surhetsregulerende middel (Bynne;E330), su, surhetsregulerende middel (Bynne;E331;E335;E331)")

this string should be converted to:

Parsestring("hvetemel, vann, surhetsregulerende middel (Bynne;E330;E331;E335;E331), su