?
Solved

Parsing a text file into 2 listboxes based on strings

Posted on 2003-03-05
15
Medium Priority
?
189 Views
Last Modified: 2011-09-20
This may be simpler than I'm making it, but it is really confounding me.

I want to parse this file, a bookmark file into two listboxes.

Basically, the attributes in the file are thus:
-----------------------------------------

<DT><H3 ADD_DATE="961102203" ID="NC:BookmarksRoot#$b742f58">Developer Information</H3>

<H3 -strings- > - denotes folder name start
'string'</H3> - denotes end of folder name


<DT><A HREF="http://www.faqs.org/rfcs/" ADD_DATE="961104168">RFC Archive</A>

<A href="string"> - denotes URL
'string'</a> - denotes end of URL description

-----------------------------------------

I want to put the URL into listbox one, and the description into listbox2.

What's the best way to search for <a href="string"> and </a> and work with both the url and the description inbetween the <a> </a> tags?

My head hurts.  :(
0
Comment
Question by:mcdev
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
15 Comments
 
LVL 10

Expert Comment

by:aeklund
ID: 8073296
This should do it for you:

Private Sub Command1_Click()
  Dim lfnum As Long
  lfnum = FreeFile
 
  Dim sLine As String
  Dim sPage As String
  Open "c:\test.htm" For Input As #lfnum
    Do Until EOF(lfnum)
      Line Input #lfnum, sLine
      sPage = sPage & sLine
    Loop
  Close #lfnum
 
  Dim lpos1 As Long
  Dim lpos2 As Long
 
  Do
    lpos1 = InStr(1, UCase(sPage), "<A HREF")
    If lpos1 = 0 Then Exit Do
    sPage = Right(sPage, Len(sPage) - lpos1 + 1)
   
    lpos1 = InStr(1, UCase(sPage), "HREF=" & Chr(34))
    lpos1 = lpos1 + 6
    lpos2 = InStr(lpos1 + 1, sPage, Chr(34))
    List1.AddItem Mid(sPage, lpos1, lpos2 - lpos1)
   
    lpos1 = InStr(1, sPage, ">") + 1
    lpos2 = InStr(lpos1 + 1, UCase(sPage), "</A")
    List2.AddItem Mid(sPage, lpos1, lpos2 - lpos1)
   
    sPage = Right(sPage, Len(sPage) - lpos2 - 3)
  Loop
End Sub
0
 
LVL 2

Expert Comment

by:keenez
ID: 8073302
There may be a simpler way but I would use a 2 step approach.

Take the original string, split it with a delimiter of <A REF=.  You can disregard the first element in the array because you know it precedes the <A HREF.

Now you can split again (the splitted strings) with a delimiter of </A> to get what's inbetween the two.

Eg.

blah blah blah <A HREF="www.experts-exchange.com"> more
blah </A>

split once for <a HREF="
element 0 - blah blah blah
element 1 - www.experts-exchange.com"> more blah </A>

split element 1 for ">
new element 0 - www.experts-exchange.com
new element 1 - more blah </A>

you can then remove the </A> with an left(new element 1, length(new element 1) - instrrev(new element 1, </A>

There's probably a much simpler function but this is what I would do if I couldn't find it.  Of course, this closer to pseudo code.

Cheers,

keenez
0
 
LVL 1

Expert Comment

by:Renato102098
ID: 8073332
Hi
You know that: first came URL address and then URL description if exists.
The URL address is between "" and URL description is between ">" and "</A>"

I'm have two function (adapted from Lisp Lenguage) CAR and CDR

This is an Example for Visual Basic (3.0 to 6.0)
In the Form you draw a Command Button the name is Command1

'CAR Function retrive the string up to first caracter
Function car (ByVal Lista As String, ByVal caracter As String) As String
Lista = Trim(Lista)
If InStr(1, Lista, caracter) > 0 Then
    car = Trim(Left(Lista, InStr(1, Lista, caracter) - 1))
Else
    car = Trim(Lista)
End If
End Function

'CDR Function retrive the string beyond to first caracter
Function cdr (ByVal Lista As String, ByVal caracter As String) As String
Lista = Trim(Lista)
If InStr(1, Lista, caracter) > 0 Then
cdr = Trim(Right(Lista, Len(Lista) - InStr(1, Lista, caracter)))
Else
cdr = ""
End If
End Function


Private Sub Command1_Click()
Dim MiTexto As String, Aux As String
MiTexto = "<DT><A HREF=""http://www.faqs.org/rfcs/"" ADD_DATE=""961104168"">RFC Archive</A>"

MsgBox "The origen is" & vbCrLf & MiTexto

MiTexto = Mid(cdr(MiTexto, "A HREF="""), 7)
Aux = car(MiTexto, """")
MsgBox "URL Address:" & vbCrLf & Aux
'You can add in list box List1.AddItem Aux

Aux = cdr(MiTexto, ">")
Aux = car(Aux, "</A>")
MsgBox "URL Description:" & vbCrLf & Aux
'You can add in list box List2.AddItem Aux

End Sub

Good Luck
Renato
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 3

Expert Comment

by:vbbuff
ID: 8073679
mcdev,

best way i don't know.
One way I know( i think):
you store the line in a string variable: say strTag
that is  strTag = <DT><A HREF="http://www.faqs.org/rfcs/" ADD_DATE="961104168">RFC Archive</A>

Modify the code to meet your needs

Private sub cmdAddtoList_Click()
dim ArrTag() as string
dim strHref as string
dim strDesc as string

ArrTag= split(strTag,chr(34))

for i = 0 to ubound(ArrTag)
    strHref = replace(ArrTag(i)," ","")
    if len (strHref) > 5 then
       if UCASE(right(strHref,5)) = "HREF=" Then
          if i < ubound(ArrTag) Then _
             strHref = ArrTag(i + 1)
             Exit for
       end if
    End If
Next

ArrTag= split(strTag,">")

For i = 0 To UBound(ArrTag)
    strDesc = Replace(ArrTag(i), " ", "")
    If Len(strDesc) > 3 Then
       If UCase(Right(strDesc, 3)) = "</A" Then
             strDesc = Left(ArrTag(i), InStr(ArrTag(i), "<") - 1)
             Exit For
       End If
    End If
Next
listbox1.additem strHref
listbox2.additem strDesc
End sub
0
 

Expert Comment

by:brookd
ID: 8074503
I always do things the hard way , but I'd do something similar to the last post , except use left, right, and mid strings ,..
' <DT><A HREF="<http://www.faqs.org/rfcs/>" ADD_DATE="961104168">RFC Archive</A>

listbox1.clear
listbox2.clear
open filename for input as #1
while not eof(1)
input #1,a$
i=instr(a$,"<A HREF=")
i=i+8
a$=mid$(a$,9)' get start of url
i=instr(a$,">")
strHref=left$(a$,i-1)  ' url

a$=mid$(a$,i+1)
i=instr(a$,">")
a$=mid$(a$,i+1)
i=instr(a$,"<")
strDesc=left$(a$,i-1) ' desc

listbox1.additem strHref
listbox2.additem strDesc
wend
close 1

' didn't test this, but that's how I usually do it ,.. inching my way along..

-- David







listbox1.additem strHref
listbox2.additem strDesc
0
 
LVL 3

Expert Comment

by:DocM
ID: 8075022
I like it simple with syntax error checking (Just in case):

Open pathname For Input As 1
 strA = Input(FileLen(pathname), 1)
Close 1
 
c = InStr(1, UCase(strA), "<A HREF")
While c > 0
 strA = Mid(strA, c + 1)
 c = InStr(1, UCase(strA), "<A HREF")
 b = InStr(strA, Chr(34))
If b > 0 Then
 strA = Mid(strA, b + 1)
 b = InStr(strA, Chr(34))
 strURL = Mid(strA, 1, b - 1)
 b = InStr(strA, ">")
 strA = Mid(strA, b + 1)
 b = InStr(strA, "<")
 If b > 0 Then
  strDESCR = Mid(strA, 1, b - 1)
  List1.AddItem strURL
  List2.AddItem strDESCR
 End If
 End If
 c = InStr(1, UCase(strA), "<A HREF")
Wend
0
 
LVL 5

Expert Comment

by:Rhaedes
ID: 8075904
Okay, I'm going to throw my hat into the ring. To my mind, since this is an HTML document it is just begging to be parsed using the DOM, which does all the hard work for you, and provides a rather more elegant and robust solution than instr and split. McDev: I'm not sure what you need from the 'folder name' part of your question. I think everyone has ignored it, so I will too unless you come back to us.
Put two listboxes on a form (List1 and List2) and add a reference to MicroSoft Internet Controls. Paste the following code into the form.

Kindest regards,
Rhaedes


Dim IE As SHDocVw.InternetExplorer

Private Sub Form_Load()
Set IE = New InternetExplorer 'create instance of IExplorer
IE.Navigate2 ("c:\WHEREVER\myFile.htm") 'load file

Do While IE.readyState <> READYSTATE_COMPLETE 'wait until fully loaded
 DoEvents
Loop

With IE.document.All.tags("A") 'get anchor collection
 For n = 0 To .length - 1
  List2.AddItem .Item(n).getAttribute("href") 'extract href
  List1.AddItem .Item(n).innerText 'extract description
 Next n
End With

Set IE = Nothing 'do away with IExplorer
End Sub
0
 
LVL 3

Expert Comment

by:vbbuff
ID: 8076861
dear mcdev,

All of them here have overlooked some points.

1)  If there is one or additional spaces between "A" & "HREF" then their program may not work as intended:

their code will work if:
all your tags have one space between "A" & "HREF", for eg:

<A HREF="http://www.faqs.org/rfcs/" ADD_DATE="961104168">RFC Archive</A>

but if there are more than 1 space then there will be a problem, for eg:
<A   HREF="http://www.faqs.org/rfcs/" ADD_DATE="961104168">RFC Archive</A>

2) Their conditions are case sensitive, that is if you type any one of the tags in lower case, for eg :
<a HREF="http://www.faqs.org/rfcs/" ADD_DATE="961104168">RFC Archive</a>

Or

<a hReF="http://www.faqs.org/rfcs/" ADD_DATE="961104168">RFC Archive</a>
Then their code will not find "URL" at all.

*******************************************************


0
 
LVL 3

Expert Comment

by:vbbuff
ID: 8076876
There is one problem in the code I gave:

If there is one or more white space between "H" & "REF"
or between any characters in "HREF" like "H REF" or "HR ef", then also the output result will be the url.In order to overcome taht you will have to do an additional cheking.My modified code is:
that is  strTag = <DT><A HREF="http://www.faqs.org/rfcs/" ADD_DATE="961104168">RFC Archive</A>

Modify the code to meet your needs

Private sub cmdAddtoList_Click()
dim ArrTag() as string
dim strHref as string
dim strDesc as string

ArrTag= split(strTag,chr(34))

for i = 0 to ubound(ArrTag)
   strHref = replace(ArrTag(i)," ","")
   if len (strHref) > 5 then
      if UCASE(right(strHref,5)) = "HREF=" Then
         if instr(ucase(ArrTag(i),"HREF") > 0 Then  
            if i < ubound(ArrTag) Then
               strHref = ArrTag(i + 1)
               Exit for
            end if
         end if
   End If
Next

ArrTag= split(strTag,">")

For i = 0 To UBound(ArrTag)
   strDesc = Replace(ArrTag(i), " ", "")
   If Len(strDesc) > 3 Then
      If UCase(Right(strDesc, 3)) = "</A" Then
            strDesc = Left(ArrTag(i), InStr(ArrTag(i), "<") - 1)
            Exit For
      End If
   End If
Next
listbox1.additem strHref
listbox2.additem strDesc
End sub
0
 
LVL 5

Expert Comment

by:Rhaedes
ID: 8079138
Vbbuff: You are NOT correct when you say 'All of them here have overlooked some points'! The solution using the DOM by definition works with all good HTML documents, whether or not they contain tags and elements in uppercase, lower case, with extra whitespace, etc, and an endless number of possibilities that your code does not contemplate (tabs, newline characters, nobreak spaces etc. etc.). Also note (no disrespect) that your code contains syntax errors (you appear not to have closed all brackets properly, for example).
Mcdev: Use the code which works best for you or with which you are most comfortable: since your strings appear to be simple, a solution with 'Instr' or similar will work just fine. But in all honesty, the Document Object Model exists precisely so that you can parse HTML simply and robustly with a few lines of code.

Kindest regards,
Rhaedes
0
 
LVL 3

Expert Comment

by:vbbuff
ID: 8081532
dear rhaedes,
I was not refering to you, I correct myself. I was also just pointing (with no disrepect too) out to some of the points that were overlooked , but are important. After all this site is all about providing and gaining knowledge ain't it ?
0
 
LVL 5

Expert Comment

by:Rhaedes
ID: 8081839
Absolutely. And of course you are 100% correct in pointing out the failings of the other methods.
Respect and regards,
Rhaedes
0
 

Expert Comment

by:CleanupPing
ID: 8900636
mcdev:
This old question needs to be finalized -- accept an answer, split points, or get a refund.  For information on your options, please click here-> http:/help/closing.jsp#1 
Experts: Post your closing recommendations!  Who deserves points here?
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 9073962
mcdev, an EE Moderator will handle this for you.
Moderator, my recommended disposition is:

    Refund points and save as a 0-pt PAQ.

DanRollins -- EE database cleanup volunteer
0
 

Accepted Solution

by:
YensidMod earned 0 total points
ID: 9165895
Question is PAQ'd and no points refunded.

YensidMod
Community Support Moderator @Experts Exchange
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…
Suggested Courses
Course of the Month11 days, 1 hour left to enroll

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question