Solved

parse html

Posted on 2008-11-01
15
255 Views
Last Modified: 2010-04-21
I need a function to parse a html string.  See attached code snipplet.
I need the keywords parsed out.

function ParseOutKeyWords(HtmlStr: String); String;
begin
 result:=
end;

the result should be a Comma separated delimited string

"A.B.E.L.", "MnGCA","Minnesota"


Note: The attached html code snipplet is an example. There may be 0 to N number of keywords in the html String

thanks
<a href="gallery.php?gallery_filter=keyword&keyword=A.B.E.L.">A.B.E.L.</a><a href="gallery.php?gallery_filter=keyword&keyword=MnGCA">MnGCA</a><a href="gallery.php?gallery_filter=keyword&keyword=Minnesota">Minnesota</a>

Open in new window

0
Comment
Question by:geocoins-software
  • 6
  • 6
  • 3
15 Comments
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
There are two instances for each of these three keywords.  which one do you need?
0
 

Author Comment

by:geocoins-software
Comment Utility
<a name="test"> test</a>

I dont think it matters, but for sakes of an answer - lets use the actual

value that is just before the Anchor termination tag

test </a>


thanks
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
it matters quite a bit
* the code or regex pattern will be different
* the rules for the href text are different from the displayed text.  href should not include any spaces, whereas text may include space characters.
* these two text strings can be different
* these two text strings serve different purposes.  one is a key for php code to retrieve data and the other is a description for a human.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I forgot to ask... is this parsing needed to take place within an entire html document or would be passed strings with links of interest?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
public and third party solutions:
http://htmlp.sourceforge.net/
http://www.torry.net/vcl/internet/html/jshtmpsr.zip
http://www.torry.net/vcl/internet/html/nzhtmlparser.zip
http://www.yunqa.de/delphi/doku.php/products/htmlparser/index?DokuWiki=i0p01nc77vr7pkmd8jfbfufdr0

http://positivesale.com/freePascal/HtmlPars/FastHtmlParse1.0.zip
or http://z505.com/download/pascal/html/fast-html-parser.zip

http://wikitaxi.org/delphi/doku.php/products/htmlparser/index
http://www.torry.net/authorsmore.php?id=4072
https://secure.element5.com/shareit/programs.html?productid=147143

___________________
Roll your own solutions:
http://www.delphipages.com/tips/copyview.cfm?ID=123=

http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_20892572.html
1. look for next '</a>'
2. look for prior '>'
3. the text is between these two positions
* repeat steps 1-3 until you no longer find any '</a>'

You can use the parsing capabilities of the TStringList class.  Set the delimiter as '</a>' then the items in the resulting TStringList will end with text you seek.  You only need to find the last '>' in these strings.
0
 

Author Comment

by:geocoins-software
Comment Utility
"it matters quite a bit"

Not to me it doesn't - my goal is to get the values - you asked which ones, I told you it dind't matter - as long as you delivered the values

0
 

Author Comment

by:geocoins-software
Comment Utility
"I forgot to ask... is this parsing needed to take place within an entire html document or would be passed
strings with links of interest?"

Just a string - as my question described
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 13

Accepted Solution

by:
ThievingSix earned 500 total points
Comment Utility
Uhh, why not something that doesn't use a regex?



function ExtractTextFromAnchor(Input: String): String;

var

  Position : Integer;

  Buffer : PChar;

  Buffer2 : PChar;

  Output : PChar;

  Size : DWORD;

begin

  Result := '';

  Buffer := PChar(Input);

  Buffer := StrPos(Buffer,'>');

  While Buffer <> nil Do

    begin

    Buffer2 := StrPos(Buffer,'</a>');

    If Buffer2 = nil Then Exit;

    Inc(Buffer);

    Dec(Buffer2);

    Size := DWORD(Buffer2 - Buffer);

    If Size > 0 Then

      begin

      Inc(Size);

      GetMem(Output,Size);

      Try

        CopyMemory(Output,Buffer,Size);

        Output[Size] := #0;

        Result := Result + '"' + Output + '", ';

      Finally

        FreeMem(Output,Size);

      end;

    end;

    Inc(Buffer2,5);

    Buffer := StrPos(Buffer2,'>');

  end;

  Result := Copy(Result,1,Length(Result) - 2);

end;

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@geocoins-software

If you use TStringList to parse the input string, the last item in the parsed result is NOT guaranteed to end with the text you seek, depending on whether the input string ends with '</a>' or some other text.

================
@ThievingSix

How are you going to pull multiple text values when your function only returns a string data type?
0
 
LVL 13

Expert Comment

by:ThievingSix
Comment Utility
"the result should be a Comma separated delimited string"
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
sorry.  you're right.  missed that spec.
0
 

Author Closing Comment

by:geocoins-software
Comment Utility
VERY NICE!  Thank You!
0
 

Author Comment

by:geocoins-software
Comment Utility
ThievingSix:

You code is throwing Invalid Pointer operation

It looks like when I step through the code, it blows up on this line

        FreeMem(Output,Size);


thanks


Note: It doesn't seem to happen all the time though - im still trying to narrow down when it happens

0
 
LVL 13

Expert Comment

by:ThievingSix
Comment Utility
I couldn't reproduce the error but this should fix it.
function ExtractTextFromAnchor(Input: String): String;

var

  Position : Integer;

  Buffer : PChar;

  Buffer2 : PChar;

  Output : PChar;

  Size : DWORD;

begin

  Result := '';

  Buffer := PChar(Input);

  Buffer := StrPos(Buffer,'>');

  GetMem(Output,255);

  If Output = nil Then Exit;

  Try

    While Buffer <> nil Do

      begin

      Buffer2 := StrPos(Buffer,'</a>');

      If Buffer2 = nil Then Exit;

      Inc(Buffer);

      Dec(Buffer2);

      Size := DWORD(Buffer2 - Buffer);

      If Size > 0 Then

        begin

        Inc(Size);

        CopyMemory(Output,Buffer,Size);

        Output[Size] := #0;

        Result := Result + '"' + Output + '", ';

      end;

      Inc(Buffer2,5);

      Buffer := StrPos(Buffer2,'>');

    end;

  Finally

    FreeMem(Output);

  end;

  Result := Copy(Result,1,Length(Result) - 2);

end;

Open in new window

0
 

Author Comment

by:geocoins-software
Comment Utility
I think that worked - thanks!
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
Hello everybody This Article will show you how to validate number with TEdit control, What's the TEdit control? TEdit is a standard Windows edit control on a form, it allows to user to write, read and copy/paste single line of text. Usua…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now