Solved

parse html

Posted on 2008-11-01
15
264 Views
Last Modified: 2010-04-21
I need a function to parse a html string.  See attached code snipplet.
I need the keywords parsed out.

function ParseOutKeyWords(HtmlStr: String); String;
begin
 result:=
end;

the result should be a Comma separated delimited string

"A.B.E.L.", "MnGCA","Minnesota"


Note: The attached html code snipplet is an example. There may be 0 to N number of keywords in the html String

thanks
<a href="gallery.php?gallery_filter=keyword&keyword=A.B.E.L.">A.B.E.L.</a><a href="gallery.php?gallery_filter=keyword&keyword=MnGCA">MnGCA</a><a href="gallery.php?gallery_filter=keyword&keyword=Minnesota">Minnesota</a>

Open in new window

0
Comment
Question by:geocoins-software
  • 6
  • 6
  • 3
15 Comments
 
LVL 45

Expert Comment

by:aikimark
ID: 22856995
There are two instances for each of these three keywords.  which one do you need?
0
 

Author Comment

by:geocoins-software
ID: 22857779
<a name="test"> test</a>

I dont think it matters, but for sakes of an answer - lets use the actual

value that is just before the Anchor termination tag

test </a>


thanks
0
 
LVL 45

Expert Comment

by:aikimark
ID: 22857791
it matters quite a bit
* the code or regex pattern will be different
* the rules for the href text are different from the displayed text.  href should not include any spaces, whereas text may include space characters.
* these two text strings can be different
* these two text strings serve different purposes.  one is a key for php code to retrieve data and the other is a description for a human.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 45

Expert Comment

by:aikimark
ID: 22857824
I forgot to ask... is this parsing needed to take place within an entire html document or would be passed strings with links of interest?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 22857937
public and third party solutions:
http://htmlp.sourceforge.net/
http://www.torry.net/vcl/internet/html/jshtmpsr.zip
http://www.torry.net/vcl/internet/html/nzhtmlparser.zip
http://www.yunqa.de/delphi/doku.php/products/htmlparser/index?DokuWiki=i0p01nc77vr7pkmd8jfbfufdr0

http://positivesale.com/freePascal/HtmlPars/FastHtmlParse1.0.zip
or http://z505.com/download/pascal/html/fast-html-parser.zip

http://wikitaxi.org/delphi/doku.php/products/htmlparser/index
http://www.torry.net/authorsmore.php?id=4072
https://secure.element5.com/shareit/programs.html?productid=147143

___________________
Roll your own solutions:
http://www.delphipages.com/tips/copyview.cfm?ID=123=

http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_20892572.html
1. look for next '</a>'
2. look for prior '>'
3. the text is between these two positions
* repeat steps 1-3 until you no longer find any '</a>'

You can use the parsing capabilities of the TStringList class.  Set the delimiter as '</a>' then the items in the resulting TStringList will end with text you seek.  You only need to find the last '>' in these strings.
0
 

Author Comment

by:geocoins-software
ID: 22857986
"it matters quite a bit"

Not to me it doesn't - my goal is to get the values - you asked which ones, I told you it dind't matter - as long as you delivered the values

0
 

Author Comment

by:geocoins-software
ID: 22857990
"I forgot to ask... is this parsing needed to take place within an entire html document or would be passed
strings with links of interest?"

Just a string - as my question described
0
 
LVL 13

Accepted Solution

by:
ThievingSix earned 500 total points
ID: 22863486
Uhh, why not something that doesn't use a regex?



function ExtractTextFromAnchor(Input: String): String;
var
  Position : Integer;
  Buffer : PChar;
  Buffer2 : PChar;
  Output : PChar;
  Size : DWORD;
begin
  Result := '';
  Buffer := PChar(Input);
  Buffer := StrPos(Buffer,'>');
  While Buffer <> nil Do
    begin
    Buffer2 := StrPos(Buffer,'</a>');
    If Buffer2 = nil Then Exit;
    Inc(Buffer);
    Dec(Buffer2);
    Size := DWORD(Buffer2 - Buffer);
    If Size > 0 Then
      begin
      Inc(Size);
      GetMem(Output,Size);
      Try
        CopyMemory(Output,Buffer,Size);
        Output[Size] := #0;
        Result := Result + '"' + Output + '", ';
      Finally
        FreeMem(Output,Size);
      end;
    end;
    Inc(Buffer2,5);
    Buffer := StrPos(Buffer2,'>');
  end;
  Result := Copy(Result,1,Length(Result) - 2);
end;

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 22863585
@geocoins-software

If you use TStringList to parse the input string, the last item in the parsed result is NOT guaranteed to end with the text you seek, depending on whether the input string ends with '</a>' or some other text.

================
@ThievingSix

How are you going to pull multiple text values when your function only returns a string data type?
0
 
LVL 13

Expert Comment

by:ThievingSix
ID: 22863653
"the result should be a Comma separated delimited string"
0
 
LVL 45

Expert Comment

by:aikimark
ID: 22863688
sorry.  you're right.  missed that spec.
0
 

Author Closing Comment

by:geocoins-software
ID: 31512317
VERY NICE!  Thank You!
0
 

Author Comment

by:geocoins-software
ID: 22869263
ThievingSix:

You code is throwing Invalid Pointer operation

It looks like when I step through the code, it blows up on this line

        FreeMem(Output,Size);


thanks


Note: It doesn't seem to happen all the time though - im still trying to narrow down when it happens

0
 
LVL 13

Expert Comment

by:ThievingSix
ID: 22869395
I couldn't reproduce the error but this should fix it.
function ExtractTextFromAnchor(Input: String): String;
var
  Position : Integer;
  Buffer : PChar;
  Buffer2 : PChar;
  Output : PChar;
  Size : DWORD;
begin
  Result := '';
  Buffer := PChar(Input);
  Buffer := StrPos(Buffer,'>');
  GetMem(Output,255);
  If Output = nil Then Exit;
  Try
    While Buffer <> nil Do
      begin
      Buffer2 := StrPos(Buffer,'</a>');
      If Buffer2 = nil Then Exit;
      Inc(Buffer);
      Dec(Buffer2);
      Size := DWORD(Buffer2 - Buffer);
      If Size > 0 Then
        begin
        Inc(Size);
        CopyMemory(Output,Buffer,Size);
        Output[Size] := #0;
        Result := Result + '"' + Output + '", ';
      end;
      Inc(Buffer2,5);
      Buffer := StrPos(Buffer2,'>');
    end;
  Finally
    FreeMem(Output);
  end;
  Result := Copy(Result,1,Length(Result) - 2);
end;

Open in new window

0
 

Author Comment

by:geocoins-software
ID: 22869608
I think that worked - thanks!
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Multiple image collision 13 87
Base1 Encode/Decode 3 85
How to build JSON File in Delphi 6 3 66
How to save the image in the .cds File ClientDataSet? 1 28
The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

685 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question