Solved

PARER QUESTION! QUICK EASY POINTS!

Posted on 1998-03-26
5
135 Views
Last Modified: 2010-04-06
I need an example of a parser to parse information pulled off the web.  Can someone please give a good example of that? Such as image, links, pictures, and mailto etc...  I need asap if possible and will possibly increase points for a good example.
0
Comment
Question by:aj85
  • 3
  • 2
5 Comments
 
LVL 4

Expert Comment

by:d003303
ID: 1360879
You get an extendable, good working parser code with sample app for 200 points.
0
 

Author Comment

by:aj85
ID: 1360880
Thanks, I have increased points to 200.

Tony
0
 
LVL 4

Accepted Solution

by:
d003303 earned 200 total points
ID: 1360881
OK, here we go.
This code is a generic parser that searches for a HTML tag. A HTML tag is mainly defined that the tag keyword is embedded in '<' and '>' characters. This is how the function GetBlockFromBuffer works.
The GetTagOptionValues procedure searches a PChar buffer for a tag with the specified tag option and adds all found values to the string list.
The demo app uses a memo where you can paste some HTML source in. In the two edit fields, put in e.g.
Tag To Find : a
Extract From Tag : href
(be sure to get NO spaces in). Press Go! and the list box will be filled with all hrefs in the HTML source. You could also use
Tag To Find : img
Extract From Tag : src
or anything you like. Now here's the source.

////////////////////////
// to be _tagfind.pas

unit _tagfind;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls;

type
  TForm1 = class(TForm)
    Memo1: TMemo;
    ListBox1: TListBox;
    Button1: TButton;
    Edit1: TEdit;
    Label1: TLabel;
    Label2: TLabel;
    Edit2: TEdit;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.DFM}

function GetBlockFromBuffer(SearchBuffer, StartTag, KeyTag, EndTag : PChar; var Buffer : string): PChar;
var BufPtr,
    TagStartPtr,
    TagKeyPtr,
    TagEndPtr    : PChar;
    SafeChar     : Char;
begin
  Buffer := '';
  SafeChar := #0;
  TagEndPtr := nil;
  Result := nil;
  BufPtr := SearchBuffer;
  repeat
  // search for start tag
  TagStartPtr := StrPos(BufPtr, StartTag);
  if (TagStartPtr <> nil) and (TagStartPtr[1] <> #0) then
   begin
     // search for end of tag
     TagEndPtr := StrPos(@TagStartPtr[1], EndTag);
     if TagEndPtr <> nil then
      begin
        // tag did close before EOF
        if TagEndPtr[1] <> #0 then
         begin
           // terminate string temporarily
           SafeChar := TagEndPtr[1];
           TagEndPtr[1] := #0;
           BufPtr := @TagEndPtr[1];
         end
        else BufPtr := nil;
        // search for key tag
        TagKeyPtr := StrPos(TagStartPtr, KeyTag);
        if TagKeyPtr <> nil then
         begin
           Result := BufPtr;
           Buffer := StrPas(TagStartPtr);
         end;
        // undo temporarily terminated string
        if BufPtr <> nil
         then TagEndPtr[1] := SafeChar;
      end;
   end;
  until (Buffer <> '') or (TagStartPtr = nil) or (TagEndPtr = nil);
end;

procedure GetTagOptionValues(Buffer : PChar; Tag, Option : string; Strings : TStrings);
var Index       : Integer;
    BufPtr      : PChar;
    CR,
    LF,
    FoundTag,
    FoundOption : string;
begin
  CR := Chr($0A);
  LF := Chr($0D);
  // lowercase
  StrLower(Buffer);
  // kick out CR/LF
  for Index := 0 to StrLen(Buffer) do
   if (Buffer[Index] = CR) or (Buffer[Index] = LF)
    then Buffer[Index] := ' ';
  // truncate multiple spaces, this takes most of the processing time
  // this could also be done before the function is called if you search
  // for several tags, like A HREF, IMG SRC, etc.
  Index := 0;
  while Index < StrLen(Buffer) do
   if (Buffer[Index] = #32) and (Buffer[Index + 1] = #32)
    then StrCopy(@Buffer[Index], @Buffer[Index + 1])
    else Inc(Index);

  BufPtr := Buffer;
  repeat
    // first step : get matching tag
    // added a trailing space because we search for options in a tag
    BufPtr := GetBlockFromBuffer(BufPtr, '<', PChar(Tag + ' '), '>', FoundTag);
    if FoundTag <> '' then
     // second step : validate tag
     if (FoundTag[Pos(Tag, FoundTag) -1] = ' ')
      or (FoundTag[Pos(Tag, FoundTag) -1] = '<') then
      begin
       // tag found, options have to be space limited and end with space or >
       GetBlockFromBuffer(PChar(FoundTag), ' ', PChar(Option), ' ', FoundOption);
       if FoundOption = ''
        then GetBlockFromBuffer(PChar(FoundTag), ' ', PChar(Option), '>', FoundOption);
       if FoundOption <> '' then
        begin
          // trim start and end tag fields
          Index := Pos(Option, FoundOption);
          FoundOption := Copy(FoundOption, Index - 1, Length(FoundOption) - Index + 1);
          Strings.Add(FoundOption);
        end;
     end;
  until BufPtr = nil;
end;

procedure TForm1.Button1Click(Sender: TObject);
var TextBuf   : PChar;
    MemHandle : HGlobal;
begin
  Enabled := false;
  Screen.Cursor := crHourglass;
  Memo1.SelectAll;
  MemHandle := GlobalAlloc(GMEM_MOVEABLE, Memo1.SelLength + 1);
  TextBuf := GlobalLock(MemHandle);
  try
    Memo1.GetSelTextBuf(TextBuf, Memo1.SelLength);
    // terminate string
    TextBuf[Memo1.SelLength] := #0;
    GetTagOptionValues(TextBuf, Edit1.Text, Edit2.Text, ListBox1.Items);
  finally
    GlobalUnLock(MemHandle);
    GlobalFree(MemHandle);
    Screen.Cursor := crDefault;
    Enabled := true;
  end;
end;

end.

////////////////////////
// to be _tagfind.dfm

object Form1: TForm1
  Left = 200
  Top = 108
  Width = 820
  Height = 478
  Caption = 'Form1'
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  PixelsPerInch = 96
  TextHeight = 13
  object Label1: TLabel
    Left = 296
    Top = 360
    Width = 54
    Height = 13
    Caption = 'Tag to find:'
  end
  object Label2: TLabel
    Left = 296
    Top = 384
    Width = 77
    Height = 13
    Caption = 'Extract from tag:'
  end
  object Memo1: TMemo
    Left = 8
    Top = 12
    Width = 277
    Height = 425
    TabOrder = 0
  end
  object ListBox1: TListBox
    Left = 296
    Top = 12
    Width = 509
    Height = 329
    ItemHeight = 13
    TabOrder = 1
  end
  object Button1: TButton
    Left = 296
    Top = 412
    Width = 75
    Height = 25
    Caption = 'Go!'
    TabOrder = 2
    OnClick = Button1Click
  end
  object Edit1: TEdit
    Left = 380
    Top = 356
    Width = 121
    Height = 21
    TabOrder = 3
  end
  object Edit2: TEdit
    Left = 380
    Top = 380
    Width = 121
    Height = 21
    TabOrder = 4
  end
end

////////////////////////
// to be tagfind.dpr

program tagfind;

uses
  Forms,
  _tagfind in '_tagfind.pas' {Form1};

{$R *.RES}

begin
  Application.Initialize;
  Application.CreateForm(TForm1, Form1);
  Application.Run;
end.

////////////////////////

Have fun,
Slash/d003303
0
 

Author Comment

by:aj85
ID: 1360882
Thanks very good answer...
0
 
LVL 4

Expert Comment

by:d003303
ID: 1360883
Thx, have fun with it !
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now