Solved

Parsing images from HTML

Posted on 2004-04-29
23
293 Views
Last Modified: 2010-04-05
Hi,

What I ant is to parsing Images and the Links for these images from an HTML File, most Components I foun parses oly Images or Links, but I need the Image Link!

Could anybody help me?

k4hvd77
0
Comment
Question by:k4hvd77
23 Comments
 
LVL 7

Expert Comment

by:sftweng
Comment Utility
0
 
LVL 17

Expert Comment

by:mokule
Comment Utility
I suggest using regular expression.

For example You can download
http://regexpstudio.com/TRegExpr/TRegExpr.html

It's quite powerfull and easy to use.
0
 
LVL 7

Expert Comment

by:sftweng
Comment Utility
Re: "LinkGrabber": The whole project is available in the following zip file: http://members.rogers.com/alan.bu/LinkGrabber.zip
0
 
LVL 4

Author Comment

by:k4hvd77
Comment Utility
sftweng,

Cannot Understand how LinkGrabber could help me do that!

what I need is follwoing:

I have a HTML File:


------------------------------------------------------------------------------------------------------------------------------------------

<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>Neue Seite 1</title>
</head>

<body>

<p><a href="http://www.google.de">
<img border="0" src="http://www.google.de/intl/de_de/images/logo.gif" width="800" height="600"></a></p>

</body>

</html>


------------------------------------------------------------------------------------------------------------------------------------------

by Clicking on the Image I will redirected to google, now I want to extract the Image ("http://www.google.de/intl/de_de/images/logo.gif) and the Link for this image (http://www.google.de), and get an output like  this:

[Link01]
image= http://www.google.de/intl/de_de/images/logo.gif
Link= http://www.google.de



0
 
LVL 7

Expert Comment

by:sftweng
Comment Utility
Your could modify the LinkGrabber "TestForLink" procedure to pull out all "<img> directives
0
 
LVL 4

Author Comment

by:k4hvd77
Comment Utility
Could you send me some examples?
0
 
LVL 7

Expert Comment

by:sftweng
Comment Utility
I have to go to a meeting for a couple of hours but I'll try to get back to this later today. Sorry.
0
 
LVL 4

Author Comment

by:k4hvd77
Comment Utility
no problem ;)
0
 
LVL 6

Expert Comment

by:Amir Azhdari
Comment Utility
k4hvd77
place a webbrowser, memo and 2 buttons  on the form and try this code :
by the way , first navigate the page(ex. www.yahoo.com or html file or ... )  to the webbrowser.

Regards
Azhdari


unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,

  Dialogs,activex,comctrls,olectrls, mshtml, StdCtrls, SHDocVw,clipbrd;

type
  TForm1 = class(TForm)
    WebBrowser1: TWebBrowser;
    Button1: TButton;
    Memo1: TMemo;
    Button2: TButton;
    procedure Button2Click(Sender: TObject);
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}



procedure TForm1.Button2Click(Sender: TObject);
var li:word;
var s1,s2:string;
var i,j:integer;
begin
memo1.Lines.Clear;
 for li:=0 to webbrowser1.OleObject.document.images.length-1 do
   begin
    s1:='';
    with memo1.lines do
      begin

          add('[LINK'+inttostr(li)+']');
          add('image= '+webbrowser1.OleObject.document.images.item(li).src);
          s1:=webbrowser1.OleObject.document.images.item(li).src;
          if ((strpos(pchar(s1),'http')<>nil) or (strpos(pchar(s1),'ftp')<>nil))  then
           begin
              s2:='';
              j:=0;
              for i:=1 to length(s1) do
                begin
                 if (j=3) then
                     break;
                 s2:=s2+s1[i];
                 if s1[i]='/' then
                   inc(j);
                end;
             add('Link= '+s2);
           end
           else
             add('Link= Load From Drive');


      end;


   end;

end;

procedure TForm1.Button1Click(Sender: TObject);
begin
webbrowser1.Navigate('www.yahoo.com');
end;

end.


0
 
LVL 4

Author Comment

by:k4hvd77
Comment Utility
AmirAzhdari,

that's not what I'm looking for!


RE:

what I need is follwoing:

I have a HTML File:


------------------------------------------------------------------------------------------------------------------------------------------

<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>Neue Seite 1</title>
</head>

<body>

<p><a href="http://www.google.de">
<img border="0" src="http://www.google.de/intl/de_de/images/logo.gif" width="800" height="600"></a></p>

</body>

</html>


------------------------------------------------------------------------------------------------------------------------------------------

by Clicking on the Image I will redirected to google, now I want to extract the Image ("http://www.google.de/intl/de_de/images/logo.gif) and the Link for this image (http://www.google.de), and get an output like  this:

[Link01]
image= http://www.google.de/intl/de_de/images/logo.gif
Link= http://www.google.de
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 22

Expert Comment

by:mnasman
Comment Utility
0
 
LVL 7

Expert Comment

by:sftweng
Comment Utility
Change to "LinkGrabber":
{==============================================================================}
procedure TLGForm.Parse1Click(Sender: TObject);
VAR
  il, ic, lc, lb, rb, linkCount : INTEGER;
  s, t, d : String;
  collecting : BOOLEAN;
  parentNode : TTreeNode;
  currNode : TTreeNode;
  currText, hrefText, srcText : String;
  deltaTime : INTEGER;
  lineCount, lineCountTick : INTEGER;
{------------------------------------------------------------------------------}
procedure TestForLink(ts : String; VAR cnt : INTEGER);
var hrefPos : INTEGER;
begin
  currText := UpperCase(ts);
  IF (Pos('<A',currText) = 1) THEN
  BEGIN
    hrefPos := Pos('HREF="',currText);
    IF (hrefPos > 0) THEN
    BEGIN
      Delete(ts,1,hrefPos+5);
      hrefPos := Pos('"',ts);
      currText := UpperCase(ts);
      IF (hrefPos > 0)
      {AND (Pos('.JPG',currText) = Length(currText)-5)} THEN
      BEGIN
        hrefText := Copy(ts,1,hrefPos-1);
        ListBoxLinks.Items.Add('Link='+hrefText);
        INC(cnt);
      END {IF};
    END {IF};
  END {IF};
end {TestForLink};
{------------------------------------------------------------------------------}
procedure TestForImage(ts : String; VAR cnt : INTEGER);
var srcPos : INTEGER;
begin
  currText := UpperCase(ts);
  IF (Pos('<IMG',currText) = 1) THEN
  BEGIN
    srcPos := Pos('SRC="',currText);
    IF (srcPos > 0) THEN
    BEGIN
      Delete(ts,1,srcPos+4);
      srcPos := Pos('"',ts);
      currText := UpperCase(ts);
      IF (srcPos > 0)
      {AND (Pos('.JPG',currText) = Length(currText)-5)} THEN
      BEGIN
        srcText := Copy(ts,1,srcPos-1);
        ListBoxLinks.Items.Add('Image='+srcText);
        INC(cnt);
      END {IF};
    END {IF};
  END {IF};
end {TestForImage};
{------------------------------------------------------------------------------}
begin {Parse1Click}
  startTime := Now;
  rootNode.Text := EditURL.Text;
  parentNode := rootNode;
  ListBoxLinks.Items.Clear;
  linkCount := 0;
  WITH MemoRawHTML DO BEGIN
    lc := Lines.Count;
    lineCount := lc;
    lineCountTick := LineCount DIV 20;
    collecting := FALSE;
    t := ''; d := '';
    TreeViewParsed.Visible := FALSE;
    ListBoxLinks.Visible := FALSE;
    FOR il := 0 TO lc-1
    DO BEGIN
      s := Lines[il];
//      StatusBar.SimpleText := s;
      FOR ic := 1 TO Length(s) DO
      BEGIN
        IF s[ic] = '<'
        THEN BEGIN
          collecting := TRUE;
          WITH TreeViewParsed.Items DO
          IF d <> '' THEN BEGIN
            AddChild(parentNode,d);
            TestForLink(d,linkCount);
            TestForImage(d,linkCount);
          END {IF};
          d := '';
        END {IF};
        IF NOT collecting THEN d := d+s[ic];
        IF collecting  THEN t := t+s[ic];
        IF s[ic] = '>'
        THEN BEGIN
          collecting := FALSE;
          WITH TreeViewParsed.Items DO
          BEGIN
            IF Pos('</',t) = 1
            THEN AddChild(parentNode,t)
            ELSE parentNode := Add(rootNode,t);
            TestForLink(t,linkCount);
            TestForImage(t,linkCount);
          END {WITH };
          t := '';
        END {IF};
        IF (linkCount >= StrToInt(EditLinkLimit.Text)) THEN Break;
      END {FOR};
      IF ((il MOD lineCountTick) = 0) THEN
      BEGIN
        ProgressBar1.Position := (il * 100 DIV lineCount);
      END {IF};
    END {FOR};
    ListBoxLinks.Visible := TRUE;
    TreeViewParsed.Visible := TRUE;
  END {WITH };
  endTime := Now;
  deltaTime := SecondsBetween(startTime,endTime);
  StatusBar.SimpleText := Format('Done parse in %d seconds',[deltaTime]);
  MemoDiag.Lines.Add(StatusBar.SimpleText);
//  TreeView1.FullExpand;
end {Parse1Click};
0
 
LVL 4

Author Comment

by:k4hvd77
Comment Utility
sftweng,

I'm using Delphi 7 and cannot Complie the Project!
1. have no FastNet (NMHTTP) Components,
2. I have't the  JVCL
0
 
LVL 7

Expert Comment

by:sftweng
Comment Utility
I'll have to check carefully but I don't think you need them. Just write a program that puts your HTML into a string, passes it into the procedure (Parse1Click) and stores or uses the results.

Concentrate on the "TestFor" procedures and just pass them HTML strings from whatever source you choose.
0
 
LVL 7

Expert Comment

by:sftweng
Comment Utility
I don't think "Parse1Click" needs either NMHTTP or JVCL.
0
 
LVL 4

Author Comment

by:k4hvd77
Comment Utility
sorry cannot understant how to get it work!

Could you send me the project to  admin@titaniumserver.de


thanks
k4hvd77
0
 
LVL 7

Accepted Solution

by:
sftweng earned 250 total points
Comment Utility
k4hvd77, Ex-Ex rules don't allow me to use email. You should have been able to download the original project from the URL I posted earlier and then to cut-and-paste the replacement code frommy earlier posting. I'd like to help you on this, but I'm prevented by Ex-Ex rules from using email correspondence.

But if you don't have the NMHTTP and JVCL components, anyway, you should just take the source code for "Parse1Click", written above, and remove all of the component references, e.g., an edit box, treenode and listbox, and replace them with string equivalents.

The core of the code is the "TestFor" procedures - just feed them HTML lines, acquired from whatever source you like, and feed the results (added via Listbox.Add) back to the client (caller) software.
0
 
LVL 7

Expert Comment

by:sftweng
Comment Utility
k4hvd77, when I said "you should have been able to", I meant no criticism and I recognize the fact that we are dealing with different versions of Delphi (6 & 7) and libraries. My intention was to focus on the key software, the "TestFor" procedure, which should be more portable than the rest of the application.

I do recommend, however, that yu take a good look at using (at least), the JCL and JVCL components, available from http://www.jedi-delphi.org

Good luck, and do, please, continue to ask questions - I'll be pleased to help.
0
 
LVL 7

Expert Comment

by:sftweng
Comment Utility
Sorry, that should be http://delphi-jedi.org.
0
 
LVL 7

Expert Comment

by:sftweng
Comment Utility
I believe my solution met the requirements
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
Introduction I have seen many questions in this Delphi topic area where queries in threads are needed or suggested. I know bumped into a similar need. This article will address some of the concepts when dealing with a multithreaded delphi database…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now