?
Solved

Keyword Density

Posted on 2006-04-26
4
Medium Priority
?
435 Views
Last Modified: 2010-04-05
Hi,

I'm looking for a delphi component or code to calculate the keyword density of a document or web page.

Thanks
0
Comment
Question by:jamesr123456
2 Comments
 
LVL 10

Accepted Solution

by:
wildzero earned 500 total points
ID: 16548000
What you will do (for a webpage) is remove all the html tags, then any javascript and css.
Once they are out the way you can remove any common words (a, if, the, what etc) along with puncuation (,, . ! @ etc) finally split the document by spaces using this function

procedure Split(const Delimiter: Char; Input: string; const Strings: TStrings);
begin
  Assert(Assigned(Strings));
  Strings.Clear;
  Strings.Delimiter := Delimiter;
  Strings.DelimitedText := Input;
end;

So,
var
sl : Tstringlist;
i : integer;
begin
sl := Tstringlist.create;
Split(' ', Document, sl);
That splits your document up by the ' ' (spaces) and puts them in the sl string list.

Loop the sl and remove any blank lines.
For i := sl.count-1 downto 0 do
  If trim(sl[i]) = '' then
    sl[i].delete;


Then just loop it and group the items.

Hope that helps/gets you started

0
 
LVL 26

Assisted Solution

by:Eddie Shipman
Eddie Shipman earned 500 total points
ID: 16554843
Here is a Split function that will ignore duplicates so it will essentially count the number of distinct words in a
string, punctuation not included. This requires PosEx which is in StrUtils in D6 and above.

procedure Split(S, Delimiter: String; var Strings: TStringList; IgnoreDupes: Boolean);
var
  P, OldP: integer;
  Token: string;
  t: String;
begin
  // Prevent any errors due to bogus parameters
  if (Strings = nil) or (Length(S) = 0) or (Length(Delimiter) = 0) then
    exit;
  P := Pos(Delimiter, S);
  OldP := 1;
  while P > 0 do
  begin
    Token := Copy(S, OldP, P-OldP);
    if IgnoreDupes then
    begin
      if (Strings.IndexOf(Token) = -1)  then
        Strings.Add(Trim(Token))
    end
    else
      Strings.Add(Trim(Token));
    // Don't call delete, instead save off P and search from
    // P + 1 to the end of S
    OldP := P + 1;
    P := PosEx(Delimiter, S, OldP);
  end;
  if P = 0 then
    Strings.Add(Trim(Copy(S, OldP, Length(S))));
end;
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an anti-spam), the admin…
Whether it be Exchange Server Crash Issues, Dirty Shutdown Errors or Failed to mount error, Stellar Phoenix Mailbox Exchange Recovery has always got your back. With the help of its easy to understand user interface and 3 simple steps recovery proced…
Suggested Courses

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question