# Keyword Density

Posted on 2006-04-26
I'm looking for a delphi component or code to calculate the keyword density of a document or web page.

Question by:jamesr123456

What you will do (for a webpage) is remove all the html tags, then any javascript and css.
Once they are out the way you can remove any common words (a, if, the, what etc) along with puncuation (,, . ! @ etc) finally split the document by spaces using this function

procedure Split(const Delimiter: Char; Input: string; const Strings: TStrings);
begin
Assert(Assigned(Strings));
Strings.Clear;
Strings.Delimiter := Delimiter;
Strings.DelimitedText := Input;
end;

So,
var
sl : Tstringlist;
i : integer;
begin
sl := Tstringlist.create;
Split(' ', Document, sl);
That splits your document up by the ' ' (spaces) and puts them in the sl string list.

Loop the sl and remove any blank lines.
For i := sl.count-1 downto 0 do
If trim(sl[i]) = '' then
sl[i].delete;

Then just loop it and group the items.

Hope that helps/gets you started

Here is a Split function that will ignore duplicates so it will essentially count the number of distinct words in a
string, punctuation not included. This requires PosEx which is in StrUtils in D6 and above.

procedure Split(S, Delimiter: String; var Strings: TStringList; IgnoreDupes: Boolean);
var
P, OldP: integer;
Token: string;
t: String;
begin
// Prevent any errors due to bogus parameters
if (Strings = nil) or (Length(S) = 0) or (Length(Delimiter) = 0) then
exit;
P := Pos(Delimiter, S);
OldP := 1;
while P > 0 do
begin
Token := Copy(S, OldP, P-OldP);
if IgnoreDupes then
begin
if (Strings.IndexOf(Token) = -1)  then
end
else
// Don't call delete, instead save off P and search from
// P + 1 to the end of S
OldP := P + 1;
P := PosEx(Delimiter, S, OldP);
end;
if P = 0 then
end;
