Solved

# Keyword Density

Posted on 2006-04-26
433 Views
Hi,

I'm looking for a delphi component or code to calculate the keyword density of a document or web page.

Thanks
0
Question by:jamesr123456

LVL 10

Accepted Solution

What you will do (for a webpage) is remove all the html tags, then any javascript and css.
Once they are out the way you can remove any common words (a, if, the, what etc) along with puncuation (,, . ! @ etc) finally split the document by spaces using this function

procedure Split(const Delimiter: Char; Input: string; const Strings: TStrings);
begin
Assert(Assigned(Strings));
Strings.Clear;
Strings.Delimiter := Delimiter;
Strings.DelimitedText := Input;
end;

So,
var
sl : Tstringlist;
i : integer;
begin
sl := Tstringlist.create;
Split(' ', Document, sl);
That splits your document up by the ' ' (spaces) and puts them in the sl string list.

Loop the sl and remove any blank lines.
For i := sl.count-1 downto 0 do
If trim(sl[i]) = '' then
sl[i].delete;

Then just loop it and group the items.

Hope that helps/gets you started

0

LVL 26

Assisted Solution

Here is a Split function that will ignore duplicates so it will essentially count the number of distinct words in a
string, punctuation not included. This requires PosEx which is in StrUtils in D6 and above.

procedure Split(S, Delimiter: String; var Strings: TStringList; IgnoreDupes: Boolean);
var
P, OldP: integer;
Token: string;
t: String;
begin
// Prevent any errors due to bogus parameters
if (Strings = nil) or (Length(S) = 0) or (Length(Delimiter) = 0) then
exit;
P := Pos(Delimiter, S);
OldP := 1;
while P > 0 do
begin
Token := Copy(S, OldP, P-OldP);
if IgnoreDupes then
begin
if (Strings.IndexOf(Token) = -1)  then
end
else
// Don't call delete, instead save off P and search from
// P + 1 to the end of S
OldP := P + 1;
P := PosEx(Delimiter, S, OldP);
end;
if P = 0 then
end;
0

## Featured Post

Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.