Keyword Density

Hi,

I'm looking for a delphi component or code to calculate the keyword density of a document or web page.

Thanks
jamesr123456Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

wildzeroCommented:
What you will do (for a webpage) is remove all the html tags, then any javascript and css.
Once they are out the way you can remove any common words (a, if, the, what etc) along with puncuation (,, . ! @ etc) finally split the document by spaces using this function

procedure Split(const Delimiter: Char; Input: string; const Strings: TStrings);
begin
  Assert(Assigned(Strings));
  Strings.Clear;
  Strings.Delimiter := Delimiter;
  Strings.DelimitedText := Input;
end;

So,
var
sl : Tstringlist;
i : integer;
begin
sl := Tstringlist.create;
Split(' ', Document, sl);
That splits your document up by the ' ' (spaces) and puts them in the sl string list.

Loop the sl and remove any blank lines.
For i := sl.count-1 downto 0 do
  If trim(sl[i]) = '' then
    sl[i].delete;


Then just loop it and group the items.

Hope that helps/gets you started

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Eddie ShipmanAll-around developerCommented:
Here is a Split function that will ignore duplicates so it will essentially count the number of distinct words in a
string, punctuation not included. This requires PosEx which is in StrUtils in D6 and above.

procedure Split(S, Delimiter: String; var Strings: TStringList; IgnoreDupes: Boolean);
var
  P, OldP: integer;
  Token: string;
  t: String;
begin
  // Prevent any errors due to bogus parameters
  if (Strings = nil) or (Length(S) = 0) or (Length(Delimiter) = 0) then
    exit;
  P := Pos(Delimiter, S);
  OldP := 1;
  while P > 0 do
  begin
    Token := Copy(S, OldP, P-OldP);
    if IgnoreDupes then
    begin
      if (Strings.IndexOf(Token) = -1)  then
        Strings.Add(Trim(Token))
    end
    else
      Strings.Add(Trim(Token));
    // Don't call delete, instead save off P and search from
    // P + 1 to the end of S
    OldP := P + 1;
    P := PosEx(Delimiter, S, OldP);
  end;
  if P = 0 then
    Strings.Add(Trim(Copy(S, OldP, Length(S))));
end;
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Delphi

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.