Managing text

Hi all experts,

Can you supply me with a working commented sample of code to scan line by line a portion of text in order to:
- delete all the double,triple ..... spaces in lines or between words or letters
- delete all the blank lines between paragraphs
- delete all the carriage returns [except the last between paragraphs] and tabs

Thanks for your help.
I give  125 points for code and 75 points more for explanation (comments in code)

Bernani

LVL 9
bernaniAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

DavidBirch2dotComCommented:
Hello Bernani try this, if it doesnt fit the bill I would be happy to modify it

Var
 K: integer;
begin
  For K:= (memo1.Lines.Count-1) downto 0 do // memo lines are counted from 0    work backwards to avoid problems deleting items and making the code unstable
  begin
     while pos('  ',Memo1.Lines.Strings[K])<> 0 do // while there are still double spaces
     Memo1.Lines.Strings[K]:=   StringReplace(Memo1.Lines.Strings[K], '  ', ' ',[rfReplaceAll, rfIgnoreCase]); // replace all double spaces with a single space

     If not (K =memo1.Lines.Count) then // to avoid range check error
      If not (trim(Memo1.Lines.Strings[K+1])='') then // if the next line is not blank or has only spaces
        Memo1.Lines.Strings[K]:=   StringReplace(Memo1.Lines.Strings[K], #32, '',[rfReplaceAll, rfIgnoreCase]); // replace all the returns character #32 with nothing

     Memo1.Lines.Strings[K]:=   StringReplace(Memo1.Lines.Strings[K], #13, '',[rfReplaceAll, rfIgnoreCase]); // replace all tabs with nothing

     If trim(Memo1.Lines.Strings[K])='' then   // kill off empty lines or lines with only spaces
      Memo1.Lines.Delete(K);
  end;

David
illusion_chaserCommented:
The result of the following function is a corrected text just like you wanted it:
Regards.

function FormatText(_InputText: string): string;
var
  i, len: Integer;
  boolSpaceFound, boolEnterFound, boolLineFeedFound: Boolean;
begin
  len := Length(_InputText); //Get text length

  //Reset all flags
  boolSpaceFound := False;
  boolEnterFound := False;
  boolLineFeedFound := False;

  for i := 1 to len do
  begin
    case (_InputText[i]) of
      Chr(VK_SPACE): //We found Space character
      begin
        boolEnterFound := False;
        boolLineFeedFound := False;

        if (not boolSpaceFound) then
          boolSpaceFound := True
        else
          Continue; //Will not copy consequtive spaces
      end;

      Chr(VK_RETURN): //We found CarriageReturn (Enter) character
      begin
        boolSpaceFound := False;

        if (not boolEnterFound) then
          boolEnterFound := True
        else
          Continue; //Will not copy consequtive Enters
      end;

      #10: //We found LineFeed character
      begin
        boolSpaceFound := False;

        if (not boolLineFeedFound) then
          boolLineFeedFound := True
        else
          Continue; //Will not copy consequtive line feeds
      end;

      Chr(VK_TAB): //We found a Tab character
      begin
        Continue;
      end;

      else //Any other character, reset all flags
      begin
        boolSpaceFound := False;
        boolEnterFound := False;
        boolLineFeedFound := False;
      end;
    end; //Of case

    Result := Result + _InputText[i];
  end;
end;
bernaniAuthor Commented:
Hi DavidBirch2dotCom  and illusion_chaser

Thanks for your prompt answer and comment.
I'll take some time to test and analyse the snippet and function you supply.
I'll let you know if it's OK or if I've question or comment.



Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

bernaniAuthor Commented:
Hi illusion_chaser,

I've tested your great function in my application and it requires nearly all what I'm looking for. :)

How can you improve it - if you accept - in order:
- it respects one blank line between paragraphs (2 or more lines without CR, like between the body of two procedures in a programm or text)
- some lines still remain preceded by a space (or sth like a blank space but)

When I surf, I'll grab text from web pages into my application (inserted in a blank jvRichEdit and saved automatically in rtf format under a unique name in a particlar folder), by means of an item in a systray menu of my application, and without leaving the page or switching between applications.
I'll use this function found on the net (I think it's also present in this EE-Group, on D-Tnt, SwissDelphi, ...:

function GetSelectedIEtext: string;
var
  x: Integer;
  Sw: IShellWindows;
  IE: HWND;
begin
  IE := FindWindow('IEFrame', nil);
  sw := CoShellWindows.Create;
  for x := SW.Count - 1 downto 0 do
    if (Sw.Item(x) as IWebbrowser2).hwnd = IE then begin
      Result := variant(Sw.Item(x)).Document.Selection.createRange.Text;
      break;
    end;
end;

Sometimes, the grabbed text is composed by correct text but sentences seem cutted at a certain point and you need delete a lot of stuff to get a correct text to be formatted in an rtf format.

I don't have yet a great knowledge on difference between Unix text, dos text, Ansi text, ....

Could we check with your function wich kind of text is present ?

I'll increase the points for this to 325:
- 125 (initial) + 125 more for my request above
- 75 for the comments and explanations

Thanks a lot for your precious contribution.

_______________________
To David:
Thanks David for your suggestion .
The code works but all the spaces between words are removed. Nice for naming variable but not readable text ;)
Same comments as above for some text grabbed from web page.
The comment in your code were welcome. I'll give you points for those (I'll see how much when I accept an answer).

Thanks a lot for your precious contribution.

Bernani



bernaniAuthor Commented:
Re illusion_chaser and David,


Sorry, forgot to increase the points in the box

Sorry 2 : in my qyestion I said "- delete all the blank lines between paragraphs" : bad formulation.
I should have said "- delete all the blank lines between paragraphs except the blank line between paragraphs".


DavidBirch2dotComCommented:
Sorry about that, I had put in #32 (space not return), so that problem is fixed, I have also changed the code to leave a line where there are more than two blank lines (for a paragraph)

Updated Version:

Var
 K: integer;
begin
  For K:= (memo1.Lines.Count-1) downto 0 do // memo lines are counted from 0    work backwards to avoid problems deleting items and making the code unstable
  begin
     while pos('  ',Memo1.Lines.Strings[K])<> 0 do // while there are still double spaces
     Memo1.Lines.Strings[K]:=   StringReplace(Memo1.Lines.Strings[K], '  ', ' ',[rfReplaceAll, rfIgnoreCase]); // replace all double spaces with a single space

     If not (K =0) or (K= memo1.Lines.Count)  then // to avoid range check error
     begin
      If not (trim(Memo1.Lines.Strings[K])='') then // if the next line is not blank or has only spaces
        Memo1.Lines.Strings[K-1]:=   StringReplace(Memo1.Lines.Strings[K-1], #13, '',[rfReplaceAll, rfIgnoreCase]); // replace all the returns character #13 with nothing

      If (trim(Memo1.Lines.Strings[K])='') and not (trim(Memo1.Lines.Strings[K-1])='') then   // kill off empty lines or lines with only spaces
        Memo1.Lines.Delete(K);

      If (trim(Memo1.Lines.Strings[K+1])='') and (trim(Memo1.Lines.Strings[K])='') then
        Memo1.Lines.Delete(K);        // to avoid leaving two blank lines in a row
  end;

     Memo1.Lines.Strings[K]:=   StringReplace(Memo1.Lines.Strings[K], #09, '',[rfReplaceAll, rfIgnoreCase]); // replace all tabs with nothing
  end;

David
sas13Commented:
function ManageText(const AText: string): string;
var
  _str : TStringList;
begin
 _str := TStringList.Create;
 try
  _str.Delimiter := ' ';
  _str.DelimitedText := AText;
  Result := StringReplace(_str.Text, #13#10, ' ', [rfReplaceAll])
 finally
  _str.Free
 end
end;
illusion_chaserCommented:
Hi there.
Here is the updated function code:

function TForm1.FormatText(_InputText: string): string;
var
  i, len: Integer;
  boolSpaceFound: Boolean;
  iEnterCnt, iLineFeedCnt: Integer;
begin
  len := Length(_InputText); //Get text length

  //Reset all flags
  boolSpaceFound := False;
  iEnterCnt := 0;
  iLineFeedCnt := 0;

  for i := 1 to len do
  begin
    case (_InputText[i]) of
      Chr(VK_SPACE): //We found Space character
      begin
        iEnterCnt := 0;
        iLineFeedCnt := 0;

        if (not boolSpaceFound) then
          boolSpaceFound := True
        else
          Continue; //Will not copy consecutive spaces
      end;

      Chr(VK_RETURN): //We found CarriageReturn (Enter) character
      begin
        boolSpaceFound := True;
         
        if (iEnterCnt < 2) then //Will leave one line between paragraphs
          Inc(iEnterCnt) //Count enters
        else
          Continue; //Will not copy consecutive Enters
      end;

      #10: //We found LineFeed character
      begin
        boolSpaceFound := True;

        if (iLineFeedCnt < 2) then //Will leave one line between paragraphs
          Inc(iLineFeedCnt) //Count line feeds
        else
          Continue; //Will not copy consecutive line feeds
      end;

      Chr(VK_TAB): //We found a Tab character
      begin
        Continue;
      end;

      else //Any other character, reset all flags
      begin
        boolSpaceFound := False;
        iEnterCnt := 0;
        iLineFeedCnt := 0;
      end;
    end; //Of case

    Result := Result + _InputText[i];
  end;
end;

And here is a link to and application that deals with Unix/DOS/Windows/MAC text formats (without source):
http://www.chmaas.handshake.de/delphi/freeware/cmsort/cmsort.htm
Couldn't get any sources though. .

Regards.
bernaniAuthor Commented:
Hi David  and illusion_chaser

Nice work from you two which was of a great value for me.

Before accepting yourtwo proposals, I increase the points and split them like this:

- explanation David: 75 - illusion_chaser : 75
- code: David: 175 - illusion_chaser: 175

So total points : David 250 and illusion_chaser 250 : all the two ways of manipulating the strings are interessant for me even if the function FormatText is more reusable without change.
Tell me if you feel OK if I split in this way.

Hi Sas13 ,

Thanks for your function ManageText but my D5 reports an error :   _str.Delimiter and _str.DelimitedText as  Undeclared identifier
I checked the properties and indeed, no such a property exists in TSringList in D5.
I suppose your using a newer version than mine ;)


Bernani.






illusion_chaserCommented:
No problem here. I am glad I could help.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DavidBirch2dotComCommented:
That split would be great, Glad to help
David
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Fonts Typography

From novice to tech pro — start learning today.