Solved

How do you find and replace just the text in Rich Text Format files, using sed, ignoring useless formatting codes?

Posted on 2004-09-07
10
1,024 Views
Last Modified: 2010-04-21
Here is an outline of the problem:

I am building an application to search and replace rich text files, for example, being able to search for some text in bold formatting and being able to replace it with text in italic formatting.

I looked into using Sed Addressing to solve the problem, whereby sed searches only between curly backets { and } within the rich text document file, ***but this doesnt work where the curly brackets span multiple lines***.  For example, the sed script would look like the following:

/{/,/}/{
s/search text/replace text/g
}

I am using a Windows Port of the unix utility sed, but that should make little difference.  I think.  I am building the actual application itself in the Windows environment, I posted this question in the UNIX area because sed is a UNIX tool, and it is likely the solution to my problem will involve the use of this tool.

I really want some sed script code (or otherwise) that can search through just the bulk of the actual printed text, and replace with specific words.  Just searching and replacing outright on an RTF file is a bad idea, as you can end up searching and replacing the rtf codes you dont want to.

Any suggestions, ideas, or new approaches to this problem?

Thanks,
Matt (ANSI C++/ANSI C/VB Programmer)
0
Comment
Question by:amadataset
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
  • 2
  • +1
10 Comments
 
LVL 20

Expert Comment

by:Gns
ID: 11996374
Hm, strange. With GNU sed on it works as expected....
With file aaa:
$ cat aaa
kjashjkasdhdjklhaskldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj text lslösdflösd}lkdfsljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj text lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj text lstextlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdftextlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj text lslösdflösd}lkdfsljfkals
$ sed -e '/{/,/}/ s/text/TEXT/g' aaa
kjashjkasdhdjklhaskldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösd}lkdfsljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lsTEXTlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdfTEXTlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösd}lkdfsljfkals
$
... Which is equivalent to what you're trying to do. If your sed implementation doesn't want to play, get the cygwin one from http://www.cygwin.com ... That is GNU sed...

-- Glenn
0
 
LVL 48

Expert Comment

by:Tintin
ID: 12001057
Using Perl would be more portable.
0
 
LVL 38

Expert Comment

by:yuzh
ID: 12002353
if you have perl installed (most system have them these days), you can do:


perl -i -pe "s/oldstr/newstr/" $file

you can put it in a shell script if you wanted.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 20

Expert Comment

by:Gns
ID: 12004004
Perl would be fine... as portable as GNU sed, in a way:-):-)
Note the example above Greg, We need find starting { and ending } ... and only replace the re pattern between those... So one might do something like
perl -pe '$f=1 if(/{/); s/text/TEXT/ if(defined($f)); undef($f) if(defined($f && /}/);' aaa
(which might actually fail if you have a line "skaalas} saas text jkas{jsdksd")

-- Glenn
0
 
LVL 38

Expert Comment

by:yuzh
ID: 12004097
Good point Glenn, thanks for the correction!
0
 

Author Comment

by:amadataset
ID: 12004688
OK

Heres an update to the question.

Technically the above should work, and therefore its probably jsut a problem with me port of sed.  Not that it matters.  Newllines in rich-text files are purely optional - they have no effect on the Rich Text Format file whatsoever, so I can remove all the newlines in the file and just use sed that way.

However, still this doesn't solve the problem of finding and replacing only the words themselves within the RTF file (none of the formatting codes etc.), which is what the essence of this problem was.

QUOTE ------------>
     I really want some sed script code (or otherwise) that can search through just
     the bulk of the actual printed text, and replace with specific words.  Just searching
     and replacing outright on an RTF file is a bad idea, as you can end up searching and
     replacing the rtf codes you dont want to.
<-------------------

Thanks for the input though...

-Matt
0
 
LVL 20

Expert Comment

by:Gns
ID: 12005151
Ah, we're fighting the "lineorientationed-ness" of the tools, sort of... With sed, we can use "Newllines in rich-text files are purely optional" to insert some wellplaced newlines... Like:
---------------------------------------------------------
#### Indata
$ cat aaa
kjashjkasdhdjklhaskldaskl
söajdkl text asjdlkjaskl{ ldfkjgkldfj text lslösdflösd}lkdfstextljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj text lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj text lstextlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdftextlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjg}kldfj text lslösdflösd}lkdfsljfkals

#### Bad one, look at the first line.
$ sed -e '/{/,/}/ s/text/TEXT/g' aaa / if(def
kjashjkasdhdjklhaskldaskl
söajdkl TEXT asjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösd}lkdfsTEXTljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lsTEXTlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdfTEXTlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjg}kldfj TEXT lslösdflösd}lkdfsljfkals

#### "good" one
$ sed -e 's/{/\
{\
/; s/}/\
}\
/' aaa | sed -e '/{/,/}/ s/text/TEXT/g'
kjashjkasdhdjklhaskldaskl
söajdkl text asjdlkjaskl
{
 ldfkjgkldfj TEXT lslösdflösd
}
lkdfstextljfkals
söajdklasjdlkjaskl
{
 ldfkjgkldfj TEXT lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdflösd
}
lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl
{
 ldfkjgkldfj TEXT lsTEXTlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdfTEXTlösdlkdfsljfkals
söajdklasjdlkjaskl
{
 ldfkjg
}
kldfj text lslösdflösd}lkdfsljfkals
-------------------------------------------------------------------------
Now, with a more proper Perl program we could do this without inserting newlines... Let me frob/tweak a bit and I'll get back to you.

-- Glenn
0
 
LVL 20

Expert Comment

by:Gns
ID: 12005696
Oh so very crude, but working:-). Large files puts a bit of a load on the memory:-):-)...

$ cat aaa
kjashjkasdhdjklhaskldaskl
söajdkl text asjdlkjaskl{ ldfkjgkldfj text lslösdflösd}lkdfstextljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj text lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj text lstextlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdftextlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjg}kldfj text lslösdflösd}lkdfsljfkals

$ cat aa.pl
#!/usr/bin/perl -0
$match = "text";
$replace = "TEXT";
open(R,aaa);
$str=<R>;
@s=split(//,$str);
$f=0;
for($i=0;$i<=$#s;$i++)
{
  if($s[$i] =~ /{/) { $f=1; };
  if($s[$i] =~ /}/) { $f=0; };
  if(($f == 1) and ($s[$i] == 't')) {
    $tmp=substr($str,$i,length($match));
    if($tmp eq $match) {
      print $replace;
      $i+=length($match)-1;
      next;
    }
  }
  print $s[$i];
}


$ ./aa.pl
kjashjkasdhdjklhaskldaskl
söajdkl text asjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösd}lkdfstextljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lsTEXTlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdfTEXTlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjg}kldfj text lslösdflösd}lkdfsljfkals

$

Enjoy
-- Glenn
0
 
LVL 20

Accepted Solution

by:
Gns earned 500 total points
ID: 12005764
Argh. The script should've been:
#!/usr/bin/perl -0
$match = "text";
$replace = "TEXT";
$firstchar = substr($match,0,1);
open(R,aaa);
$str=<R>;
@s=split(//,$str);
$f=0;
for($i=0;$i<=$#s;$i++)
{
  if($s[$i] =~ /{/) { $f=1; };
  if($s[$i] =~ /}/) { $f=0; };
  if(($f == 1) and ($s[$i] == $firstchar)) {
    $tmp=substr($str,$i,length($match));
#print "<$tmp>";
    if($tmp eq $match) {
      print $replace;
      $i+=length($match)-1;
      next;
    }
  }
  print $s[$i];
}

Sorry for that .... (the hardcoded 't' compare...).

-- Glenn
0
 

Author Comment

by:amadataset
ID: 12006189
This is, by far, not the most elegant of solutions, but probably the only practical way.  Well done, Glenn.

FYI:  I am giving up search and replacing RTFs as they are a maze of codes, making the task near-impossible without some long winded program I don't have time to create.

- Matt
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

When you do backups in the Solaris Operating System, the file system must be inactive. Otherwise, the output may be inconsistent. A file system is inactive when it's unmounted or it's write-locked by the operating system. Although the fssnap utility…
Let's say you need to move the data of a file system from one partition to another. This generally involves dismounting the file system, backing it up to tapes, and restoring it to a new partition. You may also copy the file system from one place to…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Suggested Courses

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question