Solved

How do you find and replace just the text in Rich Text Format files, using sed, ignoring useless formatting codes?

Posted on 2004-09-07
10
1,012 Views
Last Modified: 2010-04-21
Here is an outline of the problem:

I am building an application to search and replace rich text files, for example, being able to search for some text in bold formatting and being able to replace it with text in italic formatting.

I looked into using Sed Addressing to solve the problem, whereby sed searches only between curly backets { and } within the rich text document file, ***but this doesnt work where the curly brackets span multiple lines***.  For example, the sed script would look like the following:

/{/,/}/{
s/search text/replace text/g
}

I am using a Windows Port of the unix utility sed, but that should make little difference.  I think.  I am building the actual application itself in the Windows environment, I posted this question in the UNIX area because sed is a UNIX tool, and it is likely the solution to my problem will involve the use of this tool.

I really want some sed script code (or otherwise) that can search through just the bulk of the actual printed text, and replace with specific words.  Just searching and replacing outright on an RTF file is a bad idea, as you can end up searching and replacing the rtf codes you dont want to.

Any suggestions, ideas, or new approaches to this problem?

Thanks,
Matt (ANSI C++/ANSI C/VB Programmer)
0
Comment
Question by:amadataset
  • 5
  • 2
  • 2
  • +1
10 Comments
 
LVL 20

Expert Comment

by:Gns
ID: 11996374
Hm, strange. With GNU sed on it works as expected....
With file aaa:
$ cat aaa
kjashjkasdhdjklhaskldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj text lslösdflösd}lkdfsljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj text lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj text lstextlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdftextlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj text lslösdflösd}lkdfsljfkals
$ sed -e '/{/,/}/ s/text/TEXT/g' aaa
kjashjkasdhdjklhaskldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösd}lkdfsljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lsTEXTlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdfTEXTlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösd}lkdfsljfkals
$
... Which is equivalent to what you're trying to do. If your sed implementation doesn't want to play, get the cygwin one from http://www.cygwin.com ... That is GNU sed...

-- Glenn
0
 
LVL 48

Expert Comment

by:Tintin
ID: 12001057
Using Perl would be more portable.
0
 
LVL 38

Expert Comment

by:yuzh
ID: 12002353
if you have perl installed (most system have them these days), you can do:


perl -i -pe "s/oldstr/newstr/" $file

you can put it in a shell script if you wanted.
0
 
LVL 20

Expert Comment

by:Gns
ID: 12004004
Perl would be fine... as portable as GNU sed, in a way:-):-)
Note the example above Greg, We need find starting { and ending } ... and only replace the re pattern between those... So one might do something like
perl -pe '$f=1 if(/{/); s/text/TEXT/ if(defined($f)); undef($f) if(defined($f && /}/);' aaa
(which might actually fail if you have a line "skaalas} saas text jkas{jsdksd")

-- Glenn
0
 
LVL 38

Expert Comment

by:yuzh
ID: 12004097
Good point Glenn, thanks for the correction!
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:amadataset
ID: 12004688
OK

Heres an update to the question.

Technically the above should work, and therefore its probably jsut a problem with me port of sed.  Not that it matters.  Newllines in rich-text files are purely optional - they have no effect on the Rich Text Format file whatsoever, so I can remove all the newlines in the file and just use sed that way.

However, still this doesn't solve the problem of finding and replacing only the words themselves within the RTF file (none of the formatting codes etc.), which is what the essence of this problem was.

QUOTE ------------>
     I really want some sed script code (or otherwise) that can search through just
     the bulk of the actual printed text, and replace with specific words.  Just searching
     and replacing outright on an RTF file is a bad idea, as you can end up searching and
     replacing the rtf codes you dont want to.
<-------------------

Thanks for the input though...

-Matt
0
 
LVL 20

Expert Comment

by:Gns
ID: 12005151
Ah, we're fighting the "lineorientationed-ness" of the tools, sort of... With sed, we can use "Newllines in rich-text files are purely optional" to insert some wellplaced newlines... Like:
---------------------------------------------------------
#### Indata
$ cat aaa
kjashjkasdhdjklhaskldaskl
söajdkl text asjdlkjaskl{ ldfkjgkldfj text lslösdflösd}lkdfstextljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj text lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj text lstextlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdftextlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjg}kldfj text lslösdflösd}lkdfsljfkals

#### Bad one, look at the first line.
$ sed -e '/{/,/}/ s/text/TEXT/g' aaa / if(def
kjashjkasdhdjklhaskldaskl
söajdkl TEXT asjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösd}lkdfsTEXTljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lsTEXTlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdfTEXTlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjg}kldfj TEXT lslösdflösd}lkdfsljfkals

#### "good" one
$ sed -e 's/{/\
{\
/; s/}/\
}\
/' aaa | sed -e '/{/,/}/ s/text/TEXT/g'
kjashjkasdhdjklhaskldaskl
söajdkl text asjdlkjaskl
{
 ldfkjgkldfj TEXT lslösdflösd
}
lkdfstextljfkals
söajdklasjdlkjaskl
{
 ldfkjgkldfj TEXT lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdflösd
}
lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl
{
 ldfkjgkldfj TEXT lsTEXTlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdfTEXTlösdlkdfsljfkals
söajdklasjdlkjaskl
{
 ldfkjg
}
kldfj text lslösdflösd}lkdfsljfkals
-------------------------------------------------------------------------
Now, with a more proper Perl program we could do this without inserting newlines... Let me frob/tweak a bit and I'll get back to you.

-- Glenn
0
 
LVL 20

Expert Comment

by:Gns
ID: 12005696
Oh so very crude, but working:-). Large files puts a bit of a load on the memory:-):-)...

$ cat aaa
kjashjkasdhdjklhaskldaskl
söajdkl text asjdlkjaskl{ ldfkjgkldfj text lslösdflösd}lkdfstextljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj text lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj text lstextlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj text lslösdftextlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjg}kldfj text lslösdflösd}lkdfsljfkals

$ cat aa.pl
#!/usr/bin/perl -0
$match = "text";
$replace = "TEXT";
open(R,aaa);
$str=<R>;
@s=split(//,$str);
$f=0;
for($i=0;$i<=$#s;$i++)
{
  if($s[$i] =~ /{/) { $f=1; };
  if($s[$i] =~ /}/) { $f=0; };
  if(($f == 1) and ($s[$i] == 't')) {
    $tmp=substr($str,$i,length($match));
    if($tmp eq $match) {
      print $replace;
      $i+=length($match)-1;
      next;
    }
  }
  print $s[$i];
}


$ ./aa.pl
kjashjkasdhdjklhaskldaskl
söajdkl text asjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösd}lkdfstextljfkals
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lslösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdflösd}lkdfsljfkals
kjashjkasdhdjklhaskldaskl
kjashjkasdhdjklhas text kldaskl
söajdklasjdlkjaskl{ ldfkjgkldfj TEXT lsTEXTlösdflösdlkdfsljfkals
söajdklasjdlkjaskl ldfkjgkldfj TEXT lslösdfTEXTlösdlkdfsljfkals
söajdklasjdlkjaskl{ ldfkjg}kldfj text lslösdflösd}lkdfsljfkals

$

Enjoy
-- Glenn
0
 
LVL 20

Accepted Solution

by:
Gns earned 500 total points
ID: 12005764
Argh. The script should've been:
#!/usr/bin/perl -0
$match = "text";
$replace = "TEXT";
$firstchar = substr($match,0,1);
open(R,aaa);
$str=<R>;
@s=split(//,$str);
$f=0;
for($i=0;$i<=$#s;$i++)
{
  if($s[$i] =~ /{/) { $f=1; };
  if($s[$i] =~ /}/) { $f=0; };
  if(($f == 1) and ($s[$i] == $firstchar)) {
    $tmp=substr($str,$i,length($match));
#print "<$tmp>";
    if($tmp eq $match) {
      print $replace;
      $i+=length($match)-1;
      next;
    }
  }
  print $s[$i];
}

Sorry for that .... (the hardcoded 't' compare...).

-- Glenn
0
 

Author Comment

by:amadataset
ID: 12006189
This is, by far, not the most elegant of solutions, but probably the only practical way.  Well done, Glenn.

FYI:  I am giving up search and replacing RTFs as they are a maze of codes, making the task near-impossible without some long winded program I don't have time to create.

- Matt
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
SSH commands for Nas4free 21 304
nodeip 9 68
Restore XenServer VM with only dd image of LVM snapshot 3 137
Oracle Finace 3 48
When you do backups in the Solaris Operating System, the file system must be inactive. Otherwise, the output may be inconsistent. A file system is inactive when it's unmounted or it's write-locked by the operating system. Although the fssnap utility…
Let's say you need to move the data of a file system from one partition to another. This generally involves dismounting the file system, backing it up to tapes, and restoring it to a new partition. You may also copy the file system from one place to…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now