Solved

sed script needed

Posted on 2001-09-05
34
702 Views
Last Modified: 2012-08-14
(it can be done with other unix/linux utils such as awk, nawk, etc)
i need a script that will get a cpp file (or more than one, such as *.cpp) from the commandline, and will do :
1. add <endline> in the end of file (if it not ending with such)
2. do "dos2unix" (i need a batch proccessing too. and dos2unix itself do not support more than 1 input file.)
3. do a new "hard line brakes" in each line
4. if have my include written as <dds/DDSArray.h> - will replace it with the same but in "", not in <>
(standard includes will remain in <>)


Gad
0
Comment
Question by:gadh98
  • 12
  • 10
  • 10
  • +2
34 Comments
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6458941
1. do you mean to add the literal string   <endline>  as last line of the file?
3. what are "hard line brakes" for you? (which hexkey?)
4. have your includes a unique pattern?
0
 
LVL 5

Expert Comment

by:moonbeam012200
ID: 6459217
I'm a bit confused about the "<endline>" as well, but the
following perl script should do the trick.

-->cat cvt
#!/usr/bin/perl -w -i-orig
#
# program to convert dos formatted cpp files and
# perform include file syntax replacements.
#
###
while (<>) {
    $_ =~ y;\r;;d;
    $_ =~ s/<(.*)>/"$1"/;
    print;
}
-->cat cvt
#!/usr/bin/perl -w -i-orig
#
# program to convert dos formatted cpp files and
# perform include file syntax replacements.
#
###
while (<>) {
    $_ =~ y;\r;;d;
    $_ =~ s/<(.*)>/"$1"/;
    print;
}

To run this script enter:

cvt *.cpp

The original cpp files will be retained with "-orig" appended to the filename.

0
 
LVL 5

Expert Comment

by:moonbeam012200
ID: 6459225
Doh! netscape dupped my paste. Well, you get the idea.

0
 
LVL 5

Expert Comment

by:moonbeam012200
ID: 6459243
oops... forgot to account for leaving "standard" includes alone. here is a new version.

-->cat cvt
#!/usr/bin/perl -w -i-orig
#
# program to convert dos formatted cpp files and
# perform include file syntax replacements.
#
###
while (<>) {
    $_ =~ y;\r;;d;
    $_ =~ s/<(.*\/.*)>/"$1"/;
    print;
}

0
 

Author Comment

by:gadh98
ID: 6459403
moon beam: i'll check your comment right away
ahoffman and moonbeam:
when i say <endline> i mean to erase the current <LF> (if i remeber right - this is the endline in unix/linux files)
and re-writing it again. why ?
caused i saw that when i compile cpp code in solaris-forte compiler, if i had 2 rows written in windows like this:

#include <dds/DDSValue.h>
#include <dds/DDSArray.h>

sometimes the compiler ignored the second line !
just after i had inserted a newline between the two - it recognized it


0
 
LVL 5

Expert Comment

by:moonbeam012200
ID: 6459802
A line feed (LF) is not used as a unix eof. Generally, applications will use, <EOD> (^d), as a "end if document".

0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6460126
UNIX has no End-Of-Document (^D) inside files
0
 
LVL 5

Expert Comment

by:garboua
ID: 6464138
c and C++ compilers use a free flow, you can write all your code in one long line and compiler will recognize it.  
you can use the C shell
$foreach fn (*.cpp)
foreach> cp -f $fn tmpcat
foreach> dos2unix tmpcat >$fn
foreach> cp -f $fn tmpcat
foreach> tr -d "\000"<tmpcat>$fn
foreach> cp -f $fn tmpcat
foreach> sed "s/<\(pattern\)/\"\1\"/g" tmpcat >$fn
foreach> end
this will seach for all your cpp files
run dostounix on them
clean all the null pointers at end of line caused by conversions and the end of file as well
replace the <pattern.h> with "pattern.h"
now you can write a file with all the  patterns and substitde sed "s/<\(pattern\)/\"\1\"/g" tmpcat >$fn
with sed -f sedfile tmpcat >$fn
and the sed file will contail all patterns
"s/<\(pattern\)/\"\1\"/g"
"s/<\(pattern2\)/\"\1\"/g"
etc etc
you can run this command from commandline or put it in a script file.
you can also do it from a tar file a zip file or generate them from command such as
foreach fn (`gzip -cd som.tar.gz| tar xvf -`)
it will take a zip file unzip it untar it and send it to your script which will clean up files and then write them
 
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6464884
gadh98
, would you like to give more informations (see very first comment), otherwise you'll get incomplete answers
0
 

Author Comment

by:gadh98
ID: 6473528
yes, i'll calrify a bit
first - thanks for all your efforts.

to ahoffman: your script is not working . this is the err:

./cvt: --: command not found
./cvt: line 8: syntax error near unexpected token `(<>)'
./cvt: line 8: `while (<>) {'


to ahoffman's first comment:

1. i mean like your doing <enter> manually.
2. i do not know the hex codes.
3. my include patterns are starting and ending with brackets <>, and have to have slash inside - /

this is similar to some of the std. includes, like <sys/time.h> for example, but you have to recognize it.
so i suggest that if you see 'sys/' - ignore it, while in all other cases in which you see <.../....> - change it to ""

(if you know other USEFULL include dirs other than sys/ - let me know)

garboao: it better you'll edit your text so i can just run it...

Gad
0
 

Author Comment

by:gadh98
ID: 6473535
and one more comment: i use BASH shell.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6473906
in your question you write:
  3. do a new "hard line brakes" in each line
in your last comment (assuming 2. equals 3.):
  2. i do not know the hex codes.

It,s realy very hard to imagine what you want to have if you
   a. cannot tell us what is
   b. cannot tell us what should be
and also: how do you identify the position where a <hard brake> should occour, how should a program identify it? If there is no special character at the moment (like 0x12 or ox15) it cannot be done.

Could you please give exact answers to questions 1. and 3. (numbering according to your question) in my very first comment.

About the includes you wrote (in previous comment):
   while in all other cases in which you see <.../....> - change it to ""

Does this mean that all includes of the form:
   #include <file.h>
are to be treated as system include, means you don't have any include in . just in subdirectories?
BTW, there are dozents of subdirectories in /usr/include other than sys, for example net, nfs.
   
0
 

Author Comment

by:gadh98
ID: 6474219
about the "hard line break":
sorry for this misunderstanding.
i'll start from the beginning - my problem is that i edit my files in windows, and therefore they are kept in PC format. while running them and compiling them on unix/linux. the main problem is that they are compiled wrong, cause some include lines are not recognized by the compiler, and also my executables cannot be executed.
my solutions so far were dos2unix - which is not working , or just doing a thing that won't solve this problem, or to open the files in an advanced editor in windows , like "TextPad", and saving them in unix format.
the last solution is almost enough, but even after it i have some compilation errors also, so i have to insert a line break just before the "disappearing" line in the code, in order to the compiler to recognize it.

what i'm looking now is a batch utility/script, that will do this automatically for me.
i still do not know all the way how to do this, so this is the reasin i'm telling you all this story.
if you still do not know what to do, lets say that i will be satisfied is you'll convert the files to unix format, and also add a simple line break between each line of the include lines. (i'm not sure this is the appropriate solution, but as ia said before, i do not know the exact reason to this problem)


about the include pattern:
the <file.h> is to be treated like system include.

about the BTW:
i saw in my solaris /usr/include - 25 sub dirs.
if you can add all them to the pattern rules, i'll be glad, and also make this pattern modular, so i can use one variable to all sub dirs names, so i can change it to my needs.

if you can do also the same var. to MY dirs - it'll be perfect. for example:
<dds/file.h>
<poem/file.h>
these are my dirs
while:
<sys/file.h>
is a system dir

and for this i'll give more points, offcourse.

for all the guys working on a solution:
please write your script with comments, so everyone can understand, and make it ONE script to do all this, or one script to each operation.
i will not accept just general instructions with 1-2 lines that i have to alter to my needs .

i need a whole soluion ...
0
 

Author Comment

by:gadh98
ID: 6474228
one more : i mean one variable to my dirs, one variable to system dirs, offcourse.
0
 

Author Comment

by:gadh98
ID: 6474234
i do not know why, bu i cannot raise the points to 400 ...
0
 

Author Comment

by:gadh98
ID: 6474263
sorry about the dos2unix. now i know how to work  with it:
dos2unix <source> <destination>

and have to enter a destination - otherwise it will not work ! (there is no default dest. like i thought)

(i'm on Solaris 7)
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6474296
ok, now I'm going to understand it too ;-)

> .. dos2unix - which is not working
this is exactly what you need, usualy.
It does all the convertings about linefeed, carriage return.

If it does not, could you plese tell me which OS, which version of OS and dos2unix.
I only know of a problem about DOS' EOF character: ^Z which is not converted, sometimes, but this should not cause your problems.
Also if the compiler fails, could you please post the message. I will not belief that this is caused by the DOS-versusUNIX-format problem, a compiler reads abyte stream usually.

If you have perl (either on M$ or UNIX), you may convert like:
    perl -i -pe 'BEGIN{ $/="\012"; $\="\015\012"} chomp' dosfile >unixfile


> .. like "TextPad",

I highly recommend to use such an editor, you'll just avoid a lot of other problems. TextPad works for me.

> .. insert a line break just before the "disappearing" line in the code
ok, that's the disadvantage of a byte stream ;-)


About the includes and the (more or less) complete script, I'll work on it. Do you have perl on Solaris (usually Solaris comes with perl 4.0.036 at least).


Just a question about the current format, as you have the files before anything done on UNIX: could you please do a:

   od -c file|head

You'll see a few lines of your file printed as characters, the unprintable characters shown like \n or \r, etc.
I'll just need to know what's the current line break, probably it is:  \r\n
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:gadh98
ID: 6476818
ok, dos2unix is working , and i can do also a batch converting using for...do loop

yes, i have perl, but still the initial script you gave do not work. i suppose you have to fill the <> with a living example. do it , please, and i'll change the value to mine.


0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6477405
gadh98, do you mean me by "the initial script you gave do not work." ?
I didn't gave a script in this thread.
0
 
LVL 5

Expert Comment

by:moonbeam012200
ID: 6477553
I posted the script. From the error output it looks like you included the "-->cat cvt" line in the script. The script should start with the line "#!/usr/bin/perl -w -i-orig". Depending on where you have perl installed, you might need to edit that line. Enter "type perl" to see where perl is installed on your system.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6477572
moonbeam, as you gave the basic perl script, I think it's up to you to add the loop around the files (probaly command args), and a hash for not changable include, and then collect the points ;-)
0
 

Author Comment

by:gadh98
ID: 6477651
o'k, i'll try moonbeam's comment.

BTW, how can i raise the points above 300 ?
0
 
LVL 5

Expert Comment

by:moonbeam012200
ID: 6478142
The loop around the files is already implicit in the existing script. I am however concerned about the comment:

>If you can do also the same var. to MY dirs - it'll be
>perfect. for example:
>               <dds/file.h>
>               <poem/file.h>
>               these are my dirs
>               while:
>               <sys/file.h>
>               is a system dir

The current version will not change any include that is in the form <anydir/include.h>. It would not be hard to fix this, but I don't have time right now, and will post a new version later tonight. The approach I will take will to key on any subdirectory name that is relative to the current directory. If however, gadh98 has installed any "MY" dirs in /usr/include, there isn't any "general" solution.
0
 
LVL 5

Expert Comment

by:moonbeam012200
ID: 6478150
oops...

I said: "current version will not change"

I should have said, "current version will only change"

Sorry
0
 
LVL 5

Expert Comment

by:moonbeam012200
ID: 6481215
Sorry for the delay, it's been a busy week. I think I have a complete solution for you. Here is the script.

#!/usr/bin/perl -w -i-orig
#
# program to convert dos formatted cpp files and
# perform include file syntax replacements.
#
###

#
# generate a set of "system" includes
#
$dirs = join( " ", map($_, </usr/include/*\/>));
$dirs =~  s/\+//g;
$dirs =~ s/\/usr\/include\///g;
@dirs = split /\s+/,$dirs;

#
# loop over every file and perform modifications
#
while (<>) {                            # loop over each line and each file
    $_ =~ y;\r;;d;                      # translate dos line end characters
    foreach $dir (@dirs) {              # loop over each system include
        if ( /$dir/) {                  # check the line for a system include
            print;                      # preserve the line
            goto LINE;                  # get the next line
        }
    }
    $_ =~ s/<(.*\/.*)>/"$1"/;           # modify local includes
    print;
    LINE:
}

In my test case, the original includes looked like this:

#include <iostream.h>
#include <sys/it.h>

#include <my/mitrace.h>
#include <string.h>

After running the utility, they look like this:


#include <iostream.h>
#include <sys/it.h>

#include "my/mitrace.h"
#include <string.h>

To verify that only the necessary changes where made (my test case was a complete c++ program, not just a fragment) I checked the sysname.cpp against the sysname.cpp-orig:
-->diff sysname.cpp-orig sysname.cpp
5c5
< #include <my/mitrace.h>
---
> #include "my/mitrace.h"

Looks good!
perl -e '( $ ,, $ ")=("a".."z")[0,-1]; print "sh", $ ","m\n";;";;"'

William

0
 
LVL 5

Expert Comment

by:moonbeam012200
ID: 6481622
For those type "A" people out there, it is possible to shorten this script by one line.

#!/usr/bin/perl -w -i-orig
#
# program to convert dos formatted cpp files and
# perform include file syntax replacements.
#
###

#
# generate a set of "system" includes
#
$dirs = join( " ", map($_, </usr/include/*\/>));
$dirs =~  s/\+//g;
$dirs =~ s/\/usr\/include\///g;
@dirs = split /\s+/,$dirs;

#
# loop over every file and perform modifications
#
while (<>) {                            # loop over each line and each file
    $_ =~ y;\r;;d;                      # translate dos line end characters
    foreach $dir (@dirs) {              # loop over each system include
        if ( /$dir/) {                  # check the line for a system include
            goto LINE;                  # get the next line
        }
    }
    $_ =~ s/<(.*\/.*)>/"$1"/;           # modify local includes
    LINE:
    print;
}
0
 
LVL 4

Accepted Solution

by:
kiffney earned 300 total points
ID: 6501647
I think ahoffman has put his finger on it - don't use that editor you were using.  I've seen various so-called windows text editors stick control characters into code that made it unrecognizable to the compiler, although it looks fine on the screen or printed out.  Start with 'textpad' and you won't have these problems.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6502177
BTW, start with the *right version* of textpad, or you even will have more problems than these (as usual on ...)
0
 
LVL 5

Expert Comment

by:moonbeam012200
ID: 6502194
Or better yet, vim (vi improved) is available on win and unix.

vi is my shepherd; i shall not font.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6502200
.. was there any better suggestion than vi[m]
it even manages the ^M characters propper
Not bad for somthing already 30 years old :-]]
0
 

Author Comment

by:gadh98
ID: 6502531
but i cannot use other txt editor frequently, cause i'm doing a porting code job from windows to unix, and i compile in the same time for windows using VC++, and on unix - solaris forte 6
0
 

Author Comment

by:gadh98
ID: 6505220
to moonbeam: what is type "a" people, and what is the difference between the 2 scripts you gave
(only the first works for me)
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6562375
0
 

Author Comment

by:gadh98
ID: 6566693
to moobeam - i wanted to give you the points , but by accident i gave them to the wrong man.

what can i do to fix it ?
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

If you have a server on collocation with the super-fast CPU, that doesn't mean that you get it running at full power. Here is a preamble. When doing inventory of Linux servers, that I'm administering, I've found that some of them are running on l…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now