Link to home
Start Free TrialLog in
Avatar of anseris
anseris

asked on

batch file with sed - escape character, trailing newlines

I'm using ssed (super-sed) version 3.59, based on GNU sed version 3.02.80. sed newbie here, so be prepared for horrors below.

I need to create a batch file to run under Windows XP which will repair broken lines in a text data file. Valid lines in the data file end in one of these three strings:
""" (three double quotes)
TEMPORARY.ID"
NEXT***"
... so I need a ssed command which will find any newline NOT preceded by one of the above strings, and remove the newline.

The naive best I've been able to do so far is:
ssed -R "(?<!\"\"\"/TEMPORARY.ID\"/NEXT\*\*\*\"\n/d" oldfile > newfile

Apart from the fact that there's got to be a way of escaping all characters in a group, there seem to be two problems here.
(1) The escape character isn't working ahead of " (it works ahead of *). The command seems to be being interpreted as finishing at the first ".
(2) It doesn't find the newline character. I found a reference to sed removing trailing newlines before doing pattern matching, but I can make no sense of the example given to demonstrate how one line should be joined to the next.

anseris
ASKER CERTIFIED SOLUTION
Avatar of Monky42
Monky42

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of anseris
anseris

ASKER

Hi Monky42,

- Using single quotes as delimiters gets the error message "The system cannot find the file specified", even with an expression which works using double quotes.

- \r\n made no difference.

Please stand by, I'm testing the multiple matching. It inspired a bit of research which has turned up some possibilities.

Thanks
anseris
I guess it is possible to do it with (s)sed using it's h H g G x and n commands; but I'd use (g)awk instead as it is more simple to do and more readable for humans:)
Avatar of anseris

ASKER

I got this far with awk:

>awk "/NEXT\"\"\"$|\"\"\"$/ { print $0 } " inputfile
awk: /NEXT"""$
awk:  ^ unterminated regexp
The network path was not found.

It looks as though I can't escape the " on the left side of |.

Still trying sed, will try the commands you mention.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of anseris

ASKER

Apologies for the length of time taken to close this. Other more urgent tasks have prevented me continuing working on this problem. I tried splitting the points earlier, but somehow that feature seemed absent for a while.