anseris
asked on
batch file with sed - escape character, trailing newlines
I'm using ssed (super-sed) version 3.59, based on GNU sed version 3.02.80. sed newbie here, so be prepared for horrors below.
I need to create a batch file to run under Windows XP which will repair broken lines in a text data file. Valid lines in the data file end in one of these three strings:
""" (three double quotes)
TEMPORARY.ID"
NEXT***"
... so I need a ssed command which will find any newline NOT preceded by one of the above strings, and remove the newline.
The naive best I've been able to do so far is:
ssed -R "(?<!\"\"\"/TEMPORARY.ID\" /NEXT\*\*\ *\"\n/d" oldfile > newfile
Apart from the fact that there's got to be a way of escaping all characters in a group, there seem to be two problems here.
(1) The escape character isn't working ahead of " (it works ahead of *). The command seems to be being interpreted as finishing at the first ".
(2) It doesn't find the newline character. I found a reference to sed removing trailing newlines before doing pattern matching, but I can make no sense of the example given to demonstrate how one line should be joined to the next.
anseris
I need to create a batch file to run under Windows XP which will repair broken lines in a text data file. Valid lines in the data file end in one of these three strings:
""" (three double quotes)
TEMPORARY.ID"
NEXT***"
... so I need a ssed command which will find any newline NOT preceded by one of the above strings, and remove the newline.
The naive best I've been able to do so far is:
ssed -R "(?<!\"\"\"/TEMPORARY.ID\"
Apart from the fact that there's got to be a way of escaping all characters in a group, there seem to be two problems here.
(1) The escape character isn't working ahead of " (it works ahead of *). The command seems to be being interpreted as finishing at the first ".
(2) It doesn't find the newline character. I found a reference to sed removing trailing newlines before doing pattern matching, but I can make no sense of the example given to demonstrate how one line should be joined to the next.
anseris
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I guess it is possible to do it with (s)sed using it's h H g G x and n commands; but I'd use (g)awk instead as it is more simple to do and more readable for humans:)
ASKER
I got this far with awk:
>awk "/NEXT\"\"\"$|\"\"\"$/ { print $0 } " inputfile
awk: /NEXT"""$
awk: ^ unterminated regexp
The network path was not found.
It looks as though I can't escape the " on the left side of |.
Still trying sed, will try the commands you mention.
>awk "/NEXT\"\"\"$|\"\"\"$/ { print $0 } " inputfile
awk: /NEXT"""$
awk: ^ unterminated regexp
The network path was not found.
It looks as though I can't escape the " on the left side of |.
Still trying sed, will try the commands you mention.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Apologies for the length of time taken to close this. Other more urgent tasks have prevented me continuing working on this problem. I tried splitting the points earlier, but somehow that feature seemed absent for a while.
ASKER
- Using single quotes as delimiters gets the error message "The system cannot find the file specified", even with an expression which works using double quotes.
- \r\n made no difference.
Please stand by, I'm testing the multiple matching. It inspired a bit of research which has turned up some possibilities.
Thanks
anseris