• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2510
  • Last Modified:

batch file with sed - escape character, trailing newlines

I'm using ssed (super-sed) version 3.59, based on GNU sed version 3.02.80. sed newbie here, so be prepared for horrors below.

I need to create a batch file to run under Windows XP which will repair broken lines in a text data file. Valid lines in the data file end in one of these three strings:
""" (three double quotes)
TEMPORARY.ID"
NEXT***"
... so I need a ssed command which will find any newline NOT preceded by one of the above strings, and remove the newline.

The naive best I've been able to do so far is:
ssed -R "(?<!\"\"\"/TEMPORARY.ID\"/NEXT\*\*\*\"\n/d" oldfile > newfile

Apart from the fact that there's got to be a way of escaping all characters in a group, there seem to be two problems here.
(1) The escape character isn't working ahead of " (it works ahead of *). The command seems to be being interpreted as finishing at the first ".
(2) It doesn't find the newline character. I found a reference to sed removing trailing newlines before doing pattern matching, but I can make no sense of the example given to demonstrate how one line should be joined to the next.

anseris
0
anseris
Asked:
anseris
  • 3
  • 2
2 Solutions
 
Monky42Commented:
Hello, I am no sed expert, but I have some experience with regular expressions. Here are some hints that might be worth trying:
- Your regular expression is delimited by "..." try '...' (single quotes) instead. This might be the reason for your trouble with \"
- To match a newline it might be necessary to enable "multiline matching" for the regular expression. Some regexp engines only match single lines by default. Quote: "In Perl, you do this by adding an m after the regex code, like this: m/^regex$/m;."
- When you are working in a windows environment the lines might end with \r\n instead of the unix style plain \n. Add \r? (an optional linefeed) to your expression.

Good luck.
0
 
anserisAuthor Commented:
Hi Monky42,

- Using single quotes as delimiters gets the error message "The system cannot find the file specified", even with an expression which works using double quotes.

- \r\n made no difference.

Please stand by, I'm testing the multiple matching. It inspired a bit of research which has turned up some possibilities.

Thanks
anseris
0
 
ahoffmannCommented:
I guess it is possible to do it with (s)sed using it's h H g G x and n commands; but I'd use (g)awk instead as it is more simple to do and more readable for humans:)
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
anserisAuthor Commented:
I got this far with awk:

>awk "/NEXT\"\"\"$|\"\"\"$/ { print $0 } " inputfile
awk: /NEXT"""$
awk:  ^ unterminated regexp
The network path was not found.

It looks as though I can't escape the " on the left side of |.

Still trying sed, will try the commands you mention.
0
 
ahoffmannCommented:
> .. create a batch file to run under Windows XP ..
> .. data file end in one of these three strings:
> """ (three double quotes)

that's a challange for crappy systems ;-)

You need to write your script (awk, sed, whatever) in a file and then use the tool (awk, sed, ...) with a proper option to read its command from that file. Something (awk) like:

/"""$/{next}
/TEMPORARY\.ID"/{next}
/NEXT\*\*\*"$/{next}
{ print "bad line ["NR"]: "$0 }
0
 
anserisAuthor Commented:
Apologies for the length of time taken to close this. Other more urgent tasks have prevented me continuing working on this problem. I tried splitting the points earlier, but somehow that feature seemed absent for a while.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now