Solved

batch file with sed - escape character, trailing newlines

Posted on 2008-10-11
6
2,309 Views
Last Modified: 2012-05-06
I'm using ssed (super-sed) version 3.59, based on GNU sed version 3.02.80. sed newbie here, so be prepared for horrors below.

I need to create a batch file to run under Windows XP which will repair broken lines in a text data file. Valid lines in the data file end in one of these three strings:
""" (three double quotes)
TEMPORARY.ID"
NEXT***"
... so I need a ssed command which will find any newline NOT preceded by one of the above strings, and remove the newline.

The naive best I've been able to do so far is:
ssed -R "(?<!\"\"\"/TEMPORARY.ID\"/NEXT\*\*\*\"\n/d" oldfile > newfile

Apart from the fact that there's got to be a way of escaping all characters in a group, there seem to be two problems here.
(1) The escape character isn't working ahead of " (it works ahead of *). The command seems to be being interpreted as finishing at the first ".
(2) It doesn't find the newline character. I found a reference to sed removing trailing newlines before doing pattern matching, but I can make no sense of the example given to demonstrate how one line should be joined to the next.

anseris
0
Comment
Question by:anseris
  • 3
  • 2
6 Comments
 
LVL 2

Accepted Solution

by:
Monky42 earned 250 total points
ID: 22694661
Hello, I am no sed expert, but I have some experience with regular expressions. Here are some hints that might be worth trying:
- Your regular expression is delimited by "..." try '...' (single quotes) instead. This might be the reason for your trouble with \"
- To match a newline it might be necessary to enable "multiline matching" for the regular expression. Some regexp engines only match single lines by default. Quote: "In Perl, you do this by adding an m after the regex code, like this: m/^regex$/m;."
- When you are working in a windows environment the lines might end with \r\n instead of the unix style plain \n. Add \r? (an optional linefeed) to your expression.

Good luck.
0
 

Author Comment

by:anseris
ID: 22696946
Hi Monky42,

- Using single quotes as delimiters gets the error message "The system cannot find the file specified", even with an expression which works using double quotes.

- \r\n made no difference.

Please stand by, I'm testing the multiple matching. It inspired a bit of research which has turned up some possibilities.

Thanks
anseris
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 22698338
I guess it is possible to do it with (s)sed using it's h H g G x and n commands; but I'd use (g)awk instead as it is more simple to do and more readable for humans:)
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:anseris
ID: 22703360
I got this far with awk:

>awk "/NEXT\"\"\"$|\"\"\"$/ { print $0 } " inputfile
awk: /NEXT"""$
awk:  ^ unterminated regexp
The network path was not found.

It looks as though I can't escape the " on the left side of |.

Still trying sed, will try the commands you mention.
0
 
LVL 51

Assisted Solution

by:ahoffmann
ahoffmann earned 250 total points
ID: 22706034
> .. create a batch file to run under Windows XP ..
> .. data file end in one of these three strings:
> """ (three double quotes)

that's a challange for crappy systems ;-)

You need to write your script (awk, sed, whatever) in a file and then use the tool (awk, sed, ...) with a proper option to read its command from that file. Something (awk) like:

/"""$/{next}
/TEMPORARY\.ID"/{next}
/NEXT\*\*\*"$/{next}
{ print "bad line ["NR"]: "$0 }
0
 

Author Closing Comment

by:anseris
ID: 31581153
Apologies for the length of time taken to close this. Other more urgent tasks have prevented me continuing working on this problem. I tried splitting the points earlier, but somehow that feature seemed absent for a while.
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When you receive another warning that your shared drive is almost full and you have asked your users to clean out old files again and again, here is a single command that may help. This command will place all the files that have not been used rec…
If like me you are one who spends a lot of time working and scripting with cmd.exe, sometimes it is handy to be able to quickly view a calendar for a given month and year. This script will quickly do just that!  Save the code posted below to a .bat …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

832 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question