Solved

batch file with sed - escape character, trailing newlines

Posted on 2008-10-11
6
2,327 Views
Last Modified: 2012-05-06
I'm using ssed (super-sed) version 3.59, based on GNU sed version 3.02.80. sed newbie here, so be prepared for horrors below.

I need to create a batch file to run under Windows XP which will repair broken lines in a text data file. Valid lines in the data file end in one of these three strings:
""" (three double quotes)
TEMPORARY.ID"
NEXT***"
... so I need a ssed command which will find any newline NOT preceded by one of the above strings, and remove the newline.

The naive best I've been able to do so far is:
ssed -R "(?<!\"\"\"/TEMPORARY.ID\"/NEXT\*\*\*\"\n/d" oldfile > newfile

Apart from the fact that there's got to be a way of escaping all characters in a group, there seem to be two problems here.
(1) The escape character isn't working ahead of " (it works ahead of *). The command seems to be being interpreted as finishing at the first ".
(2) It doesn't find the newline character. I found a reference to sed removing trailing newlines before doing pattern matching, but I can make no sense of the example given to demonstrate how one line should be joined to the next.

anseris
0
Comment
Question by:anseris
  • 3
  • 2
6 Comments
 
LVL 2

Accepted Solution

by:
Monky42 earned 250 total points
ID: 22694661
Hello, I am no sed expert, but I have some experience with regular expressions. Here are some hints that might be worth trying:
- Your regular expression is delimited by "..." try '...' (single quotes) instead. This might be the reason for your trouble with \"
- To match a newline it might be necessary to enable "multiline matching" for the regular expression. Some regexp engines only match single lines by default. Quote: "In Perl, you do this by adding an m after the regex code, like this: m/^regex$/m;."
- When you are working in a windows environment the lines might end with \r\n instead of the unix style plain \n. Add \r? (an optional linefeed) to your expression.

Good luck.
0
 

Author Comment

by:anseris
ID: 22696946
Hi Monky42,

- Using single quotes as delimiters gets the error message "The system cannot find the file specified", even with an expression which works using double quotes.

- \r\n made no difference.

Please stand by, I'm testing the multiple matching. It inspired a bit of research which has turned up some possibilities.

Thanks
anseris
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 22698338
I guess it is possible to do it with (s)sed using it's h H g G x and n commands; but I'd use (g)awk instead as it is more simple to do and more readable for humans:)
0
MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.

 

Author Comment

by:anseris
ID: 22703360
I got this far with awk:

>awk "/NEXT\"\"\"$|\"\"\"$/ { print $0 } " inputfile
awk: /NEXT"""$
awk:  ^ unterminated regexp
The network path was not found.

It looks as though I can't escape the " on the left side of |.

Still trying sed, will try the commands you mention.
0
 
LVL 51

Assisted Solution

by:ahoffmann
ahoffmann earned 250 total points
ID: 22706034
> .. create a batch file to run under Windows XP ..
> .. data file end in one of these three strings:
> """ (three double quotes)

that's a challange for crappy systems ;-)

You need to write your script (awk, sed, whatever) in a file and then use the tool (awk, sed, ...) with a proper option to read its command from that file. Something (awk) like:

/"""$/{next}
/TEMPORARY\.ID"/{next}
/NEXT\*\*\*"$/{next}
{ print "bad line ["NR"]: "$0 }
0
 

Author Closing Comment

by:anseris
ID: 31581153
Apologies for the length of time taken to close this. Other more urgent tasks have prevented me continuing working on this problem. I tried splitting the points earlier, but somehow that feature seemed absent for a while.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

You may have already been in the need to update a whole folder stucture using a script. Robocopy does it well and even provides a list of non-updated files in a log (if asked to). Generally those files that were locked by a user or a process by the …
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question