Removing the unwanted characters and CR-LF from the file in UNIX

Hello Experts,

I have data in a file like below

XXXX|YYYY|ZSCUST|06-May-2013|31-Dec-9999|XXXXX||bp15lf9 organisation to maxwell$% tec. -- > OK



cmnt .end.
XXXXX|YYYY|BUR021|02-Apr-2008|31-Dec-9999|XXXX|N| -- > OK

I have the first line which is OK and then there are some extra blank lines and then some characters in the line after the first line. I also need to remove any special characters in the last field.

I need to remove those extra line and get the output like below:

XXXX|YYYY|ZSCUST|06-May-2013|31-Dec-9999|XXXXX||bp15lf9 organisation to maxwell tec
XXXXX|YYYY|BUR021|02-Apr-2008|31-Dec-9999|XXXX|N|

Any thoughts on how to achieve this is highly appreciated.
r4ramkiAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

woolmilkporcCommented:
Not sure if I fully understood all implications, but please try this:

awk 'BEGIN {FS="|"; OFS="|"} /\|.*\|/ {gsub("[^a-zA-Z0-9 ]","",$NF); print}' inputfile

To write the output to "outputfile":

awk 'BEGIN {FS="|"; OFS="|"} /\|.*/|/ {gsub("[^a-zA-Z0-9 ]","",$NF); print}' inputfile > outputfile
0
arnoldCommented:
can you post the actual characters
cat -v file | more
are they in the form of ^M
you can pass the file through awk
cat filename | sed -e 's/\r//g'| awk ' (length($0)>0 { print $0 }'
The above will strip out empty line feeds (^M \r) and empty lines.
0
gheistCommented:
cat file | dos2unix | iconv -f ASCII -t ANSI > result
1

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Daniel McAllisterPresident, IT4SOHO, LLCCommented:
A few comments -- based on my course content for beginning UNIX:

 1) The best Unix tools are filters (read from STDIN & write to STDOUT)
 2) They also are designed to do ONE task and do it well
 3) The design allows you to PIPELINE filters together to do complex things

So here's how I read your problem:
 A) You need to change EOL from dos (CR/LF) to UNIX (LF). The best tool for that is "dos2unix"
 B) You need to remove lines (empty or not) that don't don't meet a certain criteria. The best tool for that is "grep"
 C) You need to remove special characters. There are multiple options for that, but if they are the same characters, perhaps "sed" would suffice, but I would probably go with "tr".

So, in the general form, a command line might look like:
dos2unix <infile> | grep "^XXXX\|YYYY" | tr -cd '\11\12\40-\176'  > <outfile>

As noted:
 - dos2unix removes the cr/lf from the end of each line, reading from <infile> and outputting to the pipe
 - grep reads from the pipe and prints only the lines that:
    + START WITH (the ^ symbol) XXXX
    + followed by a PIPE (which has to be escaped with the \ in front)
    + followed by YYYY
    + you could make the pattern longer, but this seemed to match enough from what you provided
    + the output goes to the next PIPE
 - tr reads from the second PIPE and deletes anything that is NOT the ASCII characters #11 (octal) which is TAB. #12 (LF), and all characters from #40 (space) through #176 (tilde) -- those are the printable characters.

Note: If you're paying close attention, you could note that you could use tr to also remove the pesky CR characters too, thus eliminating the need for "unix2dos", but there are DOZENS of ways to do these kinds of things, with different tools, in different orders, and with different options. This is just ONE way that should be sufficient to not only get the job done, but also be readable (and thus, maintainable) for the future.

I hope this helps.

Dan
IT4SOHO
0
CSIA ANCommented:
If AIX OS, simply install tofrodos (http://www.oss4aix.org/download/RPMS/tofrodos/tofrodos-1.7.9-1.aix5.1.ppc.rpm)
(aix):[root] /var/hacmp/log -> rpm -Uvh /tmp/tofrodos-1.7.9-1.aix5.1.ppc.rpm
tofrodos                    ##################################################

Open in new window


it's the same as dos2unix for linux..

(aix):[root] /tmp -> /opt/freeware/bin/fromdos -h
tofrodos Ver 1.7.9 Converts text files between DOS and Unix formats.
Copyright 1996-2011 Christopher Heng. All rights reserved.
Usage: fromdos [options] [file...]
-a      Always convert (DOS to Unix: kill all CRs;
        Unix to DOS: convert all LFs to CRLFs)
-b      Make backup of original file (.bak).
-d      Convert DOS to Unix.
-e      Abort processing files on error in any file.
-f      Force: convert even if file is not writeable.
-h      Display help on usage and quit.
-l file Log most errors and verbose messages to <file>
-o      Overwrite original file (no backup).
-p      Preserve file owner and time.
-u      Convert Unix to DOS.
-v      Verbose.
-V      Show version and quit.

Open in new window

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.