Link to home
Start Free TrialLog in
Avatar of naga1979
naga1979

asked on

Find the NULL values lines in a file

We have an input data file. We would like to find the lines (along with the data) which carry NULL values in it.

We couldn't directly grep the NULL values, hence used the following command. But not sure how to find the line numbers of the NULL value data in the file.

od -b file | grep ' 000'

Any help in this regard in highly appreciated.
Avatar of ghostdog74
ghostdog74

how does your file look like? attach a sample if possible
Hi,

grep -n
should give you the line number of the matching string
Avatar of naga1979

ASKER

grep -n won't work here as the std output was already an octal dump.

Here is the sample data line, but not the NULL value line.
02293D    1307755910001178480010000          0000000000000CDM831000080397483100008039812009139   2009138   2009109   2009112   00053911MTHLYSVCR+000000000000000+000000000-00000022678+00000000000004324858004+001+0000011F0008T0355C0217                            +000000000000000                                        +2000103257518                               0125049034000                                                 +00002104                                                                                     MIS Port Service Discount for Flex-Burstable T-3  +0000+0000+00000+00000000000+00000000000+00000000000  2503143           ZD    1          +0000000000                                                                  +000000000+000000000                        WN00000003SERV 999SDISC  O                       00048969                                                                R02XXCXC  -000000000000000000  NUBLR OCB+00000001US   N N                                                                 2503143         200440                                              000000000000                                                                  000                                                  000                                                  000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             00000000                                                                                                                       0000000000000 +0000000000000                                                             0000000      01PSALTXST+07000STAT-0000000000000000+0000000000000R+0000000000000-0000002267800+0000000000000 +0000000000000+00000+0000000000000000  
od -tx1 | tr ' ' '\n' | awk '
BEGIN{n=1}
($1==0x0a){n++} # count lines by yourself
($1==0x00){print "zero at line",n}'
Probably, the fixed version is better:

od -tx1 | awk '{$1="";print} | tr ' ' '\n' | awk '
BEGIN{n=1}
(0+$1==0x0a){n++} # count lines by yourself
(0+$1==0x00){print "zero at line",n}'


The first awk instance is for filtering out the addresses/offsets.
JIEXA - the command syntax error and when i tried correcting syntax error, the required result is not achieved.

Pls can you correct it or explain more what we are trying to do ....

thanks
Here is the fixed code.
And the explanations:
1. first "od" outputs haxadecimal bytes prepended by address for each 32 bytes
2. the 2 tr's replace spaces by new lines and drop hexadecimal letters to lowercase
3. the awk part counts the lines (the /^0a$/ case) and checks the zero bytes (the /^00$/ case)
od -tx1 YOURFILE | tr ' ' '\n' | tr '[A-Z]' '[a-z]' | awk 'BEGIN{n=1}/^0a$/{n++}/^00$/{print "zero at line",n}'

Open in new window

JIEXA - That worked good. I can get the line number now, pls can you help to find the position with in the line where null occurs?

Thanks
Well, I suppose the awk part should be different.
We need to count the bytes after newline, and not offsets (i.e. the strings of length 2). And this counting should be accurate: it matches also newline and zero bytes.
od -tx1 YOURFILE | tr ' ' '\n' | tr '[A-Z]' '[a-z]' | awk 'BEGIN{n=1;col=0;}(length($1)==2){col++}/^0a$/{n++;col=0}/^00$/{print "zero at line",n,"column",col}'

Open in new window

Thanks JIEXA.

In my file, the NULL occurs on line 44315 at column 899. You command gives
"zero at line 44315 column 771"

Line number is correct, but there an offset on the column.

I think it's because tr ' ' '\n' . Any idea to over come that pls?

thanks.
The "tr ' ' '\n'" is done for hexadecimals, so it should not be a problem. I've tested it now with an example.
Can you attach zip file of first 45000 lines?
Oh, I've found a problem: it's the LC_* environment variables for "tr '[A-Z]' '[a-z]'" command.
Here is the fixed command.
od -tx1 YOURFILE | tr ' ' '\n' | tr '[:upper:]' '[:lower:]' | awk 'BEGIN{n=1;col=0;}(length($1)==2){col++}/^0a$/{n++;col=0}/^00$/{print "zero at line",n,"column",col}'

Open in new window

Still the same. Attached is the test file. We should have got the result "44315 at column 899"


test-null.zip
ASKER CERTIFIED SOLUTION
Avatar of JIEXA
JIEXA
Flag of Israel image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That worked perfect. Thanks for your help on this. That was a simple and effective solution. I posted another question on finding the files with a date range, if you want to take it up.