asked on

Find the NULL values lines in a file

We have an input data file. We would like to find the lines (along with the data) which carry NULL values in it.

We couldn't directly grep the NULL values, hence used the following command. But not sure how to find the line numbers of the NULL value data in the file.

od -b file | grep ' 000'

Any help in this regard in highly appreciated.

ghostdog74

how does your file look like? attach a sample if possible

joules17

Hi,

grep -n
should give you the line number of the matching string

naga1979

ASKER

grep -n won't work here as the std output was already an octal dump.

Here is the sample data line, but not the NULL value line.
02293D 1307755910001178480010000 0000000000000CDM831000080397483100008039812009139 2009138 2009109 2009112 00053911MTHLYSVCR+000000000000000+000000000-00000022678+00000000000004324858004+001+0000011F0008T0355C0217 +000000000000000 +2000103257518 0125049034000 +00002104 MIS Port Service Discount for Flex-Burstable T-3 +0000+0000+00000+00000000000+00000000000+00000000000 2503143 ZD 1 +0000000000 +000000000+000000000 WN00000003SERV 999SDISC O 00048969 R02XXCXC -000000000000000000 NUBLR OCB+00000001US N N 2503143 200440 000000000000 000 000 000 00000000 0000000000000 +0000000000000 0000000 01PSALTXST+07000STAT-0000000000000000+0000000000000R+0000000000000-0000002267800+0000000000000 +0000000000000+00000+0000000000000000

JIEXA

od -tx1 | tr ' ' '\n' | awk '
BEGIN{n=1}
($1==0x0a){n++} # count lines by yourself
($1==0x00){print "zero at line",n}'

JIEXA

Probably, the fixed version is better:

od -tx1 | awk '{$1="";print} | tr ' ' '\n' | awk '
BEGIN{n=1}
(0+$1==0x0a){n++} # count lines by yourself
(0+$1==0x00){print "zero at line",n}'

The first awk instance is for filtering out the addresses/offsets.

naga1979

ASKER

JIEXA - the command syntax error and when i tried correcting syntax error, the required result is not achieved.

Pls can you correct it or explain more what we are trying to do ....

thanks

JIEXA

Here is the fixed code.
And the explanations:
1. first "od" outputs haxadecimal bytes prepended by address for each 32 bytes
2. the 2 tr's replace spaces by new lines and drop hexadecimal letters to lowercase
3. the awk part counts the lines (the /^0a$/ case) and checks the zero bytes (the /^00$/ case)

od -tx1 YOURFILE | tr ' ' '\n' | tr '[A-Z]' '[a-z]' | awk 'BEGIN{n=1}/^0a$/{n++}/^00$/{print "zero at line",n}'

Open in new window

naga1979

ASKER

JIEXA - That worked good. I can get the line number now, pls can you help to find the position with in the line where null occurs?

Thanks

JIEXA

Well, I suppose the awk part should be different.
We need to count the bytes after newline, and not offsets (i.e. the strings of length 2). And this counting should be accurate: it matches also newline and zero bytes.

od -tx1 YOURFILE | tr ' ' '\n' | tr '[A-Z]' '[a-z]' | awk 'BEGIN{n=1;col=0;}(length($1)==2){col++}/^0a$/{n++;col=0}/^00$/{print "zero at line",n,"column",col}'

Open in new window

naga1979

ASKER

Thanks JIEXA.

In my file, the NULL occurs on line 44315 at column 899. You command gives
"zero at line 44315 column 771"

Line number is correct, but there an offset on the column.

I think it's because tr ' ' '\n' . Any idea to over come that pls?

thanks.

JIEXA

The "tr ' ' '\n'" is done for hexadecimals, so it should not be a problem. I've tested it now with an example.
Can you attach zip file of first 45000 lines?

JIEXA

Oh, I've found a problem: it's the LC_* environment variables for "tr '[A-Z]' '[a-z]'" command.
Here is the fixed command.

od -tx1 YOURFILE | tr ' ' '\n' | tr '[:upper:]' '[:lower:]' | awk 'BEGIN{n=1;col=0;}(length($1)==2){col++}/^0a$/{n++;col=0}/^00$/{print "zero at line",n,"column",col}'

Open in new window

naga1979

ASKER

Still the same. Attached is the test file. We should have got the result "44315 at column 899"

test-null.zip

ASKER CERTIFIED SOLUTION

JIEXA

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

naga1979

ASKER

That worked perfect. Thanks for your help on this. That was a simple and effective solution. I posted another question on finding the files with a date range, if you want to take it up.