naga1979
asked on
Find the NULL values lines in a file
We have an input data file. We would like to find the lines (along with the data) which carry NULL values in it.
We couldn't directly grep the NULL values, hence used the following command. But not sure how to find the line numbers of the NULL value data in the file.
od -b file | grep ' 000'
Any help in this regard in highly appreciated.
We couldn't directly grep the NULL values, hence used the following command. But not sure how to find the line numbers of the NULL value data in the file.
od -b file | grep ' 000'
Any help in this regard in highly appreciated.
how does your file look like? attach a sample if possible
Hi,
grep -n
should give you the line number of the matching string
grep -n
should give you the line number of the matching string
ASKER
grep -n won't work here as the std output was already an octal dump.
Here is the sample data line, but not the NULL value line.
02293D 1307755910001178480010000 0000000000000CDM8310000803 9748310000 8039812009 139 2009138 2009109 2009112 00053911MTHLYSVCR+00000000 0000000+00 0000000-00 000022678+ 0000000000 0004324858 004+001+00 00011F0008 T0355C0217 +000000000000000 +2000103257518 0125049034000 +00002104 MIS Port Service Discount for Flex-Burstable T-3 +0000+0000+00000+000000000 00+0000000 0000+00000 000000 2503143 ZD 1 +0000000000 +000000000+000000000 WN00000003SERV 999SDISC O 00048969 R02XXCXC -000000000000000000 NUBLR OCB+00000001US N N 2503143 200440 000000000000 000 000 000 00000000 0000000000000 +0000000000000 0000000 01PSALTXST+07000STAT-00000 0000000000 0+00000000 00000R+000 0000000000 -000000226 7800+00000 00000000 +0000000000000+00000+00000 0000000000 0
Here is the sample data line, but not the NULL value line.
02293D 1307755910001178480010000 0000000000000CDM8310000803
od -tx1 | tr ' ' '\n' | awk '
BEGIN{n=1}
($1==0x0a){n++} # count lines by yourself
($1==0x00){print "zero at line",n}'
BEGIN{n=1}
($1==0x0a){n++} # count lines by yourself
($1==0x00){print "zero at line",n}'
Probably, the fixed version is better:
od -tx1 | awk '{$1="";print} | tr ' ' '\n' | awk '
BEGIN{n=1}
(0+$1==0x0a){n++} # count lines by yourself
(0+$1==0x00){print "zero at line",n}'
The first awk instance is for filtering out the addresses/offsets.
od -tx1 | awk '{$1="";print} | tr ' ' '\n' | awk '
BEGIN{n=1}
(0+$1==0x0a){n++} # count lines by yourself
(0+$1==0x00){print "zero at line",n}'
The first awk instance is for filtering out the addresses/offsets.
ASKER
JIEXA - the command syntax error and when i tried correcting syntax error, the required result is not achieved.
Pls can you correct it or explain more what we are trying to do ....
thanks
Pls can you correct it or explain more what we are trying to do ....
thanks
Here is the fixed code.
And the explanations:
1. first "od" outputs haxadecimal bytes prepended by address for each 32 bytes
2. the 2 tr's replace spaces by new lines and drop hexadecimal letters to lowercase
3. the awk part counts the lines (the /^0a$/ case) and checks the zero bytes (the /^00$/ case)
And the explanations:
1. first "od" outputs haxadecimal bytes prepended by address for each 32 bytes
2. the 2 tr's replace spaces by new lines and drop hexadecimal letters to lowercase
3. the awk part counts the lines (the /^0a$/ case) and checks the zero bytes (the /^00$/ case)
od -tx1 YOURFILE | tr ' ' '\n' | tr '[A-Z]' '[a-z]' | awk 'BEGIN{n=1}/^0a$/{n++}/^00$/{print "zero at line",n}'
ASKER
JIEXA - That worked good. I can get the line number now, pls can you help to find the position with in the line where null occurs?
Thanks
Thanks
Well, I suppose the awk part should be different.
We need to count the bytes after newline, and not offsets (i.e. the strings of length 2). And this counting should be accurate: it matches also newline and zero bytes.
We need to count the bytes after newline, and not offsets (i.e. the strings of length 2). And this counting should be accurate: it matches also newline and zero bytes.
od -tx1 YOURFILE | tr ' ' '\n' | tr '[A-Z]' '[a-z]' | awk 'BEGIN{n=1;col=0;}(length($1)==2){col++}/^0a$/{n++;col=0}/^00$/{print "zero at line",n,"column",col}'
ASKER
Thanks JIEXA.
In my file, the NULL occurs on line 44315 at column 899. You command gives
"zero at line 44315 column 771"
Line number is correct, but there an offset on the column.
I think it's because tr ' ' '\n' . Any idea to over come that pls?
thanks.
In my file, the NULL occurs on line 44315 at column 899. You command gives
"zero at line 44315 column 771"
Line number is correct, but there an offset on the column.
I think it's because tr ' ' '\n' . Any idea to over come that pls?
thanks.
The "tr ' ' '\n'" is done for hexadecimals, so it should not be a problem. I've tested it now with an example.
Can you attach zip file of first 45000 lines?
Can you attach zip file of first 45000 lines?
Oh, I've found a problem: it's the LC_* environment variables for "tr '[A-Z]' '[a-z]'" command.
Here is the fixed command.
Here is the fixed command.
od -tx1 YOURFILE | tr ' ' '\n' | tr '[:upper:]' '[:lower:]' | awk 'BEGIN{n=1;col=0;}(length($1)==2){col++}/^0a$/{n++;col=0}/^00$/{print "zero at line",n,"column",col}'
ASKER
Still the same. Attached is the test file. We should have got the result "44315 at column 899"
test-null.zip
test-null.zip
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
That worked perfect. Thanks for your help on this. That was a simple and effective solution. I posted another question on finding the files with a date range, if you want to take it up.