Link to home
Start Free TrialLog in
Avatar of drewrockshard
drewrockshard

asked on

Perl Regex

I have a filehandle that reads in multiple lines of "blocks" of text.  Each block is separated with each iteration of the work TYPE ... or the end of file (eof).

There's a piece of text, however, that looks like the following:

TIFF Directory at offset 0x10eb4

I need it to match any form of this.  The hex output will change at any given time, so I need it to match anything in the sort of:

TIFF Directory at offset * (where * = the hex output)

I use this in a perl script (if statement), not as a one liner, so any help is appriciated.
Avatar of FishMonger
FishMonger
Flag of United States of America image

/(TIFF Directory at offset .+)/

The matched string will be in $1
Avatar of drewrockshard
drewrockshard

ASKER

Thanks, but this isn't exactly what I was looking for; almost though.  It matches (which is good :)), but how do I then take this information and then "remove" it from the file?  How do I match this with input and then actually tell it to delete this in the output.

All my output values are setup, I just need to know how to remove this "match" from it outputting the offset information of the regex.

Thanks.
Is this what you need?

#!/usr/bin/perl

use strict;
use warnings;

my $str = 'TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename2.TIF:bkwht:200:200:bkwht:1728:2376:
TIFF Directory at offset 0x4aac
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane';

$str =~ s/(TIFF Directory at offset .+)\n//;

#print $1, "\n\n";
print $str;

Open in new window

Actually, not exactly.  I think you know what I'm needing, just not really knowing how to implement. Let me attach my code to here, so you can see how I have everything laid out.  (see code attachement)
# open the file
open (IN, $infile) or die "Can't open $infile: $!";
open(OUT1,"> \!".$state.$status."INFO.txt") or die "Can't open \!".$state.$status."INFO.txt: $!"; 
open(OUT2,"> \!".$state.$status."INFOspdsht.txt") or die "Can't open \!".$state.$status."INFO.txt $!";

# loop through it
while (<IN>) {
	chomp;
  	# if we've reached the "record-delimiter i.e. TYPE:" or EOF
  	if (/^TYPE:/ or eof) {
    	# check if array "@line" is defined; if so, print it after
    	# appending the additional information after filetype i.e. 2nd element
    	if (@line) {
      		$line[1] .= "\n$fn:$cs:$r[0]:$r[1]:$pi:$iw[0]:$iw[1]:";
      		#my $file = File::Spec->splitpath( $fn );
      		print OUT2 $line[1],"\n"; 
      		foreach (@line) { 
      			#print $_,"\n";
      			print OUT1 $_,"\n";
      		}
      			# Important - reset the array
     			@line = ();
     			$filesize = (-s $fpath.$fn) / (1024 * 1024);
				$size_in_mb = sprintf "%.2f", $filesize;
    			print OUT1 $size_in_mb." MB\n";
    		}		
  		}
  		else {
    		# check for filename, compression scheme, resolution, photometric interpretation,
    		# image width and image length, and set the variables $fn, $cs, $r, $pi, $iw, $il
    		#if (/^\s*(.+)\.tif:\s*$/i) {
			#	$fn = "$1.tif";
			#	#next;
			#}
			if (/^(..[^:]*?\\)([^:\\]*\.tiff?):/im) {
				#print "Path: $1\n";
				#print "File: $2\n";
				$fn = $2;
				$fpath = $1;
				next;
			}
	    	elsif (/^\s*Compression Scheme:\s*(.*?)$/) {
				$cs = $1;
				if ($cs eq "LZW") {
					$cs = "color";
				}
				elsif ($cs eq "Lempel-Ziv & Welch encoding") {
					$cs = "color";
				}
				elsif ($cs eq "CCITT Group 4") {
					$cs = "bkwht";
				}
				elsif ($cs eq "CCITT Group 4 facsimile encoding") {
					$cs = "bkwht";
				}
				elsif ($cs eq "none") {
					$cs = "none";
				}
				next;
			}
    		elsif (/^\s*Resolution:\s*(.*?)$/) {
				$r = $1;
				@r = split(/, /, $r);
				$r[0] =~ s/\D//g;
				$r[1] =~ s/\D//g;		
				next;
			}
	    	elsif (/^\s*Photometric Interpretation:\s*(.*?)$/) {
				$pi = $1;
				if ($pi eq "RGB color") {
					$pi = "color";
				}
				elsif ($pi eq "min-is-white") {
					$pi = "bkwht";
				}
				elsif ($pi eq "min-is-black") {
					$pi = "bkwht";
				}
				elsif ($pi eq "palette color (RGB from colormap)") {
					$pi = "color";
				}
				next;
			}
    		elsif (/^\s*Image Width:\s*(.*?)\s*$/) {
				$iw = $1;
				$iw = $1;
		        @iw = split(/(?<=\d)\D/,$iw);
		        $iw[0] =~ s/\D//g;
		        $iw[1] =~ s/\D//g;
		        next;
			}
		} ## Closing else
  		# add the current line to the array @line
  		push @line, $_;
  		
	} ## closing while loop
# clean up after we're done
close (OUT1) or die "Can't close out1: $!"; 
close (OUT2) or die "Can't close out2: $!"; 
close (IN) or die "Can't close $infile: $!";

Open in new window

What I was hinting at is that you should be reading in the file in record mode instead of line mode.

I need to work on a project of my own, but will give a more complete example in a little while, but for now look over this:



#!/usr/bin/perl

use strict;
use warnings;
use File::Spec;
use Data::Dumper;

$/ = "TYPE:\n";

while (<DATA>) {
    chomp;
    next if $_ eq '';
    s/(TIFF Directory at offset .+)\n//;
    my ($line1, $other) = split(/\n/, $_, 2);
    print Dumper ($line1, $other);
}


__DATA__
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename.TIF:bkwht:200:200:bkwht:1728:2376:
TIFF Directory at offset 0x10eb4
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename2.TIF:bkwht:200:200:bkwht:1728:2376:
TIFF Directory at offset 0x4aac
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename3.TIF:None:31:31:color:124:124:
TIFF Directory at offset 0x8
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Software: "¼"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename4.TIFF:color:300:300:color:1419:1001:
TIFF Directory at offset 0x8
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes

Open in new window

Please do, I don't understand this completely, since I'm new to perl, but come from an intermediate PHP background, and advanced Shell programming.  Does this mean I'll need to rewrite most of my code that I sent you?  Please let me know.

Regards,
Drew
Yes, you will need to rewrite a good portion of that code.  But in the end, it also means that the code will be cleaner, more efficient, and easier to read/maintain.
Sounds good, let me know; I'm willing to learn.
If you just want to remove the phrase " at offset 0x10eb4" from that line, then:

s/ at offset 0x[0-9a-fA-F]+$//;

should do it.  Do it right before the push $_.  (It will make the change to $_.)
I'm wanting to remove "TIFF Directory at offset" and the hex that follows from muliple instances of the match.
To remove that whole phrase but not the line itself?

s/TIFF Directory at offset 0x[0-9a-fA-F]+//g;

To ignore the whole line in your program, in place of simply push $_:

push $_ unless /TIFF Directory at offset 0x[0-9a-fA-F]+/;

Your workaround would remove it, however, it places a "space", how would you remove the line and its "space", so that it basically doesn't exist?

For example, output now looks like:

TYPE:

filename.TIF:bkwht:200:200:bkwht:1728:2376:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.07 MB
TYPE:

filename2.TIF:bkwht:200:200:bkwht:1728:2376:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.02 MB

Where I want it to be:

TYPE:
filename.TIF:bkwht:200:200:bkwht:1728:2376:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.07 MB
TYPE:
filename2.TIF:bkwht:200:200:bkwht:1728:2376:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.02 MB
Is that with my first version (with s/ )?  Do it the second way then where you don't push if it matches.  i.e.

push $_ unless /TIFF Directory at offset 0x[0-9a-fA-F]+/;
Or if you did both things because I didn't make myself clear, take out the first part.  If you made the first change, the second change won't take effect because the unless condition will never match.  I meant to show two different ways of doing it because I didn't know if you wanted the blank line or if you wanted the whole line deleted.
Well, I've tried both, problem is, it doesn't work 100%.  Here's what happens:

RAW OUTPUT (before the regexs):

TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename.TIF:
TIFF Directory at offset 0x10eb4
  Image Width: 1728 Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane

MAIN OUTPUT:

TYPE:
  Bits/Sample: 1
filename.TIF:bkwht:200:200:bkwht:1728:2376:
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.07 MB

Its shifting data up, which I'm not wanting that :(.  Do you know why its doing this?
No I don't know why.  What does your program look like at this point?
See code snippet.
# open the file
open (IN, $infile) or die "Can't open $infile: $!";
open(OUT1,"> \!".$state.$status."INFO.txt") or die "Can't open \!".$state.$status."INFO.txt: $!"; 
open(OUT2,"> \!".$state.$status."INFOspdsht.txt") or die "Can't open \!".$state.$status."INFO.txt $!";

# loop through it
while (<IN>) {
	chomp;
  	# if we've reached the "record-delimiter i.e. TYPE:" or EOF
  	if (/^TYPE:/ or eof) {
    	# check if array "@line" is defined; if so, print it after
    	# appending the additional information after filetype i.e. 2nd element
    	if (@line) {
      		$line[1] .= "\n$fn:$cs:$r[0]:$r[1]:$pi:$iw[0]:$iw[1]:";
      		#my $file = File::Spec->splitpath( $fn );
      		print OUT2 $line[1],"\n"; 
      		foreach (@line) { 
      			#print $_,"\n";
      			print OUT1 $_,"\n";
      		}
      			# Important - reset the array
     			@line = ();
     			$filesize = (-s $fpath.$fn) / (1024 * 1024);
				$size_in_mb = sprintf "%.2f", $filesize;
    			print OUT1 $size_in_mb." MB\n";
    		}		
  		}
  		else {
    		# check for filename, compression scheme, resolution, photometric interpretation,
    		# image width and image length, and set the variables $fn, $cs, $r, $pi, $iw, $il
    		#if (/^\s*(.+)\.tif:\s*$/i) {
			#	$fn = "$1.tif";
			#	#next;
			#}
			if (/^(..[^:]*?\\)([^:\\]*\.tiff?):/im) {
				#print "Path: $1\n";
				#print "File: $2\n";
				$fn = $2;
				$fpath = $1;
				next;
			}
			##elsif (s/TIFF Directory at offset 0x[0-9a-fA-F]+//g) {
			##	my $oset = $1;
			##}			
	    	elsif (/^\s*Compression Scheme:\s*(.*?)$/) {
				$cs = $1;
				if ($cs eq "LZW") {
					$cs = "color";
				}
				elsif ($cs eq "Lempel-Ziv & Welch encoding") {
					$cs = "color";
				}
				elsif ($cs eq "CCITT Group 4") {
					$cs = "bkwht";
				}
				elsif ($cs eq "CCITT Group 4 facsimile encoding") {
					$cs = "bkwht";
				}
				elsif ($cs eq "none") {
					$cs = "none";
				}
				next;
			}
    		elsif (/^\s*Resolution:\s*(.*?)$/) {
				$r = $1;
				@r = split(/, /, $r);
				$r[0] =~ s/\D//g;
				$r[1] =~ s/\D//g;		
				next;
			}
	    	elsif (/^\s*Photometric Interpretation:\s*(.*?)$/) {
				$pi = $1;
				if ($pi eq "RGB color") {
					$pi = "color";
				}
				elsif ($pi eq "min-is-white") {
					$pi = "bkwht";
				}
				elsif ($pi eq "min-is-black") {
					$pi = "bkwht";
				}
				elsif ($pi eq "palette color (RGB from colormap)") {
					$pi = "color";
				}
				next;
			}
    		elsif (/^\s*Image Width:\s*(.*?)\s*$/) {
				$iw = $1;
				$iw = $1;
		        @iw = split(/(?<=\d)\D/,$iw);
		        $iw[0] =~ s/\D//g;
		        $iw[1] =~ s/\D//g;
		        next;
			}
		} ## Closing else
  		# add the current line to the array @line
  		push @line, $_ unless /TIFF Directory at offset 0x[0-9a-fA-F]+/; #/TIFF Directory at offset 0x[0-9a-fA-F]+/; #  unless ($_ eq s/\n+/\n/g)
  		
	} ## closing while loop
# clean up after we're done
close (OUT1) or die "Can't close out1: $!"; 
close (OUT2) or die "Can't close out2: $!"; 
close (IN) or die "Can't close $infile: $!";

Open in new window

The dropped lines are where you have "next" as the last thing in processing that condition, so they never get pushed into @line.  Is that what you mean?
Your indentation is not right in a couple places, it might help us figure it out if you cleaned that up.  e.g. from "#Important - reset the array" and down from there should be exdented.  Also where you have a lot of commented-out code it changes the logic of the indenting.
Nope, thats not what I mean.  I know what the "next" parts are doing, and why they "disappear" from the "raw" output.  I'm talking about how, when I remove all the spaces from the output, it moves the next section that is below "filename.TIF:bkwht:200:200:bkwht:1728:2376:" and moves it above it.  Take a look at the raw output, and compare with the main output.  See how "Bits/Sample: 1" is now above  "filename.TIF:bkwht:200:200:bkwht:1728:2376:"?  If I were to remove that, then the next entry below that would move up, for example, if  "Bits/Sample: 1" were removed, then "FillOrder: lsb-to-msb" would go above "filename.TIF:bkwht:200:200:bkwht:1728:2376:".  Not sure why.  I basically want it to say "TYPE:" and the be followed by the section "filename.TIF:bkwht:200:200:bkwht:1728:2376:", and THEN have all the rest of the output.

RAW OUTPUT:

TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename.TIF:
TIFF Directory at offset 0x10eb4
  Image Width: 1728 Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane

MAIN OUTPUT:

TYPE:
  Bits/Sample: 1
filename.TIF:bkwht:200:200:bkwht:1728:2376:
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.07 MB
Where are "raw output" and "main output" coming from?  Is that OUT1 and OUT2?
I think maybe $line[1] should be $line[0].
"raw output" = input file

"main output" = what is outputted to the screen (my final output that I'm left with).
Changing from $line[0] to $line[1] fixed some issues, but caused another issue.  Now above some output that is:

filename.TIF:bkwht:200:200:bkwht:1728:2376:

I also have the full filename path :(  so it looks like this now:

C:\Users\Drew\Desktop\Dad\tiffs\filename.TIF:
filename.TIF:bkwht:200:200:bkwht:1728:2376:

How can this be removed?

Attached is the current version of the script.


# open the file
open (IN, $infile) or die "Can't open $infile: $!";
open(OUT1,"> \!".$state.$status."INFO.txt") or die "Can't open \!".$state.$status."INFO.txt: $!"; 
open(OUT2,"> \!".$state.$status."INFOspdsht.txt") or die "Can't open \!".$state.$status."INFO.txt $!";

# loop through it
while (<IN>) {
	chomp;
  	# if we've reached the "record-delimiter i.e. TYPE:" or EOF
  	if (/^TYPE:/ or eof) {
    	# check if array "@line" is defined; if so, print it after
    	# appending the additional information after filetype i.e. 2nd element
    	if (@line) {
      		$line[0] .= "\n$fn:$cs:$r[0]:$r[1]:$pi:$iw[0]:$iw[1]:";
      		#my $file = File::Spec->splitpath( $fn );
      		print OUT2 $line[0],"\n"; 
      		foreach (@line) { 
      			#print $_,"\n";
      			print OUT1 $_,"\n";
      		}
      			# Important - reset the array
     			@line = ();
     			$filesize = (-s $fpath.$fn) / (1024 * 1024);
				$size_in_mb = sprintf "%.2f", $filesize;
    			print OUT1 $size_in_mb." MB\n";
    		}		
  		}
  		else {
    		# check for filename, compression scheme, resolution, photometric interpretation,
    		# image width and image length, and set the variables $fn, $cs, $r, $pi, $iw, $il
    		#if (/^\s*(.+)\.tif:\s*$/i) {
			#	$fn = "$1.tif";
			#	#next;
			#}	
			if (/^(..[^:]*?\\)([^:\\]*\.tiff?):/im) {
				#print "Path: $1\n";
				#print "File: $2\n";
				$fn = $2;
				$fpath = $1;
				#next;
			}	
	    	elsif (/^\s*Compression Scheme:\s*(.*?)$/) {
				$cs = $1;
				if ($cs eq "LZW") {
					$cs = "color";
				}
				elsif ($cs eq "Lempel-Ziv & Welch encoding") {
					$cs = "color";
				}
				elsif ($cs eq "CCITT Group 4") {
					$cs = "bkwht";
				}
				elsif ($cs eq "CCITT Group 4 facsimile encoding") {
					$cs = "bkwht";
				}
				elsif ($cs eq "none") {
					$cs = "none";
				}
				next;
			}
    		elsif (/^\s*Resolution:\s*(.*?)$/) {
				$r = $1;
				@r = split(/, /, $r);
				$r[0] =~ s/\D//g;
				$r[1] =~ s/\D//g;		
				next;
			}
	    	elsif (/^\s*Photometric Interpretation:\s*(.*?)$/) {
				$pi = $1;
				if ($pi eq "RGB color") {
					$pi = "color";
				}
				elsif ($pi eq "min-is-white") {
					$pi = "bkwht";
				}
				elsif ($pi eq "min-is-black") {
					$pi = "bkwht";
				}
				elsif ($pi eq "palette color (RGB from colormap)") {
					$pi = "color";
				}
				next;
			}
    		elsif (/^\s*Image Width:\s*(.*?)\s*$/) {
				$iw = $1;
				$iw = $1;
		        @iw = split(/(?<=\d)\D/,$iw);
		        $iw[0] =~ s/\D//g;
		        $iw[1] =~ s/\D//g;
		        next;
			}
		} ## Closing else
  		# add the current line to the array @line
  		push @line, $_ unless /TIFF Directory at offset 0x[0-9a-fA-F]+/ || /TYPE:/; #/TIFF Directory at offset 0x[0-9a-fA-F]+/; #  unless ($_ eq s/\n+/\n/g)
  		
	} ## closing while loop
# clean up after we're done
close (OUT1) or die "Can't close out1: $!"; 
close (OUT2) or die "Can't close out2: $!"; 
close (IN) or die "Can't close $infile: $!";

Open in new window

FishMonger:

Let me know when you have some time to go through that code you wrote. I'll look up some documentation, however, I'm just fed up with editing my source here and there.  If it takes rewriting this from scratch in a more efficient way, then so-be-it.  Just let me know when you have some time.

Thanks,
Drew
Also, FishMonger, I'm not sure what you meant by "record mode", the only thing close to this I found was Perl's "slurping" mode, is this what you meant?

Regards,
Drew
FishMonger,

I think I'm somewhat starting to understand. So, "slurping" this data the way you are doing this, allows me to basically customize any output based on the input; so I have complete control over my output.  This is nice, but actually understanding this is another thing.  I'll await your reply.
Where did the stuff go that we did before where we split that line into three parts, and I showed something like print "$1$2\n" to one file and $line[0]="$2$3" to the other?  That is what was supposed to remove the path ($1) and print the rest, while also printing just the file spec ($1$2) somewhere else.  You have some relic of that where you set $fn and $fpath, but it was dividing it into 3 parts.  I think that should go where you have Splitpath commented out (which probably didn't work because there was more than a filename on that line).
Thats old code that worked, but didn't provide good output and I coded something that worked better.  Sometimes I leave old code in, the I commit (so that I have a backup), and then I remove the comments, etc, and push to a stable branch.  So, that extra commented out stuff is just old code.  I kind of want to go with FishMonger's idea, but I'm not sure if anyone really knows what I'm wanting to do.  Maybe both of you can look at my original post that was started before I even started writing this code, that someone posted on the Unix.com forums.  I didn't know where to start, so they provided starting code for me:

http://www.unix.com/shell-programming-scripting/132365-perl-text-manipulation.html#post302404807

Let me know if we can come up with some sort of solution on this, that would do what that post states, but in a more elegant, efficient way of doing so.

Regards,
Drew
What about all those elsif blocks (lines 42 thru 92)?  The sample data you posted doesn't contain any data that will match any of those conditionals.  Is there additional data that you haven't shown, or are those elsif blocks no longer needed?
I see why I was not getting the results you wanted.  I pulled the source data from your related question here on EE, but that was a preprocessed data set and your real source data was posted the other forum.

This may need a little more tweaking, but give it a try.
#!/usr/bin/perl

use strict;
use warnings;
use File::Spec;

my %config = (
    'LZW' => 'color',
    'Lempel-Ziv & Welch encoding'       => 'color',
    'CCITT Group 4'                     => 'bkwht',
    'CCITT Group 4 facsimile encoding'  => 'bkwht',
    'none'                              => 'none',
    'RGB color'                         => 'color',
    'min-is-white'                      => 'bkwht',
    'min-is-black'                      => 'bkwht',
    'palette color (RGB from colormap)' => 'color',
    'Resolution'                        => sub {
                                            my @r = split(/, /, shift);
                                            $r[0] =~ s/\D//g;
                                            $r[1] =~ s/\D//g;
                                            return @r[0,1];
    },
);

my @config = keys %config;

my $file = 'data.txt'; # set this as needed.

open my $fh, '<', $file or die "can't open <$file> for reading $!";

$/ = "TYPE:\n";
while ( my $record = <$fh> ) {
    chomp $record;
    next if $record eq '';
    $record =~ s/(TIFF Directory at offset .+)\n//;


    my ($fullpath, $data) = split(/\n/, $record, 2);
    $fullpath =~ s/:$//;

    my ($drv, $path, $file) = File::Spec->splitpath($fullpath);

    my $cs = $config{$1} if ($data =~ s/\s+Compression Scheme:\s+(.*?)\n//);
    my $pi = $config{$1} if ($data =~ s/\s+Photometric Interpretation:\s+(.*?)\n//);
    my @r = $config{'Resolution'}->($1) if ($data =~ s/\s+Resolution:\s+(.*?)\n//);
    my $w = $1 if ($data =~ s/\s+Image Width:\s+(\d+)\s*\n//);
    my $l = $1 if ($data =~ s/\s+Image Length:\s+(\S+)\s*//);

    my $filesize = (-s $fullpath) / (1024 * 1024);
    my $size_in_mb = sprintf "%.2f", $filesize;

    print $/, join(':', $file, $cs, @r, $pi, $w, $l), ":\n", $data, $size_in_mb;

}

Open in new window

I'm getting some errors.  See code snippet.
C:\Users\Drew\Desktop\Dad>compileTiffInfoNew.pl
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 52, <$fh> chunk 2.
TYPE:
filename.TIF:bkwht:200:200:bkwht::2376:
  Image Width: 1728Bits/Sample: 1  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.07 MB
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 52, <$fh> chunk 3.
TYPE:
filename2.TIF:bkwht:200:200:bkwht::2376:
  Image Width: 1728Bits/Sample: 1  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.02 MB
Use of uninitialized value $cs in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 52, <$fh> chunk 4.
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 52, <$fh> chunk 4.
TYPE:
filename3.TIF::31:31:color::124:
  Subfile Type: (0 = 0x0)
  Image Width: 124Bits/Sample: 8  Software: "+"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane
0.04 MB
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 52, <$fh> chunk 5.
TYPE:
filename4.TIFF:color:300:300:color::1001:
  Subfile Type: (0 = 0x0)
  Image Width: 1419Bits/Sample: 8  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)
2.52 MB

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of FishMonger
FishMonger
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Based on the sample data you posted on the other site, here are the results that I get.

C:\TEMP>ee.pl
TYPE:
filename.tif:bkwht:200:200:bkwht:1728:2376:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
filename2.tif:bkwht:200:200:bkwht:1728:2376:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
filename3.tif:none:31:31:color:124:124:
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Software: "+"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane
TYPE:
filename4.tif:color:300:300:color:1419:1001:
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)

Open in new window

That was sample data.  Here's real data from images, sorry:

TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename.TIF:
TIFF Directory at offset 0x10eb4
  Image Width: 1728 Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename2.TIF:
TIFF Directory at offset 0x4aac
  Image Width: 1728 Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename3.TIF:
TIFF Directory at offset 0x8
  Subfile Type: (0 = 0x0)
  Image Width: 124 Image Length: 124
  Resolution: 31, 31 pixels/inch
  Bits/Sample: 8
  Compression Scheme: None
  Photometric Interpretation: RGB color
  Software: "¼"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename4.TIFF:
TIFF Directory at offset 0x8
  Subfile Type: (0 = 0x0)
  Image Width: 1419 Image Length: 1001
  Resolution: 300, 300 pixels/inch
  Bits/Sample: 8
  Compression Scheme: LZW
  Photometric Interpretation: RGB color
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)
This still needs a slight tweak to fix the indentation on the "Bits/Sample" line, but other than that, it appears to do what you need.

#!/usr/bin/perl

use strict;
use warnings;
use File::Spec;

my %config = (
    'LZW'                               => 'color',
    'Lempel-Ziv & Welch encoding'       => 'color',
    'CCITT Group 4'                     => 'bkwht',
    'CCITT Group 4 facsimile encoding'  => 'bkwht',
    'None'                              => 'none',
    'RGB color'                         => 'color',
    'min-is-white'                      => 'bkwht',
    'min-is-black'                      => 'bkwht',
    'palette color (RGB from colormap)' => 'color',
    'Resolution'                        => sub {
                                            my @r = split(/, /, shift);
                                            $r[0] =~ s/\D//g;
                                            $r[1] =~ s/\D//g;
                                            return @r[0,1];
    },
);

my @config = keys %config;

my $file = 'data.txt'; # set this as needed.

open my $fh, '<', $file or die "can't open <$file> for reading $!";

$/ = "TYPE:\n";
while ( my $record = <$fh> ) {
    chomp $record;
    next if $record eq '';
    $record =~ s/(TIFF Directory at offset .+)\n//;


    my ($fullpath, $data) = split(/\n/, $record, 2);
    $fullpath =~ s/:$//;

    my ($drv, $path, $file) = File::Spec->splitpath($fullpath);

    my $cs = $config{$1} if ($data =~ s/\s{2}Compression Scheme:\s+(.*?)\n//);
    my $pi = $config{$1} if ($data =~ s/\s{2}Photometric Interpretation:\s+(.*?)\n//);
    my @r = $config{'Resolution'}->($1) if ($data =~ s/Resolution:\s+(.*?)\n//);
    my ($w, $l) = ($1, $2) if ($data =~ s/Image Width: (\d+) Image Length: (\d+)\n//);

#    my $filesize = (-s $fullpath) / (1024 * 1024);
#    my $size_in_mb = sprintf "%.2f", $filesize;

    print $/, join(':', $file, $cs, @r, $pi, $w, $l), ":\n", $data; #, $size_in_mb;

}

Open in new window

These are the needed corrections.

    my @r = $config{'Resolution'}->($1) if ($data =~ s/\s{2}Resolution:\s+(.*?)\n//);
    my ($w, $l) = ($1, $2) if ($data =~ s/\s{2}Image Width: (\d+) Image Length: (\d+)\n//);


Note: using single letter vars really should be done and I should have fixed those as well, but didn't.
This did fix the indents perfectly, however, I'm still getting those   uninitialized value errors:

C:\Users\Drew\Desktop\Dad>compileTiffInfoNew.pl
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 51, <$fh> chunk 2.
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 51, <$fh> chunk 2.
TYPE:
filename.tif:bkwht:200:200:bkwht:::
  Image Width: 1728
  Image Length: 2376
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 51, <$fh> chunk 3.
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 51, <$fh> chunk 3.
TYPE:
filename2.tif:bkwht:200:200:bkwht:::
  Image Width: 1728
  Image Length: 2376
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 51, <$fh> chunk 4.
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 51, <$fh> chunk 4.
TYPE:
filename3.tif:none:31:31:color:::
  Subfile Type: (0 = 0x0)
  Image Width: 124
  Image Length: 124
  Bits/Sample: 8
  Software: "+"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 51, <$fh> chunk 5.
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 51, <$fh> chunk 5.
TYPE:
filename4.tif:color:300:300:color:::
  Subfile Type: (0 = 0x0)
  Image Width: 1419
  Image Length: 1001
  Bits/Sample: 8
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)

Attached is my data.txt file.
TYPE:
filename.tif:
TIFF Directory at offset 0x10eb4
  Image Width: 1728 
  Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
filename2.tif:
TIFF Directory at offset 0x4aac
  Image Width: 1728 
  Image Length: 2376
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
TYPE:
filename3.tif:
TIFF Directory at offset 0x8
  Subfile Type: (0 = 0x0)
  Image Width: 124 
  Image Length: 124
  Resolution: 31, 31 pixels/inch
  Bits/Sample: 8
  Compression Scheme: None
  Photometric Interpretation: RGB color
  Software: "¼"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane
TYPE:
filename4.tif:
TIFF Directory at offset 0x8
  Subfile Type: (0 = 0x0)
  Image Width: 1419 
  Image Length: 1001
  Resolution: 300, 300 pixels/inch
  Bits/Sample: 8
  Compression Scheme: LZW
  Photometric Interpretation: RGB color
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)

Open in new window

The format of your sample data keeps changing.

Is the "Image Width" and "Image Length" on separate lines as shown in this last post or on the same line as is in the prior post?

Is the actual data in a consistent format.  Meaning, do all records have the same number of "rows" and those rows are in the same order?
Oh man, I messed up, heh.  It works.  To answer your question though, the records will be different, depending on the image that is being read, but the way this is set up, should work perfectly.

Have you been able to look into getting the filesize to work? I basically, just want it at the bottom of each record, indented out, so it's uniform with output.  Let me know.

This works in my test.
    my $filesize = (-s $fullpath) / (1024 * 1024);
    my $size_in_mb = sprintf "%.2f\n", $filesize;

    print $/, join(':', $file, $cs, @r, $pi, $w, $l, "\n"), $data, $size_in_mb;

Open in new window

The last record is on the same line as the last entry:

C:\Users\Drew\Desktop\Dad>compileTiffInfoNew.pl
TYPE:
filename.TIF:bkwht:200:200:bkwht:1728:2376:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.07
TYPE:
filename2.TIF:bkwht:200:200:bkwht:1728:2376:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.02
TYPE:
filename3.TIF:none:31:31:color:124:124:
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Software: "+"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane
0.04
TYPE:
filename4.TIFF:color:300:300:color:1419:1001:
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)2.52       <------------ same line :(
That means that your file doesn't end with a line terminator, like it normally should.

Make this change.

    $data =~ s/\n$//;
    
    my $filesize = (-s $fullpath) / (1024 * 1024);
    my $size_in_mb = sprintf "\n%.2f\n", $filesize;
    
    print $/, join(':', $file, $cs, @r, $pi, $w, $l, "\n"), $data, $size_in_mb;

Open in new window

Sweet! That works.  I'm going to continue to work on the other section of this code.  This was the "main" program, however there is a little user input required, but I kept that from the code at the time.  For all I know, it works great, it has been so far.  I'm going to make sure that the new stuff you wrote can be utilized, and if everything checks out we can go ahead and close out this thread :)  I'll let you know if I run into any snags.  You are one heck of a perl programming; let me know if you have any good books/references you recommend.

I'll let you know about my script shortly.
Yup!  This code is SOO much easier to maintain and feed into output files.  Attached is my final version, and it works great!  Thanks so much for your help on this :)


#!/usr/bin/perl

use strict;
use warnings;
use File::Spec;

# Vars
my $dir_root;
my $state;
my $status;
my @files;
my $tifs;
my $executebat;
my $infile;

system 'cls';
print "Directory Root (no trailing slash!): ";
chomp($dir_root = <STDIN>);
print "State: ";
chomp($state = uc(<STDIN>));
print "Status [nr][hs][tye]: ";
chomp($status = lc(<STDIN>));

# Define the output file, based on user input
my $batOutput = "\!".$state.$status."INFOraw.txt";

open (BATFILE, "> \!".$state.$status."INFOraw.bat");
print BATFILE "\@echo off\n";
close (BATFILE);

open (BATFILE, ">> \!".$state.$status."INFOraw.bat");
print BATFILE "type nul > $batOutput\n";
close (BATFILE);

# Get a list of tif files from dir_root
# No trailing slash is allowed
opendir(DIR, $dir_root);
@files = grep(/\.ti[f]{1,2}$/i,readdir(DIR));
closedir(DIR);

# Check to see if array has data
if (@files) {
	foreach $tifs (@files) {
		open (BATFILE, ">> \!".$state.$status."INFOraw.bat");
		print BATFILE "tiffinfo TYPE $dir_root"."\\".$tifs." \>> ".$batOutput."\n";
		close (BATFILE);
	}
}
# if array is null (no data), then no tif files were found
else {
	print "No Tiff files were found.";
	exit;
}

# Run bat script
print "Attempting to execute .bat script now...\n";
$executebat = system 'call !'.$state.$status.'INFOraw.bat > NUL 2>&1';
if ( $executebat != 0 ) { 
        die "Failed executing .bat script. \n"; 
}
else { print "Ran .bat script successfully.\n"; }

$infile = $batOutput;

# open the file
open(OUT1,"> \!".$state.$status."INFO.txt") or die "Can't open \!".$state.$status."INFO.txt: $!"; 
open(OUT2,"> \!".$state.$status."INFOspdsht.txt") or die "Can't open \!".$state.$status."INFO.txt $!";

my %config = (
    'LZW'                               => 'color',
    'Lempel-Ziv & Welch encoding'       => 'color',
    'CCITT Group 4'                     => 'bkwht',
    'CCITT Group 4 facsimile encoding'  => 'bkwht',
    'None'                              => 'none',
    'RGB color'                         => 'color',
    'min-is-white'                      => 'bkwht',
    'min-is-black'                      => 'bkwht',
    'palette color (RGB from colormap)' => 'color',
    'Resolution'                        => sub {
                                            my @r = split(/, /, shift);
                                            $r[0] =~ s/\D//g;
                                            $r[1] =~ s/\D//g;
                                            return @r[0,1];
    },
);

my @config = keys %config;

my $file = $infile; # set this as needed.

open my $fh, '<', $file or die "can't open <$file> for reading $!";

$/ = "TYPE:\n";
while ( my $record = <$fh> ) {
    chomp $record;
    next if $record eq '';
    $record =~ s/(TIFF Directory at offset .+)\n//;


    my ($fullpath, $data) = split(/\n/, $record, 2);
    $fullpath =~ s/:$//;

    my ($drv, $path, $file) = File::Spec->splitpath($fullpath);

    my $cs = $config{$1} if ($data =~ s/\s{2}Compression Scheme:\s+(.*?)\n//);
    my $pi = $config{$1} if ($data =~ s/\s{2}Photometric Interpretation:\s+(.*?)\n//);
    my @r = $config{'Resolution'}->($1) if ($data =~ s/\s{2}Resolution:\s+(.*?)\n//);
    my ($w, $l) = ($1, $2) if ($data =~ s/\s{2}Image Width: (\d+) Image Length: (\d+)\n//);

    $data =~ s/\n$//;
    
    my $filesize = (-s $fullpath) / (1024 * 1024);
    my $size_in_mb = sprintf "\n%.2f MB\n", $filesize;
    
    print OUT1 $/, join(':', $file, $cs, @r, $pi, $w, $l, "\n"), $data, $size_in_mb;
    print OUT2 join(':', $file, $cs, @r, $pi, $w, $l, "\n");
}
close (OUT1) or die "Can't close out1: $!"; 
close (OUT2) or die "Can't close out2: $!"; 
close ($fh) or die "Can't close $fh: $!";

Open in new window

Oh, one last thing!  Here's my output now:

TYPE:
filename4.TIFF:color:300:300:color:1419:1001:
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)
Size: 2.52 MB

Is there a way to place the value of "Size" (2.52 - numbers only, not MB) on that top line so that its at the end of it, like so:...

TYPE:
filename4.TIFF:color:300:300:color:1419:1001:2.52:
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)

    my $size_in_mb = sprintf "%.2f", $filesize;
    
    print $/, join(':', $file, $cs, @r, $pi, $w, $l, $size_in_mb, "\n"), $data;

Open in new window

Correction:
print $/, join(':', $file, $cs, @r, $pi, $w, $l, $size_in_mb, "\n"), $data, "\n";

Open in new window

Awesome.  That worked.  Before you go, can you actually explain this information:

my $cs = $config{$1} if ($data =~ s/\s{2}Compression Scheme:\s+(.*?)\n//);
    my $pi = $config{$1} if ($data =~ s/\s{2}Photometric Interpretation:\s+(.*?)\n//);
    my @r = $config{'Resolution'}->($1) if ($data =~ s/\s{2}Resolution:\s+(.*?)\n//);
    my ($w, $l) = ($1, $2) if ($data =~ s/\s{2}Image Width: (\d+) Image Length: (\d+)\n//);

And how I can create additional ones that would update that one line, just in case I have to make additional checks on things.  I fought for like 2 hours with a regex, and i kept recieving that uninitialized variable issue.  If you can shed some light on this, I'd better understand how the script works.
If the regex match is successful, it captures/saves a portion of the match in $1 and deletes the entire match from the string.  Then, $1 is used as the key for a hash lookup and assigns the value returned to the var.  Each of those lines replace the messy/verbose elsif blocks that you were previously using.

Here's a detailed explanation of the first regex.

The regular expression:

(?-imsx:s/\s{2}Compression Scheme:\s+(.*?)\n//)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  s/                       's/'
----------------------------------------------------------------------
  \s{2}                    whitespace (\n, \r, \t, \f, and " ") (2
                           times)
----------------------------------------------------------------------
  Compression Scheme:      'Compression Scheme:'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \n                       '\n' (newline)
----------------------------------------------------------------------
  //                       '//'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Open in new window

See:
perldoc perlretut - http://perldoc.perl.org/perlretut.html

Mastering Regular Expressions - http://oreilly.com/catalog/9780596528126
Do you use a regex testing that would make writing RegEx any easier?

This output is killing me...

So your stuff worked, but it's causing my output to be one one line, see:

TYPE:
filename.TIF:bkwht:200:200:bkwht:1728:2376:0.07:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane0.07TYPE:
filename2.TIF:bkwht:200:200:bkwht:1728:2376:0.02:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane0.02TYPE:
filename3.TIF:none:31:31:color:124:124:0.04:
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Software: "¼"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane0.04TYPE:
filename4.TIFF:color:300:300:color:1419:1001:2.52:
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)2.52

IT should be:

TYPE:
filename.TIF:bkwht:200:200:bkwht:1728:2376:0.07:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.07
TYPE:
filename2.TIF:bkwht:200:200:bkwht:1728:2376:0.02:
  Bits/Sample: 1
  FillOrder: lsb-to-msb
  Document Name: "Standard Input"
  Image Description: "converted PBM file"
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2376
  Planar Configuration: single image plane
0.02
TYPE:
filename3.TIF:none:31:31:color:124:124:0.04:
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Software: "¼"
  Samples/Pixel: 3
  Rows/Strip: 55
  Planar Configuration: single image plane0.04
TYPE:
filename4.TIFF:color:300:300:color:1419:1001:2.52:
  Subfile Type: (0 = 0x0)
  Bits/Sample: 8
  Samples/Pixel: 3
  Rows/Strip: 1
  Planar Configuration: single image plane
  Photoshop Data: <present>, 410 bytes
  Predictor: horizontal differencing 2 (0x2)
2.52

I hate bothering you over these little issues.  This is the LAST thing that needs to be fixed, and it should work fully ...
WOO HOO!

finally fixed something on my own :) don't need help with that :) - it was an issue with how I had the output set :)

Again though, do you use any sort of regex program or utility (paid or free) that you use to do any perl debugging, regexing, etc?

Any suggestions are appriciated.
>> Again though, do you use any sort of regex program or utility (paid or free) that you use to do any perl debugging, regexing, etc?

Perl has a built-in debugger ( the -d switch ) which I use on occasion, but for the most part, I always use the warnings and strict pragmas, which point out lots of errors and I use Data::Dumper to output vars to verify that they hold what I expect.

It looks to me that the original question/problem in this thread has been fully answered, so don't you think it's time to close this tread?
I'd suggest you give them their points, and start another thread on "explain this program".  Then I should be around tomorrow (Friday) most of the day CDT and I'll take a look at it myself.

At the risk of starting a flamewar, if you're teaching yourself programming and regular expressions I'd stay away from Perl unless you have a good reason for learning that language.  In my opinion it's just too ugly and hard-to-read.  I would recommend Python as my first choice of a good language to learn.  If on Windows, JScript under Windows Scripting Host might be an interesting alternative; syntactically slightly more like Perl and can do Windows-specific things like COM and sending keystrokes at other windows.
Superdave,

First off, I'm actually enjoying learning perl and I've been around of alot of other languages.  In fact, I hate Python.  Love it when someone writes a python program FOR me, but I swear, you have to write double the amount of Python code to get it to do the same thing in Perl.  I really like perl and find it easy to read and learn.  The problem was that I dove straight into a language I've never attempted to learn, and create a quite extensive program that is going a long way.  I'm up to over 200 lines of code with the suggestion in this thread.

Thanks for the recommendation though.

P.S. - I'm a Unix guy - WSH/COM - YUCK! :D