drewrockshard
asked on
Perl Regex
I have a filehandle that reads in multiple lines of "blocks" of text. Each block is separated with each iteration of the work TYPE ... or the end of file (eof).
There's a piece of text, however, that looks like the following:
TIFF Directory at offset 0x10eb4
I need it to match any form of this. The hex output will change at any given time, so I need it to match anything in the sort of:
TIFF Directory at offset * (where * = the hex output)
I use this in a perl script (if statement), not as a one liner, so any help is appriciated.
There's a piece of text, however, that looks like the following:
TIFF Directory at offset 0x10eb4
I need it to match any form of this. The hex output will change at any given time, so I need it to match anything in the sort of:
TIFF Directory at offset * (where * = the hex output)
I use this in a perl script (if statement), not as a one liner, so any help is appriciated.
ASKER
Thanks, but this isn't exactly what I was looking for; almost though. It matches (which is good :)), but how do I then take this information and then "remove" it from the file? How do I match this with input and then actually tell it to delete this in the output.
All my output values are setup, I just need to know how to remove this "match" from it outputting the offset information of the regex.
Thanks.
All my output values are setup, I just need to know how to remove this "match" from it outputting the offset information of the regex.
Thanks.
Is this what you need?
#!/usr/bin/perl
use strict;
use warnings;
my $str = 'TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename2.TIF:bkwht:200:200:bkwht:1728:2376:
TIFF Directory at offset 0x4aac
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane';
$str =~ s/(TIFF Directory at offset .+)\n//;
#print $1, "\n\n";
print $str;
ASKER
Actually, not exactly. I think you know what I'm needing, just not really knowing how to implement. Let me attach my code to here, so you can see how I have everything laid out. (see code attachement)
# open the file
open (IN, $infile) or die "Can't open $infile: $!";
open(OUT1,"> \!".$state.$status."INFO.txt") or die "Can't open \!".$state.$status."INFO.txt: $!";
open(OUT2,"> \!".$state.$status."INFOspdsht.txt") or die "Can't open \!".$state.$status."INFO.txt $!";
# loop through it
while (<IN>) {
chomp;
# if we've reached the "record-delimiter i.e. TYPE:" or EOF
if (/^TYPE:/ or eof) {
# check if array "@line" is defined; if so, print it after
# appending the additional information after filetype i.e. 2nd element
if (@line) {
$line[1] .= "\n$fn:$cs:$r[0]:$r[1]:$pi:$iw[0]:$iw[1]:";
#my $file = File::Spec->splitpath( $fn );
print OUT2 $line[1],"\n";
foreach (@line) {
#print $_,"\n";
print OUT1 $_,"\n";
}
# Important - reset the array
@line = ();
$filesize = (-s $fpath.$fn) / (1024 * 1024);
$size_in_mb = sprintf "%.2f", $filesize;
print OUT1 $size_in_mb." MB\n";
}
}
else {
# check for filename, compression scheme, resolution, photometric interpretation,
# image width and image length, and set the variables $fn, $cs, $r, $pi, $iw, $il
#if (/^\s*(.+)\.tif:\s*$/i) {
# $fn = "$1.tif";
# #next;
#}
if (/^(..[^:]*?\\)([^:\\]*\.tiff?):/im) {
#print "Path: $1\n";
#print "File: $2\n";
$fn = $2;
$fpath = $1;
next;
}
elsif (/^\s*Compression Scheme:\s*(.*?)$/) {
$cs = $1;
if ($cs eq "LZW") {
$cs = "color";
}
elsif ($cs eq "Lempel-Ziv & Welch encoding") {
$cs = "color";
}
elsif ($cs eq "CCITT Group 4") {
$cs = "bkwht";
}
elsif ($cs eq "CCITT Group 4 facsimile encoding") {
$cs = "bkwht";
}
elsif ($cs eq "none") {
$cs = "none";
}
next;
}
elsif (/^\s*Resolution:\s*(.*?)$/) {
$r = $1;
@r = split(/, /, $r);
$r[0] =~ s/\D//g;
$r[1] =~ s/\D//g;
next;
}
elsif (/^\s*Photometric Interpretation:\s*(.*?)$/) {
$pi = $1;
if ($pi eq "RGB color") {
$pi = "color";
}
elsif ($pi eq "min-is-white") {
$pi = "bkwht";
}
elsif ($pi eq "min-is-black") {
$pi = "bkwht";
}
elsif ($pi eq "palette color (RGB from colormap)") {
$pi = "color";
}
next;
}
elsif (/^\s*Image Width:\s*(.*?)\s*$/) {
$iw = $1;
$iw = $1;
@iw = split(/(?<=\d)\D/,$iw);
$iw[0] =~ s/\D//g;
$iw[1] =~ s/\D//g;
next;
}
} ## Closing else
# add the current line to the array @line
push @line, $_;
} ## closing while loop
# clean up after we're done
close (OUT1) or die "Can't close out1: $!";
close (OUT2) or die "Can't close out2: $!";
close (IN) or die "Can't close $infile: $!";
What I was hinting at is that you should be reading in the file in record mode instead of line mode.
I need to work on a project of my own, but will give a more complete example in a little while, but for now look over this:
I need to work on a project of my own, but will give a more complete example in a little while, but for now look over this:
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec;
use Data::Dumper;
$/ = "TYPE:\n";
while (<DATA>) {
chomp;
next if $_ eq '';
s/(TIFF Directory at offset .+)\n//;
my ($line1, $other) = split(/\n/, $_, 2);
print Dumper ($line1, $other);
}
__DATA__
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename.TIF:bkwht:200:200:bkwht:1728:2376:
TIFF Directory at offset 0x10eb4
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename2.TIF:bkwht:200:200:bkwht:1728:2376:
TIFF Directory at offset 0x4aac
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename3.TIF:None:31:31:color:124:124:
TIFF Directory at offset 0x8
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Software: "¼"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\tiffs\filename4.TIFF:color:300:300:color:1419:1001:
TIFF Directory at offset 0x8
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
ASKER
Please do, I don't understand this completely, since I'm new to perl, but come from an intermediate PHP background, and advanced Shell programming. Does this mean I'll need to rewrite most of my code that I sent you? Please let me know.
Regards,
Drew
Regards,
Drew
Yes, you will need to rewrite a good portion of that code. But in the end, it also means that the code will be cleaner, more efficient, and easier to read/maintain.
ASKER
Sounds good, let me know; I'm willing to learn.
If you just want to remove the phrase " at offset 0x10eb4" from that line, then:
s/ at offset 0x[0-9a-fA-F]+$//;
should do it. Do it right before the push $_. (It will make the change to $_.)
s/ at offset 0x[0-9a-fA-F]+$//;
should do it. Do it right before the push $_. (It will make the change to $_.)
ASKER
I'm wanting to remove "TIFF Directory at offset" and the hex that follows from muliple instances of the match.
To remove that whole phrase but not the line itself?
s/TIFF Directory at offset 0x[0-9a-fA-F]+//g;
To ignore the whole line in your program, in place of simply push $_:
push $_ unless /TIFF Directory at offset 0x[0-9a-fA-F]+/;
s/TIFF Directory at offset 0x[0-9a-fA-F]+//g;
To ignore the whole line in your program, in place of simply push $_:
push $_ unless /TIFF Directory at offset 0x[0-9a-fA-F]+/;
ASKER
Your workaround would remove it, however, it places a "space", how would you remove the line and its "space", so that it basically doesn't exist?
For example, output now looks like:
TYPE:
filename.TIF:bkwht:200:200 :bkwht:172 8:2376:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07 MB
TYPE:
filename2.TIF:bkwht:200:20 0:bkwht:17 28:2376:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.02 MB
Where I want it to be:
TYPE:
filename.TIF:bkwht:200:200 :bkwht:172 8:2376:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07 MB
TYPE:
filename2.TIF:bkwht:200:20 0:bkwht:17 28:2376:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.02 MB
For example, output now looks like:
TYPE:
filename.TIF:bkwht:200:200
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07 MB
TYPE:
filename2.TIF:bkwht:200:20
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.02 MB
Where I want it to be:
TYPE:
filename.TIF:bkwht:200:200
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07 MB
TYPE:
filename2.TIF:bkwht:200:20
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.02 MB
Is that with my first version (with s/ )? Do it the second way then where you don't push if it matches. i.e.
push $_ unless /TIFF Directory at offset 0x[0-9a-fA-F]+/;
push $_ unless /TIFF Directory at offset 0x[0-9a-fA-F]+/;
Or if you did both things because I didn't make myself clear, take out the first part. If you made the first change, the second change won't take effect because the unless condition will never match. I meant to show two different ways of doing it because I didn't know if you wanted the blank line or if you wanted the whole line deleted.
ASKER
Well, I've tried both, problem is, it doesn't work 100%. Here's what happens:
RAW OUTPUT (before the regexs):
TYPE:
C:\Users\Drew\Desktop\Dad\ tiffs\file name.TIF:
TIFF Directory at offset 0x10eb4
Image Width: 1728 Image Length: 2376
Resolution: 200, 200 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
MAIN OUTPUT:
TYPE:
Bits/Sample: 1
filename.TIF:bkwht:200:200 :bkwht:172 8:2376:
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07 MB
Its shifting data up, which I'm not wanting that :(. Do you know why its doing this?
RAW OUTPUT (before the regexs):
TYPE:
C:\Users\Drew\Desktop\Dad\
TIFF Directory at offset 0x10eb4
Image Width: 1728 Image Length: 2376
Resolution: 200, 200 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
MAIN OUTPUT:
TYPE:
Bits/Sample: 1
filename.TIF:bkwht:200:200
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07 MB
Its shifting data up, which I'm not wanting that :(. Do you know why its doing this?
No I don't know why. What does your program look like at this point?
ASKER
See code snippet.
# open the file
open (IN, $infile) or die "Can't open $infile: $!";
open(OUT1,"> \!".$state.$status."INFO.txt") or die "Can't open \!".$state.$status."INFO.txt: $!";
open(OUT2,"> \!".$state.$status."INFOspdsht.txt") or die "Can't open \!".$state.$status."INFO.txt $!";
# loop through it
while (<IN>) {
chomp;
# if we've reached the "record-delimiter i.e. TYPE:" or EOF
if (/^TYPE:/ or eof) {
# check if array "@line" is defined; if so, print it after
# appending the additional information after filetype i.e. 2nd element
if (@line) {
$line[1] .= "\n$fn:$cs:$r[0]:$r[1]:$pi:$iw[0]:$iw[1]:";
#my $file = File::Spec->splitpath( $fn );
print OUT2 $line[1],"\n";
foreach (@line) {
#print $_,"\n";
print OUT1 $_,"\n";
}
# Important - reset the array
@line = ();
$filesize = (-s $fpath.$fn) / (1024 * 1024);
$size_in_mb = sprintf "%.2f", $filesize;
print OUT1 $size_in_mb." MB\n";
}
}
else {
# check for filename, compression scheme, resolution, photometric interpretation,
# image width and image length, and set the variables $fn, $cs, $r, $pi, $iw, $il
#if (/^\s*(.+)\.tif:\s*$/i) {
# $fn = "$1.tif";
# #next;
#}
if (/^(..[^:]*?\\)([^:\\]*\.tiff?):/im) {
#print "Path: $1\n";
#print "File: $2\n";
$fn = $2;
$fpath = $1;
next;
}
##elsif (s/TIFF Directory at offset 0x[0-9a-fA-F]+//g) {
## my $oset = $1;
##}
elsif (/^\s*Compression Scheme:\s*(.*?)$/) {
$cs = $1;
if ($cs eq "LZW") {
$cs = "color";
}
elsif ($cs eq "Lempel-Ziv & Welch encoding") {
$cs = "color";
}
elsif ($cs eq "CCITT Group 4") {
$cs = "bkwht";
}
elsif ($cs eq "CCITT Group 4 facsimile encoding") {
$cs = "bkwht";
}
elsif ($cs eq "none") {
$cs = "none";
}
next;
}
elsif (/^\s*Resolution:\s*(.*?)$/) {
$r = $1;
@r = split(/, /, $r);
$r[0] =~ s/\D//g;
$r[1] =~ s/\D//g;
next;
}
elsif (/^\s*Photometric Interpretation:\s*(.*?)$/) {
$pi = $1;
if ($pi eq "RGB color") {
$pi = "color";
}
elsif ($pi eq "min-is-white") {
$pi = "bkwht";
}
elsif ($pi eq "min-is-black") {
$pi = "bkwht";
}
elsif ($pi eq "palette color (RGB from colormap)") {
$pi = "color";
}
next;
}
elsif (/^\s*Image Width:\s*(.*?)\s*$/) {
$iw = $1;
$iw = $1;
@iw = split(/(?<=\d)\D/,$iw);
$iw[0] =~ s/\D//g;
$iw[1] =~ s/\D//g;
next;
}
} ## Closing else
# add the current line to the array @line
push @line, $_ unless /TIFF Directory at offset 0x[0-9a-fA-F]+/; #/TIFF Directory at offset 0x[0-9a-fA-F]+/; # unless ($_ eq s/\n+/\n/g)
} ## closing while loop
# clean up after we're done
close (OUT1) or die "Can't close out1: $!";
close (OUT2) or die "Can't close out2: $!";
close (IN) or die "Can't close $infile: $!";
The dropped lines are where you have "next" as the last thing in processing that condition, so they never get pushed into @line. Is that what you mean?
Your indentation is not right in a couple places, it might help us figure it out if you cleaned that up. e.g. from "#Important - reset the array" and down from there should be exdented. Also where you have a lot of commented-out code it changes the logic of the indenting.
Your indentation is not right in a couple places, it might help us figure it out if you cleaned that up. e.g. from "#Important - reset the array" and down from there should be exdented. Also where you have a lot of commented-out code it changes the logic of the indenting.
ASKER
Nope, thats not what I mean. I know what the "next" parts are doing, and why they "disappear" from the "raw" output. I'm talking about how, when I remove all the spaces from the output, it moves the next section that is below "filename.TIF:bkwht:200:20 0:bkwht:17 28:2376:" and moves it above it. Take a look at the raw output, and compare with the main output. See how "Bits/Sample: 1" is now above "filename.TIF:bkwht:200:20 0:bkwht:17 28:2376:"? If I were to remove that, then the next entry below that would move up, for example, if "Bits/Sample: 1" were removed, then "FillOrder: lsb-to-msb" would go above "filename.TIF:bkwht:200:20 0:bkwht:17 28:2376:". Not sure why. I basically want it to say "TYPE:" and the be followed by the section "filename.TIF:bkwht:200:20 0:bkwht:17 28:2376:", and THEN have all the rest of the output.
RAW OUTPUT:
TYPE:
C:\Users\Drew\Desktop\Dad\ tiffs\file name.TIF:
TIFF Directory at offset 0x10eb4
Image Width: 1728 Image Length: 2376
Resolution: 200, 200 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
MAIN OUTPUT:
TYPE:
Bits/Sample: 1
filename.TIF:bkwht:200:200 :bkwht:172 8:2376:
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07 MB
RAW OUTPUT:
TYPE:
C:\Users\Drew\Desktop\Dad\
TIFF Directory at offset 0x10eb4
Image Width: 1728 Image Length: 2376
Resolution: 200, 200 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
MAIN OUTPUT:
TYPE:
Bits/Sample: 1
filename.TIF:bkwht:200:200
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07 MB
Where are "raw output" and "main output" coming from? Is that OUT1 and OUT2?
I think maybe $line[1] should be $line[0].
ASKER
"raw output" = input file
"main output" = what is outputted to the screen (my final output that I'm left with).
"main output" = what is outputted to the screen (my final output that I'm left with).
ASKER
Changing from $line[0] to $line[1] fixed some issues, but caused another issue. Now above some output that is:
filename.TIF:bkwht:200:200 :bkwht:172 8:2376:
I also have the full filename path :( so it looks like this now:
C:\Users\Drew\Desktop\Dad\ tiffs\file name.TIF:
filename.TIF:bkwht:200:200 :bkwht:172 8:2376:
How can this be removed?
Attached is the current version of the script.
filename.TIF:bkwht:200:200
I also have the full filename path :( so it looks like this now:
C:\Users\Drew\Desktop\Dad\
filename.TIF:bkwht:200:200
How can this be removed?
Attached is the current version of the script.
# open the file
open (IN, $infile) or die "Can't open $infile: $!";
open(OUT1,"> \!".$state.$status."INFO.txt") or die "Can't open \!".$state.$status."INFO.txt: $!";
open(OUT2,"> \!".$state.$status."INFOspdsht.txt") or die "Can't open \!".$state.$status."INFO.txt $!";
# loop through it
while (<IN>) {
chomp;
# if we've reached the "record-delimiter i.e. TYPE:" or EOF
if (/^TYPE:/ or eof) {
# check if array "@line" is defined; if so, print it after
# appending the additional information after filetype i.e. 2nd element
if (@line) {
$line[0] .= "\n$fn:$cs:$r[0]:$r[1]:$pi:$iw[0]:$iw[1]:";
#my $file = File::Spec->splitpath( $fn );
print OUT2 $line[0],"\n";
foreach (@line) {
#print $_,"\n";
print OUT1 $_,"\n";
}
# Important - reset the array
@line = ();
$filesize = (-s $fpath.$fn) / (1024 * 1024);
$size_in_mb = sprintf "%.2f", $filesize;
print OUT1 $size_in_mb." MB\n";
}
}
else {
# check for filename, compression scheme, resolution, photometric interpretation,
# image width and image length, and set the variables $fn, $cs, $r, $pi, $iw, $il
#if (/^\s*(.+)\.tif:\s*$/i) {
# $fn = "$1.tif";
# #next;
#}
if (/^(..[^:]*?\\)([^:\\]*\.tiff?):/im) {
#print "Path: $1\n";
#print "File: $2\n";
$fn = $2;
$fpath = $1;
#next;
}
elsif (/^\s*Compression Scheme:\s*(.*?)$/) {
$cs = $1;
if ($cs eq "LZW") {
$cs = "color";
}
elsif ($cs eq "Lempel-Ziv & Welch encoding") {
$cs = "color";
}
elsif ($cs eq "CCITT Group 4") {
$cs = "bkwht";
}
elsif ($cs eq "CCITT Group 4 facsimile encoding") {
$cs = "bkwht";
}
elsif ($cs eq "none") {
$cs = "none";
}
next;
}
elsif (/^\s*Resolution:\s*(.*?)$/) {
$r = $1;
@r = split(/, /, $r);
$r[0] =~ s/\D//g;
$r[1] =~ s/\D//g;
next;
}
elsif (/^\s*Photometric Interpretation:\s*(.*?)$/) {
$pi = $1;
if ($pi eq "RGB color") {
$pi = "color";
}
elsif ($pi eq "min-is-white") {
$pi = "bkwht";
}
elsif ($pi eq "min-is-black") {
$pi = "bkwht";
}
elsif ($pi eq "palette color (RGB from colormap)") {
$pi = "color";
}
next;
}
elsif (/^\s*Image Width:\s*(.*?)\s*$/) {
$iw = $1;
$iw = $1;
@iw = split(/(?<=\d)\D/,$iw);
$iw[0] =~ s/\D//g;
$iw[1] =~ s/\D//g;
next;
}
} ## Closing else
# add the current line to the array @line
push @line, $_ unless /TIFF Directory at offset 0x[0-9a-fA-F]+/ || /TYPE:/; #/TIFF Directory at offset 0x[0-9a-fA-F]+/; # unless ($_ eq s/\n+/\n/g)
} ## closing while loop
# clean up after we're done
close (OUT1) or die "Can't close out1: $!";
close (OUT2) or die "Can't close out2: $!";
close (IN) or die "Can't close $infile: $!";
ASKER
FishMonger:
Let me know when you have some time to go through that code you wrote. I'll look up some documentation, however, I'm just fed up with editing my source here and there. If it takes rewriting this from scratch in a more efficient way, then so-be-it. Just let me know when you have some time.
Thanks,
Drew
Let me know when you have some time to go through that code you wrote. I'll look up some documentation, however, I'm just fed up with editing my source here and there. If it takes rewriting this from scratch in a more efficient way, then so-be-it. Just let me know when you have some time.
Thanks,
Drew
ASKER
Also, FishMonger, I'm not sure what you meant by "record mode", the only thing close to this I found was Perl's "slurping" mode, is this what you meant?
Regards,
Drew
Regards,
Drew
ASKER
FishMonger,
I think I'm somewhat starting to understand. So, "slurping" this data the way you are doing this, allows me to basically customize any output based on the input; so I have complete control over my output. This is nice, but actually understanding this is another thing. I'll await your reply.
I think I'm somewhat starting to understand. So, "slurping" this data the way you are doing this, allows me to basically customize any output based on the input; so I have complete control over my output. This is nice, but actually understanding this is another thing. I'll await your reply.
Where did the stuff go that we did before where we split that line into three parts, and I showed something like print "$1$2\n" to one file and $line[0]="$2$3" to the other? That is what was supposed to remove the path ($1) and print the rest, while also printing just the file spec ($1$2) somewhere else. You have some relic of that where you set $fn and $fpath, but it was dividing it into 3 parts. I think that should go where you have Splitpath commented out (which probably didn't work because there was more than a filename on that line).
ASKER
Thats old code that worked, but didn't provide good output and I coded something that worked better. Sometimes I leave old code in, the I commit (so that I have a backup), and then I remove the comments, etc, and push to a stable branch. So, that extra commented out stuff is just old code. I kind of want to go with FishMonger's idea, but I'm not sure if anyone really knows what I'm wanting to do. Maybe both of you can look at my original post that was started before I even started writing this code, that someone posted on the Unix.com forums. I didn't know where to start, so they provided starting code for me:
http://www.unix.com/shell-programming-scripting/132365-perl-text-manipulation.html#post302404807
Let me know if we can come up with some sort of solution on this, that would do what that post states, but in a more elegant, efficient way of doing so.
Regards,
Drew
http://www.unix.com/shell-programming-scripting/132365-perl-text-manipulation.html#post302404807
Let me know if we can come up with some sort of solution on this, that would do what that post states, but in a more elegant, efficient way of doing so.
Regards,
Drew
What about all those elsif blocks (lines 42 thru 92)? The sample data you posted doesn't contain any data that will match any of those conditionals. Is there additional data that you haven't shown, or are those elsif blocks no longer needed?
I see why I was not getting the results you wanted. I pulled the source data from your related question here on EE, but that was a preprocessed data set and your real source data was posted the other forum.
This may need a little more tweaking, but give it a try.
This may need a little more tweaking, but give it a try.
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec;
my %config = (
'LZW' => 'color',
'Lempel-Ziv & Welch encoding' => 'color',
'CCITT Group 4' => 'bkwht',
'CCITT Group 4 facsimile encoding' => 'bkwht',
'none' => 'none',
'RGB color' => 'color',
'min-is-white' => 'bkwht',
'min-is-black' => 'bkwht',
'palette color (RGB from colormap)' => 'color',
'Resolution' => sub {
my @r = split(/, /, shift);
$r[0] =~ s/\D//g;
$r[1] =~ s/\D//g;
return @r[0,1];
},
);
my @config = keys %config;
my $file = 'data.txt'; # set this as needed.
open my $fh, '<', $file or die "can't open <$file> for reading $!";
$/ = "TYPE:\n";
while ( my $record = <$fh> ) {
chomp $record;
next if $record eq '';
$record =~ s/(TIFF Directory at offset .+)\n//;
my ($fullpath, $data) = split(/\n/, $record, 2);
$fullpath =~ s/:$//;
my ($drv, $path, $file) = File::Spec->splitpath($fullpath);
my $cs = $config{$1} if ($data =~ s/\s+Compression Scheme:\s+(.*?)\n//);
my $pi = $config{$1} if ($data =~ s/\s+Photometric Interpretation:\s+(.*?)\n//);
my @r = $config{'Resolution'}->($1) if ($data =~ s/\s+Resolution:\s+(.*?)\n//);
my $w = $1 if ($data =~ s/\s+Image Width:\s+(\d+)\s*\n//);
my $l = $1 if ($data =~ s/\s+Image Length:\s+(\S+)\s*//);
my $filesize = (-s $fullpath) / (1024 * 1024);
my $size_in_mb = sprintf "%.2f", $filesize;
print $/, join(':', $file, $cs, @r, $pi, $w, $l), ":\n", $data, $size_in_mb;
}
ASKER
I'm getting some errors. See code snippet.
C:\Users\Drew\Desktop\Dad>compileTiffInfoNew.pl
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 52, <$fh> chunk 2.
TYPE:
filename.TIF:bkwht:200:200:bkwht::2376:
Image Width: 1728Bits/Sample: 1 FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07 MB
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 52, <$fh> chunk 3.
TYPE:
filename2.TIF:bkwht:200:200:bkwht::2376:
Image Width: 1728Bits/Sample: 1 FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.02 MB
Use of uninitialized value $cs in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 52, <$fh> chunk 4.
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 52, <$fh> chunk 4.
TYPE:
filename3.TIF::31:31:color::124:
Subfile Type: (0 = 0x0)
Image Width: 124Bits/Sample: 8 Software: "+"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane
0.04 MB
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\compileTiffInfoNew.pl line 52, <$fh> chunk 5.
TYPE:
filename4.TIFF:color:300:300:color::1001:
Subfile Type: (0 = 0x0)
Image Width: 1419Bits/Sample: 8 Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
2.52 MB
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Based on the sample data you posted on the other site, here are the results that I get.
C:\TEMP>ee.pl
TYPE:
filename.tif:bkwht:200:200:bkwht:1728:2376:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
TYPE:
filename2.tif:bkwht:200:200:bkwht:1728:2376:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
TYPE:
filename3.tif:none:31:31:color:124:124:
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Software: "+"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane
TYPE:
filename4.tif:color:300:300:color:1419:1001:
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
ASKER
That was sample data. Here's real data from images, sorry:
TYPE:
C:\Users\Drew\Desktop\Dad\ tiffs\file name.TIF:
TIFF Directory at offset 0x10eb4
Image Width: 1728 Image Length: 2376
Resolution: 200, 200 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\ tiffs\file name2.TIF:
TIFF Directory at offset 0x4aac
Image Width: 1728 Image Length: 2376
Resolution: 200, 200 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\ tiffs\file name3.TIF:
TIFF Directory at offset 0x8
Subfile Type: (0 = 0x0)
Image Width: 124 Image Length: 124
Resolution: 31, 31 pixels/inch
Bits/Sample: 8
Compression Scheme: None
Photometric Interpretation: RGB color
Software: "¼"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\ tiffs\file name4.TIFF :
TIFF Directory at offset 0x8
Subfile Type: (0 = 0x0)
Image Width: 1419 Image Length: 1001
Resolution: 300, 300 pixels/inch
Bits/Sample: 8
Compression Scheme: LZW
Photometric Interpretation: RGB color
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
TYPE:
C:\Users\Drew\Desktop\Dad\
TIFF Directory at offset 0x10eb4
Image Width: 1728 Image Length: 2376
Resolution: 200, 200 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\
TIFF Directory at offset 0x4aac
Image Width: 1728 Image Length: 2376
Resolution: 200, 200 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\
TIFF Directory at offset 0x8
Subfile Type: (0 = 0x0)
Image Width: 124 Image Length: 124
Resolution: 31, 31 pixels/inch
Bits/Sample: 8
Compression Scheme: None
Photometric Interpretation: RGB color
Software: "¼"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane
TYPE:
C:\Users\Drew\Desktop\Dad\
TIFF Directory at offset 0x8
Subfile Type: (0 = 0x0)
Image Width: 1419 Image Length: 1001
Resolution: 300, 300 pixels/inch
Bits/Sample: 8
Compression Scheme: LZW
Photometric Interpretation: RGB color
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
This still needs a slight tweak to fix the indentation on the "Bits/Sample" line, but other than that, it appears to do what you need.
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec;
my %config = (
'LZW' => 'color',
'Lempel-Ziv & Welch encoding' => 'color',
'CCITT Group 4' => 'bkwht',
'CCITT Group 4 facsimile encoding' => 'bkwht',
'None' => 'none',
'RGB color' => 'color',
'min-is-white' => 'bkwht',
'min-is-black' => 'bkwht',
'palette color (RGB from colormap)' => 'color',
'Resolution' => sub {
my @r = split(/, /, shift);
$r[0] =~ s/\D//g;
$r[1] =~ s/\D//g;
return @r[0,1];
},
);
my @config = keys %config;
my $file = 'data.txt'; # set this as needed.
open my $fh, '<', $file or die "can't open <$file> for reading $!";
$/ = "TYPE:\n";
while ( my $record = <$fh> ) {
chomp $record;
next if $record eq '';
$record =~ s/(TIFF Directory at offset .+)\n//;
my ($fullpath, $data) = split(/\n/, $record, 2);
$fullpath =~ s/:$//;
my ($drv, $path, $file) = File::Spec->splitpath($fullpath);
my $cs = $config{$1} if ($data =~ s/\s{2}Compression Scheme:\s+(.*?)\n//);
my $pi = $config{$1} if ($data =~ s/\s{2}Photometric Interpretation:\s+(.*?)\n//);
my @r = $config{'Resolution'}->($1) if ($data =~ s/Resolution:\s+(.*?)\n//);
my ($w, $l) = ($1, $2) if ($data =~ s/Image Width: (\d+) Image Length: (\d+)\n//);
# my $filesize = (-s $fullpath) / (1024 * 1024);
# my $size_in_mb = sprintf "%.2f", $filesize;
print $/, join(':', $file, $cs, @r, $pi, $w, $l), ":\n", $data; #, $size_in_mb;
}
These are the needed corrections.
my @r = $config{'Resolution'}->($1 ) if ($data =~ s/\s{2}Resolution:\s+(.*?) \n//);
my ($w, $l) = ($1, $2) if ($data =~ s/\s{2}Image Width: (\d+) Image Length: (\d+)\n//);
Note: using single letter vars really should be done and I should have fixed those as well, but didn't.
my @r = $config{'Resolution'}->($1
my ($w, $l) = ($1, $2) if ($data =~ s/\s{2}Image Width: (\d+) Image Length: (\d+)\n//);
Note: using single letter vars really should be done and I should have fixed those as well, but didn't.
ASKER
This did fix the indents perfectly, however, I'm still getting those uninitialized value errors:
C:\Users\Drew\Desktop\Dad> compileTif fInfoNew.p l
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\ compileTif fInfoNew.p l line 51, <$fh> chunk 2.
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\ compileTif fInfoNew.p l line 51, <$fh> chunk 2.
TYPE:
filename.tif:bkwht:200:200 :bkwht:::
Image Width: 1728
Image Length: 2376
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\ compileTif fInfoNew.p l line 51, <$fh> chunk 3.
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\ compileTif fInfoNew.p l line 51, <$fh> chunk 3.
TYPE:
filename2.tif:bkwht:200:20 0:bkwht:::
Image Width: 1728
Image Length: 2376
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\ compileTif fInfoNew.p l line 51, <$fh> chunk 4.
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\ compileTif fInfoNew.p l line 51, <$fh> chunk 4.
TYPE:
filename3.tif:none:31:31:c olor:::
Subfile Type: (0 = 0x0)
Image Width: 124
Image Length: 124
Bits/Sample: 8
Software: "+"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\ compileTif fInfoNew.p l line 51, <$fh> chunk 5.
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\ compileTif fInfoNew.p l line 51, <$fh> chunk 5.
TYPE:
filename4.tif:color:300:30 0:color:::
Subfile Type: (0 = 0x0)
Image Width: 1419
Image Length: 1001
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
Attached is my data.txt file.
C:\Users\Drew\Desktop\Dad>
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\
TYPE:
filename.tif:bkwht:200:200
Image Width: 1728
Image Length: 2376
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\
TYPE:
filename2.tif:bkwht:200:20
Image Width: 1728
Image Length: 2376
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\
TYPE:
filename3.tif:none:31:31:c
Subfile Type: (0 = 0x0)
Image Width: 124
Image Length: 124
Bits/Sample: 8
Software: "+"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane
Use of uninitialized value $w in join or string at C:\Users\Drew\Desktop\Dad\
Use of uninitialized value $l in join or string at C:\Users\Drew\Desktop\Dad\
TYPE:
filename4.tif:color:300:30
Subfile Type: (0 = 0x0)
Image Width: 1419
Image Length: 1001
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
Attached is my data.txt file.
TYPE:
filename.tif:
TIFF Directory at offset 0x10eb4
Image Width: 1728
Image Length: 2376
Resolution: 200, 200 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
TYPE:
filename2.tif:
TIFF Directory at offset 0x4aac
Image Width: 1728
Image Length: 2376
Resolution: 200, 200 pixels/inch
Bits/Sample: 1
Compression Scheme: CCITT Group 4
Photometric Interpretation: min-is-white
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
TYPE:
filename3.tif:
TIFF Directory at offset 0x8
Subfile Type: (0 = 0x0)
Image Width: 124
Image Length: 124
Resolution: 31, 31 pixels/inch
Bits/Sample: 8
Compression Scheme: None
Photometric Interpretation: RGB color
Software: "¼"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane
TYPE:
filename4.tif:
TIFF Directory at offset 0x8
Subfile Type: (0 = 0x0)
Image Width: 1419
Image Length: 1001
Resolution: 300, 300 pixels/inch
Bits/Sample: 8
Compression Scheme: LZW
Photometric Interpretation: RGB color
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
The format of your sample data keeps changing.
Is the "Image Width" and "Image Length" on separate lines as shown in this last post or on the same line as is in the prior post?
Is the actual data in a consistent format. Meaning, do all records have the same number of "rows" and those rows are in the same order?
Is the "Image Width" and "Image Length" on separate lines as shown in this last post or on the same line as is in the prior post?
Is the actual data in a consistent format. Meaning, do all records have the same number of "rows" and those rows are in the same order?
ASKER
Oh man, I messed up, heh. It works. To answer your question though, the records will be different, depending on the image that is being read, but the way this is set up, should work perfectly.
Have you been able to look into getting the filesize to work? I basically, just want it at the bottom of each record, indented out, so it's uniform with output. Let me know.
Have you been able to look into getting the filesize to work? I basically, just want it at the bottom of each record, indented out, so it's uniform with output. Let me know.
This works in my test.
my $filesize = (-s $fullpath) / (1024 * 1024);
my $size_in_mb = sprintf "%.2f\n", $filesize;
print $/, join(':', $file, $cs, @r, $pi, $w, $l, "\n"), $data, $size_in_mb;
ASKER
The last record is on the same line as the last entry:
C:\Users\Drew\Desktop\Dad> compileTif fInfoNew.p l
TYPE:
filename.TIF:bkwht:200:200 :bkwht:172 8:2376:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07
TYPE:
filename2.TIF:bkwht:200:20 0:bkwht:17 28:2376:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.02
TYPE:
filename3.TIF:none:31:31:c olor:124:1 24:
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Software: "+"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane
0.04
TYPE:
filename4.TIFF:color:300:3 00:color:1 419:1001:
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)2.52 <------------ same line :(
C:\Users\Drew\Desktop\Dad>
TYPE:
filename.TIF:bkwht:200:200
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07
TYPE:
filename2.TIF:bkwht:200:20
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.02
TYPE:
filename3.TIF:none:31:31:c
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Software: "+"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane
0.04
TYPE:
filename4.TIFF:color:300:3
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)2.52 <------------ same line :(
That means that your file doesn't end with a line terminator, like it normally should.
Make this change.
Make this change.
$data =~ s/\n$//;
my $filesize = (-s $fullpath) / (1024 * 1024);
my $size_in_mb = sprintf "\n%.2f\n", $filesize;
print $/, join(':', $file, $cs, @r, $pi, $w, $l, "\n"), $data, $size_in_mb;
ASKER
Sweet! That works. I'm going to continue to work on the other section of this code. This was the "main" program, however there is a little user input required, but I kept that from the code at the time. For all I know, it works great, it has been so far. I'm going to make sure that the new stuff you wrote can be utilized, and if everything checks out we can go ahead and close out this thread :) I'll let you know if I run into any snags. You are one heck of a perl programming; let me know if you have any good books/references you recommend.
I'll let you know about my script shortly.
I'll let you know about my script shortly.
ASKER
Yup! This code is SOO much easier to maintain and feed into output files. Attached is my final version, and it works great! Thanks so much for your help on this :)
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec;
# Vars
my $dir_root;
my $state;
my $status;
my @files;
my $tifs;
my $executebat;
my $infile;
system 'cls';
print "Directory Root (no trailing slash!): ";
chomp($dir_root = <STDIN>);
print "State: ";
chomp($state = uc(<STDIN>));
print "Status [nr][hs][tye]: ";
chomp($status = lc(<STDIN>));
# Define the output file, based on user input
my $batOutput = "\!".$state.$status."INFOraw.txt";
open (BATFILE, "> \!".$state.$status."INFOraw.bat");
print BATFILE "\@echo off\n";
close (BATFILE);
open (BATFILE, ">> \!".$state.$status."INFOraw.bat");
print BATFILE "type nul > $batOutput\n";
close (BATFILE);
# Get a list of tif files from dir_root
# No trailing slash is allowed
opendir(DIR, $dir_root);
@files = grep(/\.ti[f]{1,2}$/i,readdir(DIR));
closedir(DIR);
# Check to see if array has data
if (@files) {
foreach $tifs (@files) {
open (BATFILE, ">> \!".$state.$status."INFOraw.bat");
print BATFILE "tiffinfo TYPE $dir_root"."\\".$tifs." \>> ".$batOutput."\n";
close (BATFILE);
}
}
# if array is null (no data), then no tif files were found
else {
print "No Tiff files were found.";
exit;
}
# Run bat script
print "Attempting to execute .bat script now...\n";
$executebat = system 'call !'.$state.$status.'INFOraw.bat > NUL 2>&1';
if ( $executebat != 0 ) {
die "Failed executing .bat script. \n";
}
else { print "Ran .bat script successfully.\n"; }
$infile = $batOutput;
# open the file
open(OUT1,"> \!".$state.$status."INFO.txt") or die "Can't open \!".$state.$status."INFO.txt: $!";
open(OUT2,"> \!".$state.$status."INFOspdsht.txt") or die "Can't open \!".$state.$status."INFO.txt $!";
my %config = (
'LZW' => 'color',
'Lempel-Ziv & Welch encoding' => 'color',
'CCITT Group 4' => 'bkwht',
'CCITT Group 4 facsimile encoding' => 'bkwht',
'None' => 'none',
'RGB color' => 'color',
'min-is-white' => 'bkwht',
'min-is-black' => 'bkwht',
'palette color (RGB from colormap)' => 'color',
'Resolution' => sub {
my @r = split(/, /, shift);
$r[0] =~ s/\D//g;
$r[1] =~ s/\D//g;
return @r[0,1];
},
);
my @config = keys %config;
my $file = $infile; # set this as needed.
open my $fh, '<', $file or die "can't open <$file> for reading $!";
$/ = "TYPE:\n";
while ( my $record = <$fh> ) {
chomp $record;
next if $record eq '';
$record =~ s/(TIFF Directory at offset .+)\n//;
my ($fullpath, $data) = split(/\n/, $record, 2);
$fullpath =~ s/:$//;
my ($drv, $path, $file) = File::Spec->splitpath($fullpath);
my $cs = $config{$1} if ($data =~ s/\s{2}Compression Scheme:\s+(.*?)\n//);
my $pi = $config{$1} if ($data =~ s/\s{2}Photometric Interpretation:\s+(.*?)\n//);
my @r = $config{'Resolution'}->($1) if ($data =~ s/\s{2}Resolution:\s+(.*?)\n//);
my ($w, $l) = ($1, $2) if ($data =~ s/\s{2}Image Width: (\d+) Image Length: (\d+)\n//);
$data =~ s/\n$//;
my $filesize = (-s $fullpath) / (1024 * 1024);
my $size_in_mb = sprintf "\n%.2f MB\n", $filesize;
print OUT1 $/, join(':', $file, $cs, @r, $pi, $w, $l, "\n"), $data, $size_in_mb;
print OUT2 join(':', $file, $cs, @r, $pi, $w, $l, "\n");
}
close (OUT1) or die "Can't close out1: $!";
close (OUT2) or die "Can't close out2: $!";
close ($fh) or die "Can't close $fh: $!";
ASKER
Oh, one last thing! Here's my output now:
TYPE:
filename4.TIFF:color:300:3 00:color:1 419:1001:
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
Size: 2.52 MB
Is there a way to place the value of "Size" (2.52 - numbers only, not MB) on that top line so that its at the end of it, like so:...
TYPE:
filename4.TIFF:color:300:3 00:color:1 419:1001:2 .52:
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
TYPE:
filename4.TIFF:color:300:3
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
Size: 2.52 MB
Is there a way to place the value of "Size" (2.52 - numbers only, not MB) on that top line so that its at the end of it, like so:...
TYPE:
filename4.TIFF:color:300:3
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
my $size_in_mb = sprintf "%.2f", $filesize;
print $/, join(':', $file, $cs, @r, $pi, $w, $l, $size_in_mb, "\n"), $data;
Correction:
print $/, join(':', $file, $cs, @r, $pi, $w, $l, $size_in_mb, "\n"), $data, "\n";
ASKER
Awesome. That worked. Before you go, can you actually explain this information:
my $cs = $config{$1} if ($data =~ s/\s{2}Compression Scheme:\s+(.*?)\n//);
my $pi = $config{$1} if ($data =~ s/\s{2}Photometric Interpretation:\s+(.*?)\n/ /);
my @r = $config{'Resolution'}->($1 ) if ($data =~ s/\s{2}Resolution:\s+(.*?) \n//);
my ($w, $l) = ($1, $2) if ($data =~ s/\s{2}Image Width: (\d+) Image Length: (\d+)\n//);
And how I can create additional ones that would update that one line, just in case I have to make additional checks on things. I fought for like 2 hours with a regex, and i kept recieving that uninitialized variable issue. If you can shed some light on this, I'd better understand how the script works.
my $cs = $config{$1} if ($data =~ s/\s{2}Compression Scheme:\s+(.*?)\n//);
my $pi = $config{$1} if ($data =~ s/\s{2}Photometric Interpretation:\s+(.*?)\n/
my @r = $config{'Resolution'}->($1
my ($w, $l) = ($1, $2) if ($data =~ s/\s{2}Image Width: (\d+) Image Length: (\d+)\n//);
And how I can create additional ones that would update that one line, just in case I have to make additional checks on things. I fought for like 2 hours with a regex, and i kept recieving that uninitialized variable issue. If you can shed some light on this, I'd better understand how the script works.
If the regex match is successful, it captures/saves a portion of the match in $1 and deletes the entire match from the string. Then, $1 is used as the key for a hash lookup and assigns the value returned to the var. Each of those lines replace the messy/verbose elsif blocks that you were previously using.
Here's a detailed explanation of the first regex.
Here's a detailed explanation of the first regex.
The regular expression:
(?-imsx:s/\s{2}Compression Scheme:\s+(.*?)\n//)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
s/ 's/'
----------------------------------------------------------------------
\s{2} whitespace (\n, \r, \t, \f, and " ") (2
times)
----------------------------------------------------------------------
Compression Scheme: 'Compression Scheme:'
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\n '\n' (newline)
----------------------------------------------------------------------
// '//'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
See:
perldoc perlretut - http://perldoc.perl.org/perlretut.html
Mastering Regular Expressions - http://oreilly.com/catalog/9780596528126
perldoc perlretut - http://perldoc.perl.org/perlretut.html
Mastering Regular Expressions - http://oreilly.com/catalog/9780596528126
ASKER
Do you use a regex testing that would make writing RegEx any easier?
This output is killing me...
So your stuff worked, but it's causing my output to be one one line, see:
TYPE:
filename.TIF:bkwht:200:200 :bkwht:172 8:2376:0.0 7:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane0.07TYPE:
filename2.TIF:bkwht:200:20 0:bkwht:17 28:2376:0. 02:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane0.02TYPE:
filename3.TIF:none:31:31:c olor:124:1 24:0.04:
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Software: "¼"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane0.04TYPE:
filename4.TIFF:color:300:3 00:color:1 419:1001:2 .52:
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)2.52
IT should be:
TYPE:
filename.TIF:bkwht:200:200 :bkwht:172 8:2376:0.0 7:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07
TYPE:
filename2.TIF:bkwht:200:20 0:bkwht:17 28:2376:0. 02:
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.02
TYPE:
filename3.TIF:none:31:31:c olor:124:1 24:0.04:
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Software: "¼"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane0.04
TYPE:
filename4.TIFF:color:300:3 00:color:1 419:1001:2 .52:
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
2.52
I hate bothering you over these little issues. This is the LAST thing that needs to be fixed, and it should work fully ...
This output is killing me...
So your stuff worked, but it's causing my output to be one one line, see:
TYPE:
filename.TIF:bkwht:200:200
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane0.07TYPE:
filename2.TIF:bkwht:200:20
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane0.02TYPE:
filename3.TIF:none:31:31:c
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Software: "¼"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane0.04TYPE:
filename4.TIFF:color:300:3
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)2.52
IT should be:
TYPE:
filename.TIF:bkwht:200:200
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.07
TYPE:
filename2.TIF:bkwht:200:20
Bits/Sample: 1
FillOrder: lsb-to-msb
Document Name: "Standard Input"
Image Description: "converted PBM file"
Orientation: row 0 top, col 0 lhs
Samples/Pixel: 1
Rows/Strip: 2376
Planar Configuration: single image plane
0.02
TYPE:
filename3.TIF:none:31:31:c
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Software: "¼"
Samples/Pixel: 3
Rows/Strip: 55
Planar Configuration: single image plane0.04
TYPE:
filename4.TIFF:color:300:3
Subfile Type: (0 = 0x0)
Bits/Sample: 8
Samples/Pixel: 3
Rows/Strip: 1
Planar Configuration: single image plane
Photoshop Data: <present>, 410 bytes
Predictor: horizontal differencing 2 (0x2)
2.52
I hate bothering you over these little issues. This is the LAST thing that needs to be fixed, and it should work fully ...
ASKER
WOO HOO!
finally fixed something on my own :) don't need help with that :) - it was an issue with how I had the output set :)
Again though, do you use any sort of regex program or utility (paid or free) that you use to do any perl debugging, regexing, etc?
Any suggestions are appriciated.
finally fixed something on my own :) don't need help with that :) - it was an issue with how I had the output set :)
Again though, do you use any sort of regex program or utility (paid or free) that you use to do any perl debugging, regexing, etc?
Any suggestions are appriciated.
>> Again though, do you use any sort of regex program or utility (paid or free) that you use to do any perl debugging, regexing, etc?
Perl has a built-in debugger ( the -d switch ) which I use on occasion, but for the most part, I always use the warnings and strict pragmas, which point out lots of errors and I use Data::Dumper to output vars to verify that they hold what I expect.
It looks to me that the original question/problem in this thread has been fully answered, so don't you think it's time to close this tread?
Perl has a built-in debugger ( the -d switch ) which I use on occasion, but for the most part, I always use the warnings and strict pragmas, which point out lots of errors and I use Data::Dumper to output vars to verify that they hold what I expect.
It looks to me that the original question/problem in this thread has been fully answered, so don't you think it's time to close this tread?
I'd suggest you give them their points, and start another thread on "explain this program". Then I should be around tomorrow (Friday) most of the day CDT and I'll take a look at it myself.
At the risk of starting a flamewar, if you're teaching yourself programming and regular expressions I'd stay away from Perl unless you have a good reason for learning that language. In my opinion it's just too ugly and hard-to-read. I would recommend Python as my first choice of a good language to learn. If on Windows, JScript under Windows Scripting Host might be an interesting alternative; syntactically slightly more like Perl and can do Windows-specific things like COM and sending keystrokes at other windows.
At the risk of starting a flamewar, if you're teaching yourself programming and regular expressions I'd stay away from Perl unless you have a good reason for learning that language. In my opinion it's just too ugly and hard-to-read. I would recommend Python as my first choice of a good language to learn. If on Windows, JScript under Windows Scripting Host might be an interesting alternative; syntactically slightly more like Perl and can do Windows-specific things like COM and sending keystrokes at other windows.
ASKER
Superdave,
First off, I'm actually enjoying learning perl and I've been around of alot of other languages. In fact, I hate Python. Love it when someone writes a python program FOR me, but I swear, you have to write double the amount of Python code to get it to do the same thing in Perl. I really like perl and find it easy to read and learn. The problem was that I dove straight into a language I've never attempted to learn, and create a quite extensive program that is going a long way. I'm up to over 200 lines of code with the suggestion in this thread.
Thanks for the recommendation though.
P.S. - I'm a Unix guy - WSH/COM - YUCK! :D
First off, I'm actually enjoying learning perl and I've been around of alot of other languages. In fact, I hate Python. Love it when someone writes a python program FOR me, but I swear, you have to write double the amount of Python code to get it to do the same thing in Perl. I really like perl and find it easy to read and learn. The problem was that I dove straight into a language I've never attempted to learn, and create a quite extensive program that is going a long way. I'm up to over 200 lines of code with the suggestion in this thread.
Thanks for the recommendation though.
P.S. - I'm a Unix guy - WSH/COM - YUCK! :D
The matched string will be in $1