reg ex help


Results from a windows registry query toss back somewhat polluted data, with the details I'm interested in towards the end being space delimited:
  $var = '8í+??s??í +*V-  mspst.dll     NITA·++? ¬ 7+n    E : \ m a i l \ b a c k u p  ( 1 ) . p s t'
or
  $var = '5í+??s??í +*V-  mspst.dll     NITA·++? ¬ 7+n    \ \ s e r v e r \ u s e r \ b a c k u p  ( 1 ) . p s t'

What regular expression would give me:
  $var ='E:\mail\backup (1).pst'
or
  $var = '\\server\user\backup (1).pst'
Marketing_InsistsAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
sjklein42Connect With a Mentor Commented:
I think we got it.  Rewrote hex-to-ascii.

@hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach $var(@hexList) {
	print "Hex: $var \n";


	$ascii = '';
	while ( $var =~ s/([A-F0-9]{2})//i ) { $ascii .= ( $1 eq '00' ) ? ' ' : chr(hex $1); }
	$var = $ascii;
	print "Hex2ASCII: $var\n";


	$out = '';
	$var =~ s/\s+$//;
	while ( $var =~ s/\s(.)$// ) { $out = $1 . $out; }
	$var = $out; 
	$var =~ s/^\s+//;

	print "Clean up: '$var'\n\n";

}

Open in new window



c:\temp>perl foo2.pl
Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D00690
06E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up: '\\srvprdadmin3\user\backup (1).pst'

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B0
07500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up: 'E:\mail\backup (1).pst'

Open in new window

0
 
sjklein42Commented:
Can you work with this?

while ( <> )
{
	s/[\r\n]//g;

	$out = '';
	while ( s/\s\s?([^\s])$// ) { $out = $1 . $out; }
	print "$out\n";
}	

Open in new window


c:\temp>perl foo.pl foo.dat
E:\mail\backup(1).pst
\\server\user\backup(1).pst

Open in new window

0
 
wilcoxonCommented:
This should do it...

# first remove the extraneous spaces
$var =~ s{\s(?=\S)}{}g;
# next grab the last item from the line containing no more than single spaces
$var =~ s{^.*?(\S(?:\S+|\s(?=\S))+)$}{$1};
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
sjklein42Commented:
Within your programming context (input and output in $var)t:

$out = '';
while ( $var =~ s/\s\s?([^\s])$// ) { $out = $1 . $out; }
$var = $out;

Open in new window

0
 
wilcoxonCommented:
sjklein42, it looks like your solution strips out the spaces the author wants to keep (per your example) just before the (1).  Given these are windows paths, spaces in the paths should be picked up.
0
 
wilcoxonCommented:
On the other hand, if the path contains multiple spaces in a row, my solution won't pick it up correctly either (but that's very unusual in my experience).
0
 
sjklein42Commented:
wilcoxon,

You're right.  Good eyes.  What threw me is that the "significant" space character does not have an extra space going along with it the way all the other characters do.
0
 
wilcoxonCommented:
Yes.  It seems to be that 1 space = garbage but two spaces = 1 real space (whereas you would expect 3 spaces for 1 real space based on the other character behavior).
0
 
sjklein42Commented:
This fixes the bug in my code wilcoxon pointed out.  Thanks.

$out = '';
while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
$var = $out;

Open in new window

c:\temp>perl foo.pl foo.dat
E:\mail\backup (1).pst
\\server\user\backup (1).pst

Open in new window

0
 
Marketing_InsistsAuthor Commented:
thanks for the  help

No luck yet.  Here is the code I'm working with.  The trouble is, you would have to have Windows and Outlook (preferably opened) and have .pst files associated with your profile.  Initially, the data returned is in Hex.

(Some outlook versions shoot out a non-fatal object access error if outlook is closed when the script is run )

Very bizarre how Microsoft expects anyone to work with such garbage return info.
 
use Win32::OLE;

$objOutlook = Win32::OLE->new('Outlook.Application');
$objFolders = $objOutlook->Session->Folders;

for ($i= $objFolders->Count; $i >= 1; $i += -1) {
   $objFolder = $objFolders->Item($i);

   if (((index($objFolder->Name, 'Mailbox') + 1) == 0) && ((index($objFolder->Name, 'Public Folders') + 1) == 0)) {

      print $objFolder->StoreID, "\n";  # show hex

      $var = $objFolder->StoreID;
      $var =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg; 
      
      print $var, "\n";                 # show hex to ascii

      # need help getting usable network or local paths

    }
}

Open in new window

0
 
sjklein42Commented:
Let's focus for a minute on the step of converting hex to ascii...

Maybe there is an unprintable character (isn't that a funny expression) at the end of the $var string that is causing the match to fail, but you can't see it because it's invisible.  Like a null or a newline?

Please add a print of the raw hex $var value before the conversion, as well as your converted ascii value.

Also, have you tried both my version of the code and wilcoxon's?
0
 
Marketing_InsistsAuthor Commented:
Yes, I did try the versions from both of you.

Here are the two sets of hex data returned:

0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000

0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
0
 
sjklein42Commented:
Lots of nulls (00 bytes).  I think chr must be return blanks for them, and there are some at the end of each line.

Try this variation that trims trailing blanks before starting the parse.

$out = '';
$var =~ s/\s+$//;
while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
$var = $out; 

Open in new window

0
 
wilcoxonCommented:
I'd do something like this...

Unfortunately I can't test it.  If you find other chars that you don't want returned besides 0, add them to the first return in mychr (or you could use ranges to exclude whole swaths of "unprintable" chars).
use Win32::OLE;

$objOutlook = Win32::OLE->new('Outlook.Application');
$objFolders = $objOutlook->Session->Folders;

for ($i= $objFolders->Count; $i >= 1; $i += -1) {
   $objFolder = $objFolders->Item($i);

   if (((index($objFolder->Name, 'Mailbox') + 1) == 0) && ((index($objFolder->Name, 'Public Folders') + 1) == 0)) {

      print $objFolder->StoreID, "\n";  # show hex

      $var = $objFolder->StoreID;
      $var =~ s/([a-fA-F0-9]{2})/mychr(hex $1)/eg; 

      # first remove the extraneous spaces
      $var =~ s{\s+$}{};
      $var =~ s{\s(?=\S)}{}g;
      # next grab the last item from the line (no more than single spaces)
      $var =~ s{^.*?(\S(?:\S+|\s(?=\S))+)$}{$1};
      
      print $var, "\n";                 # show hex to ascii

      # need help getting usable network or local paths

    }
}

sub mychr {
    my ($chr) = @_;
    return ' ' if ($chr == 0);
    return chr($chr);
}

Open in new window

0
 
Marketing_InsistsAuthor Commented:
I seem to be getting blank output when I use it as my last step to cleanup.  Some sample code below demonstrates:
 
@hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach $var(@hexList) {
	print "Hex: $var \n";

	$var =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg; 
	print "Hex2ASCII: $var\n";


	$out = '';
	$var =~ s/\s+$//;
	while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
	$var = $out; 
	print "Clean up: $var\n\n";


}

__END__
The output is:

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up:

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up:

Open in new window

0
 
wilcoxonConnect With a Mentor Commented:
What I had suggested just about worked without modification.  The only trouble was that the space is 3 spaces when converted from the hex (in your original sample input it only appears to have 2 spaces).  To make this work, I had to modify the final regex.  Unfortunately, it now explicitly looks for something that looks like a file/path (drive letter colon or \\) - if you need to pull data that follows a different format, I can come up with a different regex (I went with this since both your samples are files).

I think this method will also be more efficient than sjklein's as it uses fewer loops (but does use a few s///g regexes).  However, I did not benchmark the times to verify.
#!/usr/local/bin/perl

use strict;
use warnings;

my @hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach my $var (@hexList) {
        print "Hex: $var \n";

        $var =~ s/([a-fA-F0-9]{2})/mychr(hex $1)/eg;
        print "Hex2ASCII: $var\n";

        # first remove the extraneous spaces
        $var =~ s{\s+$}{};
        $var =~ s{\s(?=\S|\s\s)}{}g;
        # next grab the last item from the line that looks like a file/path
        $var =~ s{^.*?((?:\w{1,2}:|\\\\)\S(?:\S|\s(?=\S))+)$}{$1};

        print "Clean up: $var\n\n";
}

sub mychr {
    my ($chr) = @_;
    return ' ' if ($chr == 0);
    return chr($chr);
}

Open in new window

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8¦¦¿¿¦¿¦ +*V¦  mspst.dll     NITA¦¦¦ ¦ 7¦n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up: \\srvprdadmin3\user\backup (1).pst

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8¦¦¿¿¦¿¦ +*V¦  mspst.dll     NITA¦¦¦ ¦ 7¦n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up: E:\mail\backup (1).pst

Open in new window

0
 
Marketing_InsistsAuthor Commented:
Thanks guys, both solutions work great, though sj squeezed in a few minutes earlier ;)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.