Solved

reg ex help

Posted on 2011-02-25
17
340 Views
Last Modified: 2012-06-27

Results from a windows registry query toss back somewhat polluted data, with the details I'm interested in towards the end being space delimited:
  $var = '8í+??s??í +*V-  mspst.dll     NITA·++? ¬ 7+n    E : \ m a i l \ b a c k u p  ( 1 ) . p s t'
or
  $var = '5í+??s??í +*V-  mspst.dll     NITA·++? ¬ 7+n    \ \ s e r v e r \ u s e r \ b a c k u p  ( 1 ) . p s t'

What regular expression would give me:
  $var ='E:\mail\backup (1).pst'
or
  $var = '\\server\user\backup (1).pst'
0
Comment
Question by:Marketing_Insists
  • 7
  • 6
  • 4
17 Comments
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
Can you work with this?

while ( <> )
{
	s/[\r\n]//g;

	$out = '';
	while ( s/\s\s?([^\s])$// ) { $out = $1 . $out; }
	print "$out\n";
}	

Open in new window


c:\temp>perl foo.pl foo.dat
E:\mail\backup(1).pst
\\server\user\backup(1).pst

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
This should do it...

# first remove the extraneous spaces
$var =~ s{\s(?=\S)}{}g;
# next grab the last item from the line containing no more than single spaces
$var =~ s{^.*?(\S(?:\S+|\s(?=\S))+)$}{$1};
0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
Within your programming context (input and output in $var)t:

$out = '';
while ( $var =~ s/\s\s?([^\s])$// ) { $out = $1 . $out; }
$var = $out;

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
sjklein42, it looks like your solution strips out the spaces the author wants to keep (per your example) just before the (1).  Given these are windows paths, spaces in the paths should be picked up.
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
On the other hand, if the path contains multiple spaces in a row, my solution won't pick it up correctly either (but that's very unusual in my experience).
0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
wilcoxon,

You're right.  Good eyes.  What threw me is that the "significant" space character does not have an extra space going along with it the way all the other characters do.
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Yes.  It seems to be that 1 space = garbage but two spaces = 1 real space (whereas you would expect 3 spaces for 1 real space based on the other character behavior).
0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
This fixes the bug in my code wilcoxon pointed out.  Thanks.

$out = '';
while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
$var = $out;

Open in new window

c:\temp>perl foo.pl foo.dat
E:\mail\backup (1).pst
\\server\user\backup (1).pst

Open in new window

0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:Marketing_Insists
Comment Utility
thanks for the  help

No luck yet.  Here is the code I'm working with.  The trouble is, you would have to have Windows and Outlook (preferably opened) and have .pst files associated with your profile.  Initially, the data returned is in Hex.

(Some outlook versions shoot out a non-fatal object access error if outlook is closed when the script is run )

Very bizarre how Microsoft expects anyone to work with such garbage return info.
 
use Win32::OLE;

$objOutlook = Win32::OLE->new('Outlook.Application');
$objFolders = $objOutlook->Session->Folders;

for ($i= $objFolders->Count; $i >= 1; $i += -1) {
   $objFolder = $objFolders->Item($i);

   if (((index($objFolder->Name, 'Mailbox') + 1) == 0) && ((index($objFolder->Name, 'Public Folders') + 1) == 0)) {

      print $objFolder->StoreID, "\n";  # show hex

      $var = $objFolder->StoreID;
      $var =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg; 
      
      print $var, "\n";                 # show hex to ascii

      # need help getting usable network or local paths

    }
}

Open in new window

0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
Let's focus for a minute on the step of converting hex to ascii...

Maybe there is an unprintable character (isn't that a funny expression) at the end of the $var string that is causing the match to fail, but you can't see it because it's invisible.  Like a null or a newline?

Please add a print of the raw hex $var value before the conversion, as well as your converted ascii value.

Also, have you tried both my version of the code and wilcoxon's?
0
 

Author Comment

by:Marketing_Insists
Comment Utility
Yes, I did try the versions from both of you.

Here are the two sets of hex data returned:

0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000

0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
Lots of nulls (00 bytes).  I think chr must be return blanks for them, and there are some at the end of each line.

Try this variation that trims trailing blanks before starting the parse.

$out = '';
$var =~ s/\s+$//;
while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
$var = $out; 

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
I'd do something like this...

Unfortunately I can't test it.  If you find other chars that you don't want returned besides 0, add them to the first return in mychr (or you could use ranges to exclude whole swaths of "unprintable" chars).
use Win32::OLE;

$objOutlook = Win32::OLE->new('Outlook.Application');
$objFolders = $objOutlook->Session->Folders;

for ($i= $objFolders->Count; $i >= 1; $i += -1) {
   $objFolder = $objFolders->Item($i);

   if (((index($objFolder->Name, 'Mailbox') + 1) == 0) && ((index($objFolder->Name, 'Public Folders') + 1) == 0)) {

      print $objFolder->StoreID, "\n";  # show hex

      $var = $objFolder->StoreID;
      $var =~ s/([a-fA-F0-9]{2})/mychr(hex $1)/eg; 

      # first remove the extraneous spaces
      $var =~ s{\s+$}{};
      $var =~ s{\s(?=\S)}{}g;
      # next grab the last item from the line (no more than single spaces)
      $var =~ s{^.*?(\S(?:\S+|\s(?=\S))+)$}{$1};
      
      print $var, "\n";                 # show hex to ascii

      # need help getting usable network or local paths

    }
}

sub mychr {
    my ($chr) = @_;
    return ' ' if ($chr == 0);
    return chr($chr);
}

Open in new window

0
 

Author Comment

by:Marketing_Insists
Comment Utility
I seem to be getting blank output when I use it as my last step to cleanup.  Some sample code below demonstrates:
 
@hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach $var(@hexList) {
	print "Hex: $var \n";

	$var =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg; 
	print "Hex2ASCII: $var\n";


	$out = '';
	$var =~ s/\s+$//;
	while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
	$var = $out; 
	print "Clean up: $var\n\n";


}

__END__
The output is:

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up:

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up:

Open in new window

0
 
LVL 16

Accepted Solution

by:
sjklein42 earned 250 total points
Comment Utility
I think we got it.  Rewrote hex-to-ascii.

@hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach $var(@hexList) {
	print "Hex: $var \n";


	$ascii = '';
	while ( $var =~ s/([A-F0-9]{2})//i ) { $ascii .= ( $1 eq '00' ) ? ' ' : chr(hex $1); }
	$var = $ascii;
	print "Hex2ASCII: $var\n";


	$out = '';
	$var =~ s/\s+$//;
	while ( $var =~ s/\s(.)$// ) { $out = $1 . $out; }
	$var = $out; 
	$var =~ s/^\s+//;

	print "Clean up: '$var'\n\n";

}

Open in new window



c:\temp>perl foo2.pl
Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D00690
06E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up: '\\srvprdadmin3\user\backup (1).pst'

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B0
07500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up: 'E:\mail\backup (1).pst'

Open in new window

0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 250 total points
Comment Utility
What I had suggested just about worked without modification.  The only trouble was that the space is 3 spaces when converted from the hex (in your original sample input it only appears to have 2 spaces).  To make this work, I had to modify the final regex.  Unfortunately, it now explicitly looks for something that looks like a file/path (drive letter colon or \\) - if you need to pull data that follows a different format, I can come up with a different regex (I went with this since both your samples are files).

I think this method will also be more efficient than sjklein's as it uses fewer loops (but does use a few s///g regexes).  However, I did not benchmark the times to verify.
#!/usr/local/bin/perl

use strict;
use warnings;

my @hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach my $var (@hexList) {
        print "Hex: $var \n";

        $var =~ s/([a-fA-F0-9]{2})/mychr(hex $1)/eg;
        print "Hex2ASCII: $var\n";

        # first remove the extraneous spaces
        $var =~ s{\s+$}{};
        $var =~ s{\s(?=\S|\s\s)}{}g;
        # next grab the last item from the line that looks like a file/path
        $var =~ s{^.*?((?:\w{1,2}:|\\\\)\S(?:\S|\s(?=\S))+)$}{$1};

        print "Clean up: $var\n\n";
}

sub mychr {
    my ($chr) = @_;
    return ' ' if ($chr == 0);
    return chr($chr);
}

Open in new window

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8¦¦¿¿¦¿¦ +*V¦  mspst.dll     NITA¦¦¦ ¦ 7¦n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up: \\srvprdadmin3\user\backup (1).pst

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8¦¦¿¿¦¿¦ +*V¦  mspst.dll     NITA¦¦¦ ¦ 7¦n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up: E:\mail\backup (1).pst

Open in new window

0
 

Author Closing Comment

by:Marketing_Insists
Comment Utility
Thanks guys, both solutions work great, though sj squeezed in a few minutes earlier ;)
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now