Solved

reg ex help

Posted on 2011-02-25
17
342 Views
Last Modified: 2012-06-27

Results from a windows registry query toss back somewhat polluted data, with the details I'm interested in towards the end being space delimited:
  $var = '8í+??s??í +*V-  mspst.dll     NITA·++? ¬ 7+n    E : \ m a i l \ b a c k u p  ( 1 ) . p s t'
or
  $var = '5í+??s??í +*V-  mspst.dll     NITA·++? ¬ 7+n    \ \ s e r v e r \ u s e r \ b a c k u p  ( 1 ) . p s t'

What regular expression would give me:
  $var ='E:\mail\backup (1).pst'
or
  $var = '\\server\user\backup (1).pst'
0
Comment
Question by:Marketing_Insists
  • 7
  • 6
  • 4
17 Comments
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983183
Can you work with this?

while ( <> )
{
	s/[\r\n]//g;

	$out = '';
	while ( s/\s\s?([^\s])$// ) { $out = $1 . $out; }
	print "$out\n";
}	

Open in new window


c:\temp>perl foo.pl foo.dat
E:\mail\backup(1).pst
\\server\user\backup(1).pst

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34983196
This should do it...

# first remove the extraneous spaces
$var =~ s{\s(?=\S)}{}g;
# next grab the last item from the line containing no more than single spaces
$var =~ s{^.*?(\S(?:\S+|\s(?=\S))+)$}{$1};
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983204
Within your programming context (input and output in $var)t:

$out = '';
while ( $var =~ s/\s\s?([^\s])$// ) { $out = $1 . $out; }
$var = $out;

Open in new window

0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 
LVL 26

Expert Comment

by:wilcoxon
ID: 34983236
sjklein42, it looks like your solution strips out the spaces the author wants to keep (per your example) just before the (1).  Given these are windows paths, spaces in the paths should be picked up.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34983254
On the other hand, if the path contains multiple spaces in a row, my solution won't pick it up correctly either (but that's very unusual in my experience).
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983315
wilcoxon,

You're right.  Good eyes.  What threw me is that the "significant" space character does not have an extra space going along with it the way all the other characters do.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34983342
Yes.  It seems to be that 1 space = garbage but two spaces = 1 real space (whereas you would expect 3 spaces for 1 real space based on the other character behavior).
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983410
This fixes the bug in my code wilcoxon pointed out.  Thanks.

$out = '';
while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
$var = $out;

Open in new window

c:\temp>perl foo.pl foo.dat
E:\mail\backup (1).pst
\\server\user\backup (1).pst

Open in new window

0
 

Author Comment

by:Marketing_Insists
ID: 34983663
thanks for the  help

No luck yet.  Here is the code I'm working with.  The trouble is, you would have to have Windows and Outlook (preferably opened) and have .pst files associated with your profile.  Initially, the data returned is in Hex.

(Some outlook versions shoot out a non-fatal object access error if outlook is closed when the script is run )

Very bizarre how Microsoft expects anyone to work with such garbage return info.
 
use Win32::OLE;

$objOutlook = Win32::OLE->new('Outlook.Application');
$objFolders = $objOutlook->Session->Folders;

for ($i= $objFolders->Count; $i >= 1; $i += -1) {
   $objFolder = $objFolders->Item($i);

   if (((index($objFolder->Name, 'Mailbox') + 1) == 0) && ((index($objFolder->Name, 'Public Folders') + 1) == 0)) {

      print $objFolder->StoreID, "\n";  # show hex

      $var = $objFolder->StoreID;
      $var =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg; 
      
      print $var, "\n";                 # show hex to ascii

      # need help getting usable network or local paths

    }
}

Open in new window

0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983781
Let's focus for a minute on the step of converting hex to ascii...

Maybe there is an unprintable character (isn't that a funny expression) at the end of the $var string that is causing the match to fail, but you can't see it because it's invisible.  Like a null or a newline?

Please add a print of the raw hex $var value before the conversion, as well as your converted ascii value.

Also, have you tried both my version of the code and wilcoxon's?
0
 

Author Comment

by:Marketing_Insists
ID: 34983940
Yes, I did try the versions from both of you.

Here are the two sets of hex data returned:

0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000

0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34984055
Lots of nulls (00 bytes).  I think chr must be return blanks for them, and there are some at the end of each line.

Try this variation that trims trailing blanks before starting the parse.

$out = '';
$var =~ s/\s+$//;
while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
$var = $out; 

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34984432
I'd do something like this...

Unfortunately I can't test it.  If you find other chars that you don't want returned besides 0, add them to the first return in mychr (or you could use ranges to exclude whole swaths of "unprintable" chars).
use Win32::OLE;

$objOutlook = Win32::OLE->new('Outlook.Application');
$objFolders = $objOutlook->Session->Folders;

for ($i= $objFolders->Count; $i >= 1; $i += -1) {
   $objFolder = $objFolders->Item($i);

   if (((index($objFolder->Name, 'Mailbox') + 1) == 0) && ((index($objFolder->Name, 'Public Folders') + 1) == 0)) {

      print $objFolder->StoreID, "\n";  # show hex

      $var = $objFolder->StoreID;
      $var =~ s/([a-fA-F0-9]{2})/mychr(hex $1)/eg; 

      # first remove the extraneous spaces
      $var =~ s{\s+$}{};
      $var =~ s{\s(?=\S)}{}g;
      # next grab the last item from the line (no more than single spaces)
      $var =~ s{^.*?(\S(?:\S+|\s(?=\S))+)$}{$1};
      
      print $var, "\n";                 # show hex to ascii

      # need help getting usable network or local paths

    }
}

sub mychr {
    my ($chr) = @_;
    return ' ' if ($chr == 0);
    return chr($chr);
}

Open in new window

0
 

Author Comment

by:Marketing_Insists
ID: 34984471
I seem to be getting blank output when I use it as my last step to cleanup.  Some sample code below demonstrates:
 
@hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach $var(@hexList) {
	print "Hex: $var \n";

	$var =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg; 
	print "Hex2ASCII: $var\n";


	$out = '';
	$var =~ s/\s+$//;
	while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
	$var = $out; 
	print "Clean up: $var\n\n";


}

__END__
The output is:

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up:

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up:

Open in new window

0
 
LVL 16

Accepted Solution

by:
sjklein42 earned 250 total points
ID: 34984587
I think we got it.  Rewrote hex-to-ascii.

@hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach $var(@hexList) {
	print "Hex: $var \n";


	$ascii = '';
	while ( $var =~ s/([A-F0-9]{2})//i ) { $ascii .= ( $1 eq '00' ) ? ' ' : chr(hex $1); }
	$var = $ascii;
	print "Hex2ASCII: $var\n";


	$out = '';
	$var =~ s/\s+$//;
	while ( $var =~ s/\s(.)$// ) { $out = $1 . $out; }
	$var = $out; 
	$var =~ s/^\s+//;

	print "Clean up: '$var'\n\n";

}

Open in new window



c:\temp>perl foo2.pl
Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D00690
06E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up: '\\srvprdadmin3\user\backup (1).pst'

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B0
07500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up: 'E:\mail\backup (1).pst'

Open in new window

0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 250 total points
ID: 34985120
What I had suggested just about worked without modification.  The only trouble was that the space is 3 spaces when converted from the hex (in your original sample input it only appears to have 2 spaces).  To make this work, I had to modify the final regex.  Unfortunately, it now explicitly looks for something that looks like a file/path (drive letter colon or \\) - if you need to pull data that follows a different format, I can come up with a different regex (I went with this since both your samples are files).

I think this method will also be more efficient than sjklein's as it uses fewer loops (but does use a few s///g regexes).  However, I did not benchmark the times to verify.
#!/usr/local/bin/perl

use strict;
use warnings;

my @hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach my $var (@hexList) {
        print "Hex: $var \n";

        $var =~ s/([a-fA-F0-9]{2})/mychr(hex $1)/eg;
        print "Hex2ASCII: $var\n";

        # first remove the extraneous spaces
        $var =~ s{\s+$}{};
        $var =~ s{\s(?=\S|\s\s)}{}g;
        # next grab the last item from the line that looks like a file/path
        $var =~ s{^.*?((?:\w{1,2}:|\\\\)\S(?:\S|\s(?=\S))+)$}{$1};

        print "Clean up: $var\n\n";
}

sub mychr {
    my ($chr) = @_;
    return ' ' if ($chr == 0);
    return chr($chr);
}

Open in new window

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8¦¦¿¿¦¿¦ +*V¦  mspst.dll     NITA¦¦¦ ¦ 7¦n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up: \\srvprdadmin3\user\backup (1).pst

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8¦¦¿¿¦¿¦ +*V¦  mspst.dll     NITA¦¦¦ ¦ 7¦n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up: E:\mail\backup (1).pst

Open in new window

0
 

Author Closing Comment

by:Marketing_Insists
ID: 35063295
Thanks guys, both solutions work great, though sj squeezed in a few minutes earlier ;)
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Excel to CSV conversion with specific columns 5 83
Extract data from span tag 1 95
XML::LibXML and Xpath syntax How do I get attribute of sibling 2 146
Perl Untar File 1 54
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

790 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question