Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

reg ex help

Posted on 2011-02-25
17
Medium Priority
?
348 Views
Last Modified: 2012-06-27

Results from a windows registry query toss back somewhat polluted data, with the details I'm interested in towards the end being space delimited:
  $var = '8í+??s??í +*V-  mspst.dll     NITA·++? ¬ 7+n    E : \ m a i l \ b a c k u p  ( 1 ) . p s t'
or
  $var = '5í+??s??í +*V-  mspst.dll     NITA·++? ¬ 7+n    \ \ s e r v e r \ u s e r \ b a c k u p  ( 1 ) . p s t'

What regular expression would give me:
  $var ='E:\mail\backup (1).pst'
or
  $var = '\\server\user\backup (1).pst'
0
Comment
Question by:Marketing_Insists
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
  • 4
17 Comments
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983183
Can you work with this?

while ( <> )
{
	s/[\r\n]//g;

	$out = '';
	while ( s/\s\s?([^\s])$// ) { $out = $1 . $out; }
	print "$out\n";
}	

Open in new window


c:\temp>perl foo.pl foo.dat
E:\mail\backup(1).pst
\\server\user\backup(1).pst

Open in new window

0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 34983196
This should do it...

# first remove the extraneous spaces
$var =~ s{\s(?=\S)}{}g;
# next grab the last item from the line containing no more than single spaces
$var =~ s{^.*?(\S(?:\S+|\s(?=\S))+)$}{$1};
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983204
Within your programming context (input and output in $var)t:

$out = '';
while ( $var =~ s/\s\s?([^\s])$// ) { $out = $1 . $out; }
$var = $out;

Open in new window

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 27

Expert Comment

by:wilcoxon
ID: 34983236
sjklein42, it looks like your solution strips out the spaces the author wants to keep (per your example) just before the (1).  Given these are windows paths, spaces in the paths should be picked up.
0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 34983254
On the other hand, if the path contains multiple spaces in a row, my solution won't pick it up correctly either (but that's very unusual in my experience).
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983315
wilcoxon,

You're right.  Good eyes.  What threw me is that the "significant" space character does not have an extra space going along with it the way all the other characters do.
0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 34983342
Yes.  It seems to be that 1 space = garbage but two spaces = 1 real space (whereas you would expect 3 spaces for 1 real space based on the other character behavior).
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983410
This fixes the bug in my code wilcoxon pointed out.  Thanks.

$out = '';
while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
$var = $out;

Open in new window

c:\temp>perl foo.pl foo.dat
E:\mail\backup (1).pst
\\server\user\backup (1).pst

Open in new window

0
 

Author Comment

by:Marketing_Insists
ID: 34983663
thanks for the  help

No luck yet.  Here is the code I'm working with.  The trouble is, you would have to have Windows and Outlook (preferably opened) and have .pst files associated with your profile.  Initially, the data returned is in Hex.

(Some outlook versions shoot out a non-fatal object access error if outlook is closed when the script is run )

Very bizarre how Microsoft expects anyone to work with such garbage return info.
 
use Win32::OLE;

$objOutlook = Win32::OLE->new('Outlook.Application');
$objFolders = $objOutlook->Session->Folders;

for ($i= $objFolders->Count; $i >= 1; $i += -1) {
   $objFolder = $objFolders->Item($i);

   if (((index($objFolder->Name, 'Mailbox') + 1) == 0) && ((index($objFolder->Name, 'Public Folders') + 1) == 0)) {

      print $objFolder->StoreID, "\n";  # show hex

      $var = $objFolder->StoreID;
      $var =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg; 
      
      print $var, "\n";                 # show hex to ascii

      # need help getting usable network or local paths

    }
}

Open in new window

0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983781
Let's focus for a minute on the step of converting hex to ascii...

Maybe there is an unprintable character (isn't that a funny expression) at the end of the $var string that is causing the match to fail, but you can't see it because it's invisible.  Like a null or a newline?

Please add a print of the raw hex $var value before the conversion, as well as your converted ascii value.

Also, have you tried both my version of the code and wilcoxon's?
0
 

Author Comment

by:Marketing_Insists
ID: 34983940
Yes, I did try the versions from both of you.

Here are the two sets of hex data returned:

0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000

0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34984055
Lots of nulls (00 bytes).  I think chr must be return blanks for them, and there are some at the end of each line.

Try this variation that trims trailing blanks before starting the parse.

$out = '';
$var =~ s/\s+$//;
while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
$var = $out; 

Open in new window

0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 34984432
I'd do something like this...

Unfortunately I can't test it.  If you find other chars that you don't want returned besides 0, add them to the first return in mychr (or you could use ranges to exclude whole swaths of "unprintable" chars).
use Win32::OLE;

$objOutlook = Win32::OLE->new('Outlook.Application');
$objFolders = $objOutlook->Session->Folders;

for ($i= $objFolders->Count; $i >= 1; $i += -1) {
   $objFolder = $objFolders->Item($i);

   if (((index($objFolder->Name, 'Mailbox') + 1) == 0) && ((index($objFolder->Name, 'Public Folders') + 1) == 0)) {

      print $objFolder->StoreID, "\n";  # show hex

      $var = $objFolder->StoreID;
      $var =~ s/([a-fA-F0-9]{2})/mychr(hex $1)/eg; 

      # first remove the extraneous spaces
      $var =~ s{\s+$}{};
      $var =~ s{\s(?=\S)}{}g;
      # next grab the last item from the line (no more than single spaces)
      $var =~ s{^.*?(\S(?:\S+|\s(?=\S))+)$}{$1};
      
      print $var, "\n";                 # show hex to ascii

      # need help getting usable network or local paths

    }
}

sub mychr {
    my ($chr) = @_;
    return ' ' if ($chr == 0);
    return chr($chr);
}

Open in new window

0
 

Author Comment

by:Marketing_Insists
ID: 34984471
I seem to be getting blank output when I use it as my last step to cleanup.  Some sample code below demonstrates:
 
@hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach $var(@hexList) {
	print "Hex: $var \n";

	$var =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg; 
	print "Hex2ASCII: $var\n";


	$out = '';
	$var =~ s/\s+$//;
	while ( $var =~ s/\s([^\s]\s?)$// ) { $out = $1 . $out; }
	$var = $out; 
	print "Clean up: $var\n\n";


}

__END__
The output is:

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up:

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up:

Open in new window

0
 
LVL 16

Accepted Solution

by:
sjklein42 earned 1000 total points
ID: 34984587
I think we got it.  Rewrote hex-to-ascii.

@hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach $var(@hexList) {
	print "Hex: $var \n";


	$ascii = '';
	while ( $var =~ s/([A-F0-9]{2})//i ) { $ascii .= ( $1 eq '00' ) ? ' ' : chr(hex $1); }
	$var = $ascii;
	print "Hex2ASCII: $var\n";


	$out = '';
	$var =~ s/\s+$//;
	while ( $var =~ s/\s(.)$// ) { $out = $1 . $out; }
	$var = $out; 
	$var =~ s/^\s+//;

	print "Clean up: '$var'\n\n";

}

Open in new window



c:\temp>perl foo2.pl
Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D00690
06E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up: '\\srvprdadmin3\user\backup (1).pst'

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B0
07500700020002800310029002E007000730074000000
Hex2ASCII:     8í+¿¿s¿¿í +*V-  mspst.dll     NITA·++¿ ¬ 7+n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up: 'E:\mail\backup (1).pst'

Open in new window

0
 
LVL 27

Assisted Solution

by:wilcoxon
wilcoxon earned 1000 total points
ID: 34985120
What I had suggested just about worked without modification.  The only trouble was that the space is 3 spaces when converted from the hex (in your original sample input it only appears to have 2 spaces).  To make this work, I had to modify the final regex.  Unfortunately, it now explicitly looks for something that looks like a file/path (drive letter colon or \\) - if you need to pull data that follows a different format, I can come up with a different regex (I went with this since both your samples are files).

I think this method will also be more efficient than sjklein's as it uses fewer loops (but does use a few s///g regexes).  However, I did not benchmark the times to verify.
#!/usr/local/bin/perl

use strict;
use warnings;

my @hexList = (
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000",
"0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000"
);

foreach my $var (@hexList) {
        print "Hex: $var \n";

        $var =~ s/([a-fA-F0-9]{2})/mychr(hex $1)/eg;
        print "Hex2ASCII: $var\n";

        # first remove the extraneous spaces
        $var =~ s{\s+$}{};
        $var =~ s{\s(?=\S|\s\s)}{}g;
        # next grab the last item from the line that looks like a file/path
        $var =~ s{^.*?((?:\w{1,2}:|\\\\)\S(?:\S|\s(?=\S))+)$}{$1};

        print "Clean up: $var\n\n";
}

sub mychr {
    my ($chr) = @_;
    return ' ' if ($chr == 0);
    return chr($chr);
}

Open in new window

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E000000005C005C00730072007600700072006400610064006D0069006E0033005C0075007300650072005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8¦¦¿¿¦¿¦ +*V¦  mspst.dll     NITA¦¦¦ ¦ 7¦n    \ \ s r v p r d a d m i n 3 \ u s e r \ b a c k u p   ( 1 ) . p s t
Clean up: \\srvprdadmin3\user\backup (1).pst

Hex: 0000000038A1BB1005E5101AA1BB08002B2A56C200006D737073742E646C6C00000000004E495441F9BFB80100AA0037D96E0000000045003A005C006D00610069006C005C006200610063006B007500700020002800310029002E007000730074000000
Hex2ASCII:     8¦¦¿¿¦¿¦ +*V¦  mspst.dll     NITA¦¦¦ ¦ 7¦n    E : \ m a i l \ b a c k u p   ( 1 ) . p s t
Clean up: E:\mail\backup (1).pst

Open in new window

0
 

Author Closing Comment

by:Marketing_Insists
ID: 35063295
Thanks guys, both solutions work great, though sj squeezed in a few minutes earlier ;)
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question