?
Solved

regex help

Posted on 2006-05-11
20
Medium Priority
?
610 Views
Last Modified: 2008-01-09
New to regexes, so I'd also like a brief explanation of the answer, but I'm looking for a regex that would match

    SCSI Bus 0
     SCSI Target 0 ..................(+) COMPAQ  BF03685A35
      Capacity ......................(+) 34,732 MBytes (71132000 Blocks)
      Serial Number .................(+) 3HX14GZ300007402A8CA
      Firmware Version ..............(+) HPB4

AND

    SCSI Bus 0
     SCSI Target 0 .................. COMPAQ  BF03685A35
      Capacity ...................... 34,732 MBytes (71132000 Blocks)
      Serial Number ................. 3HX14GZ300007402A8CA
      Firmware Version .............. HPB4

BUT NOT

    SCSI Bus 0
     SCSI Target 0 ..................(-) COMPAQ  BF03685A35
      Capacity ......................(-) 34,732 MBytes (71132000 Blocks)
      Serial Number .................(-) 3HX14GZ300007402A8CA
      Firmware Version ..............(-) HPB4
 
Obviously SCSI Bus 0 would be matched in all three - and that's fine, but I want to exclude lines that contain (-)

This pattern is matching all...

SCSI (Bus|Target) \d+\s*\.*^\(^\-^\)|Capacity (\.+?[^(][^-][^)]|\.+?[(][+][)]) [\d,]+? [kmg]Bytes \(\d+ Blocks\)
0
Comment
Question by:sirbounty
  • 11
  • 5
  • 2
  • +2
20 Comments
 
LVL 65

Expert Comment

by:rockiroads
ID: 16657643
never was a fan of regex, what language u using?
some examples from o'reilly here that may help you http://examples.oreilly.com/regex/
sorry couldnt be more help
0
 
LVL 67

Author Comment

by:sirbounty
ID: 16657722
I'm using vbs/vb6.
The examples are nice - I've been going through a few tutorial sites, but I just don't 'get' it (yet, I hope)... :^)
0
 
LVL 65

Expert Comment

by:rockiroads
ID: 16657787
Some things to possibly help you along your way, well that is until some expert answers your questions!

There is something called RegexBuddy which looks quite good - http://www.regexbuddy.com unfortunately there is no trial software, just full version so not worth it unless u got loads of money and can afford to waste money on things like this :)


There is a free one, but its .Net,  http://www.sellsbrothers.com/tools/

Sorry, couldnt be more help

one last thing, out of your tutorials, have u seen the one from MS? most likely u have, but just in case http://support.microsoft.com/default.aspx?scid=kb;en-us;818802
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 67

Author Comment

by:sirbounty
ID: 16657845
I've got .Net as well - perhaps I'll try that one.
I've been just 'trying' different patterns in vb6 (debugger) to try and find a good match - but no luck yet...
Thanx.
0
 
LVL 35

Assisted Solution

by:mvidas
mvidas earned 1600 total points
ID: 16658306
SB,

Try:

"SCSI Bus \d+|SCSI Target \d+\s*\.*\(?[^-]\)?|Capacity \.+\(?[^-]\)? [\d,]+? [kmg]Bytes \(\d+ Blocks\)"


Since the | is an "OR", I'll break each OR section down:


SCSI Bus \d+

This just gets the Bus lines.. since there is no (+) or (-) part, it will pull the bus for the (-) sections.  But filtering out the (-) later will remove the bad ones.


SCSI Target \d+\s*\.*\(?[^-]\)?

"SCSI Target " part is just a literal match.
"\d+" checks for a number string of at least length one
"\s*" matches space character zero or more times
"\.*" matches the "." character zero or more times
"\(?" matches a "(" symbol zero or one time
"[^-]" matches any character that is not a "-"
"\)?" matches a ")" symbol zero or one time
The last three combined, "\(?[^-]\)?", will match (+) but not (-). It would actually match (x) too, but since your data will either have nothing, (-), or (+), this is all that is necessary


"Capacity \.+\(?[^-]\)? [\d,]+ [kmg]Bytes \(\d+ Blocks\)"

"Capacity " is again just a literal match
"\.+" matches any/all "." characters
"\(?[^-]\)?" is explained above
" " - just a space (after the "." characters and possibly after the (+) if it exists)
"[\d,]+" means "match any digit or comma character one or more times" - to get the capacity
" " - just a space, after the capacity
"[kmg]Bytes" - matches KBytes, MBytes, GBytes after the capacity figure
" " just a space, after the xBytes
"\(" matches a "(" character
"\d+" matches the digits for the blocks amount
" Blocks" literal string match again
"\)" matches the ending ")" character after Blocks


Matt
0
 
LVL 35

Expert Comment

by:mvidas
ID: 16658333
Not that it will make a difference, but before the xBytes, I have:

[\d,]+?

When I only explained it as

[\d,]+

Since the ? wasn't necessary here.  Using +? is a special case not needed here, but I'd be happy to explain why you'd use +? or *? instead of just + or *.
If you used something like:
".*"
That would return anything and everything up to a line feed character.  However, if you only wanted everything up to and including "sirbounty", you'd use:
".*?sirbounty"

The ? makes the wildcard only applicable up to whatever comes after it
0
 
LVL 5

Assisted Solution

by:Dragon_Krome
Dragon_Krome earned 200 total points
ID: 16659061
The following regex matches the first two examples but not the third (tested with perl compatible regular expressions):

SCSI (Bus|Target) \d+\s+SCSI Target \d+ \.+(\(\+\))? ([\w\s]+) Capacity \.+(\(\+\))? (([\d,]+) ([KMG]bytes)) \((\d+) Blocks\)\s+Serial Number \.+(\(\+\))? \w+\s+Firmware Version \.+(\(\+\))? \w+


 sirbounty, is there any information in particular you want to extract from there, or you just want to match the normal and (+) samples, while not matching the (-) one ? Can you give a little more details on your problem?
0
 
LVL 41

Assisted Solution

by:HonorGod
HonorGod earned 200 total points
ID: 16662534
Explaination of regexp:
--------------------------
SCSI (Bus|Target) \d+\s*\.*^\(^\-^\)|Capacity (\.+?[^(][^-][^)]|\.+?[(][+][)]) [\d,]+? [kmg]Bytes \(\d+ Blocks\)
-------------------------
The "SCSI " will exactly exactly those characters (including the space".  Note, they don't have to start at the beginning of the line, because the "beginning of line" meta-character is not present (which is fine).

The "(Bus|Target) " will match either of these words when the immediately follow the "SCSI ".  So, the string needs to be either "SCSI Bus " or "SCSI Target ".

The \d+ will match 1 or more digits.
The \s* will match 0 or more "white space" characters (e.g., blank, tab)
The \.* will match 0 or more periods
The ^ will try to match a beginning of line,
The "\(^\-^\)|Capacity " will try to match lines that start with a dash in parenthesis, or that begin with "Capacity "
The (\.+?[^(][^-][^)]|\.+?[(][+][)]) looks for leading periods (one or more - non-greedy), followed by a non-open parenthesis, followed by a non-dash, followed by a non-close parenthesis "or" leading periods, followed by an open parenthesis, followed by a plus, followed by a close parenthesis.
The " [\d,]+? " looks for a blank, followed by one or more digist or commas, followed by a blank
The "[kmg]Bytes " looks for a "k", or an "m", or a "g" followed by "Bytes "
The "\(\d+ Blocks\)" looks for one or more digits followed by "Blocks" within parenthesis

0
 
LVL 67

Author Comment

by:sirbounty
ID: 16669809
Thanx for the explanation everyone.
Out of close to 200 text files, this is now working (current method) on all but 3 (was all but 10).
So, I'll be going through these suggestions probably on Monday.
Extra thanx for the explanations.  I've read through some of them and I 'think' I'm starting to get it.  :^)
0
 
LVL 67

Author Comment

by:sirbounty
ID: 16687146
Matt - your first comment seems to work on a few test servers.
"Live" run tomorrow night - so I'll let you know...
0
 
LVL 67

Author Comment

by:sirbounty
ID: 16724561
Sorry for the slight delay - had some other major bugs to work out.

We do want to pull back the serial number info as well, so I tried the regex (posted by Dragon_Krome):
SCSI (Bus|Target) \d+\s+SCSI Target \d+ \.+(\(\+\))? ([\w\s]+) Capacity \.+(\(\+\))? (([\d,]+) ([KMG]bytes)) \((\d+) Blocks\)\s+Serial Number \.+(\(\+\))? \w+\s+Firmware Version \.+(\(\+\))? \w+

But it doesn't appear to be working for me - it skips right over the test (value at that point is:
    SCSI Bus 0

I'm not sure how to modify the other two to include serial number info...

0
 
LVL 35

Accepted Solution

by:
mvidas earned 1600 total points
ID: 16733624
Hi SB,

DK's suggestion looks alright, though you have to have all 5 lines in a row to match (bus, target, capacity, serial number, firmware version). I'm sure he can modify it accordingly though.

If you wanted to still use mine, but include the serial number line"

"SCSI Bus \d+|SCSI Target \d+\s*\.*\(?[^-]?\)?|Capacity \.+\(?[^-]?\)? [\d,]+ [kmg]Bytes \(\d+ Blocks\)|Serial Number \.+\(?[^-]?\)? [a-z0-9]+"

I just added another OR ("|") then the trap for serial number.  For the individual trap, to extract the serial number (if you're using my method from another question), it would be another Else case

'code up to this point
   Else
    RegEx.Pattern = "Capacity \S+\(?[^-]\)?\s(.+?)\s"
    If RegEx.Test(FileCont(i)) Then
     tempArr(2) = RegEx.Execute(FileCont(i)).Item(0).SubMatches(0)
    Else
     RegEx.Pattern = "Serial Number \.+\(?[^-]?\)? ([a-z0-9]+)"
     If RegEx.Test(FileCont(i)) Then
      tempArr(3) = RegEx.Execute(FileCont(i)).Item(0).SubMatches(0)
      ReDim Preserve DriveArray(iCnt)
      DriveArray(iCnt) = tempArr
      iCnt = iCnt + 1
    Else
'etc

You'll also have to change
 ReDim tempArr(2)

To:
 ReDim tempArr(3)

Matt
0
 
LVL 67

Author Comment

by:sirbounty
ID: 16733798
This is what I have from before, with the above added...

 If RegEx.Test(FileCont(i)) Then
   tempArr(0) = RegEx.Execute(FileCont(i)).Item(0).SubMatches(0)
  Else
   RegEx.Pattern = "SCSI Target (\d+)"
   If RegEx.Test(FileCont(i)) Then
     tempArr(1) = RegEx.Execute(FileCont(i)).Item(0).SubMatches(0)
   Else
    RegEx.Pattern = "Capacity \S+\s(.+?)\s"
    If RegEx.Test(FileCont(i)) Then
     tempArr(2) = RegEx.Execute(FileCont(i)).Item(0).SubMatches(0)
     ReDim Preserve DriveArray(iCnt)
     DriveArray(iCnt) = tempArr
     iCnt = iCnt + 1
   Else
     RegEx.Pattern = "Serial Number \.+\(?[^-]?\)? ([a-z0-9]+)"
     If RegEx.Test(FileCont(i)) Then
      tempArr(3) = RegEx.Execute(FileCont(i)).Item(0).SubMatches(0)
      ReDim Preserve DriveArray(iCnt)
      DriveArray(iCnt) = tempArr
      iCnt = iCnt + 1
     End If
    End If
   End If
  End If
 Next

But should
tempArr(0) = RegEx.Execute(FileCont(i)).Item(0).SubMatches(0)
Be
tempArr(0) = RegEx.Execute(FileCont(i)).Item(0)

Cause the value of the first is "0", while the latter is "SCSI Bus 0" (I'm stepping through it using VB)
0
 
LVL 67

Author Comment

by:sirbounty
ID: 16733854
Scratch that - I was missing the logic...
0
 
LVL 35

Expert Comment

by:mvidas
ID: 16733863
That depends, I guess.  I thought you only wanted the "0" returned from it.  If you want the full line, then yes remove the .submatches(0) portion (that goes for each part of it).

You'll want to remove the:

     ReDim Preserve DriveArray(iCnt)
     DriveArray(iCnt) = tempArr
     iCnt = iCnt + 1

From the capacity match block, otherwise you'll get double the data returned, with half having serial numbers and the other half not having them.  Also, don't forget to change the "ReDim tempArr(2)" line to 3, for the serial number.
0
 
LVL 67

Author Comment

by:sirbounty
ID: 16733913
Nope - it's working fine...except.
This is what I'm getting from the test file I'm using:

0|0|34,732|3HX1TTPK00007424CX52
0|1|34,732|A0A1P38035R20332
1|0|34,732|3HX0M4N400007342W6UF
1|1|34,732|3HX0LC3J00007342BHV7
1|2|34,732|3HX0NLH00000734206NK|
1|3|34,732|3HX0G1AP00007342X1YC|
1|4|34,732|3HX0M2ZR00007342BJCK|
1|5|34,732|3HX0N1MK00007342FV42|
1|5|34,732|9J27FLW1XCZ3|
1|5|34,732|P432A0EBFM5HXR

Problem is - obviously there can't be 3 disks on target 5 (bus 1).
The latter 2 serial numbers come from the s/n of the subsystem and another installed component.

I'm not sure of a way to 'stop' searching after all are retrieved, cause "Serial Number" obviously occurs more than just in those blocks.  This is the data including the last drive and the data blocks following it...perhaps you can spot something - I just hope it's static enough to cover all intances - not ever server will have an external storage cabinet, but I 'think' all internals are referred to as Compaq ProLiant Storage Box 0 - so maybe that would be the line to stop scanning at.

    SCSI Target 5 ..................... COMPAQ  BF03685A35
      Capacity ......................... 34,732 MBytes (71132000 Blocks)
      Serial Number .................... 3HX0N1MK00007342FV42
      Firmware Version ................. HPB3
   Compaq ProLiant Storage Box 0 ....... COMPAQ   PROLIANT 4L6I
    Firmware Version ................... 1.84
    Storage Slot 0 ..................... COMPAQ  BF03685A35 (wide)
    Storage Slot 1 ..................... COMPAQ  BF036863B5 (wide)
    Storage Slot 2 ..................... [empty]
    Storage Slot 3 ..................... [empty]
    Storage Slot 4 ..................... [empty]
    Storage Slot 5 ..................... [empty]
    Operating Temperature .............. normal
   Compaq ProLiant Storage Box 1 ....... COMPAQ   PROLIANT 4LEE
    Firmware Version ................... JB4F
    Serial Number ...................... 9J27FLW1XCZ3
    Storage Slot 0 ..................... COMPAQ  BF03685A35 (wide)
    Storage Slot 1 ..................... COMPAQ  BF03685A35 (wide)
    Storage Slot 2 ..................... COMPAQ  BF03685A35 (wide)
    Storage Slot 3 ..................... COMPAQ  BF03685A35 (wide)
    Storage Slot 4 ..................... COMPAQ  BF03685A35 (wide)
    Storage Slot 5 ..................... COMPAQ  BF03685A35 (wide)
    Cooling Fan(s) ..................... operational
    Operating Temperature .............. normal
    Redundant Power Supply ............. operational

I'll also be opening a new question to grab the firmware - I neglected to spot that data was in there.  I don't want it answered here though, cause it appears the scope of this question is just about at its conclusion.
Thanx!
0
 
LVL 35

Assisted Solution

by:mvidas
mvidas earned 1600 total points
ID: 16733968
If the file is always setup like that (with the same number of spaces before 'Serial Number'), you could add them before it to make sure it grabs the right ones (since the firmware looks to have less spaces):

"SCSI Bus \d+|SCSI Target \d+\s*\.*\(?[^-]?\)?|Capacity \.+\(?[^-]?\)? [\d,]+ [kmg]Bytes \(\d+ Blocks\)|      Serial Number \.+\(?[^-]?\)? [a-z0-9]+"

You could also follow the same logic for the other fields to be safe as well.
Matt
0
 
LVL 67

Author Comment

by:sirbounty
ID: 16734012
Looks to be it.  It's working on this file.  Let me try it on a few more and I'll post the results.
Thanx!
0
 
LVL 67

Author Comment

by:sirbounty
ID: 16734146
So far so good - works on my test server...190~something servers left to test... :)
I think I'll roll this new function out and see where we stand...hopefully they will all work as smoothly.
If not, I'll post the troublemakers here, if there aren't too many.
Many thanx again.
0
 
LVL 67

Author Comment

by:sirbounty
ID: 16744447
Of all the ones reporting (I 'think' that's all of them) - we have 100%!
Thanx again Matt - and thanx to everyone else that helped explain this to me.

I 'may' come back for the firmware later - if I do, I'll post an invite here.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Make the most of your online learning experience.
This article will show how Aten was able to supply easy management and control for Artear's video walls and wide range display configurations of their newsroom.
Six Sigma Control Plans
Progress

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question