• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 356
  • Last Modified:

Need regex help!!!

I need to extract items 3,5, and 6 from the markup below:

<ul>
    <p id="XXX" XXX="test" class="something title test XXX" id="XXX" XXX="test">tester 2 XXX tester</p>
        <XXX>
           tester 3 XXX Care is a type of care that allows people facing
        </p>
        <XXX>
            testXXXer 4 Care is a type of care that allows people facing
        </p>
        <XXX>
            tester 5 XXX Care is a type of 6 XXX care that allows people facing
        </p>

Currently the regex I have (seen below) gets 2,3,5, and 6.  How do I get it to stop hitting 2?
(?<!<[^>]*(class="[^"]*title[^"]*")[^>]*)
(?<=(>[^<>]*))
(\sXXX\s)

Open in new window

0
abemiester
Asked:
abemiester
  • 5
  • 5
1 Solution
 
ozoCommented:
([^<>2]*[356]([^<>2]*)
0
 
ozoCommented:
([^<>2]*[356][^<>2]*)
0
 
abemiesterAuthor Commented:
I apologize for the confusion.  I'm trying to match "XXX" the numbers i refer to are the ones infront of "XXX".
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
abemiesterAuthor Commented:
And to clarify those are the ONLY instances of "XXX" i want to match.  If you look at the source you can see there are several unnumbered instances of XXX as well that must not be matched.
0
 
ozoCommented:
(?:\b[356]\b\s*)(\w+)
0
 
abemiesterAuthor Commented:
Close.  Let me explain what I am tryign to achieve.

I want to match any "XXX" that is not inside of a tag that uses the class "title".  In addition the XXX cannot be inside of a tag.

Example (Should not match):
<p XXX="id" id="XXX">test</p>

Example(Should not match):
<p class="test XXX title">test</p>

Example(Should not match):
<p class="test title">XXX</p>

Example(Should match):
<p class="test">XXX</p>

Does this help clarify what i'm trying to do?
0
 
orbitusCommented:
How about...

(\d+)\sXXX(?!.+?</p>)
0
 
abemiesterAuthor Commented:
The digit is really irrelevant.  The only reason i put it in the text was so I could refer to it in this post.

I want the XXX that is NEXT to 3,5, and 6.  I don't care about the numbers 3,5, or 6.  I just included them so i could explain which XXX i wanted to match.  I think that the original regex i provided is a good place to start.

  Also it can be any html tag.  Not just <p>.  Trying to match XXX that is not in ANY tag with the class "title" and XXX that is not an attribute of or ANY tag.
0
 
ozoCommented:
(?:<(\w+)\b[^>]*class="[^"]*title[^"]*"[^>]*>[\s\S]*?<\/\1|[\S\s])*?(XXX)(?![^<>]*>)
0
 
abemiesterAuthor Commented:
That's almost it ozo!  Only problem is that it matches the XXX with a 4 infront of it.  If we can stop matching that one we're set.
0
 
ozoCommented:
(?:<(\w+)\b[^>]*class="[^"]*title[^"]*"[^>]*>[\s\S]*?<\/\1|[\S\s])*?\b(XXX)\b(?![^<>]*>)
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 5
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now