David Glover
asked on
How to alter this REGEX pattern to match whole words rather than parts?
So in this example, the word man is found in amongst HTML markup.
(MAN)(?=[^>]*<)
It is matched ok but it also matches for occurrences in the HTML of MANAGEMENT or GERMANY etc. It isn't as easy as saying I want to see a space on each side either since punctuation and non alpha numeric characters would be acceptable.
How can I enhance the pattern so that alpha numeric characters preceding or following prevent matching?
Thanks!
(MAN)(?=[^>]*<)
It is matched ok but it also matches for occurrences in the HTML of MANAGEMENT or GERMANY etc. It isn't as easy as saying I want to see a space on each side either since punctuation and non alpha numeric characters would be acceptable.
How can I enhance the pattern so that alpha numeric characters preceding or following prevent matching?
Thanks!
Try this pattern: ([^a-z\b0-9]?)(MAN)([^a-z\ b0-9]?).
ASKER
I tried it with man but sadly it did not work.
([^a-z\b0-9]?)(man)([^a-z\ b0-9]?)
but unfortunately this matched man in 'germany' and other words.
My code adds ( ) and (?=[^>]*<) to your expression to filter HTML to become
(([^a-z\b0-9]?)(man)([^a-z \b0-9]?))( ?=[^>]*<)
Would this have broken it?
([^a-z\b0-9]?)(man)([^a-z\
but unfortunately this matched man in 'germany' and other words.
My code adds ( ) and (?=[^>]*<) to your expression to filter HTML to become
(([^a-z\b0-9]?)(man)([^a-z
Would this have broken it?
Sorry I forgot about the case sensitive when copy paste here. This is the best pattern that I can think of (and already checked): (([^a-zA-Z]+)|(\b))(MAN)(( [^>a-zA-Z] )|(\b))
Hope it could help.
Hope it could help.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks Ozo that worked fine!