We help IT Professionals succeed at work.

Convert regex to regexp

campinam
campinam used Ask the Experts™
on
In .Net the regex (?<=>[^<]*?)\bă works as expected (find words that start with ă and are not located within a tag).

But in JavaScript Chrome \b won't work with Unicode texts.

What would be the correct equivalent in JavaScript for the above .Net expression?

(I am referring to the latest versions of regex/regexp)
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2014

Commented:
I'm surprised it works in .Net
I think this should work:
 (?<=>[^<]*?\W)ă

Open in new window


\W is a non-word (non-letter, non-number) character.

Author

Commented:
\W is a non-word, but an ASCII construct n JavaScript. It matches non-ASCII letters as well, even with the u flag. So it won't do.
If Javascript is not Unicode-aware in its regexes, I can only think of one option - fake your own \W:
 (?<=>[^<]*?[^_all_chars_])ă

Open in new window

Where you replace _all_chars_ with literally all characters (or ranges of characters) that cover what you want counted as a word-character.