We help IT Professionals succeed at work.

regex need explanation

bhomass
bhomass asked
on
I am having trouble learning this piece of regex. can an expert help please.

The posting says

            String pattern = "(?i)(<title.*?>)(.+?)(</title>)";
            String updated = EXAMPLE_TEST.replaceAll(pattern, "$2");

will extract the content of the <title>, which does work out that way.
Please explain what is the (?i)? isn't this suppose to be group 1? which then make the content part group 3, but yet the result does pick out the content directly. By content I mean the text between beg and end title tag.
next, what difference does it make to have ? at the end of (.+?)? w/o the end ?, how would the result be any different?
Comment
Watch Question

Awarded 2011
Awarded 2011

Commented:
The second question - this ? Makes this not greedy - so it will capture up to the first closing of title not up to the maximum furthest closing of title
Awarded 2011
Awarded 2011
Commented:
The first piece i imediately after parenthesis and question mark means that the whole regex should ignore case - it does not have any matching so it does not form the group - so TITLE in upper casse should also match

Author

Commented:
so, without the ? in (.+?), the matcher will end up capturing the furthest closing of title?
Awarded 2011
Awarded 2011

Commented:

>so, without the ? in (.+?), the matcher will end up capturing the furthest closing of title?

Yes, without that "?"  it will be greedy - and dot corresponds to any characater - so it will find the farthest </title>
Awarded 2011
Awarded 2011

Commented:
and this (?i)  turns of  ignore case for the whole susequent regex - so obviouly it does not count as capturing group

Author

Commented:
great. I wonder if anyone can understand these sample code as a novice reader. there is generally so little explanation.
Awarded 2011
Awarded 2011

Commented:
regex requires some getting accustomed