Link to home
Start Free TrialLog in
Avatar of mmalik15
mmalik15

asked on

regular expression to search first div if not found look for second div

<map from="html"  to="somefield"  match="(?s)<div\s+id='column-middle page-content'>(.*?)</div> replace="\1" />

the above line will search any div with id='column-middle page-content' and will map the value to "somefield" but I am looking for a regular expression when it does not find any match with id='column-middle page-content' then look for a another match where id='landing-left page-content' and do the mapping
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mmalik15
mmalik15

ASKER

thanks for all the comments kaufmed

let me explain you my requirement. e.g. in my html if i have

<div id="test1">
some stuff inside test1
</div>

<div id="test2">
some stuff inside test2
</div>


if div id="test1" is present and found then i only need the text "some stuff inside test1"  but if there is no div with id="test1"  then my regular expression should look div with id="test2" and get me the text inside "test2" which is in this case "some stuff inside test2"

Did the pattern above not work?
the above pattern works fine but if both test1 and test2 are found in html it matches both whereas I only need to capture the first match among two. I dont know if that would be possible through regular expression :S
What utility is this? You have this categorized under the XML zone, but is this for something specific? Some tool or API?
I am writing a crawling stage in crawler (FAST Enterprise Search Platform built in crawler). After crawling certain html fields are mapped to internal storage fields. The mapping is done in an xml file and the mapping requirement is to check for <div class='column-middle page-content'>, if this is not present in HTML then we will take from <div class='landing-left page-content'>.
usually we map it like this
<map from="html"  to="somefield"  match="(?s)<div\s+id='column-middle page-content'>(.*?)</div> replace="\1" />
Unfortunately, I am not familiar with that technology, so I won't be able to give you a suitable answer. I certainly advise you to hold out for someone who may be more knowledgeable on the subject.

Sorry  : (
thanks mate your pattern did help and i finally sorted it out :)