sirbounty
asked on
vbscript regex against html data
I'm reading some code using msxml2.xmlhttp "GET" method.
A lot of functions and extraneous data is being returned that I'd like to easily filter out - I figure a reg ex is easiest.
There are basically two bits of information I'd like to pull from this data:
1) user name - this is the text following an <a href...> tag, but the user's real name is found in both alt="John Smith" and title="John Smith". Presumably if that's easier to grab, I could live with it as well...
2) the user's ID: This is inside a link tag, as a parameter (user=xxxxxx) (repeats several times), but also in <img uid="xxxxx"...> which might be easier to get.
I have one other piece of data I'd like to grab, but figure that deserves a new question.
Let me know if sample data is needed.
A lot of functions and extraneous data is being returned that I'd like to easily filter out - I figure a reg ex is easiest.
There are basically two bits of information I'd like to pull from this data:
1) user name - this is the text following an <a href...> tag, but the user's real name is found in both alt="John Smith" and title="John Smith". Presumably if that's easier to grab, I could live with it as well...
2) the user's ID: This is inside a link tag, as a parameter (user=xxxxxx) (repeats several times), but also in <img uid="xxxxx"...> which might be easier to get.
I have one other piece of data I'd like to grab, but figure that deserves a new question.
Let me know if sample data is needed.
ASKER
Not working for me...basically what I have for code is below...
Unfortunately, I have loads of data returned - still over 109 lines (should only be one or two if everything is filtered correctly?).
Your assumptions are correct.
Would it be easier to use the selectNode that you mentioned? I don't mind opening a new thread if needed...
Unfortunately, I have loads of data returned - still over 109 lines (should only be one or two if everything is filtered correctly?).
Your assumptions are correct.
Would it be easier to use the selectNode that you mentioned? I don't mind opening a new thread if needed...
set re1=createobject("Vbscript.RegExp")
set re2=createobject("Vbscript.RegExp")
re1.Pattern= "<a[^>]+title=""(^""])+"
re2.Pattern="<img[^>]+uid=""([^""])+"
'.....code to extract xml data
strData = objX.ResponseText
strNewData=re1.replace(strData,"")
strNewerData=re2.replace(strNewData,"")
objFile.Write strnewerdata
ASKER
bear in mind also - this is meant to be a loop.
There will be approximately 30 users on each page - and there are around 120 pages...all I want is user id & username (any other related user data is okay - I just don't want the html/xml tags)
There will be approximately 30 users on each page - and there are around 120 pages...all I want is user id & username (any other related user data is okay - I just don't want the html/xml tags)
> strNewData=re1.replace(str Data,"")
no no.... that's not what I meant. I didn't know what you were after, so I showed you a way to have a matching regex. If you want to replace all and extract only one uid/name, then prepend and postpend the expr with ".*" and put $1 in the replace expr. You probably also want to set MultiLine to true (don't know exact syntax in VBS).
no no.... that's not what I meant. I didn't know what you were after, so I showed you a way to have a matching regex. If you want to replace all and extract only one uid/name, then prepend and postpend the expr with ".*" and put $1 in the replace expr. You probably also want to set MultiLine to true (don't know exact syntax in VBS).
re1.Pattern= ".*<a[^>]+title=""(^""])+. *"re2.Pattern=".*<img[^>]+ui d=""([^""] )+.*"strDataForUid = objX.ResponseTextstrDataForname = objX.ResponseTextuid = re1.replace(strDataForUid, "$1")name = re1.replace(strDataForname ,"$1")
-- Abel --
sorry, tgot those last lines messed up:
name = re1.replace(strDataForname ,"$1")
uid = re2.replace(strDataForUid, "$1")
name = re1.replace(strDataForname
uid = re2.replace(strDataForUid,
ASKER
Hmm - now the script seems to be hanging...
I'll have to play around with it a bit, perhaps just getting the id first.
I'll have to play around with it a bit, perhaps just getting the id first.
ASKER
Hmm - no. Taking longer to run, but still outputting quite a bit of data..
Basically I'm trying with this...
Basically I'm trying with this...
set re1=createobject("Vbscript.RegExp")
set re2=createobject("Vbscript.RegExp")
re1.Pattern= ".*<a[^>]+title=""(^""])+.*"
re2.Pattern=".*<img[^>]+uid=""([^""])+.*"
re1.Multiline=True
re2.Multiline=True
objX.Open "GET", strURL, False
objX.Send
strData = objX.ResponseText
uid = re1.replace(strData,"$1")
wscript.echo uid
msgbox uid
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I've given up...thanks for the help...
<a[^>]+title="([^"])+"
2) same, userid goes into $1:
<img[^>]+uid="([^"])+"
since the result is XML, you may also use DOM methods or selectNode XPath style methods to be more precise... But that's another chapter.
I assumed that a is always a and not A, and that img is always img and not Img, IMG or iMg.
-- Abel --