Link to home
Start Free TrialLog in
Avatar of Cole3388
Cole3388Flag for United States of America

asked on

Scrape Data From Web Page and Add to Variables in ColdFusion

This is a sort of general question.  What I'm looking for is a method to scrape data from a web page, such a a few numbers or a few words, and embed them in a cfset variable.  I was just curious if anyone may have a reference or path I could get on to figure out how to do this.  For example, ColdFusion heads over to the Fox news website and grabs the front page. I can later grab data out of this page if it exists. i.e. Grab every 10 words after the word NASA on the Fox news homepage. Make sense?
Avatar of erikTsomik
erikTsomik
Flag of United States of America image

you may need to use Replace function to created a regex.
ASKER CERTIFIED SOLUTION
Avatar of _agx_
_agx_
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
> Grab every 10 words after the word NASA on the Fox news homepage

  Again it would depend on how the content is formatted.  "NASA" could be in the middle of plain text,
  or in the middle of html tags <span style="....">NASA</span>. One approach would be to remove all
  of the html tags first  (see udf from cflib.org)  
   http://www.cflib.org/udf.cfm?id=1598

  Then use a regex (or possibly adapt this function) to grab the next 10 words after "NASA"
  http://www.cflib.org/udf/FullLeft