• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 313
  • Last Modified:

XMLHTTP to scrape specific content

I'm using XMLHTTP to pull in a page and scrape some specific content... but having trouble doing so... is there a way I could load the content and rip out everything but the content living within a certain div?

<div id='cal'>
some code
</div>

0
just1coder
Asked:
just1coder
1 Solution
 
peterxlaneCommented:
This is definitely possible, but it is hard to say the best way to go about it without seeing the entire block of code you are scraping.  Additionally, it will be prone to breaking if the name of the div id changes or other aspects of the page dramatically and you are not aware of it.

Can you post the entire block of scraped content?


0
 
babuno5Commented:
whats the exact problem you are facing , i mean any error??
0
 
WMIFCommented:
regex would be a good tool to use.  do you want the div tag with it or just the text between the tags?
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
just1coderAuthor Commented:
just the text between would be perfect.
0
 
WMIFCommented:
<div[ ]id='cal'>\s*([^<]+?)\s*</div>

that pattern should work.  are you familiar with regular expression use?  this would be specific to that div tag though.



<div[ ]id='?(.+?)'?>\s*([^<]+?)\s*</div>

this pattern will return 2 submatches.  one will be the div tag id, the other will be the text in between.
0
 
WMIFCommented:
never heard back on whether the asker knows how to use regex.  the pattern is good though.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now