We help IT Professionals succeed at work.

converting html to a string / page scraping

sgaggerj
sgaggerj asked
on
Medium Priority
264 Views
Last Modified: 2009-12-16
Hi all,

I'm new to PHP programming so please bear with me.

I'm developing a java app for mobile phones that will take user input, send it to the web and get the response.
the response is formatted html that unfortunately has more page elements than i need and does not display well on the phone.

the page i send data to / from i have no control over.

what i was thinking is that if i wrote my own php script that recieved the request from the user, sent it on to the correct page
recieved the result from that page, stripped all the useless info and returned a simple string so that when the phones app
recieved the string it would be ready to display w/ no processing required.
something similar to page scraping i guess.

my reasons for trying it this way are
1) minimize the overhead on the phone, leaving the processing to the server which i think should be faster
2) minimize the data sent/recieved fom the phone to a minimum
3) to see the difference in the lag time and data transmission size between the current implementation (which recieves the whole html and scrapes it on the phone) and this implementation.
4) minimize the size of the app on the phone.

how do i go about doing this?

the user enters a string and that is transmitted to the site

the response that i need is always after a </form> tag
two elements later will always be either
a) "<p align=\"center\">"
(indicating nothing found)
or
b) "<b>"
(indicating something found)
and the data continues until a </div> is encountered.

the rest is junk

i know how to get the argument passed to the page from the app, but from there i'm kind of lost.

thoughts, suggestions?

any help is greatly appreciated!

TIA!

J
Comment
Watch Question

Solutions Architect
Commented:
This code parses out the response:

<?php
$_input = "I am using the <b>actual
mail text</b> as a way <p>to test the reg ex.
I know it isn't exactly what is in the
html page.</form> tag
two elements later will always be either
a) \"<p align=\"center\">\"
(indicating nothing found)
or
b) \"<b>\"
(indicating something found)
and the data continues until a </div>
";

$_pattern = "/<\/FORM>(.*?<P ALIGN=\"CENTER\">.*?)<\/DIV>/si";
preg_match($_pattern, $_input, $_match);


echo "<PRE>\n";
var_dump($_match[1]);
echo "</PRE>\n";
?>

I'm not sure if you are already pushing data to the other site and
getting back the response, yet.

If not, you can use cURL. Let me know, I can give you some examples.
--brian

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts

Author

Commented:
awesome!
thanks Brian - sorry it took me so long to get back to this q.

J
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.