converting html to a string / page scraping

Posted on 2006-03-23
Medium Priority
Last Modified: 2009-12-16
Hi all,

I'm new to PHP programming so please bear with me.

I'm developing a java app for mobile phones that will take user input, send it to the web and get the response.
the response is formatted html that unfortunately has more page elements than i need and does not display well on the phone.

the page i send data to / from i have no control over.

what i was thinking is that if i wrote my own php script that recieved the request from the user, sent it on to the correct page
recieved the result from that page, stripped all the useless info and returned a simple string so that when the phones app
recieved the string it would be ready to display w/ no processing required.
something similar to page scraping i guess.

my reasons for trying it this way are
1) minimize the overhead on the phone, leaving the processing to the server which i think should be faster
2) minimize the data sent/recieved fom the phone to a minimum
3) to see the difference in the lag time and data transmission size between the current implementation (which recieves the whole html and scrapes it on the phone) and this implementation.
4) minimize the size of the app on the phone.

how do i go about doing this?

the user enters a string and that is transmitted to the site

the response that i need is always after a </form> tag
two elements later will always be either
a) "<p align=\"center\">"
(indicating nothing found)
b) "<b>"
(indicating something found)
and the data continues until a </div> is encountered.

the rest is junk

i know how to get the argument passed to the page from the app, but from there i'm kind of lost.

thoughts, suggestions?

any help is greatly appreciated!


Question by:sgaggerj

Accepted Solution

Brian Bush earned 2000 total points
ID: 16286316
This code parses out the response:

$_input = "I am using the <b>actual
mail text</b> as a way <p>to test the reg ex.
I know it isn't exactly what is in the
html page.</form> tag
two elements later will always be either
a) \"<p align=\"center\">\"
(indicating nothing found)
b) \"<b>\"
(indicating something found)
and the data continues until a </div>

$_pattern = "/<\/FORM>(.*?<P ALIGN=\"CENTER\">.*?)<\/DIV>/si";
preg_match($_pattern, $_input, $_match);

echo "<PRE>\n";
echo "</PRE>\n";

I'm not sure if you are already pushing data to the other site and
getting back the response, yet.

If not, you can use cURL. Let me know, I can give you some examples.

Author Comment

ID: 16397605
thanks Brian - sorry it took me so long to get back to this q.


Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
This holiday season, we’re giving away the gift of knowledge—tech knowledge, that is. Keep reading to see what hacks, tips, and trends we have wrapped and waiting for you under the tree.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses
Course of the Month15 days, 2 hours left to enroll

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question