Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Parsing out data and adding it to an array

Posted on 2015-01-31
7
Medium Priority
?
104 Views
Last Modified: 2015-02-01
I have an html file that I need to parse for information and put that information into a number of arrays.

Can you put together an example of how I can do this?

I've got something like this:

$string = <<<EOD

<html>
<body>
<div> various text </div>
<div class="container">
<h3 class="main">title here</h3>                         //text in bold would be $main[0]
<div> various text here</div>
<div class="listing">more text here</div>            //text in bold would be $listing[0]
<div> various other text here</div>
<span class "wrap">wrap up text</span>           //text in bold would be $wrap[0]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 2 here</h3>                        //text in bold would be $main[1]
<div> various text here</div>
<div class="listing">more text here 2</div>          //text in bold would be $listing[1]
<div> various other text here</div>
<span class "wrap">wrap up text 2</span>           //text in bold would be $wrap[1]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 3 here</h3>                                  //text in bold would be $main[2]
<div> various text here</div>
<div class="listing">more text here 3</div>        //text in bold would be $listing[2]
<div> various other text here</div>
<span class "wrap">wrap up text 3</span>        //text in bold would be $wrap[2]
<div>end of this listing</div>
</div>

<div> various text</div>
</body>
</html>
EOD;


So for each <div class="container">  I need to pull out the text between:

<h3 class="main">          </h3>

<div class="listing">      </div>

<span class "wrap">      </span>

and make each into it's own array, so I can later say:

<p>
echo $main[0];'<br />';
echo $listing[0];'<br />';
echo $wrap[0];'<br />';
</p><p>
echo $main[1];'<br />';
echo $listing[1];'<br />';
echo $wrap[1];'<br />';
</p><p>
echo $main[1];'<br />';
echo $listing[1];'<br />';
echo $wrap[1];'<br />';
</p>

There will be 25 reiterations of this in all.

I know some php basics of course, but don't know how to put a while or for each thing nor how to pull the data out from between text... but I can certainly work off of a good example if you can put one together for me.

Thanks!   Chris
0
Comment
Question by:St_Aug_Beach_Bum
7 Comments
 
LVL 62

Expert Comment

by:gheist
ID: 40582378
So you want to extract data from a webpage?
0
 
LVL 25

Expert Comment

by:Kyle Hamilton
ID: 40582398
sounds like you're looking for scraper. if your php knowledge is that weak, you might be better off finding a library that does it.
0
 
LVL 25

Expert Comment

by:Kyle Hamilton
ID: 40582413
[link to competing web site removed]
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 111

Expert Comment

by:Ray Paseur
ID: 40582481
Is this really the data you're working with?  It's not even valid HTML!
0
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 40582520
This does what you're asking for, but I have a feeling that there is a "backstory" here, and if we understood that we might be able to lead you to a more suitable solution.
http://iconoun.com/demo/temp_staug.php 

Web scraping is fraught with risk and you should expect any web scraping script to fail at any time without notice, so don't depend on this automation to do anything important, or to produce any output that goes directly into another automated process.  The reason this is risky is that publishers can, and do, tinker with their HTML documents all the time.  Any and all assumptions about the structure of an HTML document are at risk.

A safer and better approach to getting external data is to ask the publisher to expose an API.  APIs are typically version-controlled and stable.  API version 1.0 will always produce the same document (probably XML or JSON) and will not vary, so you can depend on the format.  API version 1.1 will represent improvements and additions to version 1.0.  Things won't really change until you get to API version 2+, etc.  

If the publisher wants you to be able to use its data, it should expose an API for you, however this may come at a cost, since the publisher is the copyright holder and can legally charge for the use of its data.

<?php

/**
 * See http://www.experts-exchange.com/Programming/Languages/Scripting/PHP/Q_28607748.html
 */
error_reporting(E_ALL);

// TEST DATA CREATED USING NOWDOC SYNTAX
$str = <<<'EOD'

<html>
<body>
<div> various text </div>
<div class="container">
<h3 class="main">title here</h3>                         //text in bold would be $main[0]
<div> various text here</div>
<div class="listing">more text here</div>            //text in bold would be $listing[0]
<div> various other text here</div>
<span class "wrap">wrap up text</span>           //text in bold would be $wrap[0]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 2 here</h3>                        //text in bold would be $main[1]
<div> various text here</div>
<div class="listing">more text here 2</div>          //text in bold would be $listing[1]
<div> various other text here</div>
<span class "wrap">wrap up text 2</span>           //text in bold would be $wrap[1]
<div>end of this listing</div>
</div>

<div class="container">
<h3 class="main">title 3 here</h3>                                  //text in bold would be $main[2]
<div> various text here</div>
<div class="listing">more text here 3</div>        //text in bold would be $listing[2]
<div> various other text here</div>
<span class "wrap">wrap up text 3</span>        //text in bold would be $wrap[2]
<div>end of this listing</div>
</div>

<div> various text</div>
</body>
</html>
EOD;


// THE SIGNAL INFORMATION
$trap['main']    = [ '<h3 class="main">',      '</h3>' ];
$trap['listing'] = [ '<div class="listing">',  '</div>' ];
$trap['wrap']    = [ '<span class "wrap">',    '</span>' ];


// EXTRACT THE DATA AND BUILD NEW ARRAYS
foreach ($trap as $var => $arr)
{
    $rgx = '#' . '(' . preg_quote($arr[0]) . ')(.*?)(' . preg_quote($arr[1]) . ')#';
    preg_match_all($rgx, $str, $mat);
    $$var = $mat[2];
}


// SHOW THE WORK PRODUCTS
$kount = 0;
while ($kount > -1)
{
    echo '<p>';
    echo $main[$kount]    . '<br />';
    echo $listing[$kount] . '<br />';
    echo $wrap[$kount]    . '<br />';
    echo '</p>' . PHP_EOL;

    $kount++;
    if (empty($main[$kount])) break;
}

Open in new window

0
 

Author Closing Comment

by:St_Aug_Beach_Bum
ID: 40582748
Yikes, ok all.

Thank you for help. It's not the actually html, I just threw that together for an example so I can work from it. I'm pulling data from a number of sites for a project on trends, not a critical application. Looked at several spidering services by they didn't do quite what I wanted and were costly for a small project.

Thanks again.
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 40582770
OK, good luck with it.  As long as you understand the risks...

best regards, ~Ray
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
This article discusses how to create an extensible mechanism for linked drop downs.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to dynamically set the form action using jQuery.
Suggested Courses

885 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question